gogoduan commited on
Commit
548896c
Β·
verified Β·
1 Parent(s): ba012f0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -77
README.md CHANGED
@@ -20,87 +20,15 @@ tags:
20
  <div align="center">
21
  <a href="https://math-vr.github.io"><img src="https://img.shields.io/badge/Project-Homepage-green" alt="Home"></a>
22
  <a href="https://huggingface.co/papers/2510.11718"><img src="https://img.shields.io/badge/Paper-red" alt="Paper"></a>
23
- <a href="https://github.com/HKU-MMLab/Math-VR-CodePlot-CoT"><img src="https://img.shields.io/badge/Code-GitHub-blue" alt="GitHub"></a>
24
  </div>
25
 
26
- <div align="center">
27
- <img src="https://github.com/HKU-MMLab/Math-VR-CodePlot-CoT/raw/main/figures/teaser.png" width="100%"/>
28
- </div>
29
-
30
- ## Introduction
31
- This repository contains the **CodePlot-CoT** model, a code-driven Chain-of-Thought (CoT) paradigm for mathematical visual reasoning, as presented in the paper [CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images](https://huggingface.co/papers/2510.11718).
32
-
33
- Recent advances in Large Language Models (LLMs) and Vision Language Models (VLMs) have shown significant progress in mathematical reasoning, yet they still face a critical bottleneck with problems requiring visual assistance, such as drawing auxiliary lines or plotting functions to solve the problems. Most LLMs and VLMs are constrained to text-only reasoning chains, while multimodal unified models that can generate interleaved text and images lack the necessary precision and controllability for such tasks.
34
-
35
- To address this, we propose **CodePlot-CoT**, a code-driven Chain-of-Thought paradigm for "thinking with images" in mathematics. Our approach leverages the VLM to generate text reasoning as well as executable plotting code, which is then rendered into images as "visual thought", to solve mathematical problems. To achieve this, we first construct **Math-VR**, the first large-scale, bilingual dataset and benchmark for Mathematics problems with Visual Reasoning, comprising 178K samples. Second, to create high-quality training data, we develop a state-of-the-art image-to-code converter specialized for parsing complex mathematical figures into codes. Finally, using these training data, we train the CodePlot-CoT model for solving mathematical problems. Experimental results show that our model achieves up to 21% increase over base model on our new benchmark, fully validating the efficacy of our proposed code-driven reasoning paradigm. Our work opens a new direction for multimodal mathematical reasoning and provides the community with the first large-scale dataset, comprehensive benchmark, and strong approach for such problems.
36
-
37
- The main contributions of our work can be summarized as follows:
38
- * We propose a novel and efficient paradigm that enables VLMs to engage in visual reasoning through code generation.
39
- * We construct **Math-VR**, the first large-scale, bilingual dataset and benchmark (178K samples) for Mathematical problems with Visual Reasoning.
40
- * We develop **MatplotCode**, a state-of-the-art image-to-code converter for mathematical figures, and train **CodePlot-CoT** model, a specialized model that achieves up to a 21% performance increase over strong baselines.
41
-
42
- ## Released Data: Math-VR-train and Math-VR-bench
43
- | Dataset | Link |
44
- |-------------------|-------------------------------------------------------------|
45
- | **Math-VR-train** | [πŸ€— HuggingFace](https://huggingface.co/datasets/gogoduan/Math-VR-train) |
46
- | **Math-VR-bench** | [πŸ€— HuggingFace](https://huggingface.co/datasets/gogoduan/Math-VR-bench) |
47
-
48
- ## Released Models: MatplotCode and CodePlot-CoT
49
-
50
- | Model | Link |
51
- |-------------------|-------------------------------------------------------------|
52
- | **MatPlotCode** | [πŸ€— HuggingFace](https://huggingface.co/gogoduan/MatPlotCode) |
53
- | **CodePlot-CoT** | [πŸ€— HuggingFace](https://huggingface.co/gogoduan/CodePlot-CoT) |
54
-
55
- ## Model Overview
56
- ### CodePlot-CoT: Mathematical Visual Reasoning with Code-Driven Images
57
- We introduce **CodePlot-CoT**, an innovative code-driven Chain-of-Thought (CoT) paradigm designed to enable Vision Language Models to "think with images" when solving mathematical problems. Rather than generating pixel-based images directly, the model outputs executable plotting code to represent its "visual thoughts". This code is executed to render a precise figure, which is then reinput to the model as a visual input for subsequent reasoning steps.
58
-
59
- ### MatplotCode: A High-Fidelity Converter for Mathematical Figures
60
- To train the CodePlot-CoT model, we require high-quality data pairing images with corresponding plotting code. Since such resources are rare and existing general models are unreliable for this specialized task, we develope **MatplotCode**, a state-of-the-art image-to-code converter designed specifically for mathematical figures. It is specialized in converting complex mathematical figures into high-fidelity Python plotting code. In our evaluation, MatplotCode achieve a **100%** code execution success rate. Its image reconstruction fidelity is also significantly higher than SOTA models including GPT-03 and Gemini-2.5-Pro. MatplotCode is the key to enabling the large-scale curation of our code-driven training data, laying the foundation for the successful training of the CodePlot-CoT model.
61
-
62
- <div align="center">
63
- <img src="https://github.com/HKU-MMLab/Math-VR-CodePlot-CoT/raw/main/figures/model.png" width="100%"/>
64
- </div>
65
-
66
- ## Usage
67
-
68
- ### Installation
69
- Clone the repo and install dependent packages.
70
- ```bash
71
- conda create -n codeplot python==3.10
72
- conda activate codeplot
73
- git clone [email protected]:HKU-MMLab/Math-VR-CodePlot-CoT.git
74
- cd CodePlot-CoT
75
- pip install -r requirements.txt
76
- pip install flash_attn==2.7.4.post1
77
- ```
78
- For benchmark evaluation only.
79
- ```bash
80
- pip install openai==4.1.1
81
- pip install datasets==2.0.0
82
- ```
83
-
84
- ### Model Weights
85
- Expected directory structure might be:
86
- ```
87
- CodePlot-CoT
88
- β”œβ”€β”€ ckpts
89
- β”‚ β”œβ”€β”€ CodePlot-CoT
90
- β”‚ β”œβ”€β”€ MatPlotCode
91
- β”œβ”€β”€ ...
92
- ```
93
 
94
- ### Inference
95
- ```python
96
- # Convert image to python code with MatPlotCode
97
- python image_to_code.py
98
- # Solve math problems with CodePlot-CoT
99
- python math_infer.py
100
- ```
101
 
102
- ## License
103
- This code is released under the MIT License.
104
 
105
  ## Citation
106
  If you find this work helpful, please consider citing our paper:
 
20
  <div align="center">
21
  <a href="https://math-vr.github.io"><img src="https://img.shields.io/badge/Project-Homepage-green" alt="Home"></a>
22
  <a href="https://huggingface.co/papers/2510.11718"><img src="https://img.shields.io/badge/Paper-red" alt="Paper"></a>
23
+ <a href="https://github.com/HKU-MMLab/Math-VR-CodePlot-CoT"><img src="https://img.shields.io/badge/GitHub-Code-keygen.svg?logo=github&style=flat-square" alt="GitHub"></a>
24
  </div>
25
 
26
+ This repository contains the **MatplotCode** model, a core component from the paper [CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images](https://huggingface.co/papers/2510.11718).
27
+ MatPlotCode is state-of-the-art image-code converter capable of converting math figures into 'matplotlib' code.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
+ The model is built upon the Qwen2.5-VL architecture and is compatible with the `transformers` library.
 
 
 
 
 
 
30
 
31
+ For more details, please refer to the [project homepage](https://math-vr.github.io) and the [GitHub repository](https://github.com/HKU-MMLab/Math-VR-CodePlot-CoT).
 
32
 
33
  ## Citation
34
  If you find this work helpful, please consider citing our paper: