Update README.md
Browse files
README.md
CHANGED
|
@@ -20,87 +20,15 @@ tags:
|
|
| 20 |
<div align="center">
|
| 21 |
<a href="https://math-vr.github.io"><img src="https://img.shields.io/badge/Project-Homepage-green" alt="Home"></a>
|
| 22 |
<a href="https://huggingface.co/papers/2510.11718"><img src="https://img.shields.io/badge/Paper-red" alt="Paper"></a>
|
| 23 |
-
<a href="https://github.com/HKU-MMLab/Math-VR-CodePlot-CoT"><img src="https://img.shields.io/badge/Code-
|
| 24 |
</div>
|
| 25 |
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
</div>
|
| 29 |
-
|
| 30 |
-
## Introduction
|
| 31 |
-
This repository contains the **CodePlot-CoT** model, a code-driven Chain-of-Thought (CoT) paradigm for mathematical visual reasoning, as presented in the paper [CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images](https://huggingface.co/papers/2510.11718).
|
| 32 |
-
|
| 33 |
-
Recent advances in Large Language Models (LLMs) and Vision Language Models (VLMs) have shown significant progress in mathematical reasoning, yet they still face a critical bottleneck with problems requiring visual assistance, such as drawing auxiliary lines or plotting functions to solve the problems. Most LLMs and VLMs are constrained to text-only reasoning chains, while multimodal unified models that can generate interleaved text and images lack the necessary precision and controllability for such tasks.
|
| 34 |
-
|
| 35 |
-
To address this, we propose **CodePlot-CoT**, a code-driven Chain-of-Thought paradigm for "thinking with images" in mathematics. Our approach leverages the VLM to generate text reasoning as well as executable plotting code, which is then rendered into images as "visual thought", to solve mathematical problems. To achieve this, we first construct **Math-VR**, the first large-scale, bilingual dataset and benchmark for Mathematics problems with Visual Reasoning, comprising 178K samples. Second, to create high-quality training data, we develop a state-of-the-art image-to-code converter specialized for parsing complex mathematical figures into codes. Finally, using these training data, we train the CodePlot-CoT model for solving mathematical problems. Experimental results show that our model achieves up to 21% increase over base model on our new benchmark, fully validating the efficacy of our proposed code-driven reasoning paradigm. Our work opens a new direction for multimodal mathematical reasoning and provides the community with the first large-scale dataset, comprehensive benchmark, and strong approach for such problems.
|
| 36 |
-
|
| 37 |
-
The main contributions of our work can be summarized as follows:
|
| 38 |
-
* We propose a novel and efficient paradigm that enables VLMs to engage in visual reasoning through code generation.
|
| 39 |
-
* We construct **Math-VR**, the first large-scale, bilingual dataset and benchmark (178K samples) for Mathematical problems with Visual Reasoning.
|
| 40 |
-
* We develop **MatplotCode**, a state-of-the-art image-to-code converter for mathematical figures, and train **CodePlot-CoT** model, a specialized model that achieves up to a 21% performance increase over strong baselines.
|
| 41 |
-
|
| 42 |
-
## Released Data: Math-VR-train and Math-VR-bench
|
| 43 |
-
| Dataset | Link |
|
| 44 |
-
|-------------------|-------------------------------------------------------------|
|
| 45 |
-
| **Math-VR-train** | [π€ HuggingFace](https://huggingface.co/datasets/gogoduan/Math-VR-train) |
|
| 46 |
-
| **Math-VR-bench** | [π€ HuggingFace](https://huggingface.co/datasets/gogoduan/Math-VR-bench) |
|
| 47 |
-
|
| 48 |
-
## Released Models: MatplotCode and CodePlot-CoT
|
| 49 |
-
|
| 50 |
-
| Model | Link |
|
| 51 |
-
|-------------------|-------------------------------------------------------------|
|
| 52 |
-
| **MatPlotCode** | [π€ HuggingFace](https://huggingface.co/gogoduan/MatPlotCode) |
|
| 53 |
-
| **CodePlot-CoT** | [π€ HuggingFace](https://huggingface.co/gogoduan/CodePlot-CoT) |
|
| 54 |
-
|
| 55 |
-
## Model Overview
|
| 56 |
-
### CodePlot-CoT: Mathematical Visual Reasoning with Code-Driven Images
|
| 57 |
-
We introduce **CodePlot-CoT**, an innovative code-driven Chain-of-Thought (CoT) paradigm designed to enable Vision Language Models to "think with images" when solving mathematical problems. Rather than generating pixel-based images directly, the model outputs executable plotting code to represent its "visual thoughts". This code is executed to render a precise figure, which is then reinput to the model as a visual input for subsequent reasoning steps.
|
| 58 |
-
|
| 59 |
-
### MatplotCode: A High-Fidelity Converter for Mathematical Figures
|
| 60 |
-
To train the CodePlot-CoT model, we require high-quality data pairing images with corresponding plotting code. Since such resources are rare and existing general models are unreliable for this specialized task, we develope **MatplotCode**, a state-of-the-art image-to-code converter designed specifically for mathematical figures. It is specialized in converting complex mathematical figures into high-fidelity Python plotting code. In our evaluation, MatplotCode achieve a **100%** code execution success rate. Its image reconstruction fidelity is also significantly higher than SOTA models including GPT-03 and Gemini-2.5-Pro. MatplotCode is the key to enabling the large-scale curation of our code-driven training data, laying the foundation for the successful training of the CodePlot-CoT model.
|
| 61 |
-
|
| 62 |
-
<div align="center">
|
| 63 |
-
<img src="https://github.com/HKU-MMLab/Math-VR-CodePlot-CoT/raw/main/figures/model.png" width="100%"/>
|
| 64 |
-
</div>
|
| 65 |
-
|
| 66 |
-
## Usage
|
| 67 |
-
|
| 68 |
-
### Installation
|
| 69 |
-
Clone the repo and install dependent packages.
|
| 70 |
-
```bash
|
| 71 |
-
conda create -n codeplot python==3.10
|
| 72 |
-
conda activate codeplot
|
| 73 |
-
git clone [email protected]:HKU-MMLab/Math-VR-CodePlot-CoT.git
|
| 74 |
-
cd CodePlot-CoT
|
| 75 |
-
pip install -r requirements.txt
|
| 76 |
-
pip install flash_attn==2.7.4.post1
|
| 77 |
-
```
|
| 78 |
-
For benchmark evaluation only.
|
| 79 |
-
```bash
|
| 80 |
-
pip install openai==4.1.1
|
| 81 |
-
pip install datasets==2.0.0
|
| 82 |
-
```
|
| 83 |
-
|
| 84 |
-
### Model Weights
|
| 85 |
-
Expected directory structure might be:
|
| 86 |
-
```
|
| 87 |
-
CodePlot-CoT
|
| 88 |
-
βββ ckpts
|
| 89 |
-
β βββ CodePlot-CoT
|
| 90 |
-
β βββ MatPlotCode
|
| 91 |
-
βββ ...
|
| 92 |
-
```
|
| 93 |
|
| 94 |
-
|
| 95 |
-
```python
|
| 96 |
-
# Convert image to python code with MatPlotCode
|
| 97 |
-
python image_to_code.py
|
| 98 |
-
# Solve math problems with CodePlot-CoT
|
| 99 |
-
python math_infer.py
|
| 100 |
-
```
|
| 101 |
|
| 102 |
-
|
| 103 |
-
This code is released under the MIT License.
|
| 104 |
|
| 105 |
## Citation
|
| 106 |
If you find this work helpful, please consider citing our paper:
|
|
|
|
| 20 |
<div align="center">
|
| 21 |
<a href="https://math-vr.github.io"><img src="https://img.shields.io/badge/Project-Homepage-green" alt="Home"></a>
|
| 22 |
<a href="https://huggingface.co/papers/2510.11718"><img src="https://img.shields.io/badge/Paper-red" alt="Paper"></a>
|
| 23 |
+
<a href="https://github.com/HKU-MMLab/Math-VR-CodePlot-CoT"><img src="https://img.shields.io/badge/GitHub-Code-keygen.svg?logo=github&style=flat-square" alt="GitHub"></a>
|
| 24 |
</div>
|
| 25 |
|
| 26 |
+
This repository contains the **MatplotCode** model, a core component from the paper [CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images](https://huggingface.co/papers/2510.11718).
|
| 27 |
+
MatPlotCode is state-of-the-art image-code converter capable of converting math figures into 'matplotlib' code.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
|
| 29 |
+
The model is built upon the Qwen2.5-VL architecture and is compatible with the `transformers` library.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
+
For more details, please refer to the [project homepage](https://math-vr.github.io) and the [GitHub repository](https://github.com/HKU-MMLab/Math-VR-CodePlot-CoT).
|
|
|
|
| 32 |
|
| 33 |
## Citation
|
| 34 |
If you find this work helpful, please consider citing our paper:
|