Update README.md
Browse files
README.md
CHANGED
|
@@ -9,4 +9,39 @@ metrics:
|
|
| 9 |
base_model:
|
| 10 |
- Qwen/Qwen2.5-Math-7B
|
| 11 |
pipeline_tag: question-answering
|
| 12 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
base_model:
|
| 10 |
- Qwen/Qwen2.5-Math-7B
|
| 11 |
pipeline_tag: question-answering
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# Making Mathematical Reasoning Adaptive
|
| 15 |
+
|
| 16 |
+
<p align="center">
|
| 17 |
+
<a href="https://arxiv.org/abs/2510.04617"> 📃 Paper</a> |
|
| 18 |
+
<a href="https://github.com/NJUNLP/AdaR"> ⚙️ Code</a> |
|
| 19 |
+
<a href="https://huggingface.co/collections/DreamW1ngs/adar-68e648e59b2c9aec1208b5ef"> 🤖 Project</a> |
|
| 20 |
+
<a href="https://resume.laizj.fun/"> 📭 Contact</a>
|
| 21 |
+
</p>
|
| 22 |
+
|
| 23 |
+
---
|
| 24 |
+
|
| 25 |
+
## 🌱 Overview
|
| 26 |
+
|
| 27 |
+
Large Language Models (LLMs) have shown impressive reasoning capabilities, yet they often rely on **spurious reasoning** — producing answers from superficial features, leading to failure at robustness and generalization.
|
| 28 |
+
|
| 29 |
+
We propose **AdaR** framework to enable adaptive reasoning, wherein models rely on problem-solving logic to produce answers. **AdaR** synthesizes logically equivalent queries by varying variable values, and trains models with RLVR on these data to penalize spurious logic while encouraging adaptive logic.
|
| 30 |
+
|
| 31 |
+
The framework integrates *data synthesis* and *RLVR training* to enhance both **robustness (in-domain)** and **generalization (out-of-domain)**.
|
| 32 |
+
|
| 33 |
+

|
| 34 |
+
|
| 35 |
+
> **Figure 1.**
|
| 36 |
+
> *Subfigure I:* Three reasoning modes — direct inference (black), spurious reasoning (red), adaptive reasoning (green).
|
| 37 |
+
> *Subfigure II:* Logic-preserving variable perturbation and gold-answer generation via executable logic.
|
| 38 |
+
> *Subfigure III:* RLVR optimization encouraging adaptive reasoning through comparative feedback.
|
| 39 |
+
|
| 40 |
+
## 📈 Highlights
|
| 41 |
+
|
| 42 |
+
- 🚀 **+8.5 Average Improvement** across in-domain robustness tasks and out-of-domain tasks.
|
| 43 |
+
- 🧮 **Only 9K synthetic data** needed for significant gains.
|
| 44 |
+
- ⚖️ **Enable algebraic thinking** and improved stability under scaling.
|
| 45 |
+
- 🔁 **Generalizable framework** applicable to instruct models.
|
| 46 |
+
|
| 47 |
+
---
|