Update README.md
Browse files
README.md
CHANGED
|
@@ -2,73 +2,194 @@
|
|
| 2 |
license: apache-2.0
|
| 3 |
---
|
| 4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
|
| 6 |
## Introduction
|
| 7 |
-
**
|
| 8 |
|
| 9 |
-
|
| 10 |
-
Download the InfiMed model from the Hugging Face Hub into the `./models` directory.
|
| 11 |
|
| 12 |
-
```bash
|
| 13 |
-
# Create a directory for models
|
| 14 |
-
mkdir -p ./models
|
| 15 |
-
# Download the R1 model
|
| 16 |
-
huggingface-cli download --resume-download InfiX-ai/InfiR2-R1-7B-FP8 --local-dir ./models/InfiR2-R1-7B-FP8
|
| 17 |
-
````
|
| 18 |
|
| 19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
|
| 21 |
```python
|
|
|
|
| 22 |
import torch
|
| 23 |
-
|
| 24 |
|
| 25 |
MODEL_NAME = "InfiX-ai/InfiR2-R1-7B-FP8"
|
| 26 |
|
| 27 |
prompt_text = "Briefly explain what a black hole is, and provide two interesting facts."
|
| 28 |
|
| 29 |
-
MAX_NEW_TOKENS = 256
|
| 30 |
-
TEMPERATURE = 0.8
|
| 31 |
-
DO_SAMPLE = True
|
| 32 |
|
| 33 |
-
|
|
|
|
|
|
|
|
|
|
| 34 |
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
)
|
| 40 |
|
|
|
|
| 41 |
messages = [
|
| 42 |
{"role": "user", "content": prompt_text}
|
| 43 |
]
|
| 44 |
-
|
| 45 |
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
temperature=TEMPERATURE,
|
| 51 |
-
do_sample=DO_SAMPLE,
|
| 52 |
-
pad_token_id=tokenizer.eos_token_id
|
| 53 |
-
)
|
| 54 |
|
| 55 |
-
generated_text =
|
| 56 |
|
| 57 |
-
|
| 58 |
-
llm_response = generated_text[response_start_index:].strip()
|
| 59 |
|
| 60 |
print("\n" + "="*70)
|
| 61 |
print(f"Prompt: \n{prompt_text}")
|
| 62 |
print("-" * 70)
|
| 63 |
print(f"(LLM Response): \n{llm_response}")
|
| 64 |
print("="*70)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 65 |
```
|
| 66 |
|
| 67 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 68 |
|
| 69 |
* We would like to express our gratitude for the following open-source projects: [Slime](https://github.com/THUDM/slime), [Megatron](https://github.com/NVIDIA/Megatron-LM), [TransformerEngine](https://github.com/NVIDIA/TransformerEngine) and [Qwen2.5](https://github.com/QwenLM/Qwen2.5-Math).
|
| 70 |
|
| 71 |
-
|
|
|
|
|
|
|
| 72 |
|
| 73 |
If you find our work useful, please cite:
|
| 74 |
|
|
|
|
| 2 |
license: apache-2.0
|
| 3 |
---
|
| 4 |
|
| 5 |
+
# InfiR2-R1-7B-FP8
|
| 6 |
+
|
| 7 |
+
<p align="center">
|
| 8 |
+
Β <a href="https://arxiv.org/abs/2509.22536">π Paper</a> |
|
| 9 |
+
<a href="https://github.com/InfiXAI/InfiR2">π Github</a> |
|
| 10 |
+
Β <a href="https://infix-ai.com/research/infir2/">π Project Website</a>
|
| 11 |
+
</p>
|
| 12 |
|
| 13 |
## Introduction
|
| 14 |
+
We performed **Reinforcement Learning (RL)** fine-tuning on the **InfiR2-7B-Instruct-FP8** model in two stages using the **dapo-math-17k** and the **FP8 format**, with hyperparameters shown below.
|
| 15 |
|
| 16 |
+
<div align="center">
|
|
|
|
| 17 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
|
| 19 |
+
| Parameter | Value |
|
| 20 |
+
| :---: | :---: |
|
| 21 |
+
| **Batch Size (train\_prompt\_bsz)** | 128 |
|
| 22 |
+
| **N Samples Per Prompt** | 16 |
|
| 23 |
+
| **Global Batch Size** | 2048 |
|
| 24 |
+
| **Maximum Response Length** | 16384 |
|
| 25 |
+
| **Rollout Temperature** | 1.1 |
|
| 26 |
+
| **Learning Rate (LR)** | 1e-6 |
|
| 27 |
+
| **Weight Decay** | 0.1 |
|
| 28 |
+
| **Eps Clip** | 0.2 |
|
| 29 |
+
| **KL Loss Coefficient** | 0.00 |
|
| 30 |
+
|
| 31 |
+
</div>
|
| 32 |
+
|
| 33 |
+
The resulting model is the **InfiR2-R1-7B-FP8**.
|
| 34 |
+
|
| 35 |
+
|
| 36 |
+
**Training Recipe**:
|
| 37 |
+
<p align="center">
|
| 38 |
+
<img src="fp8_recipe.png" width="100%"/>
|
| 39 |
+
<p>
|
| 40 |
+
|
| 41 |
+
- Stable and Reproducible Performance
|
| 42 |
+
- Efficient and Low memory Training
|
| 43 |
+
|
| 44 |
+
---
|
| 45 |
+
|
| 46 |
+
## π InfiR2 Model Series
|
| 47 |
+
|
| 48 |
+
The InfiR2 framework offers multiple variants model with different size and training strategy:
|
| 49 |
+
|
| 50 |
+
- **1.5B**
|
| 51 |
+
- [InfiR2-1.5B-base-FP8](https://huggingface.co/InfiX-ai/InfiR2-1.5B-base-FP8): *Continue pretrain on Qwen2.5-1.5B-base*
|
| 52 |
+
- [InfiR2-1.5B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-1.5B-Instruct-FP8): *Supervised fine-tuning on InfiR2-1.5B-base-FP8 with [InfiAlign dataset](https://huggingface.co/papers/2508.05496)*
|
| 53 |
+
- **7B**
|
| 54 |
+
- [InfiR2-7B-base-FP8](https://huggingface.co/InfiX-ai/InfiR2-7B-base-FP8): *Continue pretrain on Qwen2.5-7B-base*
|
| 55 |
+
- [InfiR2-7B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-7B-Instruct-FP8): *Supervised fine-tuning on InfiR2-7B-base-FP8 with [InfiAlign dataset](https://huggingface.co/papers/2508.05496)*
|
| 56 |
+
- [InfiR2-R1-7B-FP8](https://huggingface.co/InfiX-ai/InfiR2-R1-7B-FP8): *Reinforcement learning on InfiR2-7B-Instruct-FP8 with dapo dataset*
|
| 57 |
+
|
| 58 |
+
---
|
| 59 |
+
|
| 60 |
+
## π Model Performance
|
| 61 |
+
Below is the performance comparison of **InfiR2-R1-7B-FP8** on reasoning benchmarks. Note: 'w. InfiAlign' denotes Supervised Fine-Tuning (SFT) using the InfiAlign dataset.
|
| 62 |
+
|
| 63 |
+
<div align="center">
|
| 64 |
+
|
| 65 |
+
<table>
|
| 66 |
+
<thead>
|
| 67 |
+
<tr>
|
| 68 |
+
<th align="left">Model</th>
|
| 69 |
+
<th align="center">AIME 25</th>
|
| 70 |
+
<th align="center">AIME 24</th>
|
| 71 |
+
<th align="center">GPQA</th>
|
| 72 |
+
<th align="center">LiveCodeBench v5</th>
|
| 73 |
+
</tr>
|
| 74 |
+
</thead>
|
| 75 |
+
<tbody>
|
| 76 |
+
<tr>
|
| 77 |
+
<td align="left"><strong>Deepseek-Distill-Qwen-7B</strong></td>
|
| 78 |
+
<td align="center">43.00</td>
|
| 79 |
+
<td align="center">49.00</td>
|
| 80 |
+
<td align="center">48.20</td>
|
| 81 |
+
<td align="center">37.60</td>
|
| 82 |
+
</tr>
|
| 83 |
+
<tr>
|
| 84 |
+
<td align="left"><strong>Qwen2.5-7B-base (w. InfiAlign)</strong></td>
|
| 85 |
+
<td align="center">33.75</td>
|
| 86 |
+
<td align="center">43.02</td>
|
| 87 |
+
<td align="center">48.11</td>
|
| 88 |
+
<td align="center">39.48</td>
|
| 89 |
+
</tr>
|
| 90 |
+
<tr>
|
| 91 |
+
<td align="left"><strong>InfiR2-R1-7B-FP8</strong></td>
|
| 92 |
+
<td align="center">40.62</td>
|
| 93 |
+
<td align="center">55.73</td>
|
| 94 |
+
<td align="center">45.33</td>
|
| 95 |
+
<td align="center">40.31</td>
|
| 96 |
+
</tr>
|
| 97 |
+
</tr>
|
| 98 |
+
</tbody>
|
| 99 |
+
</table>
|
| 100 |
+
|
| 101 |
+
</div>
|
| 102 |
+
|
| 103 |
+
---
|
| 104 |
+
|
| 105 |
+
## π Quick Start
|
| 106 |
|
| 107 |
```python
|
| 108 |
+
from vllm import LLM, SamplingParams
|
| 109 |
import torch
|
| 110 |
+
import os
|
| 111 |
|
| 112 |
MODEL_NAME = "InfiX-ai/InfiR2-R1-7B-FP8"
|
| 113 |
|
| 114 |
prompt_text = "Briefly explain what a black hole is, and provide two interesting facts."
|
| 115 |
|
| 116 |
+
MAX_NEW_TOKENS = 256
|
| 117 |
+
TEMPERATURE = 0.8
|
| 118 |
+
DO_SAMPLE = True
|
| 119 |
|
| 120 |
+
llm = LLM(
|
| 121 |
+
model=MODEL_NAME,
|
| 122 |
+
dtype="auto",
|
| 123 |
+
)
|
| 124 |
|
| 125 |
+
sampling_params = SamplingParams(
|
| 126 |
+
n=1,
|
| 127 |
+
temperature=TEMPERATURE,
|
| 128 |
+
max_tokens=MAX_NEW_TOKENS,
|
| 129 |
+
)
|
| 130 |
|
| 131 |
+
tokenizer = llm.get_tokenizer()
|
| 132 |
messages = [
|
| 133 |
{"role": "user", "content": prompt_text}
|
| 134 |
]
|
| 135 |
+
prompt_formatted = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
| 136 |
|
| 137 |
+
outputs = llm.generate(
|
| 138 |
+
prompt_formatted,
|
| 139 |
+
sampling_params
|
| 140 |
+
)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 141 |
|
| 142 |
+
generated_text = outputs[0].outputs[0].text
|
| 143 |
|
| 144 |
+
llm_response = generated_text.strip()
|
|
|
|
| 145 |
|
| 146 |
print("\n" + "="*70)
|
| 147 |
print(f"Prompt: \n{prompt_text}")
|
| 148 |
print("-" * 70)
|
| 149 |
print(f"(LLM Response): \n{llm_response}")
|
| 150 |
print("="*70)
|
| 151 |
+
````
|
| 152 |
+
|
| 153 |
+
-----
|
| 154 |
+
|
| 155 |
+
## π Model Download
|
| 156 |
+
|
| 157 |
+
```bash
|
| 158 |
+
# Create a directory for models
|
| 159 |
+
mkdir -p ./models
|
| 160 |
+
# Download InfiR2-R1-7B-FP8 model
|
| 161 |
+
huggingface-cli download --resume-download InfiX-ai/InfiR2-R1-7B-FP8 --local-dir ./models/InfiR2-R1-7B-FP8
|
| 162 |
```
|
| 163 |
|
| 164 |
+
-----
|
| 165 |
+
|
| 166 |
+
## π― Intended Uses
|
| 167 |
+
|
| 168 |
+
### β
Direct Use
|
| 169 |
+
|
| 170 |
+
This model is intended for research and commercial use. Example use cases include:
|
| 171 |
+
|
| 172 |
+
- Instruction following
|
| 173 |
+
- Mathematical reasoning
|
| 174 |
+
- Code generation
|
| 175 |
+
- General reasoning
|
| 176 |
+
|
| 177 |
+
### β Out-of-Scope Use
|
| 178 |
+
|
| 179 |
+
The model should **not** be used for:
|
| 180 |
+
|
| 181 |
+
- Generating harmful, offensive, or inappropriate content
|
| 182 |
+
- Creating misleading information
|
| 183 |
+
|
| 184 |
+
-----
|
| 185 |
+
|
| 186 |
+
## π Acknowledgements
|
| 187 |
|
| 188 |
* We would like to express our gratitude for the following open-source projects: [Slime](https://github.com/THUDM/slime), [Megatron](https://github.com/NVIDIA/Megatron-LM), [TransformerEngine](https://github.com/NVIDIA/TransformerEngine) and [Qwen2.5](https://github.com/QwenLM/Qwen2.5-Math).
|
| 189 |
|
| 190 |
+
-----
|
| 191 |
+
|
| 192 |
+
## π Citation
|
| 193 |
|
| 194 |
If you find our work useful, please cite:
|
| 195 |
|