Safetensors
qwen2
fp8
juezhi commited on
Commit
b832317
Β·
verified Β·
1 Parent(s): fa4b212

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +155 -34
README.md CHANGED
@@ -2,73 +2,194 @@
2
  license: apache-2.0
3
  ---
4
 
 
 
 
 
 
 
 
5
 
6
  ## Introduction
7
- **InfiR2-R1-7B-FP8** is a model derived from the **InfiR2-7B-base-FP8**, obtained through Supervised Fine-Tuning (SFT) utilizing **FP8** and the **InfiAlign dataset**.
8
 
9
- ## Model Download
10
- Download the InfiMed model from the Hugging Face Hub into the `./models` directory.
11
 
12
- ```bash
13
- # Create a directory for models
14
- mkdir -p ./models
15
- # Download the R1 model
16
- huggingface-cli download --resume-download InfiX-ai/InfiR2-R1-7B-FP8 --local-dir ./models/InfiR2-R1-7B-FP8
17
- ````
18
 
19
- ## Quick Start
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
  ```python
 
22
  import torch
23
- from transformers import AutoModelForCausalLM, AutoTokenizer
24
 
25
  MODEL_NAME = "InfiX-ai/InfiR2-R1-7B-FP8"
26
 
27
  prompt_text = "Briefly explain what a black hole is, and provide two interesting facts."
28
 
29
- MAX_NEW_TOKENS = 256
30
- TEMPERATURE = 0.8
31
- DO_SAMPLE = True
32
 
33
- tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
 
 
 
34
 
35
- device = "cuda" if torch.cuda.is_available() else "cpu"
36
- model = AutoModelForCausalLM.from_pretrained(
37
- MODEL_NAME,
38
- torch_dtype=torch.bfloat16 if device == "cuda" else None
39
- ).to(device)
40
 
 
41
  messages = [
42
  {"role": "user", "content": prompt_text}
43
  ]
44
- input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to(device)
45
 
46
- with torch.no_grad():
47
- output_ids = model.generate(
48
- input_ids,
49
- max_new_tokens=MAX_NEW_TOKENS,
50
- temperature=TEMPERATURE,
51
- do_sample=DO_SAMPLE,
52
- pad_token_id=tokenizer.eos_token_id
53
- )
54
 
55
- generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
56
 
57
- response_start_index = generated_text.rfind(prompt_text) + len(prompt_text)
58
- llm_response = generated_text[response_start_index:].strip()
59
 
60
  print("\n" + "="*70)
61
  print(f"Prompt: \n{prompt_text}")
62
  print("-" * 70)
63
  print(f"(LLM Response): \n{llm_response}")
64
  print("="*70)
 
 
 
 
 
 
 
 
 
 
 
65
  ```
66
 
67
- ## Acknowledgements
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
68
 
69
  * We would like to express our gratitude for the following open-source projects: [Slime](https://github.com/THUDM/slime), [Megatron](https://github.com/NVIDIA/Megatron-LM), [TransformerEngine](https://github.com/NVIDIA/TransformerEngine) and [Qwen2.5](https://github.com/QwenLM/Qwen2.5-Math).
70
 
71
- ## Citation
 
 
72
 
73
  If you find our work useful, please cite:
74
 
 
2
  license: apache-2.0
3
  ---
4
 
5
+ # InfiR2-R1-7B-FP8
6
+
7
+ <p align="center">
8
+ Β  <a href="https://arxiv.org/abs/2509.22536">πŸ“„ Paper</a> &nbsp; | &nbsp;
9
+ <a href="https://github.com/InfiXAI/InfiR2">πŸ™ Github</a> &nbsp; |
10
+   <a href="https://infix-ai.com/research/infir2/">🌐 Project Website</a> &nbsp;
11
+ </p>
12
 
13
  ## Introduction
14
+ We performed **Reinforcement Learning (RL)** fine-tuning on the **InfiR2-7B-Instruct-FP8** model in two stages using the **dapo-math-17k** and the **FP8 format**, with hyperparameters shown below.
15
 
16
+ <div align="center">
 
17
 
 
 
 
 
 
 
18
 
19
+ | Parameter | Value |
20
+ | :---: | :---: |
21
+ | **Batch Size (train\_prompt\_bsz)** | 128 |
22
+ | **N Samples Per Prompt** | 16 |
23
+ | **Global Batch Size** | 2048 |
24
+ | **Maximum Response Length** | 16384 |
25
+ | **Rollout Temperature** | 1.1 |
26
+ | **Learning Rate (LR)** | 1e-6 |
27
+ | **Weight Decay** | 0.1 |
28
+ | **Eps Clip** | 0.2 |
29
+ | **KL Loss Coefficient** | 0.00 |
30
+
31
+ </div>
32
+
33
+ The resulting model is the **InfiR2-R1-7B-FP8**.
34
+
35
+
36
+ **Training Recipe**:
37
+ <p align="center">
38
+ <img src="fp8_recipe.png" width="100%"/>
39
+ <p>
40
+
41
+ - Stable and Reproducible Performance
42
+ - Efficient and Low memory Training
43
+
44
+ ---
45
+
46
+ ## πŸš€ InfiR2 Model Series
47
+
48
+ The InfiR2 framework offers multiple variants model with different size and training strategy:
49
+
50
+ - **1.5B**
51
+ - [InfiR2-1.5B-base-FP8](https://huggingface.co/InfiX-ai/InfiR2-1.5B-base-FP8): *Continue pretrain on Qwen2.5-1.5B-base*
52
+ - [InfiR2-1.5B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-1.5B-Instruct-FP8): *Supervised fine-tuning on InfiR2-1.5B-base-FP8 with [InfiAlign dataset](https://huggingface.co/papers/2508.05496)*
53
+ - **7B**
54
+ - [InfiR2-7B-base-FP8](https://huggingface.co/InfiX-ai/InfiR2-7B-base-FP8): *Continue pretrain on Qwen2.5-7B-base*
55
+ - [InfiR2-7B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-7B-Instruct-FP8): *Supervised fine-tuning on InfiR2-7B-base-FP8 with [InfiAlign dataset](https://huggingface.co/papers/2508.05496)*
56
+ - [InfiR2-R1-7B-FP8](https://huggingface.co/InfiX-ai/InfiR2-R1-7B-FP8): *Reinforcement learning on InfiR2-7B-Instruct-FP8 with dapo dataset*
57
+
58
+ ---
59
+
60
+ ## πŸ“Š Model Performance
61
+ Below is the performance comparison of **InfiR2-R1-7B-FP8** on reasoning benchmarks. Note: 'w. InfiAlign' denotes Supervised Fine-Tuning (SFT) using the InfiAlign dataset.
62
+
63
+ <div align="center">
64
+
65
+ <table>
66
+ <thead>
67
+ <tr>
68
+ <th align="left">Model</th>
69
+ <th align="center">AIME 25</th>
70
+ <th align="center">AIME 24</th>
71
+ <th align="center">GPQA</th>
72
+ <th align="center">LiveCodeBench v5</th>
73
+ </tr>
74
+ </thead>
75
+ <tbody>
76
+ <tr>
77
+ <td align="left"><strong>Deepseek-Distill-Qwen-7B</strong></td>
78
+ <td align="center">43.00</td>
79
+ <td align="center">49.00</td>
80
+ <td align="center">48.20</td>
81
+ <td align="center">37.60</td>
82
+ </tr>
83
+ <tr>
84
+ <td align="left"><strong>Qwen2.5-7B-base (w. InfiAlign)</strong></td>
85
+ <td align="center">33.75</td>
86
+ <td align="center">43.02</td>
87
+ <td align="center">48.11</td>
88
+ <td align="center">39.48</td>
89
+ </tr>
90
+ <tr>
91
+ <td align="left"><strong>InfiR2-R1-7B-FP8</strong></td>
92
+ <td align="center">40.62</td>
93
+ <td align="center">55.73</td>
94
+ <td align="center">45.33</td>
95
+ <td align="center">40.31</td>
96
+ </tr>
97
+ </tr>
98
+ </tbody>
99
+ </table>
100
+
101
+ </div>
102
+
103
+ ---
104
+
105
+ ## 🎭 Quick Start
106
 
107
  ```python
108
+ from vllm import LLM, SamplingParams
109
  import torch
110
+ import os
111
 
112
  MODEL_NAME = "InfiX-ai/InfiR2-R1-7B-FP8"
113
 
114
  prompt_text = "Briefly explain what a black hole is, and provide two interesting facts."
115
 
116
+ MAX_NEW_TOKENS = 256
117
+ TEMPERATURE = 0.8
118
+ DO_SAMPLE = True
119
 
120
+ llm = LLM(
121
+ model=MODEL_NAME,
122
+ dtype="auto",
123
+ )
124
 
125
+ sampling_params = SamplingParams(
126
+ n=1,
127
+ temperature=TEMPERATURE,
128
+ max_tokens=MAX_NEW_TOKENS,
129
+ )
130
 
131
+ tokenizer = llm.get_tokenizer()
132
  messages = [
133
  {"role": "user", "content": prompt_text}
134
  ]
135
+ prompt_formatted = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
136
 
137
+ outputs = llm.generate(
138
+ prompt_formatted,
139
+ sampling_params
140
+ )
 
 
 
 
141
 
142
+ generated_text = outputs[0].outputs[0].text
143
 
144
+ llm_response = generated_text.strip()
 
145
 
146
  print("\n" + "="*70)
147
  print(f"Prompt: \n{prompt_text}")
148
  print("-" * 70)
149
  print(f"(LLM Response): \n{llm_response}")
150
  print("="*70)
151
+ ````
152
+
153
+ -----
154
+
155
+ ## πŸ“š Model Download
156
+
157
+ ```bash
158
+ # Create a directory for models
159
+ mkdir -p ./models
160
+ # Download InfiR2-R1-7B-FP8 model
161
+ huggingface-cli download --resume-download InfiX-ai/InfiR2-R1-7B-FP8 --local-dir ./models/InfiR2-R1-7B-FP8
162
  ```
163
 
164
+ -----
165
+
166
+ ## 🎯 Intended Uses
167
+
168
+ ### βœ… Direct Use
169
+
170
+ This model is intended for research and commercial use. Example use cases include:
171
+
172
+ - Instruction following
173
+ - Mathematical reasoning
174
+ - Code generation
175
+ - General reasoning
176
+
177
+ ### ❌ Out-of-Scope Use
178
+
179
+ The model should **not** be used for:
180
+
181
+ - Generating harmful, offensive, or inappropriate content
182
+ - Creating misleading information
183
+
184
+ -----
185
+
186
+ ## πŸ™ Acknowledgements
187
 
188
  * We would like to express our gratitude for the following open-source projects: [Slime](https://github.com/THUDM/slime), [Megatron](https://github.com/NVIDIA/Megatron-LM), [TransformerEngine](https://github.com/NVIDIA/TransformerEngine) and [Qwen2.5](https://github.com/QwenLM/Qwen2.5-Math).
189
 
190
+ -----
191
+
192
+ ## πŸ“Œ Citation
193
 
194
  If you find our work useful, please cite:
195