InfiX-ai
/

InfiR2-R1-7B-FP8-Preview

Safetensors

qwen2

fp8

Model card Files Files and versions

xet

Community

juezhi commited on 27 days ago

Commit

b832317

verified ·

1 Parent(s): fa4b212

Update README.md

Browse files

Files changed (1) hide show

README.md +155 -34

README.md CHANGED Viewed

@@ -2,73 +2,194 @@
 license: apache-2.0
 ---
 ## Introduction
-**InfiR2-R1-7B-FP8** is a model derived from the **InfiR2-7B-base-FP8**, obtained through Supervised Fine-Tuning (SFT) utilizing **FP8** and the **InfiAlign dataset**.
-## Model Download
-Download the InfiMed model from the Hugging Face Hub into the `./models` directory.
-```bash
-# Create a directory for models
-mkdir -p ./models
-# Download the R1 model
-huggingface-cli download --resume-download InfiX-ai/InfiR2-R1-7B-FP8 --local-dir ./models/InfiR2-R1-7B-FP8
-````
-## Quick Start
 ```python
 import torch
-from transformers import AutoModelForCausalLM, AutoTokenizer
 MODEL_NAME = "InfiX-ai/InfiR2-R1-7B-FP8"
 prompt_text = "Briefly explain what a black hole is, and provide two interesting facts."
-MAX_NEW_TOKENS = 256
-TEMPERATURE = 0.8
-DO_SAMPLE = True
-tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
-device = "cuda" if torch.cuda.is_available() else "cpu"
-model = AutoModelForCausalLM.from_pretrained(
-    MODEL_NAME,
-    torch_dtype=torch.bfloat16 if device == "cuda" else None
-).to(device)
 messages = [
     {"role": "user", "content": prompt_text}
 ]
-input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to(device)
-with torch.no_grad():
-    output_ids = model.generate(
-        input_ids,
-        max_new_tokens=MAX_NEW_TOKENS,
-        temperature=TEMPERATURE,
-        do_sample=DO_SAMPLE,
-        pad_token_id=tokenizer.eos_token_id
-    )
-generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
-response_start_index = generated_text.rfind(prompt_text) + len(prompt_text)
-llm_response = generated_text[response_start_index:].strip()
 print("\n" + "="*70)
 print(f"Prompt: \n{prompt_text}")
 print("-" * 70)
 print(f"(LLM Response): \n{llm_response}")
 print("="*70)
 ```
-## Acknowledgements
   * We would like to express our gratitude for the following open-source projects: [Slime](https://github.com/THUDM/slime), [Megatron](https://github.com/NVIDIA/Megatron-LM), [TransformerEngine](https://github.com/NVIDIA/TransformerEngine) and [Qwen2.5](https://github.com/QwenLM/Qwen2.5-Math).
-## Citation
 If you find our work useful, please cite:

 license: apache-2.0
 ---
+# InfiR2-R1-7B-FP8
+<p align="center">
+  <a href="https://arxiv.org/abs/2509.22536">📄 Paper</a> &nbsp; | &nbsp;
+  <a href="https://github.com/InfiXAI/InfiR2">🐙 Github</a> &nbsp; |
+  <a href="https://infix-ai.com/research/infir2/">🌐 Project Website</a> &nbsp;
+</p>
 ## Introduction
+We performed **Reinforcement Learning (RL)** fine-tuning on the **InfiR2-7B-Instruct-FP8** model in two stages using the **dapo-math-17k** and the **FP8 format**, with hyperparameters shown below.
+<div align="center">
+| Parameter | Value |
+| :---: | :---: |
+| **Batch Size (train\_prompt\_bsz)** | 128 |
+| **N Samples Per Prompt** | 16 |
+| **Global Batch Size** | 2048 |
+| **Maximum Response Length** | 16384 |
+| **Rollout Temperature** | 1.1 |
+| **Learning Rate (LR)** | 1e-6 |
+| **Weight Decay** | 0.1 |
+| **Eps Clip** | 0.2 |
+| **KL Loss Coefficient** | 0.00 |
+</div>
+The resulting model is the **InfiR2-R1-7B-FP8**.
+**Training Recipe**:
+<p align="center">
+    <img src="fp8_recipe.png" width="100%"/>
+<p>
+- Stable and Reproducible Performance
+- Efficient and Low memory Training
+---
+## 🚀 InfiR2 Model Series
+The InfiR2 framework offers multiple variants model with different size and training strategy:
+- **1.5B**
+- [InfiR2-1.5B-base-FP8](https://huggingface.co/InfiX-ai/InfiR2-1.5B-base-FP8): *Continue pretrain on Qwen2.5-1.5B-base*
+- [InfiR2-1.5B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-1.5B-Instruct-FP8): *Supervised fine-tuning on InfiR2-1.5B-base-FP8 with [InfiAlign dataset](https://huggingface.co/papers/2508.05496)*
+- **7B**
+- [InfiR2-7B-base-FP8](https://huggingface.co/InfiX-ai/InfiR2-7B-base-FP8): *Continue pretrain on Qwen2.5-7B-base*
+- [InfiR2-7B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-7B-Instruct-FP8): *Supervised fine-tuning on InfiR2-7B-base-FP8 with [InfiAlign dataset](https://huggingface.co/papers/2508.05496)*
+- [InfiR2-R1-7B-FP8](https://huggingface.co/InfiX-ai/InfiR2-R1-7B-FP8): *Reinforcement learning on InfiR2-7B-Instruct-FP8 with dapo dataset*
+---
+## 📊 Model Performance
+Below is the performance comparison of **InfiR2-R1-7B-FP8** on reasoning benchmarks. Note: 'w. InfiAlign' denotes Supervised Fine-Tuning (SFT) using the InfiAlign dataset.
+<div align="center">
+<table>
+  <thead>
+    <tr>
+      <th align="left">Model</th>
+      <th align="center">AIME 25</th>
+      <th align="center">AIME 24</th>
+      <th align="center">GPQA</th>
+      <th align="center">LiveCodeBench v5</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td align="left"><strong>Deepseek-Distill-Qwen-7B</strong></td>
+      <td align="center">43.00</td>
+      <td align="center">49.00</td>
+      <td align="center">48.20</td>
+      <td align="center">37.60</td>
+    </tr>
+    <tr>
+      <td align="left"><strong>Qwen2.5-7B-base (w. InfiAlign)</strong></td>
+      <td align="center">33.75</td>
+      <td align="center">43.02</td>
+      <td align="center">48.11</td>
+      <td align="center">39.48</td>
+    </tr>
+    <tr>
+      <td align="left"><strong>InfiR2-R1-7B-FP8</strong></td>
+      <td align="center">40.62</td>
+      <td align="center">55.73</td>
+      <td align="center">45.33</td>
+      <td align="center">40.31</td>
+    </tr>
+    </tr>
+  </tbody>
+</table>
+</div>
+---
+## 🎭 Quick Start
 ```python
+from vllm import LLM, SamplingParams
 import torch
+import os
 MODEL_NAME = "InfiX-ai/InfiR2-R1-7B-FP8"
 prompt_text = "Briefly explain what a black hole is, and provide two interesting facts."
+MAX_NEW_TOKENS = 256
+TEMPERATURE = 0.8
+DO_SAMPLE = True
+llm = LLM(
+    model=MODEL_NAME,
+    dtype="auto",
+)
+sampling_params = SamplingParams(
+    n=1,
+    temperature=TEMPERATURE,
+    max_tokens=MAX_NEW_TOKENS,
+)
+tokenizer = llm.get_tokenizer()
 messages = [
     {"role": "user", "content": prompt_text}
 ]
+prompt_formatted = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+outputs = llm.generate(
+    prompt_formatted,
+    sampling_params
+)
+generated_text = outputs[0].outputs[0].text
+llm_response = generated_text.strip()
 print("\n" + "="*70)
 print(f"Prompt: \n{prompt_text}")
 print("-" * 70)
 print(f"(LLM Response): \n{llm_response}")
 print("="*70)
+````
+-----
+## 📚 Model Download
+```bash
+# Create a directory for models
+mkdir -p ./models
+# Download InfiR2-R1-7B-FP8 model
+huggingface-cli download --resume-download InfiX-ai/InfiR2-R1-7B-FP8 --local-dir ./models/InfiR2-R1-7B-FP8
 ```
+-----
+## 🎯 Intended Uses
+### ✅ Direct Use
+This model is intended for research and commercial use. Example use cases include:
+  - Instruction following
+  - Mathematical reasoning
+  - Code generation
+  - General reasoning
+### ❌ Out-of-Scope Use
+The model should **not** be used for:
+  - Generating harmful, offensive, or inappropriate content
+  - Creating misleading information
+-----
+## 🙏 Acknowledgements
   * We would like to express our gratitude for the following open-source projects: [Slime](https://github.com/THUDM/slime), [Megatron](https://github.com/NVIDIA/Megatron-LM), [TransformerEngine](https://github.com/NVIDIA/TransformerEngine) and [Qwen2.5](https://github.com/QwenLM/Qwen2.5-Math).
+-----
+## 📌 Citation
 If you find our work useful, please cite: