--- license: apache-2.0 --- # InfiR2-R1-7B-FP8-Preview

We performed multi-stage FP8 **Reinforcement Learning (RL)**. More experimental details will be released soon. Stay tuned! ## 🚀 InfiR2 Model Series The InfiR2 framework offers multiple variants model with different size and training strategy: - **1.5B** - [InfiR2-1.5B-base-FP8](https://huggingface.co/InfiX-ai/InfiR2-1.5B-base-FP8): *Continue pretrain on Qwen2.5-1.5B-base* - [InfiR2-1.5B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-1.5B-Instruct-FP8): *Supervised fine-tuning on InfiR2-1.5B-base-FP8 with [InfiAlign dataset](https://huggingface.co/papers/2508.05496)* - **7B** - [InfiR2-7B-base-FP8](https://huggingface.co/InfiX-ai/InfiR2-7B-base-FP8): *Continue pretrain on Qwen2.5-7B-base* - [InfiR2-7B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-7B-Instruct-FP8): *Supervised fine-tuning on InfiR2-7B-base-FP8 with [InfiAlign dataset](https://huggingface.co/papers/2508.05496)* - [InfiR2-R1-7B-FP8-Preview](https://huggingface.co/InfiX-ai/InfiR2-R1-7B-FP8-Preview): *Multi-stage FP8 Reinforcement Learning* **Training Recipe**:

- Stable and Reproducible Performance - Efficient and Low memory Training ## 📊 Hyperparameters & Model Performance **Training hyperparameters**:

| Parameter | Value | | :---: | :---: | | **Batch Size** | 128 | | **N Samples Per Prompt** | 16 | | **Global Batch Size** | 2048 | | **Maximum Response Length** | 16384 | | **Rollout Temperature** | 1.1 | | **Learning Rate** | 1e-6 | | **Weight Decay** | 0.1 | | **Eps Clip** | 0.2 | | **KL Loss Coefficient** | 0.00 |

Below is the performance comparison of **InfiR2-R1-7B-FP8-Preview** on reasoning benchmarks.

Model	AIME 25	AIME 24	GPQA	LiveCodeBench v5
Deepseek-Distill-Qwen-7B	43.00	49.00	48.20	37.60
InfiR2-R1-7B-FP8-Preview	53.64	60.62	49.18	39.36

## 🎭 Quick Start ```python from vllm import LLM, SamplingParams import torch import os MODEL_NAME = "InfiX-ai/InfiR2-R1-7B-FP8-Preview" prompt_text = "Briefly explain what a black hole is, and provide two interesting facts." MAX_NEW_TOKENS = 256 TEMPERATURE = 0.8 DO_SAMPLE = True llm = LLM( model=MODEL_NAME, dtype="auto", ) sampling_params = SamplingParams( n=1, temperature=TEMPERATURE, max_tokens=MAX_NEW_TOKENS, ) tokenizer = llm.get_tokenizer() messages = [ {"role": "user", "content": prompt_text} ] prompt_formatted = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) outputs = llm.generate( prompt_formatted, sampling_params ) generated_text = outputs[0].outputs[0].text llm_response = generated_text.strip() print("\n" + "="*70) print(f"Prompt: \n{prompt_text}") print("-" * 70) print(f"(LLM Response): \n{llm_response}") print("="*70) ```` ## 📚 Model Download ```bash # Create a directory for models mkdir -p ./models # Download InfiR2-R1-7B-FP8-Preview model huggingface-cli download --resume-download InfiX-ai/InfiR2-R1-7B-FP8-Preview --local-dir ./models/InfiR2-R1-7B-FP8-Preview ``` ## 🎯 Intended Uses ### ✅ Direct Use This model is intended for research and commercial use. Example use cases include: - Instruction following - Mathematical reasoning - Code generation - General reasoning ### ❌ Out-of-Scope Use The model should **not** be used for: - Generating harmful, offensive, or inappropriate content - Creating misleading information ## 🙏 Acknowledgements * We would like to express our gratitude for the following open-source projects: [Slime](https://github.com/THUDM/slime), [Megatron](https://github.com/NVIDIA/Megatron-LM), [TransformerEngine](https://github.com/NVIDIA/TransformerEngine) and [Qwen2.5](https://github.com/QwenLM/Qwen2.5-Math). ## 📌 Citation If you find our work useful, please cite: ```bibtex @misc{wang2025infir2comprehensivefp8training, title={InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models}, author={Wenjun Wang and Shuo Cai and Congkai Xie and Mingfa Feng and Yiming Zhang and Zhen Li and Kejing Yang and Ming Li and Jiannong Cao and Hongxia Yang}, year={2025}, eprint={2509.22536}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={[https://arxiv.org/abs/2509.22536](https://arxiv.org/abs/2509.22536)}, } ```