Safetensors
qwen2
fp8

InfiR2-R1-7B-FP8-Preview

  πŸ“„ Paper   |   πŸ™ Github   |   🌐 Project Website  

We performed multi-stage FP8 Reinforcement Learning (RL). More experimental details will be released soon. Stay tuned!

πŸš€ InfiR2 Model Series

The InfiR2 framework offers multiple variants model with different size and training strategy:

Training Recipe:

  • Stable and Reproducible Performance
  • Efficient and Low memory Training

πŸ“Š Hyperparameters & Model Performance

Training hyperparameters:

Parameter Value
Batch Size 128
N Samples Per Prompt 16
Global Batch Size 2048
Maximum Response Length 16384
Rollout Temperature 1.1
Learning Rate 1e-6
Weight Decay 0.1
Eps Clip 0.2
KL Loss Coefficient 0.00

Below is the performance comparison of InfiR2-R1-7B-FP8-Preview on reasoning benchmarks.

Model AIME 25 AIME 24 GPQA LiveCodeBench v5
Deepseek-Distill-Qwen-7B 43.00 49.00 48.20 37.60
InfiR2-R1-7B-FP8-Preview 53.64 60.62 49.18 39.36

🎭 Quick Start

from vllm import LLM, SamplingParams
import torch
import os

MODEL_NAME = "InfiX-ai/InfiR2-R1-7B-FP8-Preview"

prompt_text = "Briefly explain what a black hole is, and provide two interesting facts."

MAX_NEW_TOKENS = 256
TEMPERATURE = 0.8
DO_SAMPLE = True

llm = LLM(
    model=MODEL_NAME, 
    dtype="auto", 
)

sampling_params = SamplingParams(
    n=1,
    temperature=TEMPERATURE,
    max_tokens=MAX_NEW_TOKENS,
)

tokenizer = llm.get_tokenizer()
messages = [
    {"role": "user", "content": prompt_text}
]
prompt_formatted = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

outputs = llm.generate(
    prompt_formatted, 
    sampling_params
)

generated_text = outputs[0].outputs[0].text

llm_response = generated_text.strip()

print("\n" + "="*70)
print(f"Prompt: \n{prompt_text}")
print("-" * 70)
print(f"(LLM Response): \n{llm_response}")
print("="*70)

πŸ“š Model Download

# Create a directory for models
mkdir -p ./models
# Download InfiR2-R1-7B-FP8-Preview model
huggingface-cli download --resume-download InfiX-ai/InfiR2-R1-7B-FP8-Preview --local-dir ./models/InfiR2-R1-7B-FP8-Preview

🎯 Intended Uses

βœ… Direct Use

This model is intended for research and commercial use. Example use cases include:

  • Instruction following
  • Mathematical reasoning
  • Code generation
  • General reasoning

❌ Out-of-Scope Use

The model should not be used for:

  • Generating harmful, offensive, or inappropriate content
  • Creating misleading information

πŸ™ Acknowledgements

πŸ“Œ Citation

If you find our work useful, please cite:

@misc{wang2025infir2comprehensivefp8training,
      title={InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models}, 
      author={Wenjun Wang and Shuo Cai and Congkai Xie and Mingfa Feng and Yiming Zhang and Zhen Li and Kejing Yang and Ming Li and Jiannong Cao and Hongxia Yang},
      year={2025},
      eprint={2509.22536},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={[https://arxiv.org/abs/2509.22536](https://arxiv.org/abs/2509.22536)}, 
}
Downloads last month
9
Safetensors
Model size
8B params
Tensor type
BF16
Β·
F16
Β·
F8_E4M3
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including InfiX-ai/InfiR2-R1-7B-FP8-Preview