P1: Mastering Physics Olympiads with Reinforcement Learning
π P1 Project Page | π HiPhO Leaderboard
 
High-performance mid-scale model for physics reasoning
Model Description
P1-30B-A3B is the mid-size variant of the P1 series, a high-performance open-source language model specialized in physics reasoning. Built on Qwen3-30B-A3B-Thinking-2507 and refined through multi-stage reinforcement learning on curated physics competition data, P1-30B-A3B achieves impressive results while maintaining reasonable computational requirements, making it accessible for researchers working with physics problems.
Key Highlights
- π₯ IPhO 2025 Silver-tier Performance: Strong competitive showing at international physics olympiad (18.5/30 points)
- π₯ HiPhO Excellence: 8 gold medals, 4 silver medals, and 1 bronze medal across 13 physics contests
Performance Benchmarks
IPhO 2025 Results
| Model | Score | Medal | 
|---|---|---|
| P1-30B-A3B | 18.5 | π₯ Silver | 
| DeepSeek-R1 | 18.5 | π₯ Silver | 
| Qwen3-235B-A22B-Thinking-2507 | 17.1 | π₯ Silver | 
| Qwen3-30B-A3B-Thinking-2507 | 15.6 | π₯ Silver | 
HiPhO Comprehensive Results
| Category | P1-30B-A3B | Qwen3-235B-A22B | DeepSeek-R1 | Qwen3-30B-A3B (Base) | 
|---|---|---|---|---|
| Overall Score | 32.5 | 33.5 | 32.9 | 29.9 | 
| Gold Medals (π₯) | 8 | 10 | 9 | 6 | 
| Silver Medals (π₯) | 4 | 3 | 3 | 6 | 
| Bronze Medals (π₯) | 1 | 0 | 1 | 1 | 
| Total Contests | 13 | 13 | 13 | 13 | 
Generalization to STEM Tasks
Beyond physics reasoning, P1 improves across multiple domains. As shown below, P1-30B-A3B outperforms its base model Qwen3-30B-A3B-Thinking-2507 on math, coding, and STEM benchmarks, demonstrating strong generalization of physics reasoning.
| Model | AIME24 | AIME25 | HMMT | GPQA | HLE | LiveCodeBench | LiveBench | 
|---|---|---|---|---|---|---|---|
| Qwen3-30B-A3B-Thinking-2507 (Base) | 90.4 | 85.0 | 71.3 | 73.0 | 11.6 | 66.7 | 76.6 | 
| P1-30B-A3B | 91.0 | 91.0 | 76.9 | 74.4 | 14.3 | 68.1 | 77.0 | 
Usage
Basic Inference
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load model and tokenizer
model_name = "P1-30B-A3B"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)
# Physics problem solving
prompt = """Solve this physics problem:
A pendulum of length L = 1.0 m swings with small amplitude.
Calculate the period of oscillation and explain your reasoning.
Use g = 9.8 m/sΒ²"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_length=81920,
    temperature=0.6,
    top_p=0.9,
    do_sample=True
)
solution = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(solution)
π Acknowledgements
We are grateful to the open-source community for their invaluable contributions. Special thanks to:
- Qwen3 - for providing the foundational base models that powered our research
- slime - for their innovative work on efficient reinforcement learning framework that powered our training pipeline
- verl - for the versatile reinforcement learning framework that enabled our training pipeline
- sglang - for the efficient LLM serving and inference infrastructure
- Megatron-LM - for the large-scale model training framework
Citation
@misc{p1-2025,
    title={P1: Mastering Physics Olympiads with Reinforcement Learning},
    author={P1 Team},
    year={2025},
    url={https://prime-rl.github.io/P1/}
}
- Downloads last month
- 81
