P1: Mastering Physics Olympiads with Reinforcement Learning

🌐 P1 Project Page | πŸ† HiPhO Leaderboard

High-performance mid-scale model for physics reasoning

Model Description

P1-30B-A3B is the mid-size variant of the P1 series, a high-performance open-source language model specialized in physics reasoning. Built on Qwen3-30B-A3B-Thinking-2507 and refined through multi-stage reinforcement learning on curated physics competition data, P1-30B-A3B achieves impressive results while maintaining reasonable computational requirements, making it accessible for researchers working with physics problems.

Key Highlights

  • πŸ₯ˆ IPhO 2025 Silver-tier Performance: Strong competitive showing at international physics olympiad (18.5/30 points)
  • πŸ₯‡ HiPhO Excellence: 8 gold medals, 4 silver medals, and 1 bronze medal across 13 physics contests

Performance Benchmarks

IPhO 2025 Results

Model Score Medal
P1-30B-A3B 18.5 πŸ₯ˆ Silver
DeepSeek-R1 18.5 πŸ₯ˆ Silver
Qwen3-235B-A22B-Thinking-2507 17.1 πŸ₯ˆ Silver
Qwen3-30B-A3B-Thinking-2507 15.6 πŸ₯ˆ Silver

HiPhO Comprehensive Results

Category P1-30B-A3B Qwen3-235B-A22B DeepSeek-R1 Qwen3-30B-A3B (Base)
Overall Score 32.5 33.5 32.9 29.9
Gold Medals (πŸ₯‡) 8 10 9 6
Silver Medals (πŸ₯ˆ) 4 3 3 6
Bronze Medals (πŸ₯‰) 1 0 1 1
Total Contests 13 13 13 13

Generalization to STEM Tasks

Beyond physics reasoning, P1 improves across multiple domains. As shown below, P1-30B-A3B outperforms its base model Qwen3-30B-A3B-Thinking-2507 on math, coding, and STEM benchmarks, demonstrating strong generalization of physics reasoning.

Model AIME24 AIME25 HMMT GPQA HLE LiveCodeBench LiveBench
Qwen3-30B-A3B-Thinking-2507 (Base) 90.4 85.0 71.3 73.0 11.6 66.7 76.6
P1-30B-A3B 91.0 91.0 76.9 74.4 14.3 68.1 77.0

Usage

Basic Inference

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model_name = "P1-30B-A3B"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

# Physics problem solving
prompt = """Solve this physics problem:

A pendulum of length L = 1.0 m swings with small amplitude.
Calculate the period of oscillation and explain your reasoning.

Use g = 9.8 m/sΒ²"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_length=81920,
    temperature=0.6,
    top_p=0.9,
    do_sample=True
)

solution = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(solution)

πŸ™ Acknowledgements

We are grateful to the open-source community for their invaluable contributions. Special thanks to:

  • Qwen3 - for providing the foundational base models that powered our research
  • slime - for their innovative work on efficient reinforcement learning framework that powered our training pipeline
  • verl - for the versatile reinforcement learning framework that enabled our training pipeline
  • sglang - for the efficient LLM serving and inference infrastructure
  • Megatron-LM - for the large-scale model training framework

Citation

@misc{p1-2025,
    title={P1: Mastering Physics Olympiads with Reinforcement Learning},
    author={P1 Team},
    year={2025},
    url={https://prime-rl.github.io/P1/}
}
Downloads last month
81
Safetensors
Model size
31B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for PRIME-RL/P1-30B-A3B

Quantizations
2 models