File size: 4,738 Bytes

7db28fc
84fd5c6
 
 
 
 
 
 
 
 
 
7db28fc
 
 
 
84fd5c6
 
 
7db28fc
84fd5c6
 
 
 
7db28fc
84fd5c6
 
 
7db28fc
84fd5c6
5dabef4
84fd5c6
7db28fc
84fd5c6
7db28fc
5dabef4
7db28fc
84fd5c6
7db28fc
84fd5c6
 
7db28fc
 
84fd5c6
7db28fc
84fd5c6
7db28fc
84fd5c6
7db28fc
84fd5c6
 
 
 
 
 
 
7db28fc
84fd5c6
7db28fc
84fd5c6
7db28fc
b391525
 
 
 
 
 
 
7db28fc
84fd5c6
7db28fc
a05e26c
 
 
 
 
 
 
 
 
 
 
 
 
7db28fc
84fd5c6
7db28fc
84fd5c6
7db28fc
84fd5c6
 
 
7db28fc
84fd5c6
 
 
 
 
 
 
 
 
7db28fc
84fd5c6
 
7db28fc
84fd5c6
 
7db28fc
84fd5c6
7db28fc
84fd5c6
 
 
 
 
 
 
 
7db28fc
84fd5c6
 
7db28fc
84fd5c6
5ffe26a
 
 
 
 
 
 
 
 
 
84fd5c6
 
 
 
 
 
 
 
7db28fc
84fd5c6
 
b391525

---
language:
- en
- multilingual
tags:
- physics
- reinforcement-learning
- olympiad
- reasoning
- competition
- education
license: apache-2.0
pipeline_tag: text-generation
---

<div align="center">
  <h1 style="font-size: 2em; font-weight: bold;">P1: Mastering Physics Olympiads with Reinforcement Learning</h1>
</div>

<p align="center">
  <a href="https://prime-rl.github.io/P1/"><b>🌐 P1 Project Page</b></a> |
  <a href="https://phyarena.github.io/"><b>🏆 HiPhO Leaderboard</b></a>
</p>

<p align="center">
<img src="https://raw.githubusercontent.com/PRIME-RL/P1/main/docs/imgs/Score_IPhO_2025_P1_v2.jpg" style="width: 800px" align=center>
</p>

<p align="center">
<i>High-performance mid-scale model for physics reasoning</i>       
</p>

## Model Description

**P1-30B-A3B** is the mid-size variant of the P1 series, a high-performance open-source language model specialized in physics reasoning. Built on *Qwen3-30B-A3B-Thinking-2507* and refined through multi-stage reinforcement learning on curated physics competition data, P1-30B-A3B achieves impressive results while maintaining reasonable computational requirements, making it accessible for researchers working with physics problems.

### Key Highlights

- 🥈 **IPhO 2025 Silver-tier Performance**: Strong competitive showing at international physics olympiad (18.5/30 points)
- 🥇 **HiPhO Excellence**: 8 gold medals, 4 silver medals, and 1 bronze medal across 13 physics contests


## Performance Benchmarks

### IPhO 2025 Results

<div align="center">

| Model | Score | Medal |
|:-----:|:-----:|:-----:|
| **P1-30B-A3B** | **18.5** | **🥈 Silver** |
| DeepSeek-R1 | 18.5 | **🥈 Silver** |
| Qwen3-235B-A22B-Thinking-2507 | 17.1 | **🥈 Silver** |
| Qwen3-30B-A3B-Thinking-2507 | 15.6 | **🥈 Silver** |
</div>

### HiPhO Comprehensive Results

<div align="center">

| Category | P1-30B-A3B | Qwen3-235B-A22B | DeepSeek-R1 | Qwen3-30B-A3B (Base) |
|:--------:|:----------:|:---------------:|:-----------:|:--------------------:|
| **Overall Score** | **32.5** | 33.5 | 32.9 | 29.9 |
| Gold Medals (🥇) | 8 | 10 | 9 | 6 |
| Silver Medals (🥈) | 4 | 3 | 3 | 6 |
| Bronze Medals (🥉) | 1 | 0 | 1 | 1 |
| Total Contests | 13 | 13 | 13 | 13 |

</div>

### Generalization to STEM Tasks

Beyond physics reasoning, P1 improves across multiple domains. As shown below, P1-30B-A3B outperforms its base model Qwen3-30B-A3B-Thinking-2507 on math, coding, and STEM benchmarks, demonstrating strong generalization of physics reasoning.

<div align="center">

| Model | AIME24 | AIME25 | HMMT | GPQA | HLE | LiveCodeBench | LiveBench |
|:-----:|:------:|:------:|:----:|:----:|:---:|:-------------:|:---------:|
| Qwen3-30B-A3B-Thinking-2507 (Base) | 90.4 | 85.0 | 71.3 | 73.0 | 11.6 | 66.7 | 76.6 |
| **P1-30B-A3B** | **91.0** | **91.0** | **76.9** | **74.4** | **14.3** | **68.1** | **77.0** |

</div>


## Usage

### Basic Inference

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model_name = "P1-30B-A3B"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

# Physics problem solving
prompt = """Solve this physics problem:

A pendulum of length L = 1.0 m swings with small amplitude.
Calculate the period of oscillation and explain your reasoning.

Use g = 9.8 m/s²"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_length=81920,
    temperature=0.6,
    top_p=0.9,
    do_sample=True
)

solution = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(solution)
```

## 🙏 Acknowledgements

We are grateful to the open-source community for their invaluable contributions. Special thanks to:

- **[Qwen3](https://huggingface.co/collections/Qwen/qwen3)** - for providing the foundational base models that powered our research
- **[slime](https://github.com/THUDM/slime)** - for their innovative work on efficient reinforcement learning framework that powered our training pipeline
- **[verl](https://github.com/volcengine/verl)** - for the versatile reinforcement learning framework that enabled our training pipeline
- **[sglang](https://github.com/sgl-project/sglang)** - for the efficient LLM serving and inference infrastructure
- **[Megatron-LM](https://github.com/NVIDIA/Megatron-LM)** - for the large-scale model training framework

## Citation

```bibtex
@misc{p1-2025,
    title={P1: Mastering Physics Olympiads with Reinforcement Learning},
    author={P1 Team},
    year={2025},
    url={https://prime-rl.github.io/P1/}
}
```

</div>