|
|
--- |
|
|
language: |
|
|
- en |
|
|
- multilingual |
|
|
tags: |
|
|
- physics |
|
|
- reinforcement-learning |
|
|
- olympiad |
|
|
- reasoning |
|
|
- competition |
|
|
- education |
|
|
license: apache-2.0 |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
<div align="center"> |
|
|
<h1 style="font-size: 2em; font-weight: bold;">P1: Mastering Physics Olympiads with Reinforcement Learning</h1> |
|
|
</div> |
|
|
|
|
|
<p align="center"> |
|
|
<a href="https://prime-rl.github.io/P1/"><b>π P1 Project Page</b></a> | |
|
|
<a href="https://phyarena.github.io/"><b>π HiPhO Leaderboard</b></a> |
|
|
</p> |
|
|
|
|
|
<p align="center"> |
|
|
<img src="https://raw.githubusercontent.com/PRIME-RL/P1/main/docs/imgs/Score_IPhO_2025_P1_v2.jpg" style="width: 800px" align=center> |
|
|
</p> |
|
|
|
|
|
<p align="center"> |
|
|
<i>High-performance mid-scale model for physics reasoning</i> |
|
|
</p> |
|
|
|
|
|
## Model Description |
|
|
|
|
|
**P1-30B-A3B** is the mid-size variant of the P1 series, a high-performance open-source language model specialized in physics reasoning. Built on *Qwen3-30B-A3B-Thinking-2507* and refined through multi-stage reinforcement learning on curated physics competition data, P1-30B-A3B achieves impressive results while maintaining reasonable computational requirements, making it accessible for researchers working with physics problems. |
|
|
|
|
|
### Key Highlights |
|
|
|
|
|
- π₯ **IPhO 2025 Silver-tier Performance**: Strong competitive showing at international physics olympiad (18.5/30 points) |
|
|
- π₯ **HiPhO Excellence**: 8 gold medals, 4 silver medals, and 1 bronze medal across 13 physics contests |
|
|
|
|
|
|
|
|
## Performance Benchmarks |
|
|
|
|
|
### IPhO 2025 Results |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
| Model | Score | Medal | |
|
|
|:-----:|:-----:|:-----:| |
|
|
| **P1-30B-A3B** | **18.5** | **π₯ Silver** | |
|
|
| DeepSeek-R1 | 18.5 | **π₯ Silver** | |
|
|
| Qwen3-235B-A22B-Thinking-2507 | 17.1 | **π₯ Silver** | |
|
|
| Qwen3-30B-A3B-Thinking-2507 | 15.6 | **π₯ Silver** | |
|
|
</div> |
|
|
|
|
|
### HiPhO Comprehensive Results |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
| Category | P1-30B-A3B | Qwen3-235B-A22B | DeepSeek-R1 | Qwen3-30B-A3B (Base) | |
|
|
|:--------:|:----------:|:---------------:|:-----------:|:--------------------:| |
|
|
| **Overall Score** | **32.5** | 33.5 | 32.9 | 29.9 | |
|
|
| Gold Medals (π₯) | 8 | 10 | 9 | 6 | |
|
|
| Silver Medals (π₯) | 4 | 3 | 3 | 6 | |
|
|
| Bronze Medals (π₯) | 1 | 0 | 1 | 1 | |
|
|
| Total Contests | 13 | 13 | 13 | 13 | |
|
|
|
|
|
</div> |
|
|
|
|
|
### Generalization to STEM Tasks |
|
|
|
|
|
Beyond physics reasoning, P1 improves across multiple domains. As shown below, P1-30B-A3B outperforms its base model Qwen3-30B-A3B-Thinking-2507 on math, coding, and STEM benchmarks, demonstrating strong generalization of physics reasoning. |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
| Model | AIME24 | AIME25 | HMMT | GPQA | HLE | LiveCodeBench | LiveBench | |
|
|
|:-----:|:------:|:------:|:----:|:----:|:---:|:-------------:|:---------:| |
|
|
| Qwen3-30B-A3B-Thinking-2507 (Base) | 90.4 | 85.0 | 71.3 | 73.0 | 11.6 | 66.7 | 76.6 | |
|
|
| **P1-30B-A3B** | **91.0** | **91.0** | **76.9** | **74.4** | **14.3** | **68.1** | **77.0** | |
|
|
|
|
|
</div> |
|
|
|
|
|
|
|
|
## Usage |
|
|
|
|
|
### Basic Inference |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
import torch |
|
|
|
|
|
# Load model and tokenizer |
|
|
model_name = "P1-30B-A3B" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_name, |
|
|
torch_dtype=torch.bfloat16, |
|
|
device_map="auto", |
|
|
trust_remote_code=True |
|
|
) |
|
|
|
|
|
# Physics problem solving |
|
|
prompt = """Solve this physics problem: |
|
|
|
|
|
A pendulum of length L = 1.0 m swings with small amplitude. |
|
|
Calculate the period of oscillation and explain your reasoning. |
|
|
|
|
|
Use g = 9.8 m/sΒ²""" |
|
|
|
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
|
outputs = model.generate( |
|
|
**inputs, |
|
|
max_length=81920, |
|
|
temperature=0.6, |
|
|
top_p=0.9, |
|
|
do_sample=True |
|
|
) |
|
|
|
|
|
solution = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
print(solution) |
|
|
``` |
|
|
|
|
|
## π Acknowledgements |
|
|
|
|
|
We are grateful to the open-source community for their invaluable contributions. Special thanks to: |
|
|
|
|
|
- **[Qwen3](https://huggingface.co/collections/Qwen/qwen3)** - for providing the foundational base models that powered our research |
|
|
- **[slime](https://github.com/THUDM/slime)** - for their innovative work on efficient reinforcement learning framework that powered our training pipeline |
|
|
- **[verl](https://github.com/volcengine/verl)** - for the versatile reinforcement learning framework that enabled our training pipeline |
|
|
- **[sglang](https://github.com/sgl-project/sglang)** - for the efficient LLM serving and inference infrastructure |
|
|
- **[Megatron-LM](https://github.com/NVIDIA/Megatron-LM)** - for the large-scale model training framework |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{p1-2025, |
|
|
title={P1: Mastering Physics Olympiads with Reinforcement Learning}, |
|
|
author={P1 Team}, |
|
|
year={2025}, |
|
|
url={https://prime-rl.github.io/P1/} |
|
|
} |
|
|
``` |
|
|
|
|
|
</div> |
|
|
|