File size: 4,738 Bytes
7db28fc 84fd5c6 7db28fc 84fd5c6 7db28fc 84fd5c6 7db28fc 84fd5c6 7db28fc 84fd5c6 5dabef4 84fd5c6 7db28fc 84fd5c6 7db28fc 5dabef4 7db28fc 84fd5c6 7db28fc 84fd5c6 7db28fc 84fd5c6 7db28fc 84fd5c6 7db28fc 84fd5c6 7db28fc 84fd5c6 7db28fc 84fd5c6 7db28fc 84fd5c6 7db28fc b391525 7db28fc 84fd5c6 7db28fc a05e26c 7db28fc 84fd5c6 7db28fc 84fd5c6 7db28fc 84fd5c6 7db28fc 84fd5c6 7db28fc 84fd5c6 7db28fc 84fd5c6 7db28fc 84fd5c6 7db28fc 84fd5c6 7db28fc 84fd5c6 7db28fc 84fd5c6 5ffe26a 84fd5c6 7db28fc 84fd5c6 b391525 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 |
---
language:
- en
- multilingual
tags:
- physics
- reinforcement-learning
- olympiad
- reasoning
- competition
- education
license: apache-2.0
pipeline_tag: text-generation
---
<div align="center">
<h1 style="font-size: 2em; font-weight: bold;">P1: Mastering Physics Olympiads with Reinforcement Learning</h1>
</div>
<p align="center">
<a href="https://prime-rl.github.io/P1/"><b>π P1 Project Page</b></a> |
<a href="https://phyarena.github.io/"><b>π HiPhO Leaderboard</b></a>
</p>
<p align="center">
<img src="https://raw.githubusercontent.com/PRIME-RL/P1/main/docs/imgs/Score_IPhO_2025_P1_v2.jpg" style="width: 800px" align=center>
</p>
<p align="center">
<i>High-performance mid-scale model for physics reasoning</i>
</p>
## Model Description
**P1-30B-A3B** is the mid-size variant of the P1 series, a high-performance open-source language model specialized in physics reasoning. Built on *Qwen3-30B-A3B-Thinking-2507* and refined through multi-stage reinforcement learning on curated physics competition data, P1-30B-A3B achieves impressive results while maintaining reasonable computational requirements, making it accessible for researchers working with physics problems.
### Key Highlights
- π₯ **IPhO 2025 Silver-tier Performance**: Strong competitive showing at international physics olympiad (18.5/30 points)
- π₯ **HiPhO Excellence**: 8 gold medals, 4 silver medals, and 1 bronze medal across 13 physics contests
## Performance Benchmarks
### IPhO 2025 Results
<div align="center">
| Model | Score | Medal |
|:-----:|:-----:|:-----:|
| **P1-30B-A3B** | **18.5** | **π₯ Silver** |
| DeepSeek-R1 | 18.5 | **π₯ Silver** |
| Qwen3-235B-A22B-Thinking-2507 | 17.1 | **π₯ Silver** |
| Qwen3-30B-A3B-Thinking-2507 | 15.6 | **π₯ Silver** |
</div>
### HiPhO Comprehensive Results
<div align="center">
| Category | P1-30B-A3B | Qwen3-235B-A22B | DeepSeek-R1 | Qwen3-30B-A3B (Base) |
|:--------:|:----------:|:---------------:|:-----------:|:--------------------:|
| **Overall Score** | **32.5** | 33.5 | 32.9 | 29.9 |
| Gold Medals (π₯) | 8 | 10 | 9 | 6 |
| Silver Medals (π₯) | 4 | 3 | 3 | 6 |
| Bronze Medals (π₯) | 1 | 0 | 1 | 1 |
| Total Contests | 13 | 13 | 13 | 13 |
</div>
### Generalization to STEM Tasks
Beyond physics reasoning, P1 improves across multiple domains. As shown below, P1-30B-A3B outperforms its base model Qwen3-30B-A3B-Thinking-2507 on math, coding, and STEM benchmarks, demonstrating strong generalization of physics reasoning.
<div align="center">
| Model | AIME24 | AIME25 | HMMT | GPQA | HLE | LiveCodeBench | LiveBench |
|:-----:|:------:|:------:|:----:|:----:|:---:|:-------------:|:---------:|
| Qwen3-30B-A3B-Thinking-2507 (Base) | 90.4 | 85.0 | 71.3 | 73.0 | 11.6 | 66.7 | 76.6 |
| **P1-30B-A3B** | **91.0** | **91.0** | **76.9** | **74.4** | **14.3** | **68.1** | **77.0** |
</div>
## Usage
### Basic Inference
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load model and tokenizer
model_name = "P1-30B-A3B"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
# Physics problem solving
prompt = """Solve this physics problem:
A pendulum of length L = 1.0 m swings with small amplitude.
Calculate the period of oscillation and explain your reasoning.
Use g = 9.8 m/sΒ²"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_length=81920,
temperature=0.6,
top_p=0.9,
do_sample=True
)
solution = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(solution)
```
## π Acknowledgements
We are grateful to the open-source community for their invaluable contributions. Special thanks to:
- **[Qwen3](https://huggingface.co/collections/Qwen/qwen3)** - for providing the foundational base models that powered our research
- **[slime](https://github.com/THUDM/slime)** - for their innovative work on efficient reinforcement learning framework that powered our training pipeline
- **[verl](https://github.com/volcengine/verl)** - for the versatile reinforcement learning framework that enabled our training pipeline
- **[sglang](https://github.com/sgl-project/sglang)** - for the efficient LLM serving and inference infrastructure
- **[Megatron-LM](https://github.com/NVIDIA/Megatron-LM)** - for the large-scale model training framework
## Citation
```bibtex
@misc{p1-2025,
title={P1: Mastering Physics Olympiads with Reinforcement Learning},
author={P1 Team},
year={2025},
url={https://prime-rl.github.io/P1/}
}
```
</div>
|