P1-30B-A3B / README.md

Update README.md

5ffe26a verified 26 days ago

4.74 kB

	---
	language:
	- en
	- multilingual
	tags:
	- physics
	- reinforcement-learning
	- olympiad
	- reasoning
	- competition
	- education
	license: apache-2.0
	pipeline_tag: text-generation
	---

	<div align="center">
	<h1 style="font-size: 2em; font-weight: bold;">P1: Mastering Physics Olympiads with Reinforcement Learning</h1>
	</div>

	<p align="center">
	<a href="https://prime-rl.github.io/P1/"><b>🌐 P1 Project Page</b></a> \|
	<a href="https://phyarena.github.io/"><b>🏆 HiPhO Leaderboard</b></a>
	</p>

	<p align="center">
	<img src="https://raw.githubusercontent.com/PRIME-RL/P1/main/docs/imgs/Score_IPhO_2025_P1_v2.jpg" style="width: 800px" align=center>
	</p>

	<p align="center">
	<i>High-performance mid-scale model for physics reasoning</i>
	</p>

	## Model Description

	P1-30B-A3B is the mid-size variant of the P1 series, a high-performance open-source language model specialized in physics reasoning. Built on Qwen3-30B-A3B-Thinking-2507 and refined through multi-stage reinforcement learning on curated physics competition data, P1-30B-A3B achieves impressive results while maintaining reasonable computational requirements, making it accessible for researchers working with physics problems.

	### Key Highlights

	- 🥈 IPhO 2025 Silver-tier Performance: Strong competitive showing at international physics olympiad (18.5/30 points)
	- 🥇 HiPhO Excellence: 8 gold medals, 4 silver medals, and 1 bronze medal across 13 physics contests


	## Performance Benchmarks

	### IPhO 2025 Results

	<div align="center">

	\| Model \| Score \| Medal \|
	\|:-----:\|:-----:\|:-----:\|
	\| P1-30B-A3B \| 18.5 \| 🥈 Silver \|
	\| DeepSeek-R1 \| 18.5 \| 🥈 Silver \|
	\| Qwen3-235B-A22B-Thinking-2507 \| 17.1 \| 🥈 Silver \|
	\| Qwen3-30B-A3B-Thinking-2507 \| 15.6 \| 🥈 Silver \|
	</div>

	### HiPhO Comprehensive Results

	<div align="center">

	\| Category \| P1-30B-A3B \| Qwen3-235B-A22B \| DeepSeek-R1 \| Qwen3-30B-A3B (Base) \|
	\|:--------:\|:----------:\|:---------------:\|:-----------:\|:--------------------:\|
	\| Overall Score \| 32.5 \| 33.5 \| 32.9 \| 29.9 \|
	\| Gold Medals (🥇) \| 8 \| 10 \| 9 \| 6 \|
	\| Silver Medals (🥈) \| 4 \| 3 \| 3 \| 6 \|
	\| Bronze Medals (🥉) \| 1 \| 0 \| 1 \| 1 \|
	\| Total Contests \| 13 \| 13 \| 13 \| 13 \|

	</div>

	### Generalization to STEM Tasks

	Beyond physics reasoning, P1 improves across multiple domains. As shown below, P1-30B-A3B outperforms its base model Qwen3-30B-A3B-Thinking-2507 on math, coding, and STEM benchmarks, demonstrating strong generalization of physics reasoning.

	<div align="center">

	\| Model \| AIME24 \| AIME25 \| HMMT \| GPQA \| HLE \| LiveCodeBench \| LiveBench \|
	\|:-----:\|:------:\|:------:\|:----:\|:----:\|:---:\|:-------------:\|:---------:\|
	\| Qwen3-30B-A3B-Thinking-2507 (Base) \| 90.4 \| 85.0 \| 71.3 \| 73.0 \| 11.6 \| 66.7 \| 76.6 \|
	\| P1-30B-A3B \| 91.0 \| 91.0 \| 76.9 \| 74.4 \| 14.3 \| 68.1 \| 77.0 \|

	</div>


	## Usage

	### Basic Inference

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	# Load model and tokenizer
	model_name = "P1-30B-A3B"
	tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype=torch.bfloat16,
	device_map="auto",
	trust_remote_code=True
	)

	# Physics problem solving
	prompt = """Solve this physics problem:

	A pendulum of length L = 1.0 m swings with small amplitude.
	Calculate the period of oscillation and explain your reasoning.

	Use g = 9.8 m/s²"""

	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	outputs = model.generate(
	**inputs,
	max_length=81920,
	temperature=0.6,
	top_p=0.9,
	do_sample=True
	)

	solution = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(solution)
	```

	## 🙏 Acknowledgements

	We are grateful to the open-source community for their invaluable contributions. Special thanks to:

	- [Qwen3](https://huggingface.co/collections/Qwen/qwen3) - for providing the foundational base models that powered our research
	- [slime](https://github.com/THUDM/slime) - for their innovative work on efficient reinforcement learning framework that powered our training pipeline
	- [verl](https://github.com/volcengine/verl) - for the versatile reinforcement learning framework that enabled our training pipeline
	- [sglang](https://github.com/sgl-project/sglang) - for the efficient LLM serving and inference infrastructure
	- [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) - for the large-scale model training framework

	## Citation

	```bibtex
	@misc{p1-2025,
	title={P1: Mastering Physics Olympiads with Reinforcement Learning},
	author={P1 Team},
	year={2025},
	url={https://prime-rl.github.io/P1/}
	}
	```

	</div>