--- language: - en - multilingual tags: - physics - reinforcement-learning - olympiad - reasoning - competition - education license: apache-2.0 pipeline_tag: text-generation ---

P1: Mastering Physics Olympiads with Reinforcement Learning

🌐 P1 Project Page | šŸ† HiPhO Leaderboard

High-performance mid-scale model for physics reasoning

## Model Description **P1-30B-A3B** is the mid-size variant of the P1 series, a high-performance open-source language model specialized in physics reasoning. Built on *Qwen3-30B-A3B-Thinking-2507* and refined through multi-stage reinforcement learning on curated physics competition data, P1-30B-A3B achieves impressive results while maintaining reasonable computational requirements, making it accessible for researchers working with physics problems. ### Key Highlights - 🄈 **IPhO 2025 Silver-tier Performance**: Strong competitive showing at international physics olympiad (18.5/30 points) - šŸ„‡ **HiPhO Excellence**: 8 gold medals, 4 silver medals, and 1 bronze medal across 13 physics contests ## Performance Benchmarks ### IPhO 2025 Results
| Model | Score | Medal | |:-----:|:-----:|:-----:| | **P1-30B-A3B** | **18.5** | **🄈 Silver** | | DeepSeek-R1 | 18.5 | **🄈 Silver** | | Qwen3-235B-A22B-Thinking-2507 | 17.1 | **🄈 Silver** | | Qwen3-30B-A3B-Thinking-2507 | 15.6 | **🄈 Silver** |
### HiPhO Comprehensive Results
| Category | P1-30B-A3B | Qwen3-235B-A22B | DeepSeek-R1 | Qwen3-30B-A3B (Base) | |:--------:|:----------:|:---------------:|:-----------:|:--------------------:| | **Overall Score** | **32.5** | 33.5 | 32.9 | 29.9 | | Gold Medals (šŸ„‡) | 8 | 10 | 9 | 6 | | Silver Medals (🄈) | 4 | 3 | 3 | 6 | | Bronze Medals (šŸ„‰) | 1 | 0 | 1 | 1 | | Total Contests | 13 | 13 | 13 | 13 |
### Generalization to STEM Tasks Beyond physics reasoning, P1 improves across multiple domains. As shown below, P1-30B-A3B outperforms its base model Qwen3-30B-A3B-Thinking-2507 on math, coding, and STEM benchmarks, demonstrating strong generalization of physics reasoning.
| Model | AIME24 | AIME25 | HMMT | GPQA | HLE | LiveCodeBench | LiveBench | |:-----:|:------:|:------:|:----:|:----:|:---:|:-------------:|:---------:| | Qwen3-30B-A3B-Thinking-2507 (Base) | 90.4 | 85.0 | 71.3 | 73.0 | 11.6 | 66.7 | 76.6 | | **P1-30B-A3B** | **91.0** | **91.0** | **76.9** | **74.4** | **14.3** | **68.1** | **77.0** |
## Usage ### Basic Inference ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch # Load model and tokenizer model_name = "P1-30B-A3B" tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True ) # Physics problem solving prompt = """Solve this physics problem: A pendulum of length L = 1.0 m swings with small amplitude. Calculate the period of oscillation and explain your reasoning. Use g = 9.8 m/s²""" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate( **inputs, max_length=81920, temperature=0.6, top_p=0.9, do_sample=True ) solution = tokenizer.decode(outputs[0], skip_special_tokens=True) print(solution) ``` ## šŸ™ Acknowledgements We are grateful to the open-source community for their invaluable contributions. Special thanks to: - **[Qwen3](https://huggingface.co/collections/Qwen/qwen3)** - for providing the foundational base models that powered our research - **[slime](https://github.com/THUDM/slime)** - for their innovative work on efficient reinforcement learning framework that powered our training pipeline - **[verl](https://github.com/volcengine/verl)** - for the versatile reinforcement learning framework that enabled our training pipeline - **[sglang](https://github.com/sgl-project/sglang)** - for the efficient LLM serving and inference infrastructure - **[Megatron-LM](https://github.com/NVIDIA/Megatron-LM)** - for the large-scale model training framework ## Citation ```bibtex @misc{p1-2025, title={P1: Mastering Physics Olympiads with Reinforcement Learning}, author={P1 Team}, year={2025}, url={https://prime-rl.github.io/P1/} } ```