Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning
Abstract
NPR, a teacher-free framework, enhances Large Language Models with native parallel reasoning capabilities through self-distilled training, Parallel-Aware Policy Optimization, and a robust NPR Engine, achieving substantial performance and speed improvements.
We introduce Native Parallel Reasoner (NPR), a teacher-free framework that enables Large Language Models (LLMs) to self-evolve genuine parallel reasoning capabilities. NPR transforms the model from sequential emulation to native parallel cognition through three key innovations: 1) a self-distilled progressive training paradigm that transitions from ``cold-start'' format discovery to strict topological constraints without external supervision; 2) a novel Parallel-Aware Policy Optimization (PAPO) algorithm that optimizes branching policies directly within the execution graph, allowing the model to learn adaptive decomposition via trial and error; and 3) a robust NPR Engine that refactors memory management and flow control of SGLang to enable stable, large-scale parallel RL training. Across eight reasoning benchmarks, NPR trained on Qwen3-4B achieves performance gains of up to 24.5% and inference speedups up to 4.6x. Unlike prior baselines that often fall back to autoregressive decoding, NPR demonstrates 100% genuine parallel execution, establishing a new standard for self-evolving, efficient, and scalable agentic reasoning.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning (2025)
- ORION: Teaching Language Models to Reason Efficiently in the Language of Thought (2025)
- Enhancing Large Language Model Reasoning via Selective Critical Token Fine-Tuning (2025)
- Masked-and-Reordered Self-Supervision for Reinforcement Learning from Verifiable Rewards (2025)
- A2FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning (2025)
- Tailored Primitive Initialization is the Secret Key to Reinforcement Learning (2025)
- SPARK: Stepwise Process-Aware Rewards for Reference-Free Reinforcement Learning (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 2
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper