Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model Paper • 2510.18855 • Published 7 days ago • 60
Ovis2.5 Collection Our next-generation MLLMs for native-resolution vision and advanced reasoning • 5 items • Updated Aug 19 • 55
SYNTHETIC-1 Collection A collection of tasks & verifiers for reasoning datasets • 9 items • Updated 21 days ago • 63
Quartet: Native FP4 Training Can Be Optimal for Large Language Models Paper • 2505.14669 • Published May 20 • 77
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning Paper • 2504.07128 • Published Apr 2 • 86
OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens Paper • 2504.07096 • Published Apr 9 • 76
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks Paper • 2502.08235 • Published Feb 12 • 58
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published Feb 13 • 148
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models Paper • 2410.07985 • Published Oct 10, 2024 • 33
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking Paper • 2502.02339 • Published Feb 4 • 22
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 243
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models Paper • 2412.11605 • Published Dec 16, 2024 • 18
WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training Paper • 2501.18511 • Published Jan 30 • 20