Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2508.10751

Reinforcement learning

Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning

Paper • 2407.20798 • Published Jul 30, 2024 • 24
Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published Dec 20, 2024 • 38
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4 • 102
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Paper • 2502.18449 • Published Feb 25 • 75

AI Release Week Thread (18 August 2025)

AI Release Week Thread (18 August 2025)

GroveMoE

Collection

GroveMoE is an open-source family of large language models developed by the AGI Center, Ant Research Institute. • 4 items • Updated 13 days ago • 7
Qwen/Qwen-Image-Edit

Image-to-Image • Updated Aug 25 • 187k • • 2.06k
Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models

Paper • 2508.10751 • Published Aug 14 • 28
nvidia/NVIDIA-Nemotron-Nano-12B-v2-Base

Text Generation • 12B • Updated 11 days ago • 4.44k • 77

Reasoning Papers

Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization

Paper • 2508.07629 • Published Aug 11 • 41
Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning

Paper • 2508.07101 • Published Aug 9 • 13
Compressing Chain-of-Thought in LLMs via Step Entropy

Paper • 2508.03346 • Published Aug 5 • 7
Train Long, Think Short: Curriculum Learning for Efficient Reasoning

Paper • 2508.08940 • Published Aug 12 • 26

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8 • 249 • 96
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 36
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 97
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23 • 88

ByteDance Papers

ByteDance papers collection

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation

Paper • 2105.09501 • Published May 20, 2021
Cross-modal Contrastive Learning for Speech Translation

Paper • 2205.02444 • Published May 5, 2022
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs

Paper • 2210.03052 • Published Oct 6, 2022
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning

Paper • 2212.10240 • Published Dec 20, 2022 • 1

Adapting Vision-Language Models Without Labels: A Comprehensive Survey

Paper • 2508.05547 • Published Aug 7 • 11
Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models

Paper • 2508.10751 • Published Aug 14 • 28
SSRL: Self-Search Reinforcement Learning

Paper • 2508.10874 • Published Aug 14 • 94
Mind the Generation Process: Fine-Grained Confidence Estimation During LLM Generation

Paper • 2508.12040 • Published Aug 16 • 14

Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models

Paper • 2508.10751 • Published Aug 14 • 28
Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 262
MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers

Paper • 2508.14704 • Published Aug 20 • 42
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published Aug 22 • 153

Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models

Paper • 2506.06395 • Published Jun 5 • 131
Magistral

Paper • 2506.10910 • Published Jun 12 • 64
Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs

Paper • 2506.07240 • Published Jun 8 • 7
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation

Paper • 2506.09991 • Published Jun 11 • 55

PotentialApplication

Let LLMs Break Free from Overthinking via Self-Braking Tuning

Paper • 2505.14604 • Published May 20 • 23
AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios

Paper • 2505.16944 • Published May 22 • 8
Training Step-Level Reasoning Verifiers with Formal Verification Tools

Paper • 2505.15960 • Published May 21 • 7
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning

Paper • 2505.15134 • Published May 21 • 6

reasoning-agentic

OpenAI o1 System Card

Paper • 2412.16720 • Published Dec 21, 2024 • 35
LearnLM: Improving Gemini for Learning

Paper • 2412.16429 • Published Dec 21, 2024 • 22
NILE: Internal Consistency Alignment in Large Language Models

Paper • 2412.16686 • Published Dec 21, 2024 • 8
Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published Dec 20, 2024 • 38

Reinforcement learning

Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning

Paper • 2407.20798 • Published Jul 30, 2024 • 24
Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published Dec 20, 2024 • 38
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4 • 102
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Paper • 2502.18449 • Published Feb 25 • 75

Adapting Vision-Language Models Without Labels: A Comprehensive Survey

Paper • 2508.05547 • Published Aug 7 • 11
Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models

Paper • 2508.10751 • Published Aug 14 • 28
SSRL: Self-Search Reinforcement Learning

Paper • 2508.10874 • Published Aug 14 • 94
Mind the Generation Process: Fine-Grained Confidence Estimation During LLM Generation

Paper • 2508.12040 • Published Aug 16 • 14

AI Release Week Thread (18 August 2025)

AI Release Week Thread (18 August 2025)

GroveMoE

Collection

GroveMoE is an open-source family of large language models developed by the AGI Center, Ant Research Institute. • 4 items • Updated 13 days ago • 7
Qwen/Qwen-Image-Edit

Image-to-Image • Updated Aug 25 • 187k • • 2.06k
Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models

Paper • 2508.10751 • Published Aug 14 • 28
nvidia/NVIDIA-Nemotron-Nano-12B-v2-Base

Text Generation • 12B • Updated 11 days ago • 4.44k • 77

Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models

Paper • 2508.10751 • Published Aug 14 • 28
Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 262
MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers

Paper • 2508.14704 • Published Aug 20 • 42
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published Aug 22 • 153

Reasoning Papers

Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization

Paper • 2508.07629 • Published Aug 11 • 41
Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning

Paper • 2508.07101 • Published Aug 9 • 13
Compressing Chain-of-Thought in LLMs via Step Entropy

Paper • 2508.03346 • Published Aug 5 • 7
Train Long, Think Short: Curriculum Learning for Efficient Reasoning

Paper • 2508.08940 • Published Aug 12 • 26

Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models

Paper • 2506.06395 • Published Jun 5 • 131
Magistral

Paper • 2506.10910 • Published Jun 12 • 64
Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs

Paper • 2506.07240 • Published Jun 8 • 7
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation

Paper • 2506.09991 • Published Jun 11 • 55

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8 • 249 • 96
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 36
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 97
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23 • 88

PotentialApplication

Let LLMs Break Free from Overthinking via Self-Braking Tuning

Paper • 2505.14604 • Published May 20 • 23
AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios

Paper • 2505.16944 • Published May 22 • 8
Training Step-Level Reasoning Verifiers with Formal Verification Tools

Paper • 2505.15960 • Published May 21 • 7
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning

Paper • 2505.15134 • Published May 21 • 6

ByteDance Papers

ByteDance papers collection

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation

Paper • 2105.09501 • Published May 20, 2021
Cross-modal Contrastive Learning for Speech Translation

Paper • 2205.02444 • Published May 5, 2022
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs

Paper • 2210.03052 • Published Oct 6, 2022
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning

Paper • 2212.10240 • Published Dec 20, 2022 • 1

reasoning-agentic

OpenAI o1 System Card

Paper • 2412.16720 • Published Dec 21, 2024 • 35
LearnLM: Improving Gemini for Learning

Paper • 2412.16429 • Published Dec 21, 2024 • 22
NILE: Internal Consistency Alignment in Large Language Models

Paper • 2412.16686 • Published Dec 21, 2024 • 8
Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published Dec 20, 2024 • 38

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs