Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2505.02387

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 28
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Skywork Open Reasoner 1 Technical Report

Paper • 2505.22312 • Published May 28 • 54
Unveiling Instruction-Specific Neurons & Experts: An Analytical Framework for LLM's Instruction-Following Capabilities

Paper • 2505.21191 • Published May 27 • 3
Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Paper • 2505.03335 • Published May 6 • 185
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 305

Scaling Law for Quantization-Aware Training

Paper • 2505.14302 • Published May 20 • 76
Reward Reasoning Model

Paper • 2505.14674 • Published May 20 • 37
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 305
AdaptThink: Reasoning Models Can Learn When to Think

Paper • 2505.13417 • Published May 19 • 82

Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers

Paper • 2504.20752 • Published Apr 29 • 92
RM-R1: Reward Modeling as Reasoning

Paper • 2505.02387 • Published May 5 • 78

RM-R1: Reward Modeling as Reasoning

RM-R1: Reward Modeling as Reasoning

Paper • 2505.02387 • Published May 5 • 78
gaotang/RM-R1-Entire-RLVR-Train

Viewer • Updated May 20 • 73k • 81 • 2
gaotang/RM-R1-Reasoning-RLVR

Viewer • Updated May 20 • 73k • 37
gaotang/RM-R1-Distill-SFT

Viewer • Updated May 20 • 8.75k • 39 • 2

RL+reason model

RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24 • 28
Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27 • 30
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 123
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Paper • 2412.12098 • Published Dec 16, 2024 • 4

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8 • 249 • 96
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 36
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 97
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23 • 88

Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers

Paper • 2505.04842 • Published May 7 • 12
ZeroSearch: Incentivize the Search Capability of LLMs without Searching

Paper • 2505.04588 • Published May 7 • 65
WebThinker: Empowering Large Reasoning Models with Deep Research Capability

Paper • 2504.21776 • Published Apr 30 • 59
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning

Paper • 2505.01441 • Published Apr 28 • 39

Toward Evaluative Thinking: Meta Policy Optimization with Evolving Reward Models

Paper • 2504.20157 • Published Apr 28 • 37
The Leaderboard Illusion

Paper • 2504.20879 • Published Apr 29 • 71
ReasonIR: Training Retrievers for Reasoning Tasks

Paper • 2504.20595 • Published Apr 29 • 53
RM-R1: Reward Modeling as Reasoning

Paper • 2505.02387 • Published May 5 • 78

TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published Apr 22 • 120
Learning to Reason under Off-Policy Guidance

Paper • 2504.14945 • Published Apr 21 • 88
RM-R1: Reward Modeling as Reasoning

Paper • 2505.02387 • Published May 5 • 78

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 28
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

RL+reason model

RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24 • 28
Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27 • 30
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 123
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Paper • 2412.12098 • Published Dec 16, 2024 • 4

Skywork Open Reasoner 1 Technical Report

Paper • 2505.22312 • Published May 28 • 54
Unveiling Instruction-Specific Neurons & Experts: An Analytical Framework for LLM's Instruction-Following Capabilities

Paper • 2505.21191 • Published May 27 • 3
Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Paper • 2505.03335 • Published May 6 • 185
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 305

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8 • 249 • 96
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 36
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 97
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23 • 88

Scaling Law for Quantization-Aware Training

Paper • 2505.14302 • Published May 20 • 76
Reward Reasoning Model

Paper • 2505.14674 • Published May 20 • 37
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 305
AdaptThink: Reasoning Models Can Learn When to Think

Paper • 2505.13417 • Published May 19 • 82

Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers

Paper • 2505.04842 • Published May 7 • 12
ZeroSearch: Incentivize the Search Capability of LLMs without Searching

Paper • 2505.04588 • Published May 7 • 65
WebThinker: Empowering Large Reasoning Models with Deep Research Capability

Paper • 2504.21776 • Published Apr 30 • 59
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning

Paper • 2505.01441 • Published Apr 28 • 39

Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers

Paper • 2504.20752 • Published Apr 29 • 92
RM-R1: Reward Modeling as Reasoning

Paper • 2505.02387 • Published May 5 • 78

Toward Evaluative Thinking: Meta Policy Optimization with Evolving Reward Models

Paper • 2504.20157 • Published Apr 28 • 37
The Leaderboard Illusion

Paper • 2504.20879 • Published Apr 29 • 71
ReasonIR: Training Retrievers for Reasoning Tasks

Paper • 2504.20595 • Published Apr 29 • 53
RM-R1: Reward Modeling as Reasoning

Paper • 2505.02387 • Published May 5 • 78

RM-R1: Reward Modeling as Reasoning

RM-R1: Reward Modeling as Reasoning

Paper • 2505.02387 • Published May 5 • 78
gaotang/RM-R1-Entire-RLVR-Train

Viewer • Updated May 20 • 73k • 81 • 2
gaotang/RM-R1-Reasoning-RLVR

Viewer • Updated May 20 • 73k • 37
gaotang/RM-R1-Distill-SFT

Viewer • Updated May 20 • 8.75k • 39 • 2

TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published Apr 22 • 120
Learning to Reason under Off-Policy Guidance

Paper • 2504.14945 • Published Apr 21 • 88
RM-R1: Reward Modeling as Reasoning

Paper • 2505.02387 • Published May 5 • 78

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs