Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
zhuww 's Collections
multi-turn
RL
arena
SWE
code
agentic
LLM
reasoning llm

RL

updated 15 days ago
Upvote
-

  • Large Reasoning Models Learn Better Alignment from Flawed Thinking

    Paper • 2510.00938 • Published 25 days ago • 56

  • What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT

    Paper • 2509.19284 • Published Sep 23 • 22

  • Learning to Reason as Action Abstractions with Scalable Mid-Training RL

    Paper • 2509.25810 • Published 27 days ago • 5

  • Agent Learning via Early Experience

    Paper • 2510.08558 • Published 17 days ago • 241
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs