Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
dkkloimwieder 's Collections
Paper
Mdl

Mdl

updated Jun 3
Upvote
-

  • S*: Test Time Scaling for Code Generation

    Paper • 2502.14382 • Published Feb 20 • 63

  • S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning

    Paper • 2502.12853 • Published Feb 18 • 29

  • rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

    Paper • 2501.04519 • Published Jan 8 • 285

  • Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search

    Paper • 2502.02508 • Published Feb 4 • 23

  • Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model

    Paper • 2503.24290 • Published Mar 31 • 62

  • SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning

    Paper • 2504.08600 • Published Apr 11 • 31

  • ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

    Paper • 2504.11536 • Published Apr 15 • 62

  • Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

    Paper • 2506.01939 • Published Jun 2 • 185
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs