유영준's picture

17 3

유영준

colin31472

AI & ML interests

None yet

Recent Activity

liked a dataset 13 days ago

hkust-nlp/SimpleRL-Zoo-Data

upvoted a paper 18 days ago

Less is More: Recursive Reasoning with Tiny Networks

upvoted a paper about 1 month ago

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

View all activity

Organizations

None yet

liked a dataset 13 days ago

hkust-nlp/SimpleRL-Zoo-Data

Viewer • Updated Mar 25 • 53.1k • 528 • 8

upvoted a paper 18 days ago

Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published 23 days ago • 455

upvoted 3 papers about 1 month ago

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Paper • 2506.01939 • Published Jun 2 • 185

Reasoning with Exploration: An Entropy Perspective

Paper • 2506.14758 • Published Jun 17 • 30

A Survey of Reinforcement Learning for Large Reasoning Models

Paper • 2509.08827 • Published Sep 10 • 183

upvoted 6 papers 3 months ago

Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning

Paper • 2410.08146 • Published Oct 10, 2024 • 1

VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment

Paper • 2410.01679 • Published Oct 2, 2024 • 27

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18 • 141

AdaptThink: Reasoning Models Can Learn When to Think

Paper • 2505.13417 • Published May 19 • 82

Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24 • 306

The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models

Paper • 2505.22617 • Published May 28 • 130

upvoted 3 papers 4 months ago

First Return, Entropy-Eliciting Explore

Paper • 2507.07017 • Published Jul 9 • 23

MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

Paper • 2506.13585 • Published Jun 16 • 267

Magistral

Paper • 2506.10910 • Published Jun 12 • 64

upvoted a paper 5 months ago

Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

Paper • 2506.09250 • Published Jun 10 • 27

liked a dataset 5 months ago

Zigeng/CoT-Verification-340k

Viewer • Updated May 27 • 340k • 297 • 5

upvoted 3 papers 5 months ago

MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Paper • 2402.14905 • Published Feb 22, 2024 • 134

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published Feb 7 • 150

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Paper • 2505.24864 • Published May 30 • 138

liked a model 5 months ago

nvidia/Nemotron-Research-Reasoning-Qwen-1.5B

Text Generation • 2B • Updated Aug 12 • 13.6k • 222