kaicheng001

https://kaicheng001.github.io/

kaicheng001

AI & ML interests

None yet

Recent Activity

upvoted a paper 22 days ago

Agent Learning via Early Experience

upvoted a paper about 1 month ago

FlowRL: Matching Reward Distributions for LLM Reasoning

upvoted a paper 2 months ago

Intern-S1: A Scientific Multimodal Foundation Model

View all activity

Organizations

None yet

upvoted a paper 22 days ago

Agent Learning via Early Experience

Paper • 2510.08558 • Published 23 days ago • 255

upvoted a paper about 1 month ago

FlowRL: Matching Reward Distributions for LLM Reasoning

Paper • 2509.15207 • Published Sep 18 • 110

upvoted a paper 2 months ago

Intern-S1: A Scientific Multimodal Foundation Model

Paper • 2508.15763 • Published Aug 21 • 254

upvoted 3 papers 3 months ago

On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification

Paper • 2508.05629 • Published Aug 7 • 177

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26 • 156

Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24 • 306

upvoted 2 papers 4 months ago

A Survey of Context Engineering for Large Language Models

Paper • 2507.13334 • Published Jul 17 • 257

A Survey on Latent Reasoning

Paper • 2507.06203 • Published Jul 8 • 92

upvoted an article 4 months ago

Article

SmolLM3: smol, multilingual, long-context reasoner

Jul 8

• 707

upvoted a paper 5 months ago

Reward Reasoning Model

Paper • 2505.14674 • Published May 20 • 37

upvoted 3 papers 6 months ago

Parallel Scaling Law for Language Models

Paper • 2505.10475 • Published May 15 • 83

Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Paper • 2505.03335 • Published May 6 • 185

Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

Paper • 2505.04921 • Published May 8 • 185

upvoted an article 6 months ago

Article

I trained a Language Model to schedule events with GRPO!

•

Apr 29

• 90

upvoted a paper 6 months ago

ToolRL: Reward is All Tool Learning Needs

Paper • 2504.13958 • Published Apr 16 • 48

upvoted an article 7 months ago

Article

Mixture of Experts Explained

Dec 11, 2023

• 947

upvoted 2 papers 7 months ago

Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model

Paper • 2504.08685 • Published Apr 11 • 130

DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning

Paper • 2504.07128 • Published Apr 2 • 86

upvoted 2 papers 8 months ago

Transformers without Normalization

Paper • 2503.10622 • Published Mar 13 • 169

LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

Paper • 2503.07536 • Published Mar 10 • 88

kaicheng001

AI & ML interests

Recent Activity

Organizations

kaicheng001's activity

SmolLM3: smol, multilingual, long-context reasoner

I trained a Language Model to schedule events with GRPO!

Mixture of Experts Explained