2 12 10

Tianjian Li

dogtooth

https://tianjianl.github.io

truthbutcher

AI & ML interests

None yet

Recent Activity

upvoted a paper 25 days ago

The Alignment Waltz: Jointly Training Agents to Collaborate for Safety

upvoted a paper 25 days ago

Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense

upvoted a paper about 1 month ago

RESTRAIN: From Spurious Votes to Signals -- Self-Driven RL with Self-Penalization

View all activity

Organizations

upvoted 2 papers 25 days ago

The Alignment Waltz: Jointly Training Agents to Collaborate for Safety

Paper • 2510.08240 • Published 25 days ago • 40

Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense

Paper • 2510.07242 • Published 26 days ago • 30

upvoted 2 papers about 1 month ago

RESTRAIN: From Spurious Votes to Signals -- Self-Driven RL with Self-Penalization

Paper • 2510.02172 • Published Oct 2 • 7

The Flaw of Averages: Quantifying Uniformity of Performance on Benchmarks

Paper • 2509.25671 • Published Sep 30 • 6

updated a dataset about 1 month ago

dogtooth/divpo_llama_3.3_rho_0.3

Viewer • Updated Sep 20 • 10k • 5

published a dataset about 1 month ago

dogtooth/divpo_llama_3.3_rho_0.3

Viewer • Updated Sep 20 • 10k • 5

authored a paper 2 months ago

Jointly Reinforcing Diversity and Quality in Language Model Generations

Paper • 2509.02534 • Published Sep 2 • 24

upvoted a paper 2 months ago

Jointly Reinforcing Diversity and Quality in Language Model Generations

Paper • 2509.02534 • Published Sep 2 • 24

commented a paper 2 months ago

Jointly Reinforcing Diversity and Quality in Language Model Generations

Paper • 2509.02534 • Published Sep 2 • 24 •

liked a model 2 months ago

Qwen/Qwen3-4B-Instruct-2507

Text Generation • 4B • Updated Sep 17 • 4.03M • • 439

liked a dataset 4 months ago

open-r1/Big-Math-RL-Verified-Processed

Viewer • Updated Apr 11 • 1M • 652 • 23

upvoted a paper 6 months ago

J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning

Paper • 2505.10320 • Published May 15 • 24

authored a paper 6 months ago

SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning

Paper • 2505.02363 • Published May 5 • 7

upvoted a paper 6 months ago

SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning

Paper • 2505.02363 • Published May 5 • 7

commented a paper 6 months ago

SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning

Paper • 2505.02363 • Published May 5 • 7 •

upvoted a paper 6 months ago

Certified Mitigation of Worst-Case LLM Copyright Infringement

Paper • 2504.16046 • Published Apr 22 • 13

liked a dataset 7 months ago

nvidia/OpenCodeReasoning

Viewer • Updated May 4 • 753k • 3.3k • 503

updated a dataset 7 months ago

dogtooth/helpsteer2_binarized_filtered

Viewer • Updated Apr 5 • 2.51k • 4

published a dataset 7 months ago

dogtooth/helpsteer2_binarized_filtered

Viewer • Updated Apr 5 • 2.51k • 4

updated a dataset 7 months ago

dogtooth/Big-Math-RL-Verified

Viewer • Updated Apr 3 • 1.52M • 17

Tianjian Li

AI & ML interests

Recent Activity

Organizations

dogtooth's activity