arxiv:2509.02534
Tianjian Li
dogtooth
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
17 days ago
The Alignment Waltz: Jointly Training Agents to Collaborate for Safety
upvoted
a
paper
17 days ago
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense
upvoted
a
paper
24 days ago
RESTRAIN: From Spurious Votes to Signals -- Self-Driven RL with
Self-Penalization