arxiv:2503.15478
							
						Song Jiang
songjiang
		AI & ML interests
None yet
		Recent Activity
						upvoted 
								a
								paper
							
						21 days ago
						
					
						
						
						SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models
						
						upvoted 
								a
								paper
							
						29 days ago
						
					
						
						
						Large Reasoning Models Learn Better Alignment from Flawed Thinking
						
						authored 
								a paper
							
						8 months ago
						
					
						
						
						SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning
  Tasks
						Organizations
None yet