neutrino12
			's Collections
			 
		
			
				
				
	
	
	
				Snowflake/Arctic-Text2SQL-R1-7B
				
				
			
		
				8B
			• 
	
				Updated
					
				
				• 
					
					4.52k
				
	
				
• 
					
					51
				
  
		
	
	
	 
	
	
	
			
			Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
		
			Paper
			
•
			2505.24726
			
•
			Published
				
			•
				
				274
			
 
	
	 
	
	
	
			
			Reinforcement Pre-Training
		
			Paper
			
•
			2506.08007
			
•
			Published
				
			•
				
				262
			
 
	
	 
	
	
	
			
			Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
		
			Paper
			
•
			2506.16406
			
•
			Published
				
			•
				
				126
			
 
	
	 
	
	
	
			
			Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
		
			Paper
			
•
			2506.06395
			
•
			Published
				
			•
				
				131
			
 
	
	 
	
	
	
		
			Paper
			
•
			2505.09388
			
•
			Published
				
			•
				
				308
			
 
	
	 
	
	
	
			
			QwenLong-L1: Towards Long-Context Large Reasoning Models with
  Reinforcement Learning
		
			Paper
			
•
			2505.17667
			
•
			Published
				
			•
				
				88
			
 
	
	 
	
	
	
			
			Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
		
			Paper
			
•
			2507.16784
			
•
			Published
				
			•
				
				120
			
 
	
	 
	
	
	
			
			Group Sequence Policy Optimization
		
			Paper
			
•
			2507.18071
			
•
			Published
				
			•
				
				306
			
 
	
	 
	
	
	
			
			nablaNABLA: Neighborhood Adaptive Block-Level Attention
		
			Paper
			
•
			2507.13546
			
•
			Published
				
			•
				
				123
			
 
	
	 
	
	
	
		
			Paper
			
•
			2507.15493
			
•
			Published
				
			•
				
				47
			
 
	
	 
	
	
	
			
			MUR: Momentum Uncertainty guided Reasoning for Large Language Models
		
			Paper
			
•
			2507.14958
			
•
			Published
				
			•
				
				46
			
 
	
	 
	
	
	
			
			LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy
  Optimization
		
			Paper
			
•
			2507.15758
			
•
			Published
				
			•
				
				35
			
 
	
	 
	
	
	
			
			Complex Logical Instruction Generation
		
			Paper
			
•
			2508.09125
			
•
			Published
				
			•
				
				39
			
 
	
	 
	
	
	
			
			SONAR-LLM: Autoregressive Transformer that Thinks in Sentence Embeddings
  and Speaks in Tokens
		
			Paper
			
•
			2508.05305
			
•
			Published
				
			•
				
				46
			
 
	
	 
	
	
	
			
			PRELUDE: A Benchmark Designed to Require Global Comprehension and
  Reasoning over Long Contexts
		
			Paper
			
•
			2508.09848
			
•
			Published
				
			•
				
				67
			
 
	
	 
	
	
	
			
			Train Long, Think Short: Curriculum Learning for Efficient Reasoning
		
			Paper
			
•
			2508.08940
			
•
			Published
				
			•
				
				26
			
 
	
	 
	
	
	
			
			Learning to Align, Aligning to Learn: A Unified Approach for
  Self-Optimized Alignment
		
			Paper
			
•
			2508.07750
			
•
			Published
				
			•
				
				19
			
 
	
	 
	
	
	
			
			Pruning the Unsurprising: Efficient Code Reasoning via First-Token
  Surprisal
		
			Paper
			
•
			2508.05988
			
•
			Published
				
			•
				
				19
			
 
	
	 
	
	
	
			
			Less Is More: Training-Free Sparse Attention with Global Locality for
  Efficient Reasoning
		
			Paper
			
•
			2508.07101
			
•
			Published
				
			•
				
				13
			
 
	
	 
	
	
	
			
			Compressing Chain-of-Thought in LLMs via Step Entropy
		
			Paper
			
•
			2508.03346
			
•
			Published
				
			•
				
				7
			
 
	
	 
	
	
	
			
			Sample More to Think Less: Group Filtered Policy Optimization for
  Concise Reasoning
		
			Paper
			
•
			2508.09726
			
•
			Published
				
			•
				
				14
			
 
	
	 
	
	
	
			
			Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
		
			Paper
			
•
			2508.01191
			
•
			Published
				
			•
				
				236
			
 
	
	 
	
	
	
			
			On the Generalization of SFT: A Reinforcement Learning Perspective with
  Reward Rectification
		
			Paper
			
•
			2508.05629
			
•
			Published
				
			•
				
				178
			
 
	
	 
	
	
	
			
			R-Zero: Self-Evolving Reasoning LLM from Zero Data
		
			Paper
			
•
			2508.05004
			
•
			Published
				
			•
				
				126
			
 
	
	 
	
	
	
			
			Story2Board: A Training-Free Approach for Expressive Storyboard
  Generation
		
			Paper
			
•
			2508.09983
			
•
			Published
				
			•
				
				68
			
 
	
	 
	
	
	
			
			Don't Overthink It: A Survey of Efficient R1-style Large Reasoning
  Models
		
			Paper
			
•
			2508.02120
			
•
			Published
				
			•
				
				19
			
 
	
	 
	
	
	
			
			Beyond the Trade-off: Self-Supervised Reinforcement Learning for
  Reasoning Models' Instruction Following
		
			Paper
			
•
			2508.02150
			
•
			Published
				
			•
				
				36
			
 
	
	 
	
	
	
			
			Trainable Dynamic Mask Sparse Attention
		
			Paper
			
•
			2508.02124
			
•
			Published
				
			•
				
				17
			
 
	
	 
	
	
	
			
			Can Large Multimodal Models Actively Recognize Faulty Inputs? A
  Systematic Evaluation Framework of Their Input Scrutiny Ability
		
			Paper
			
•
			2508.04017
			
•
			Published
				
			•
				
				11
			
 
	
	 
	
	
	
			
			Deep Think with Confidence
		
			Paper
			
•
			2508.15260
			
•
			Published
				
			•
				
				87
			
 
	
	 
	
	
	
			
			PaperRegister: Boosting Flexible-grained Paper Search via Hierarchical
  Register Indexing
		
			Paper
			
•
			2508.11116
			
•
			Published
				
			•
				
				22
			
 
	
	 
	
	
	
			
			Efficient Code Embeddings from Code Generation Models
		
			Paper
			
•
			2508.21290
			
•
			Published
				
			•
				
				18
			
 
	
	 
	
	
	
			
			Model-Task Alignment Drives Distinct RL Outcomes
		
			Paper
			
•
			2508.21188
			
•
			Published
				
			•
				
				8
			
 
	
	 
	
	
	
			
			Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable
  Text-to-Image Reinforcement Learning
		
			Paper
			
•
			2508.20751
			
•
			Published
				
			•
				
				89
			
 
	
	 
	
	
	
			
			UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior
  Long-Context Learning
		
			Paper
			
•
			2508.18756
			
•
			Published
				
			•
				
				36
			
 
	
	 
	
	
	
			
			Hermes 4 Technical Report
		
			Paper
			
•
			2508.18255
			
•
			Published
				
			•
				
				39
			
 
	
	 
	
	
	
			
			StepWiser: Stepwise Generative Judges for Wiser Reasoning
		
			Paper
			
•
			2508.19229
			
•
			Published
				
			•
				
				20
			
 
	
	 
	
	
	
			
			ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large
  Language Models
		
			Paper
			
•
			2508.18773
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			Neither Valid nor Reliable? Investigating the Use of LLMs as Judges
		
			Paper
			
•
			2508.18076
			
•
			Published
				
			•
				
				6
			
 
	
	 
	
	
	
			
			InMind: Evaluating LLMs in Capturing and Applying Individual Human
  Reasoning Styles
		
			Paper
			
•
			2508.16072
			
•
			Published
				
			•
				
				4
			
 
	
	 
	
	
	
			
			Servant, Stalker, Predator: How An Honest, Helpful, And Harmless (3H)
  Agent Unlocks Adversarial Skills
		
			Paper
			
•
			2508.19500
			
•
			Published
				
			•
				
				2
			
 
	
	 
	
	
	
			
			CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated
  Chain-of-Thought-based Reinforced Fine-Tuning
		
			Paper
			
•
			2508.15868
			
•
			Published
				
			•
				
				3
			
 
	
	 
	
	
	
			
			Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts
		
			Paper
			
•
			2508.10390
			
•
			Published
				
			•
				
				1
			
 
	
	 
	
	
	
			
			LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model
		
			Paper
			
•
			2509.00676
			
•
			Published
				
			•
				
				83
			
 
	
	 
	
	
	
			
			DCPO: Dynamic Clipping Policy Optimization
		
			Paper
			
•
			2509.02333
			
•
			Published
				
			•
				
				21
			
 
	
	 
	
	
	
			
			When Does Reasoning Matter? A Controlled Study of Reasoning's
  Contribution to Model Performance
		
			Paper
			
•
			2509.22193
			
•
			Published
				
			•
				
				37
			
 
	
	 
	
	
	
			
			What Characterizes Effective Reasoning? Revisiting Length, Review, and
  Structure of CoT
		
			Paper
			
•
			2509.19284
			
•
			Published
				
			•
				
				22