kaizuberbuehler
			's Collections
			 
		
			
		LM Training
		
	updated
			
 
				
				
	
	
	
			
			Rho-1: Not All Tokens Are What You Need
		
			Paper
			
•
			2404.07965
			
•
			Published
				
			•
				
				93
			
 
	
	 
	
	
	
			
			VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
		
			Paper
			
•
			2404.10667
			
•
			Published
				
			•
				
				22
			
 
	
	 
	
	
	
			
			Instruction-tuned Language Models are Better Knowledge Learners
		
			Paper
			
•
			2402.12847
			
•
			Published
				
			•
				
				26
			
 
	
	 
	
	
	
			
			DoRA: Weight-Decomposed Low-Rank Adaptation
		
			Paper
			
•
			2402.09353
			
•
			Published
				
			•
				
				29
			
 
	
	 
	
	
	
			
			QLoRA: Efficient Finetuning of Quantized LLMs
		
			Paper
			
•
			2305.14314
			
•
			Published
				
			•
				
				56
			
 
	
	 
	
	
	
			
			GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
		
			Paper
			
•
			2403.03507
			
•
			Published
				
			•
				
				189
			
 
	
	 
	
	
	
			
			Reverse Training to Nurse the Reversal Curse
		
			Paper
			
•
			2403.13799
			
•
			Published
				
			•
				
				13
			
 
	
	 
	
	
	
			
			SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
  Depth Up-Scaling
		
			Paper
			
•
			2312.15166
			
•
			Published
				
			•
				
				60
			
 
	
	 
	
	
	
			
			ReFT: Representation Finetuning for Language Models
		
			Paper
			
•
			2404.03592
			
•
			Published
				
			•
				
				101
			
 
	
	 
	
	
	
			
			Direct Nash Optimization: Teaching Language Models to Self-Improve with
  General Preferences
		
			Paper
			
•
			2404.03715
			
•
			Published
				
			•
				
				62
			
 
	
	 
	
	
	
			
			Learn Your Reference Model for Real Good Alignment
		
			Paper
			
•
			2404.09656
			
•
			Published
				
			•
				
				89
			
 
	
	 
	
	
	
			
			Megalodon: Efficient LLM Pretraining and Inference with Unlimited
  Context Length
		
			Paper
			
•
			2404.08801
			
•
			Published
				
			•
				
				66
			
 
	
	 
	
	
	
			
			Pre-training Small Base LMs with Fewer Tokens
		
			Paper
			
•
			2404.08634
			
•
			Published
				
			•
				
				35
			
 
	
	 
	
	
	
			
			JetMoE: Reaching Llama2 Performance with 0.1M Dollars
		
			Paper
			
•
			2404.07413
			
•
			Published
				
			•
				
				38
			
 
	
	 
	
	
	
			
			MiniCPM: Unveiling the Potential of Small Language Models with Scalable
  Training Strategies
		
			Paper
			
•
			2404.06395
			
•
			Published
				
			•
				
				24
			
 
	
	 
	
	
	
			
			SambaLingo: Teaching Large Language Models New Languages
		
			Paper
			
•
			2404.05829
			
•
			Published
				
			•
				
				13
			
 
	
	 
	
	
	
			
			Advancing LLM Reasoning Generalists with Preference Trees
		
			Paper
			
•
			2404.02078
			
•
			Published
				
			•
				
				46
			
 
	
	 
	
	
	
			
			Poro 34B and the Blessing of Multilinguality
		
			Paper
			
•
			2404.01856
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			Phi-3 Technical Report: A Highly Capable Language Model Locally on Your
  Phone
		
			Paper
			
•
			2404.14219
			
•
			Published
				
			•
				
				258
			
 
	
	 
	
	
	
			
			The Instruction Hierarchy: Training LLMs to Prioritize Privileged
  Instructions
		
			Paper
			
•
			2404.13208
			
•
			Published
				
			•
				
				40
			
 
	
	 
	
	
	
			
			Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
		
			Paper
			
•
			2404.05892
			
•
			Published
				
			•
				
				39
			
 
	
	 
	
	
	
			
			Mamba: Linear-Time Sequence Modeling with Selective State Spaces
		
			Paper
			
•
			2312.00752
			
•
			Published
				
			•
				
				146
			
 
	
	 
	
	
	
			
			OpenELM: An Efficient Language Model Family with Open-source Training
  and Inference Framework
		
			Paper
			
•
			2404.14619
			
•
			Published
				
			•
				
				126
			
 
	
	 
	
	
	
			
			Jamba: A Hybrid Transformer-Mamba Language Model
		
			Paper
			
•
			2403.19887
			
•
			Published
				
			•
				
				111
			
 
	
	 
	
	
	
			
			Make Your LLM Fully Utilize the Context
		
			Paper
			
•
			2404.16811
			
•
			Published
				
			•
				
				55
			
 
	
	 
	
	
	
			
			Tele-FLM Technical Report
		
			Paper
			
•
			2404.16645
			
•
			Published
				
			•
				
				18
			
 
	
	 
	
	
	
			
			PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video
  Dense Captioning
		
			Paper
			
•
			2404.16994
			
•
			Published
				
			•
				
				36
			
 
	
	 
	
	
	
			
			LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
		
			Paper
			
•
			2405.00732
			
•
			Published
				
			•
				
				121
			
 
	
	 
	
	
	
			
			Iterative Reasoning Preference Optimization
		
			Paper
			
•
			2404.19733
			
•
			Published
				
			•
				
				49
			
 
	
	 
	
	
	
			
			What matters when building vision-language models?
		
			Paper
			
•
			2405.02246
			
•
			Published
				
			•
				
				103
			
 
	
	 
	
	
	
			
			MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
		
			Paper
			
•
			2405.12130
			
•
			Published
				
			•
				
				50
			
 
	
	 
	
	
	
			
			Your Transformer is Secretly Linear
		
			Paper
			
•
			2405.12250
			
•
			Published
				
			•
				
				158
			
 
	
	 
	
	
	
			
			Perplexed by Perplexity: Perplexity-Based Data Pruning With Small
  Reference Models
		
			Paper
			
•
			2405.20541
			
•
			Published
				
			•
				
				24
			
 
	
	 
	
	
	
			
			How Do Large Language Models Acquire Factual Knowledge During
  Pretraining?
		
			Paper
			
•
			2406.11813
			
•
			Published
				
			•
				
				31
			
 
	
	 
	
	
	
			
			Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs
  with Nothing
		
			Paper
			
•
			2406.08464
			
•
			Published
				
			•
				
				71
			
 
	
	 
	
	
	
			
			The Llama 3 Herd of Models
		
			Paper
			
•
			2407.21783
			
•
			Published
				
			•
				
				116
			
 
	
	 
	
	
	
			
			Gemma 2: Improving Open Language Models at a Practical Size
		
			Paper
			
•
			2408.00118
			
•
			Published
				
			•
				
				79
			
 
	
	 
	
	
	
			
			MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware
  Experts
		
			Paper
			
•
			2407.21770
			
•
			Published
				
			•
				
				22
			
 
	
	 
	
	
	
			
			LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
		
			Paper
			
•
			2408.07055
			
•
			Published
				
			•
				
				67
			
 
	
	 
	
	
	
			
			Data curation via joint example selection further accelerates multimodal
  learning
		
			Paper
			
•
			2406.17711
			
•
			Published
				
			•
				
				3
			
 
	
	 
	
	
	
			
			Jamba-1.5: Hybrid Transformer-Mamba Models at Scale
		
			Paper
			
•
			2408.12570
			
•
			Published
				
			•
				
				33
			
 
	
	 
	
	
	
			
			OLMoE: Open Mixture-of-Experts Language Models
		
			Paper
			
•
			2409.02060
			
•
			Published
				
			•
				
				78
			
 
	
	 
	
	
	
			
			Training Language Models to Self-Correct via Reinforcement Learning
		
			Paper
			
•
			2409.12917
			
•
			Published
				
			•
				
				140
			
 
	
	 
	
	
	
			
			GRIN: GRadient-INformed MoE
		
			Paper
			
•
			2409.12136
			
•
			Published
				
			•
				
				16
			
 
	
	 
	
	
	
			
			Preference Tuning with Human Feedback on Language, Speech, and Vision
  Tasks: A Survey
		
			Paper
			
•
			2409.11564
			
•
			Published
				
			•
				
				20
			
 
	
	 
	
	
	
			
			NVLM: Open Frontier-Class Multimodal LLMs
		
			Paper
			
•
			2409.11402
			
•
			Published
				
			•
				
				74
			
 
	
	 
	
	
	
			
			Instruction Following without Instruction Tuning
		
			Paper
			
•
			2409.14254
			
•
			Published
				
			•
				
				30
			
 
	
	 
	
	
	
			
			Molmo and PixMo: Open Weights and Open Data for State-of-the-Art
  Multimodal Models
		
			Paper
			
•
			2409.17146
			
•
			Published
				
			•
				
				121
			
 
	
	 
	
	
	
			
			Programming Every Example: Lifting Pre-training Data Quality like
  Experts at Scale
		
			Paper
			
•
			2409.17115
			
•
			Published
				
			•
				
				63
			
 
	
	 
	
	
	
			
			Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from
  Disparate Training Data
		
			Paper
			
•
			2406.14546
			
•
			Published
				
			•
				
				2
			
 
	
	 
	
	
	
			
			Thinking LLMs: General Instruction Following with Thought Generation
		
			Paper
			
•
			2410.10630
			
•
			Published
				
			•
				
				21
			
 
	
	 
	
	
	
		
			Paper
			
•
			2412.08905
			
•
			Published
				
			•
				
				121
			
 
	
	 
	
	
	
			
			Offline Reinforcement Learning for LLM Multi-Step Reasoning
		
			Paper
			
•
			2412.16145
			
•
			Published
				
			•
				
				38
			
 
	
	 
	
	
	
			
			RobustFT: Robust Supervised Fine-tuning for Large Language Models under
  Noisy Response
		
			Paper
			
•
			2412.14922
			
•
			Published
				
			•
				
				88
			
 
	
	 
	
	
	
			
			Diving into Self-Evolving Training for Multimodal Reasoning
		
			Paper
			
•
			2412.17451
			
•
			Published
				
			•
				
				43
			
 
	
	 
	
	
	
		
			Paper
			
•
			2412.16720
			
•
			Published
				
			•
				
				36
			
 
	
	 
	
	
	
			
			TÜLU 3: Pushing Frontiers in Open Language Model Post-Training
		
			Paper
			
•
			2411.15124
			
•
			Published
				
			•
				
				66
			
 
	
	 
	
	
	
			
			Natural Language Reinforcement Learning
		
			Paper
			
•
			2411.14251
			
•
			Published
				
			•
				
				31
			
 
	
	 
	
	
	
			
			OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
		
			Paper
			
•
			2411.04905
			
•
			Published
				
			•
				
				127
			
 
	
	 
	
	
	
			
			Mixture-of-Transformers: A Sparse and Scalable Architecture for
  Multi-Modal Foundation Models
		
			Paper
			
•
			2411.04996
			
•
			Published
				
			•
				
				51
			
 
	
	 
	
	
	
			
			2.5 Years in Class: A Multimodal Textbook for Vision-Language
  Pretraining
		
			Paper
			
•
			2501.00958
			
•
			Published
				
			•
				
				107
			
 
	
	 
	
	
	
			
			Towards System 2 Reasoning in LLMs: Learning How to Think With Meta
  Chain-of-Though
		
			Paper
			
•
			2501.04682
			
•
			Published
				
			•
				
				99
			
 
	
	 
	
	
	
			
			Scaling Laws for Floating Point Quantization Training
		
			Paper
			
•
			2501.02423
			
•
			Published
				
			•
				
				26
			
 
	
	 
	
	
	
			
			Virgo: A Preliminary Exploration on Reproducing o1-like MLLM
		
			Paper
			
•
			2501.01904
			
•
			Published
				
			•
				
				33
			
 
	
	 
	
	
	
			
			O1 Replication Journey -- Part 3: Inference-time Scaling for Medical
  Reasoning
		
			Paper
			
•
			2501.06458
			
•
			Published
				
			•
				
				31
			
 
	
	 
	
	
	
			
			Enhancing Human-Like Responses in Large Language Models
		
			Paper
			
•
			2501.05032
			
•
			Published
				
			•
				
				58
			
 
	
	 
	
	
	
			
			Do generative video models learn physical principles from watching
  videos?
		
			Paper
			
•
			2501.09038
			
•
			Published
				
			•
				
				34
			
 
	
	 
	
	
	
			
			DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
  Reinforcement Learning
		
			Paper
			
•
			2501.12948
			
•
			Published
				
			•
				
				422
			
 
	
	 
	
	
	
			
			Kimi k1.5: Scaling Reinforcement Learning with LLMs
		
			Paper
			
•
			2501.12599
			
•
			Published
				
			•
				
				123
			
 
	
	 
	
	
	
			
			Demystifying Long Chain-of-Thought Reasoning in LLMs
		
			Paper
			
•
			2502.03373
			
•
			Published
				
			•
				
				58
			
 
	
	 
	
	
	
			
			SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model
  Post-training
		
			Paper
			
•
			2501.17161
			
•
			Published
				
			•
				
				123
			
 
	
	 
	
	
	
			
			Test-Time Preference Optimization: On-the-Fly Alignment via Iterative
  Textual Feedback
		
			Paper
			
•
			2501.12895
			
•
			Published
				
			•
				
				61
			
 
	
	 
	
	
	
			
			Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary
  Feedback
		
			Paper
			
•
			2501.10799
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			Qwen2.5-1M Technical Report
		
			Paper
			
•
			2501.15383
			
•
			Published
				
			•
				
				72
			
 
	
	 
	
	
	
			
			Baichuan-Omni-1.5 Technical Report
		
			Paper
			
•
			2501.15368
			
•
			Published
				
			•
				
				62
			
 
	
	 
	
	
	
			
			Optimizing Large Language Model Training Using FP4 Quantization
		
			Paper
			
•
			2501.17116
			
•
			Published
				
			•
				
				37
			
 
	
	 
	
	
	
			
			Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling
		
			Paper
			
•
			2501.16975
			
•
			Published
				
			•
				
				31
			
 
	
	 
	
	
	
			
			Critique Fine-Tuning: Learning to Critique is More Effective than
  Learning to Imitate
		
			Paper
			
•
			2501.17703
			
•
			Published
				
			•
				
				58
			
 
	
	 
	
	
	
			
			SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language
  Model
		
			Paper
			
•
			2502.02737
			
•
			Published
				
			•
				
				246
			
 
	
	 
	
	
	
			
			LIMO: Less is More for Reasoning
		
			Paper
			
•
			2502.03387
			
•
			Published
				
			•
				
				62
			
 
	
	 
	
	
	
			
			QuEST: Stable Training of LLMs with 1-Bit Weights and Activations
		
			Paper
			
•
			2502.05003
			
•
			Published
				
			•
				
				43
			
 
	
	 
	
	
	
			
			CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference
		
			Paper
			
•
			2502.04416
			
•
			Published
				
			•
				
				12
			
 
	
	 
	
	
	
			
			Scaling Pre-training to One Hundred Billion Data for Vision Language
  Models
		
			Paper
			
•
			2502.07617
			
•
			Published
				
			•
				
				29
			
 
	
	 
	
	
	
			
			Gemstones: A Model Suite for Multi-Faceted Scaling Laws
		
			Paper
			
•
			2502.06857
			
•
			Published
				
			•
				
				25
			
 
	
	 
	
	
	
			
			Typhoon T1: An Open Thai Reasoning Model
		
			Paper
			
•
			2502.09042
			
•
			Published
				
			•
				
				16
			
 
	
	 
	
	
	
			
			Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse
  Attention
		
			Paper
			
•
			2502.11089
			
•
			Published
				
			•
				
				165
			
 
	
	 
	
	
	
			
			How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?
		
			Paper
			
•
			2502.14502
			
•
			Published
				
			•
				
				91
			
 
	
	 
	
	
	
			
			Continuous Diffusion Model for Language Modeling
		
			Paper
			
•
			2502.11564
			
•
			Published
				
			•
				
				53
			
 
	
	 
	
	
	
			
			Train Small, Infer Large: Memory-Efficient LoRA Training for Large
  Language Models
		
			Paper
			
•
			2502.13533
			
•
			Published
				
			•
				
				12
			
 
	
	 
	
	
	
			
			LongRoPE2: Near-Lossless LLM Context Window Scaling
		
			Paper
			
•
			2502.20082
			
•
			Published
				
			•
				
				38
			
 
	
	 
	
	
	
			
			Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
		
			Paper
			
•
			2502.17055
			
•
			Published
				
			•
				
				20
			
 
	
	 
	
	
	
			
			Visual-RFT: Visual Reinforcement Fine-Tuning
		
			Paper
			
•
			2503.01785
			
•
			Published
				
			•
				
				84
			
 
	
	 
	
	
	
			
			Predictive Data Selection: The Data That Predicts Is the Data That
  Teaches
		
			Paper
			
•
			2503.00808
			
•
			Published
				
			•
				
				56
			
 
	
	 
	
	
	
			
			Gemini Robotics: Bringing AI into the Physical World
		
			Paper
			
•
			2503.20020
			
•
			Published
				
			•
				
				29
			
 
	
	 
	
	
	
			
			Large-Scale Data Selection for Instruction Tuning
		
			Paper
			
•
			2503.01807
			
•
			Published
				
			•
				
				13
			
 
	
	 
	
	
	
			
			Unified Reward Model for Multimodal Understanding and Generation
		
			Paper
			
•
			2503.05236
			
•
			Published
				
			•
				
				123
			
 
	
	 
	
	
	
			
			Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and
  Beyond
		
			Paper
			
•
			2503.10460
			
•
			Published
				
			•
				
				29
			
 
	
	 
	
	
	
			
			TTRL: Test-Time Reinforcement Learning
		
			Paper
			
•
			2504.16084
			
•
			Published
				
			•
				
				120
			
 
	
	 
	
	
	
			
			Learning from Failures in Multi-Attempt Reinforcement Learning
		
			Paper
			
•
			2503.04808
			
•
			Published
				
			•
				
				18
			
 
	
	 
	
	
	
			
			TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation
		
			Paper
			
•
			2503.04872
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			Self-Taught Self-Correction for Small Language Models
		
			Paper
			
•
			2503.08681
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
		
			Paper
			
•
			2503.15558
			
•
			Published
				
			•
				
				50
			
 
	
	 
	
	
	
			
			SkyLadder: Better and Faster Pretraining via Context Window Scheduling
		
			Paper
			
•
			2503.15450
			
•
			Published
				
			•
				
				12
			
 
	
	 
	
	
	
		
			Paper
			
•
			2503.19786
			
•
			Published
				
			•
				
				54
			
 
	
	 
	
	
	
			
			Modifying Large Language Model Post-Training for Diverse Creative
  Writing
		
			Paper
			
•
			2503.17126
			
•
			Published
				
			•
				
				36
			
 
	
	 
	
	
	
			
			FastCuRL: Curriculum Reinforcement Learning with Progressive Context
  Extension for Efficient Training R1-like Reasoning Models
		
			Paper
			
•
			2503.17287
			
•
			Published
				
			•
				
				11
			
 
	
	 
	
	
	
			
			ZClip: Adaptive Spike Mitigation for LLM Pre-Training
		
			Paper
			
•
			2504.02507
			
•
			Published
				
			•
				
				88
			
 
	
	 
	
	
	
			
			RL Tango: Reinforcing Generator and Verifier Together for Language
  Reasoning
		
			Paper
			
•
			2505.15034
			
•
			Published
				
			•
				
				5
			
 
	
	 
	
	
	
			
			Improved Visual-Spatial Reasoning via R1-Zero-Like Training
		
			Paper
			
•
			2504.00883
			
•
			Published
				
			•
				
				66
			
 
	
	 
	
	
	
			
			Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement
  Learning on the Base Model
		
			Paper
			
•
			2503.24290
			
•
			Published
				
			•
				
				62
			
 
	
	 
	
	
	
			
			JudgeLRM: Large Reasoning Models as a Judge
		
			Paper
			
•
			2504.00050
			
•
			Published
				
			•
				
				62
			
 
	
	 
	
	
	
			
			Inference-Time Scaling for Generalist Reward Modeling
		
			Paper
			
•
			2504.02495
			
•
			Published
				
			•
				
				56
			
 
	
	 
	
	
	
			
			Understanding R1-Zero-Like Training: A Critical Perspective
		
			Paper
			
•
			2503.20783
			
•
			Published
				
			•
				
				56
			
 
	
	 
	
	
	
			
			Exploring Data Scaling Trends and Effects in Reinforcement Learning from
  Human Feedback
		
			Paper
			
•
			2503.22230
			
•
			Published
				
			•
				
				45
			
 
	
	 
	
	
	
			
			Unicorn: Text-Only Data Synthesis for Vision Language Model Training
		
			Paper
			
•
			2503.22655
			
•
			Published
				
			•
				
				39
			
 
	
	 
	
	
	
			
			Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal
  LLMs on Academic Resources
		
			Paper
			
•
			2504.00595
			
•
			Published
				
			•
				
				36
			
 
	
	 
	
	
	
			
			Rethinking RL Scaling for Vision Language Models: A Transparent,
  From-Scratch Framework and Comprehensive Evaluation Scheme
		
			Paper
			
•
			2504.02587
			
•
			Published
				
			•
				
				32
			
 
	
	 
	
	
	
			
			Scaling Analysis of Interleaved Speech-Text Language Models
		
			Paper
			
•
			2504.02398
			
•
			Published
				
			•
				
				31
			
 
	
	 
	
	
	
			
			Scaling Language-Free Visual Representation Learning
		
			Paper
			
•
			2504.01017
			
•
			Published
				
			•
				
				32
			
 
	
	 
	
	
	
			
			RIG: Synergizing Reasoning and Imagination in End-to-End Generalist
  Policy
		
			Paper
			
•
			2503.24388
			
•
			Published
				
			•
				
				30
			
 
	
	 
	
	
	
			
			Z1: Efficient Test-time Scaling with Code
		
			Paper
			
•
			2504.00810
			
•
			Published
				
			•
				
				26
			
 
	
	 
	
	
	
			
			Command A: An Enterprise-Ready Large Language Model
		
			Paper
			
•
			2504.00698
			
•
			Published
				
			•
				
				28
			
 
	
	 
	
	
	
			
			Expanding RL with Verifiable Rewards Across Diverse Domains
		
			Paper
			
•
			2503.23829
			
•
			Published
				
			•
				
				23
			
 
	
	 
	
	
	
			
			ActionStudio: A Lightweight Framework for Data and Training of Large
  Action Models
		
			Paper
			
•
			2503.22673
			
•
			Published
				
			•
				
				12
			
 
	
	 
	
	
	
			
			Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards
  for Reasoning-Enhanced Text-to-SQL
		
			Paper
			
•
			2503.23157
			
•
			Published
				
			•
				
				10
			
 
	
	 
	
	
	
		
			Paper
			
•
			2504.07491
			
•
			Published
				
			•
				
				132
			
 
	
	 
	
	
	
			
			Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought
		
			Paper
			
•
			2504.05599
			
•
			Published
				
			•
				
				85
			
 
	
	 
	
	
	
			
			Rethinking Reflection in Pre-Training
		
			Paper
			
•
			2504.04022
			
•
			Published
				
			•
				
				79
			
 
	
	 
	
	
	
			
			OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training
  Tokens
		
			Paper
			
•
			2504.07096
			
•
			Published
				
			•
				
				76
			
 
	
	 
	
	
	
			
			Scaling Laws for Native Multimodal Models Scaling Laws for Native
  Multimodal Models
		
			Paper
			
•
			2504.07951
			
•
			Published
				
			•
				
				29
			
 
	
	 
	
	
	
			
			VAPO: Efficient and Reliable Reinforcement Learning for Advanced
  Reasoning Tasks
		
			Paper
			
•
			2504.05118
			
•
			Published
				
			•
				
				26
			
 
	
	 
	
	
	
			
			A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths
  to Reproducibility
		
			Paper
			
•
			2504.07086
			
•
			Published
				
			•
				
				21
			
 
	
	 
	
	
	
			
			SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual
  Reasoning Self-Improvement
		
			Paper
			
•
			2504.07934
			
•
			Published
				
			•
				
				20
			
 
	
	 
	
	
	
			
			VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement
  Fine-Tuning
		
			Paper
			
•
			2504.06958
			
•
			Published
				
			•
				
				12
			
 
	
	 
	
	
	
			
			Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
		
			Paper
			
•
			2504.05520
			
•
			Published
				
			•
				
				11
			
 
	
	 
	
	
	
			
			InternVL3: Exploring Advanced Training and Test-Time Recipes for
  Open-Source Multimodal Models
		
			Paper
			
•
			2504.10479
			
•
			Published
				
			•
				
				300
			
 
	
	 
	
	
	
			
			CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for
  Language Model Pre-training
		
			Paper
			
•
			2504.13161
			
•
			Published
				
			•
				
				93
			
 
	
	 
	
	
	
			
			xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
		
			Paper
			
•
			2504.10481
			
•
			Published
				
			•
				
				85
			
 
	
	 
	
	
	
			
			BitNet b1.58 2B4T Technical Report
		
			Paper
			
•
			2504.12285
			
•
			Published
				
			•
				
				75
			
 
	
	 
	
	
	
			
			Genius: A Generalizable and Purely Unsupervised Self-Training Framework
  For Advanced Reasoning
		
			Paper
			
•
			2504.08672
			
•
			Published
				
			•
				
				55
			
 
	
	 
	
	
	
			
			VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models
  with Reinforcement Learning
		
			Paper
			
•
			2504.08837
			
•
			Published
				
			•
				
				43
			
 
	
	 
	
	
	
			
			Heimdall: test-time scaling on the generative verification
		
			Paper
			
•
			2504.10337
			
•
			Published
				
			•
				
				33
			
 
	
	 
	
	
	
			
			SFT or RL? An Early Investigation into Training R1-Like Reasoning Large
  Vision-Language Models
		
			Paper
			
•
			2504.11468
			
•
			Published
				
			•
				
				30
			
 
	
	 
	
	
	
			
			NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
		
			Paper
			
•
			2504.13055
			
•
			Published
				
			•
				
				19
			
 
	
	 
	
	
	
			
			A Minimalist Approach to LLM Reasoning: from Rejection Sampling to
  Reinforce
		
			Paper
			
•
			2504.11343
			
•
			Published
				
			•
				
				19
			
 
	
	 
	
	
	
			
			DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM
  Post-training
		
			Paper
			
•
			2504.09710
			
•
			Published
				
			•
				
				19
			
 
	
	 
	
	
	
			
			DataDecide: How to Predict Best Pretraining Data with Small Experiments
		
			Paper
			
•
			2504.11393
			
•
			Published
				
			•
				
				18
			
 
	
	 
	
	
	
			
			Breaking the Data Barrier -- Building GUI Agents Through Task
  Generalization
		
			Paper
			
•
			2504.10127
			
•
			Published
				
			•
				
				17
			
 
	
	 
	
	
	
			
			M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
		
			Paper
			
•
			2504.10449
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			SimpleAR: Pushing the Frontier of Autoregressive Visual Generation
  through Pretraining, SFT, and RL
		
			Paper
			
•
			2504.11455
			
•
			Published
				
			•
				
				14
			
 
	
	 
	
	
	
			
			Efficient Process Reward Model Training via Active Learning
		
			Paper
			
•
			2504.10559
			
•
			Published
				
			•
				
				13
			
 
	
	 
	
	
	
			
			Exploring Expert Failures Improves LLM Agent Tuning
		
			Paper
			
•
			2504.13145
			
•
			Published
				
			•
				
				12
			
 
	
	 
	
	
	
			
			Eagle 2.5: Boosting Long-Context Post-Training for Frontier
  Vision-Language Models
		
			Paper
			
•
			2504.15271
			
•
			Published
				
			•
				
				66
			
 
	
	 
	
	
	
			
			ToolRL: Reward is All Tool Learning Needs
		
			Paper
			
•
			2504.13958
			
•
			Published
				
			•
				
				48
			
 
	
	 
	
	
	
			
			OTC: Optimal Tool Calls via Reinforcement Learning
		
			Paper
			
•
			2504.14870
			
•
			Published
				
			•
				
				35
			
 
	
	 
	
	
	
			
			QuaDMix: Quality-Diversity Balanced Data Selection for Efficient LLM
  Pretraining
		
			Paper
			
•
			2504.16511
			
•
			Published
				
			•
				
				22
			
 
	
	 
	
	
	
			
			LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making
  Abilities
		
			Paper
			
•
			2504.16078
			
•
			Published
				
			•
				
				21
			
 
	
	 
	
	
	
			
			Efficient Pretraining Length Scaling
		
			Paper
			
•
			2504.14992
			
•
			Published
				
			•
				
				20