 JuanRafap
			's Collections
			JuanRafap
			's Collections
			
			
		Library
		
	updated
			
 
				
				
   - lusxvr/nanoVLM-222M- 
			Image-Text-to-Text
			 • 
		
				0.2B
			• 
	
				Updated
					
				
				•- 
					249
				
	
				 •- 
					96
				 
 - Search-R1: Training LLMs to Reason and Leverage Search Engines with
  Reinforcement Learning- 
			Paper
			 •- 
			2503.09516
			 •
			Published
				
			•- 
				36
			 
 - AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time- 
			Paper
			 •- 
			2505.24863
			 •
			Published
				
			•- 
				97
			 
 - QwenLong-L1: Towards Long-Context Large Reasoning Models with
  Reinforcement Learning- 
			Paper
			 •- 
			2505.17667
			 •
			Published
				
			•- 
				88
			 
 - ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in
  Large Language Models- 
			Paper
			 •- 
			2505.24864
			 •
			Published
				
			•- 
				138
			 
 - AReaL: A Large-Scale Asynchronous Reinforcement Learning System for
  Language Reasoning- 
			Paper
			 •- 
			2505.24298
			 •
			Published
				
			•- 
				28
			 
 - GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient
  Fine-Tuning- 
			Paper
			 •- 
			2505.20355
			 •
			Published
				
			•- 
				35
			 
 - Interleaved Reasoning for Large Language Models via Reinforcement
  Learning- 
			Paper
			 •- 
			2505.19640
			 •
			Published
				
			•- 
				14
			 
 - FullFront: Benchmarking MLLMs Across the Full Front-End Engineering
  Workflow- 
			Paper
			 •- 
			2505.17399
			 •
			Published
				
			•- 
				14
			 
 - Enigmata: Scaling Logical Reasoning in Large Language Models with
  Synthetic Verifiable Puzzles- 
			Paper
			 •- 
			2505.19914
			 •
			Published
				
			•- 
				43
			 
 - One RL to See Them All: Visual Triple Unified Reinforcement Learning- 
			Paper
			 •- 
			2505.18129
			 •
			Published
				
			•- 
				59
			 
 - Scaling Reasoning, Losing Control: Evaluating Instruction Following in
  Large Reasoning Models- 
			Paper
			 •- 
			2505.14810
			 •
			Published
				
			•- 
				62
			 
 - Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement
  Learning- 
			Paper
			 •- 
			2505.16410
			 •
			Published
				
			•- 
				57
			 
 - JULI: Jailbreak Large Language Models by Self-Introspection- 
			Paper
			 •- 
			2505.11790
			 •
			Published
 - Optimizing Anytime Reasoning via Budget Relative Policy Optimization- 
			Paper
			 •- 
			2505.13438
			 •
			Published
				
			•- 
				36
			 
 - 
			Paper
			 •- 
			2505.14674
			 •
			Published
				
			•- 
				37
			 
 - RM-R1: Reward Modeling as Reasoning- 
			Paper
			 •- 
			2505.02387
			 •
			Published
				
			•- 
				78
			 
 - CPGD: Toward Stable Rule-based Reinforcement Learning for Language
  Models- 
			Paper
			 •- 
			2505.12504
			 •
			Published
				
			•- 
				24
			 
 - Neuro-Symbolic Query Compiler- 
			Paper
			 •- 
			2505.11932
			 •
			Published
				
			•- 
				18
			 
 - Ψ-Sampler: Initial Particle Sampling for SMC-Based Inference-Time
  Reward Alignment in Score Models- 
			Paper
			 •- 
			2506.01320
			 •
			Published
				
			•- 
				16
			 
 - Aligning Latent Spaces with Flow Priors- 
			Paper
			 •- 
			2506.05240
			 •
			Published
				
			•- 
				27
			 
 - Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in
  Robotics- 
			Paper
			 •- 
			2506.00070
			 •
			Published
				
			•- 
				29
			 
 - A Controllable Examination for Long-Context Language Models- 
			Paper
			 •- 
			2506.02921
			 •
			Published
				
			•- 
				33
			 
 - MotionSight: Boosting Fine-Grained Motion Understanding in Multimodal
  LLMs- 
			Paper
			 •- 
			2506.01674
			 •
			Published
				
			•- 
				28
			 
 - CodeContests+: High-Quality Test Case Generation for Competitive
  Programming- 
			Paper
			 •- 
			2506.05817
			 •
			Published
				
			•- 
				9
			 
 - FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal
  Contextual Fusion- 
			Paper
			 •- 
			2506.01111
			 •
			Published
				
			•- 
				30
			 
 - Reinforcement Pre-Training- 
			Paper
			 •- 
			2506.08007
			 •
			Published
				
			•- 
				262
			 
 - GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection
  Behavior- 
			Paper
			 •- 
			2506.08012
			 •
			Published
				
			•- 
				7
			 
 - Dreamland: Controllable World Creation with Simulator and Generative
  Models- 
			Paper
			 •- 
			2506.08006
			 •
			Published
				
			•- 
				7
			 
 - Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety
  Assurance- 
			Paper
			 •- 
			2506.06444
			 •
			Published
				
			•- 
				73
			 
 - BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation- 
			Paper
			 •- 
			2506.07530
			 •
			Published
				
			•- 
				20
			 
 - Solving Inequality Proofs with Large Language Models- 
			Paper
			 •- 
			2506.07927
			 •
			Published
				
			•- 
				20
			 
 - Autoregressive Semantic Visual Reconstruction Helps VLMs Understand
  Better- 
			Paper
			 •- 
			2506.09040
			 •
			Published
				
			•- 
				34
			 
 - Through the Valley: Path to Effective Long CoT Training for Small
  Language Models- 
			Paper
			 •- 
			2506.07712
			 •
			Published
				
			•- 
				18
			 
 - Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports
  From Scratch with Agentic Framework- 
			Paper
			 •- 
			2506.02454
			 •
			Published
				
			•- 
				7
			 
 - Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error
  Diagnosis in GUI Automation- 
			Paper
			 •- 
			2506.04614
			 •
			Published
				
			•- 
				18
			 
 - Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal
  Learning- 
			Paper
			 •- 
			2506.06205
			 •
			Published
				
			•- 
				30
			 
 - 
			Paper
			 •- 
			2506.10910
			 •
			Published
				
			•- 
				64
			 
 - Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just
  Like an Olympiad Team- 
			Paper
			 •- 
			2506.14234
			 •
			Published
				
			•- 
				41
			 
 - Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time
  Markers- 
			Paper
			 •- 
			2506.14702
			 •
			Published
				
			•- 
				3
			 
 - AR-RAG: Autoregressive Retrieval Augmentation for Image Generation- 
			Paper
			 •- 
			2506.06962
			 •
			Published
				
			•- 
				28
			 
 - DoTA-RAG: Dynamic of Thought Aggregation RAG- 
			Paper
			 •- 
			2506.12571
			 •
			Published
				
			•- 
				50
			 
 - syftr: Pareto-Optimal Generative AI- 
			Paper
			 •- 
			2505.20266
			 •
			Published
 - Scaling Test-time Compute for LLM Agents- 
			Paper
			 •- 
			2506.12928
			 •
			Published
				
			•- 
				63
			 
 - LoRA-Edit: Controllable First-Frame-Guided Video Editing via Mask-Aware
  LoRA Fine-Tuning- 
			Paper
			 •- 
			2506.10082
			 •
			Published
				
			•- 
				8
			 
 - General-Reasoner: Advancing LLM Reasoning Across All Domains- 
			Paper
			 •- 
			2505.14652
			 •
			Published
				
			•- 
				23
			 
 - Optimizing Length Compression in Large Reasoning Models- 
			Paper
			 •- 
			2506.14755
			 •
			Published
				
			•- 
				10
			 
 - UniFork: Exploring Modality Alignment for Unified Multimodal
  Understanding and Generation- 
			Paper
			 •- 
			2506.17202
			 •
			Published
				
			•- 
				10
			 
 - ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought
  Reasoning in LLMs- 
			Paper
			 •- 
			2506.18896
			 •
			Published
				
			•- 
				29
			 
 - Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal
  Document Understanding- 
			Paper
			 •- 
			2506.16035
			 •
			Published
				
			•- 
				88
			 
 - Robust Reward Modeling via Causal Rubrics- 
			Paper
			 •- 
			2506.16507
			 •
			Published
				
			•- 
				9
			 
 - LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with
  TriMap Video Diffusion- 
			Paper
			 •- 
			2507.02813
			 •
			Published
				
			•- 
				60
			 
 - FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model- 
			Paper
			 •- 
			2507.01953
			 •
			Published
				
			•- 
				19
			 
 - Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective- 
			Paper
			 •- 
			2506.17930
			 •
			Published
				
			•- 
				19
			 
 - SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via
  Multi-Agent Multi-Turn Reinforcement Learning- 
			Paper
			 •- 
			2506.24119
			 •
			Published
				
			•- 
				50
			 
   - katanemo/Arch-Router-1.5B- 
			Text Generation
			 • 
		
				2B
			• 
	
				Updated
					
				
				•- 
					2.45k
				 • 
			
	
				•- 
					207
				 
 - Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs
  More Realistic and Less Risky- 
			Paper
			 •- 
			2507.03336
			 •
			Published
				
			•- 
				5
			 
 - SingLoRA: Low Rank Adaptation Using a Single Matrix- 
			Paper
			 •- 
			2507.05566
			 •
			Published
				
			•- 
				112
			 
 - CriticLean: Critic-Guided Reinforcement Learning for Mathematical
  Formalization- 
			Paper
			 •- 
			2507.06181
			 •
			Published
				
			•- 
				43
			 
 - AutoTriton: Automatic Triton Programming with Reinforcement Learning in
  LLMs- 
			Paper
			 •- 
			2507.05687
			 •
			Published
				
			•- 
				27
			 
 - Coding Triangle: How Does Large Language Model Understand Code?- 
			Paper
			 •- 
			2507.06138
			 •
			Published
				
			•- 
				21
			 
 - High-Resolution Visual Reasoning via Multi-Turn Grounding-Based
  Reinforcement Learning- 
			Paper
			 •- 
			2507.05920
			 •
			Published
				
			•- 
				11
			 
 - RefineX: Learning to Refine Pre-training Data at Scale from
  Expert-Guided Programs- 
			Paper
			 •- 
			2507.03253
			 •
			Published
				
			•- 
				18
			 
 - Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs- 
			Paper
			 •- 
			2507.07996
			 •
			Published
				
			•- 
				34
			 
 - Lumos-1: On Autoregressive Video Generation from a Unified Model
  Perspective- 
			Paper
			 •- 
			2507.08801
			 •
			Published
				
			•- 
				30
			 
 - A Survey of Context Engineering for Large Language Models- 
			Paper
			 •- 
			2507.13334
			 •
			Published
				
			•- 
				257
			 
 - WebShaper: Agentically Data Synthesizing via Information-Seeking
  Formalization- 
			Paper
			 •- 
			2507.15061
			 •
			Published
				
			•- 
				59
			 
 - AnyCap Project: A Unified Framework, Dataset, and Benchmark for
  Controllable Omni-modal Captioning- 
			Paper
			 •- 
			2507.12841
			 •
			Published
				
			•- 
				41
			 
 - Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning- 
			Paper
			 •- 
			2507.16784
			 •
			Published
				
			•- 
				120
			 
 - MUR: Momentum Uncertainty guided Reasoning for Large Language Models- 
			Paper
			 •- 
			2507.14958
			 •
			Published
				
			•- 
				46
			 
 - Does More Inference-Time Compute Really Help Robustness?- 
			Paper
			 •- 
			2507.15974
			 •
			Published
				
			•- 
				7
			 
 - RefCritic: Training Long Chain-of-Thought Critic Models with Refinement
  Feedback- 
			Paper
			 •- 
			2507.15024
			 •
			Published
				
			•- 
				14
			 
 - ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via
  Gaussian Splatting- 
			Paper
			 •- 
			2507.15454
			 •
			Published
				
			•- 
				7
			 
 - Promptomatix: An Automatic Prompt Optimization Framework for Large
  Language Models- 
			Paper
			 •- 
			2507.14241
			 •
			Published
				
			•- 
				17
			 
 - TTS-VAR: A Test-Time Scaling Framework for Visual Auto-Regressive
  Generation- 
			Paper
			 •- 
			2507.18537
			 •
			Published
				
			•- 
				17
			 
 - Being-H0: Vision-Language-Action Pretraining from Large-Scale Human
  Videos- 
			Paper
			 •- 
			2507.15597
			 •
			Published
				
			•- 
				33
			 
 - A Simple "Try Again" Can Elicit Multi-Turn LLM Reasoning- 
			Paper
			 •- 
			2507.14295
			 •
			Published
				
			•- 
				13
			 
 - SeC: Advancing Complex Video Object Segmentation via Progressive Concept
  Construction- 
			Paper
			 •- 
			2507.15852
			 •
			Published
				
			•- 
				38
			 
 - FLEXITOKENS: Flexible Tokenization for Evolving Language Models- 
			Paper
			 •- 
			2507.12720
			 •
			Published
				
			•- 
				9
			 
 - RiemannLoRA: A Unified Riemannian Framework for Ambiguity-Free LoRA
  Optimization- 
			Paper
			 •- 
			2507.12142
			 •
			Published
				
			•- 
				36
			 
 - Replacing thinking with tool usage enables reasoning in small language
  models- 
			Paper
			 •- 
			2507.05065
			 •
			Published
				
			•- 
				15
			 
 - Lizard: An Efficient Linearization Framework for Large Language Models- 
			Paper
			 •- 
			2507.09025
			 •
			Published
				
			•- 
				18
			 
 - MemOS: A Memory OS for AI System- 
			Paper
			 •- 
			2507.03724
			 •
			Published
				
			•- 
				153
			 
 - Agentic Reinforced Policy Optimization- 
			Paper
			 •- 
			2507.19849
			 •
			Published
				
			•- 
				156
			 
 - Deep Researcher with Test-Time Diffusion- 
			Paper
			 •- 
			2507.16075
			 •
			Published
				
			•- 
				64
			 
 - SmallThinker: A Family of Efficient Large Language Models Natively
  Trained for Local Deployment- 
			Paper
			 •- 
			2507.20984
			 •
			Published
				
			•- 
				56
			 
 - MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI
  Agents- 
			Paper
			 •- 
			2507.19478
			 •
			Published
				
			•- 
				30
			 
 - Geometric-Mean Policy Optimization- 
			Paper
			 •- 
			2507.20673
			 •
			Published
				
			•- 
				31
			 
 - UloRL:An Ultra-Long Output Reinforcement Learning Approach for Advancing
  Large Language Models' Reasoning Abilities- 
			Paper
			 •- 
			2507.19766
			 •
			Published
				
			•- 
				14
			 
 - VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced
  Multimodal Reasoning- 
			Paper
			 •- 
			2507.22607
			 •
			Published
				
			•- 
				46
			 
 - Beyond Fixed: Variable-Length Denoising for Diffusion Large Language
  Models- 
			Paper
			 •- 
			2508.00819
			 •
			Published
				
			•- 
				62
			 
 - Beyond the Trade-off: Self-Supervised Reinforcement Learning for
  Reasoning Models' Instruction Following- 
			Paper
			 •- 
			2508.02150
			 •
			Published
				
			•- 
				36
			 
 - Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent
  Foundation Models Training- 
			Paper
			 •- 
			2508.00414
			 •
			Published
				
			•- 
				91
			 
 - On the Expressiveness of Softmax Attention: A Recurrent Neural Network
  Perspective- 
			Paper
			 •- 
			2507.23632
			 •
			Published
				
			•- 
				6
			 
 - Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving- 
			Paper
			 •- 
			2507.23726
			 •
			Published
				
			•- 
				112
			 
 - SitEmb-v1.5: Improved Context-Aware Dense Retrieval for Semantic
  Association and Long Story Comprehension- 
			Paper
			 •- 
			2508.01959
			 •
			Published
				
			•- 
				56
			 
 - Tool-integrated Reinforcement Learning for Repo Deep Search- 
			Paper
			 •- 
			2508.03012
			 •
			Published
				
			•- 
				20
			 
 - InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy
  Optimization- 
			Paper
			 •- 
			2508.05731
			 •
			Published
				
			•- 
				25
			 
 - MeshLLM: Empowering Large Language Models to Progressively Understand
  and Generate 3D Mesh- 
			Paper
			 •- 
			2508.01242
			 •
			Published
				
			•- 
				10
			 
 - Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning- 
			Paper
			 •- 
			2508.08221
			 •
			Published
				
			•- 
				47
			 
 - Reinforcement Learning in Vision: A Survey- 
			Paper
			 •- 
			2508.08189
			 •
			Published
				
			•- 
				28
			 
 - Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with
  Patch-level CLIP Latents- 
			Paper
			 •- 
			2508.05954
			 •
			Published
				
			•- 
				6
			 
 - Feedback-Driven Tool-Use Improvements in Large Language Models via
  Automated Build Environments- 
			Paper
			 •- 
			2508.08791
			 •
			Published
				
			•- 
				16
			 
 - Training Long-Context, Multi-Turn Software Engineering Agents with
  Reinforcement Learning- 
			Paper
			 •- 
			2508.03501
			 •
			Published
				
			•- 
				56
			 
 - Complex Logical Instruction Generation- 
			Paper
			 •- 
			2508.09125
			 •
			Published
				
			•- 
				39
			 
 - Mol-R1: Towards Explicit Long-CoT Reasoning in Molecule Discovery- 
			Paper
			 •- 
			2508.08401
			 •
			Published
				
			•- 
				42
			 
 - Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion
  Forcing- 
			Paper
			 •- 
			2508.09192
			 •
			Published
				
			•- 
				30
			 
 - Inverse-LLaVA: Eliminating Alignment Pre-training Through Text-to-Vision
  Mapping- 
			Paper
			 •- 
			2508.12466
			 •
			Published
				
			•- 
				8
			 
 - Has GPT-5 Achieved Spatial Intelligence? An Empirical Study- 
			Paper
			 •- 
			2508.13142
			 •
			Published
				
			•- 
				34
			 
 - VertexRegen: Mesh Generation with Continuous Level of Detail- 
			Paper
			 •- 
			2508.09062
			 •
			Published
				
			•- 
				37
			 
 - XQuant: Breaking the Memory Wall for LLM Inference with KV Cache
  Rematerialization- 
			Paper
			 •- 
			2508.10395
			 •
			Published
				
			•- 
				42
			 
 - STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer- 
			Paper
			 •- 
			2508.10893
			 •
			Published
				
			•- 
				31
			 
 - FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction- 
			Paper
			 •- 
			2508.11987
			 •
			Published
				
			•- 
				69
			 
 - MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents- 
			Paper
			 •- 
			2508.13186
			 •
			Published
				
			•- 
				18
			 
 - Pass@k Training for Adaptively Balancing Exploration and Exploitation of
  Large Reasoning Models- 
			Paper
			 •- 
			2508.10751
			 •
			Published
				
			•- 
				28
			 
 - UI-Venus Technical Report: Building High-performance UI Agents with RFT- 
			Paper
			 •- 
			2508.10833
			 •
			Published
				
			•- 
				43
			 
 - Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models- 
			Paper
			 •- 
			2508.09968
			 •
			Published
				
			•- 
				15
			 
 - CRINN: Contrastive Reinforcement Learning for Approximate Nearest
  Neighbor Search- 
			Paper
			 •- 
			2508.02091
			 •
			Published
				
			•- 
				13
			 
 - LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on
  Challenging Queries- 
			Paper
			 •- 
			2508.15760
			 •
			Published
				
			•- 
				46
			 
 - Deep Think with Confidence- 
			Paper
			 •- 
			2508.15260
			 •
			Published
				
			•- 
				87
			 
 - DuPO: Enabling Reliable LLM Self-Verification via Dual Preference
  Optimization- 
			Paper
			 •- 
			2508.14460
			 •
			Published
				
			•- 
				82
			 
 - Quantization Meets dLLMs: A Systematic Study of Post-training
  Quantization for Diffusion LLMs- 
			Paper
			 •- 
			2508.14896
			 •
			Published
				
			•- 
				22
			 
 - PosterGen: Aesthetic-Aware Paper-to-Poster Generation via Multi-Agent
  LLMs- 
			Paper
			 •- 
			2508.17188
			 •
			Published
				
			•- 
				17
			 
 - Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement
  Learning for General LLM Reasoning- 
			Paper
			 •- 
			2508.16949
			 •
			Published
				
			•- 
				22
			 
 - Visual-CoG: Stage-Aware Reinforcement Learning with Chain of Guidance
  for Text-to-Image Generation- 
			Paper
			 •- 
			2508.18032
			 •
			Published
				
			•- 
				41
			 
 - Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory
  and Test-Time Compute Scaling- 
			Paper
			 •- 
			2508.16745
			 •
			Published
				
			•- 
				28
			 
 - UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior
  Long-Context Learning- 
			Paper
			 •- 
			2508.18756
			 •
			Published
				
			•- 
				36
			 
 - Do What? Teaching Vision-Language-Action Models to Reject the Impossible- 
			Paper
			 •- 
			2508.16292
			 •
			Published
				
			•- 
				9
			 
 - MeshSplat: Generalizable Sparse-View Surface Reconstruction via Gaussian
  Splatting- 
			Paper
			 •- 
			2508.17811
			 •
			Published
				
			•- 
				6
			 
 - FastMesh:Efficient Artistic Mesh Generation via Component Decoupling- 
			Paper
			 •- 
			2508.19188
			 •
			Published
				
			•- 
				16
			 
 - Spacer: Towards Engineered Scientific Inspiration- 
			Paper
			 •- 
			2508.17661
			 •
			Published
				
			•- 
				32
			 
 - ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large
  Language Models- 
			Paper
			 •- 
			2508.18773
			 •
			Published
				
			•- 
				15
			 
 - VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D
  Space- 
			Paper
			 •- 
			2508.19247
			 •
			Published
				
			•- 
				41
			 
 - TreePO: Bridging the Gap of Policy Optimization and Efficacy and
  Inference Efficiency with Heuristic Tree-based Modeling- 
			Paper
			 •- 
			2508.17445
			 •
			Published
				
			•- 
				80
			 
 - Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable
  Text-to-Image Reinforcement Learning- 
			Paper
			 •- 
			2508.20751
			 •
			Published
				
			•- 
				89
			 
 - Provable Benefits of In-Tool Learning for Large Language Models- 
			Paper
			 •- 
			2508.20755
			 •
			Published
				
			•- 
				11
			 
 - Think in Games: Learning to Reason in Games via Reinforcement Learning
  with Large Language Models- 
			Paper
			 •- 
			2508.21365
			 •
			Published
				
			•- 
				28
			 
 - Efficient Code Embeddings from Code Generation Models- 
			Paper
			 •- 
			2508.21290
			 •
			Published
				
			•- 
				18
			 
 - CLIPSym: Delving into Symmetry Detection with CLIP- 
			Paper
			 •- 
			2508.14197
			 •
			Published
				
			•- 
				8
			 
 - Implicit Actor Critic Coupling via a Supervised Learning Framework for
  RLVR- 
			Paper
			 •- 
			2509.02522
			 •
			Published
				
			•- 
				25
			 
 - SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn
  Tool-Integrated Reasoning- 
			Paper
			 •- 
			2509.02479
			 •
			Published
				
			•- 
				83
			 
 - Universal Deep Research: Bring Your Own Model and Strategy- 
			Paper
			 •- 
			2509.00244
			 •
			Published
				
			•- 
				13
			 
 - LMEnt: A Suite for Analyzing Knowledge in Language Models from
  Pretraining Data to Representations- 
			Paper
			 •- 
			2509.03405
			 •
			Published
				
			•- 
				23
			 
 - Open Data Synthesis For Deep Research- 
			Paper
			 •- 
			2509.00375
			 •
			Published
				
			•- 
				68
			 
 - Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow
  Real Instructions?- 
			Paper
			 •- 
			2509.04292
			 •
			Published
				
			•- 
				57
			 
 - Towards a Unified View of Large Language Model Post-Training- 
			Paper
			 •- 
			2509.04419
			 •
			Published
				
			•- 
				73
			 
 - How Can Input Reformulation Improve Tool Usage Accuracy in a Complex
  Dynamic Environment? A Study on τ-bench- 
			Paper
			 •- 
			2508.20931
			 •
			Published
				
			•- 
				15
			 
 - Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers- 
			Paper
			 •- 
			2509.03059
			 •
			Published
				
			•- 
				24
			 
 - NER Retriever: Zero-Shot Named Entity Retrieval with Type-Aware
  Embeddings- 
			Paper
			 •- 
			2509.04011
			 •
			Published
				
			•- 
				28
			 
 - Symbolic Graphics Programming with Large Language Models- 
			Paper
			 •- 
			2509.05208
			 •
			Published
				
			•- 
				45
			 
 - Bootstrapping Task Spaces for Self-Improvement- 
			Paper
			 •- 
			2509.04575
			 •
			Published
				
			•- 
				5
			 
 - Behavioral Fingerprinting of Large Language Models- 
			Paper
			 •- 
			2509.04504
			 •
			Published
				
			•- 
				5
			 
 - Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM
  Step-Provers- 
			Paper
			 •- 
			2509.06493
			 •
			Published
				
			•- 
				11
			 
 - Reinforcement Learning Foundations for Deep Research Systems: A Survey- 
			Paper
			 •- 
			2509.06733
			 •
			Published
				
			•- 
				31
			 
 - Reconstruction Alignment Improves Unified Multimodal Models- 
			Paper
			 •- 
			2509.07295
			 •
			Published
				
			•- 
				40
			 
 - Visual Representation Alignment for Multimodal Large Language Models- 
			Paper
			 •- 
			2509.07979
			 •
			Published
				
			•- 
				82
			 
 - Revolutionizing Reinforcement Learning Framework for Diffusion Large
  Language Models- 
			Paper
			 •- 
			2509.06949
			 •
			Published
				
			•- 
				56
			 
 - Parallel-R1: Towards Parallel Thinking via Reinforcement Learning- 
			Paper
			 •- 
			2509.07980
			 •
			Published
				
			•- 
				98
			 
 - Towards General Agentic Intelligence via Environment Scaling- 
			Paper
			 •- 
			2509.13311
			 •
			Published
				
			•- 
				70
			 
 - WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic
  Data and Scalable Reinforcement Learning- 
			Paper
			 •- 
			2509.13305
			 •
			Published
				
			•- 
				88
			 
 - SearchInstruct: Enhancing Domain Adaptation via Retrieval-Based
  Instruction Dataset Creation- 
			Paper
			 •- 
			2509.10708
			 •
			Published
				
			•- 
				17
			 
 - HANRAG: Heuristic Accurate Noise-resistant Retrieval-Augmented
  Generation for Multi-hop Question Answering- 
			Paper
			 •- 
			2509.09713
			 •
			Published
				
			•- 
				24
			 
 - FlowRL: Matching Reward Distributions for LLM Reasoning- 
			Paper
			 •- 
			2509.15207
			 •
			Published
				
			•- 
				110
			 
 - Single-stream Policy Optimization- 
			Paper
			 •- 
			2509.13232
			 •
			Published
				
			•- 
				33
			 
 - World Modeling with Probabilistic Structure Integration- 
			Paper
			 •- 
			2509.09737
			 •
			Published
				
			•- 
				13
			 
 - Scrub It Out! Erasing Sensitive Memorization in Code Language Models via
  Machine Unlearning- 
			Paper
			 •- 
			2509.13755
			 •
			Published
				
			•- 
				19
			 
 - C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving
  Reasoning- 
			Paper
			 •- 
			2507.16518
			 •
			Published
				
			•- 
				2
			 
 - WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model
  via Training-Free Guidance- 
			Paper
			 •- 
			2509.15130
			 •
			Published
				
			•- 
				30
			 
 - Evolving Language Models without Labels: Majority Drives Selection,
  Novelty Promotes Variation- 
			Paper
			 •- 
			2509.15194
			 •
			Published
				
			•- 
				33
			 
 - THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical
  Reasoning- 
			Paper
			 •- 
			2509.13761
			 •
			Published
				
			•- 
				16
			 
 - A Vision-Language-Action-Critic Model for Robotic Real-World
  Reinforcement Learning- 
			Paper
			 •- 
			2509.15937
			 •
			Published
				
			•- 
				20
			 
 - BaseReward: A Strong Baseline for Multimodal Reward Model- 
			Paper
			 •- 
			2509.16127
			 •
			Published
				
			•- 
				21
			 
 - MultiEdit: Advancing Instruction-based Image Editing on Diverse and
  Challenging Tasks- 
			Paper
			 •- 
			2509.14638
			 •
			Published
				
			•- 
				11
			 
 - Video2Roleplay: A Multimodal Dataset and Framework for Video-Guided
  Role-playing Agents- 
			Paper
			 •- 
			2509.15233
			 •
			Published
				
			•- 
				2
			 
 - MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid
  Vision Tokenizer- 
			Paper
			 •- 
			2509.16197
			 •
			Published
				
			•- 
				52
			 
 - Latent Zoning Network: A Unified Principle for Generative Modeling,
  Representation Learning, and Classification- 
			Paper
			 •- 
			2509.15591
			 •
			Published
				
			•- 
				45
			 
 - BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent- 
			Paper
			 •- 
			2509.15566
			 •
			Published
				
			•- 
				14
			 
 - 
			Paper
			 •- 
			2509.17336
			 •
			Published
				
			•- 
				9
			 
 - Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary
  Feedback- 
			Paper
			 •- 
			2501.10799
			 •
			Published
				
			•- 
				15
			 
 - Table as Thought: Exploring Structured Thoughts in LLM Reasoning- 
			Paper
			 •- 
			2501.02152
			 •
			Published
 - Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning- 
			Paper
			 •- 
			2412.09078
			 •
			Published
 - TinyThinker: Distilling Reasoning through Coarse-to-Fine Knowledge
  Internalization with Self-Reflection- 
			Paper
			 •- 
			2412.08024
			 •
			Published
				
			•- 
				1
			 
 - LLM2: Let Large Language Models Harness System 2 Reasoning- 
			Paper
			 •- 
			2412.20372
			 •
			Published
 - Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large
  Language Models via a Multi-Paradigm Perspective- 
			Paper
			 •- 
			2501.11110
			 •
			Published
				
			•- 
				4
			 
 - Ensembling Large Language Models with Process Reward-Guided Tree Search
  for Better Complex Reasoning- 
			Paper
			 •- 
			2412.15797
			 •
			Published
				
			•- 
				18
			 
 - RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented
  Verification and Refinement- 
			Paper
			 •- 
			2412.12881
			 •
			Published
				
			•- 
				2
			 
 - OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion
  Transformer Models- 
			Paper
			 •- 
			2509.17627
			 •
			Published
				
			•- 
				65
			 
 - Hyper-Bagel: A Unified Acceleration Framework for Multimodal
  Understanding and Generation- 
			Paper
			 •- 
			2509.18824
			 •
			Published
				
			•- 
				22
			 
 - Understanding the Thinking Process of Reasoning Models: A Perspective
  from Schoenfeld's Episode Theory- 
			Paper
			 •- 
			2509.14662
			 •
			Published
				
			•- 
				13
			 
 - VCRL: Variance-based Curriculum Reinforcement Learning for Large
  Language Models- 
			Paper
			 •- 
			2509.19803
			 •
			Published
				
			•- 
				117
			 
 - Tree Search for LLM Agent Reinforcement Learning- 
			Paper
			 •- 
			2509.21240
			 •
			Published
				
			•- 
				87
			 
 - SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable
  Sparse-Linear Attention- 
			Paper
			 •- 
			2509.24006
			 •
			Published
				
			•- 
				114
			 
 - Fine-tuning Done Right in Model Editing- 
			Paper
			 •- 
			2509.22072
			 •
			Published
				
			•- 
				28
			 
 - No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM
  Reinforcement Learning via Entropy-Guided Advantage Shaping- 
			Paper
			 •- 
			2509.21880
			 •
			Published
				
			•- 
				44
			 
 - LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale
  Diffusion Transformer- 
			Paper
			 •- 
			2509.22414
			 •
			Published
				
			•- 
				21
			 
 - Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive
  Exploration for Agentic Reinforcement Learning- 
			Paper
			 •- 
			2509.22601
			 •
			Published
				
			•- 
				29
			 
 - EPO: Entropy-regularized Policy Optimization for LLM Agents
  Reinforcement Learning- 
			Paper
			 •- 
			2509.22576
			 •
			Published
				
			•- 
				132
			 
 - Variational Reasoning for Language Models- 
			Paper
			 •- 
			2509.22637
			 •
			Published
				
			•- 
				68
			 
 - AutoIntent: AutoML for Text Classification- 
			Paper
			 •- 
			2509.21138
			 •
			Published
				
			•- 
				32
			 
 - TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning- 
			Paper
			 •- 
			2509.25760
			 •
			Published
				
			•- 
				52
			 
 - Attention as a Compass: Efficient Exploration for Process-Supervised RL
  in Reasoning Models- 
			Paper
			 •- 
			2509.26628
			 •
			Published
				
			•- 
				14
			 
 - Sequential Diffusion Language Models- 
			Paper
			 •- 
			2509.24007
			 •
			Published
				
			•- 
				42
			 
 - ReviewScore: Misinformed Peer Review Detection with Large Language
  Models- 
			Paper
			 •- 
			2509.21679
			 •
			Published
				
			•- 
				63
			 
 - ReviewRL: Towards Automated Scientific Review with RL- 
			Paper
			 •- 
			2508.10308
			 •
			Published
 - ReportBench: Evaluating Deep Research Agents via Academic Survey Tasks- 
			Paper
			 •- 
			2508.15804
			 •
			Published
				
			•- 
				15
			 
 - DeepSearch: Overcome the Bottleneck of Reinforcement Learning with
  Verifiable Rewards via Monte Carlo Tree Search- 
			Paper
			 •- 
			2509.25454
			 •
			Published
				
			•- 
				133
			 
 - Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget
  Allocation- 
			Paper
			 •- 
			2509.25849
			 •
			Published
				
			•- 
				46
			 
 - BroRL: Scaling Reinforcement Learning via Broadened Exploration- 
			Paper
			 •- 
			2510.01180
			 •
			Published
				
			•- 
				17
			 
 - GEM: A Gym for Agentic LLMs- 
			Paper
			 •- 
			2510.01051
			 •
			Published
				
			•- 
				86
			 
 - Interactive Training: Feedback-Driven Neural Network Optimization- 
			Paper
			 •- 
			2510.02297
			 •
			Published
				
			•- 
				41
			 
 - More Thought, Less Accuracy? On the Dual Nature of Reasoning in
  Vision-Language Models- 
			Paper
			 •- 
			2509.25848
			 •
			Published
				
			•- 
				77
			 
 - CLUE: Non-parametric Verification from Experience via Hidden-State
  Clustering- 
			Paper
			 •- 
			2510.01591
			 •
			Published
				
			•- 
				26
			 
 - LongCodeZip: Compress Long Context for Code Language Models- 
			Paper
			 •- 
			2510.00446
			 •
			Published
				
			•- 
				106
			 
 - Efficient Multi-modal Large Language Models via Progressive Consistency
  Distillation- 
			Paper
			 •- 
			2510.00515
			 •
			Published
				
			•- 
				39
			 
 - Reactive Transformer (RxT) -- Stateful Real-Time Processing for
  Event-Driven Reactive Language Models- 
			Paper
			 •- 
			2510.03561
			 •
			Published
				
			•- 
				23
			 
 - Large Language Models as Optimizers- 
			Paper
			 •- 
			2309.03409
			 •
			Published
				
			•- 
				77
			 
 - Connecting Large Language Models with Evolutionary Algorithms Yields
  Powerful Prompt Optimizers- 
			Paper
			 •- 
			2309.08532
			 •
			Published
				
			•- 
				53
			 
 - PanGu-Coder2: Boosting Large Language Models for Code with Ranking
  Feedback- 
			Paper
			 •- 
			2307.14936
			 •
			Published
				
			•- 
				41
			 
 - Factuality Matters: When Image Generation and Editing Meet Structured
  Visuals- 
			Paper
			 •- 
			2510.05091
			 •
			Published
				
			•- 
				17
			 
 - Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM
  Training- 
			Paper
			 •- 
			2510.04996
			 •
			Published
				
			•- 
				15
			 
 - Paper2Video: Automatic Video Generation from Scientific Papers- 
			Paper
			 •- 
			2510.05096
			 •
			Published
				
			•- 
				106
			 
 - SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior
  Reasoning LLMs- 
			Paper
			 •- 
			2510.05069
			 •
			Published
				
			•- 
				12
			 
 - MITS: Enhanced Tree Search Reasoning for LLMs via Pointwise Mutual
  Information- 
			Paper
			 •- 
			2510.03632
			 •
			Published
				
			•- 
				40
			 
 - Large Reasoning Models Learn Better Alignment from Flawed Thinking- 
			Paper
			 •- 
			2510.00938
			 •
			Published
				
			•- 
				56
			 
 - Less is More: Recursive Reasoning with Tiny Networks- 
			Paper
			 •- 
			2510.04871
			 •
			Published
				
			•- 
				445
			 
 - Multi-Agent Tool-Integrated Policy Optimization- 
			Paper
			 •- 
			2510.04678
			 •
			Published
				
			•- 
				30
			 
 - Agent Learning via Early Experience- 
			Paper
			 •- 
			2510.08558
			 •
			Published
				
			•- 
				241
			 
 - Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large
  Multimodal Models- 
			Paper
			 •- 
			2510.05034
			 •
			Published
				
			•- 
				45
			 
 - Low-probability Tokens Sustain Exploration in Reinforcement Learning
  with Verifiable Reward- 
			Paper
			 •- 
			2510.03222
			 •
			Published
				
			•- 
				44
			 
 - QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning
  for LLMs- 
			Paper
			 •- 
			2510.11696
			 •
			Published
				
			•- 
				165
			 
 - PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs- 
			Paper
			 •- 
			2510.09507
			 •
			Published
				
			•- 
				10
			 
 - Agentic Context Engineering: Evolving Contexts for Self-Improving
  Language Models- 
			Paper
			 •- 
			2510.04618
			 •
			Published
				
			•- 
				105
			 
 - Better Together: Leveraging Unpaired Multimodal Data for Stronger
  Unimodal Models- 
			Paper
			 •- 
			2510.08492
			 •
			Published
				
			•- 
				7
			 
 - Dyna-Mind: Learning to Simulate from Experience for Better AI Agents- 
			Paper
			 •- 
			2510.09577
			 •
			Published
				
			•- 
				6
			 
 - BigCodeArena: Unveiling More Reliable Human Preferences in Code
  Generation via Execution- 
			Paper
			 •- 
			2510.08697
			 •
			Published
				
			•- 
				32
			 
 - Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for
  MLLMs- 
			Paper
			 •- 
			2510.09201
			 •
			Published
				
			•- 
				46
			 
 - Diffusion Transformers with Representation Autoencoders- 
			Paper
			 •- 
			2510.11690
			 •
			Published
				
			•- 
				157
			 
 - UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning- 
			Paper
			 •- 
			2510.13515
			 •
			Published
				
			•- 
				11
			 
 - Advancing End-to-End Pixel Space Generative Modeling via Self-supervised
  Pre-training- 
			Paper
			 •- 
			2510.12586
			 •
			Published
				
			•- 
				106
			 
 - Understanding DeepResearch via Reports- 
			Paper
			 •- 
			2510.07861
			 •
			Published
				
			•- 
				6
			 
 - RAG-Anything: All-in-One RAG Framework- 
			Paper
			 •- 
			2510.12323
			 •
			Published
				
			•- 
				39
			 
 - The Art of Scaling Reinforcement Learning Compute for LLMs- 
			Paper
			 •- 
			2510.13786
			 •
			Published
				
			•- 
				30
			 
 - Glyph: Scaling Context Windows via Visual-Text Compression- 
			Paper
			 •- 
			2510.17800
			 •
			Published
				
			•- 
				57
			 
 - LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts- 
			Paper
			 •- 
			2510.19363
			 •
			Published
				
			•- 
				54
			 
 - Unified Reinforcement and Imitation Learning for Vision-Language Models- 
			Paper
			 •- 
			2510.19307
			 •
			Published
				
			•- 
				22
			 
 - Attention Is All You Need for KV Cache in Diffusion LLMs- 
			Paper
			 •- 
			2510.14973
			 •
			Published
				
			•- 
				35
			 
 - Information Gain-based Policy Optimization: A Simple and Effective
  Approach for Multi-Turn LLM Agents- 
			Paper
			 •- 
			2510.14967
			 •
			Published
				
			•- 
				32
			 
 - Video Reasoning without Training- 
			Paper
			 •- 
			2510.17045
			 •
			Published
				
			•- 
				6
			 
 - AdaSPEC: Selective Knowledge Distillation for Efficient Speculative
  Decoders- 
			Paper
			 •- 
			2510.19779
			 •
			Published
				
			•- 
				55