JuanRafap
's Collections
Library
updated
lusxvr/nanoVLM-222M
Image-Text-to-Text
•
0.2B
•
Updated
•
249
•
96
Search-R1: Training LLMs to Reason and Leverage Search Engines with
Reinforcement Learning
Paper
•
2503.09516
•
Published
•
36
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper
•
2505.24863
•
Published
•
97
QwenLong-L1: Towards Long-Context Large Reasoning Models with
Reinforcement Learning
Paper
•
2505.17667
•
Published
•
88
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in
Large Language Models
Paper
•
2505.24864
•
Published
•
138
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for
Language Reasoning
Paper
•
2505.24298
•
Published
•
28
GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient
Fine-Tuning
Paper
•
2505.20355
•
Published
•
35
Interleaved Reasoning for Large Language Models via Reinforcement
Learning
Paper
•
2505.19640
•
Published
•
14
FullFront: Benchmarking MLLMs Across the Full Front-End Engineering
Workflow
Paper
•
2505.17399
•
Published
•
14
Enigmata: Scaling Logical Reasoning in Large Language Models with
Synthetic Verifiable Puzzles
Paper
•
2505.19914
•
Published
•
43
One RL to See Them All: Visual Triple Unified Reinforcement Learning
Paper
•
2505.18129
•
Published
•
59
Scaling Reasoning, Losing Control: Evaluating Instruction Following in
Large Reasoning Models
Paper
•
2505.14810
•
Published
•
62
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement
Learning
Paper
•
2505.16410
•
Published
•
57
JULI: Jailbreak Large Language Models by Self-Introspection
Paper
•
2505.11790
•
Published
Optimizing Anytime Reasoning via Budget Relative Policy Optimization
Paper
•
2505.13438
•
Published
•
36
Paper
•
2505.14674
•
Published
•
37
RM-R1: Reward Modeling as Reasoning
Paper
•
2505.02387
•
Published
•
78
CPGD: Toward Stable Rule-based Reinforcement Learning for Language
Models
Paper
•
2505.12504
•
Published
•
24
Neuro-Symbolic Query Compiler
Paper
•
2505.11932
•
Published
•
18
Ψ-Sampler: Initial Particle Sampling for SMC-Based Inference-Time
Reward Alignment in Score Models
Paper
•
2506.01320
•
Published
•
16
Aligning Latent Spaces with Flow Priors
Paper
•
2506.05240
•
Published
•
27
Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in
Robotics
Paper
•
2506.00070
•
Published
•
29
A Controllable Examination for Long-Context Language Models
Paper
•
2506.02921
•
Published
•
33
MotionSight: Boosting Fine-Grained Motion Understanding in Multimodal
LLMs
Paper
•
2506.01674
•
Published
•
28
CodeContests+: High-Quality Test Case Generation for Competitive
Programming
Paper
•
2506.05817
•
Published
•
9
FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal
Contextual Fusion
Paper
•
2506.01111
•
Published
•
30
Reinforcement Pre-Training
Paper
•
2506.08007
•
Published
•
262
GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection
Behavior
Paper
•
2506.08012
•
Published
•
7
Dreamland: Controllable World Creation with Simulator and Generative
Models
Paper
•
2506.08006
•
Published
•
7
Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety
Assurance
Paper
•
2506.06444
•
Published
•
73
BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation
Paper
•
2506.07530
•
Published
•
20
Solving Inequality Proofs with Large Language Models
Paper
•
2506.07927
•
Published
•
20
Autoregressive Semantic Visual Reconstruction Helps VLMs Understand
Better
Paper
•
2506.09040
•
Published
•
34
Through the Valley: Path to Effective Long CoT Training for Small
Language Models
Paper
•
2506.07712
•
Published
•
18
Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports
From Scratch with Agentic Framework
Paper
•
2506.02454
•
Published
•
7
Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error
Diagnosis in GUI Automation
Paper
•
2506.04614
•
Published
•
18
Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal
Learning
Paper
•
2506.06205
•
Published
•
30
Paper
•
2506.10910
•
Published
•
64
Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just
Like an Olympiad Team
Paper
•
2506.14234
•
Published
•
41
Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time
Markers
Paper
•
2506.14702
•
Published
•
3
AR-RAG: Autoregressive Retrieval Augmentation for Image Generation
Paper
•
2506.06962
•
Published
•
28
DoTA-RAG: Dynamic of Thought Aggregation RAG
Paper
•
2506.12571
•
Published
•
50
syftr: Pareto-Optimal Generative AI
Paper
•
2505.20266
•
Published
Scaling Test-time Compute for LLM Agents
Paper
•
2506.12928
•
Published
•
63
LoRA-Edit: Controllable First-Frame-Guided Video Editing via Mask-Aware
LoRA Fine-Tuning
Paper
•
2506.10082
•
Published
•
8
General-Reasoner: Advancing LLM Reasoning Across All Domains
Paper
•
2505.14652
•
Published
•
23
Optimizing Length Compression in Large Reasoning Models
Paper
•
2506.14755
•
Published
•
10
UniFork: Exploring Modality Alignment for Unified Multimodal
Understanding and Generation
Paper
•
2506.17202
•
Published
•
10
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought
Reasoning in LLMs
Paper
•
2506.18896
•
Published
•
29
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal
Document Understanding
Paper
•
2506.16035
•
Published
•
88
Robust Reward Modeling via Causal Rubrics
Paper
•
2506.16507
•
Published
•
9
LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with
TriMap Video Diffusion
Paper
•
2507.02813
•
Published
•
60
FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model
Paper
•
2507.01953
•
Published
•
19
Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective
Paper
•
2506.17930
•
Published
•
19
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via
Multi-Agent Multi-Turn Reinforcement Learning
Paper
•
2506.24119
•
Published
•
50
katanemo/Arch-Router-1.5B
Text Generation
•
2B
•
Updated
•
2.45k
•
•
207
Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs
More Realistic and Less Risky
Paper
•
2507.03336
•
Published
•
5
SingLoRA: Low Rank Adaptation Using a Single Matrix
Paper
•
2507.05566
•
Published
•
112
CriticLean: Critic-Guided Reinforcement Learning for Mathematical
Formalization
Paper
•
2507.06181
•
Published
•
43
AutoTriton: Automatic Triton Programming with Reinforcement Learning in
LLMs
Paper
•
2507.05687
•
Published
•
27
Coding Triangle: How Does Large Language Model Understand Code?
Paper
•
2507.06138
•
Published
•
21
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based
Reinforcement Learning
Paper
•
2507.05920
•
Published
•
11
RefineX: Learning to Refine Pre-training Data at Scale from
Expert-Guided Programs
Paper
•
2507.03253
•
Published
•
18
Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs
Paper
•
2507.07996
•
Published
•
34
Lumos-1: On Autoregressive Video Generation from a Unified Model
Perspective
Paper
•
2507.08801
•
Published
•
30
A Survey of Context Engineering for Large Language Models
Paper
•
2507.13334
•
Published
•
257
WebShaper: Agentically Data Synthesizing via Information-Seeking
Formalization
Paper
•
2507.15061
•
Published
•
59
AnyCap Project: A Unified Framework, Dataset, and Benchmark for
Controllable Omni-modal Captioning
Paper
•
2507.12841
•
Published
•
41
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
Paper
•
2507.16784
•
Published
•
120
MUR: Momentum Uncertainty guided Reasoning for Large Language Models
Paper
•
2507.14958
•
Published
•
46
Does More Inference-Time Compute Really Help Robustness?
Paper
•
2507.15974
•
Published
•
7
RefCritic: Training Long Chain-of-Thought Critic Models with Refinement
Feedback
Paper
•
2507.15024
•
Published
•
14
ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via
Gaussian Splatting
Paper
•
2507.15454
•
Published
•
7
Promptomatix: An Automatic Prompt Optimization Framework for Large
Language Models
Paper
•
2507.14241
•
Published
•
17
TTS-VAR: A Test-Time Scaling Framework for Visual Auto-Regressive
Generation
Paper
•
2507.18537
•
Published
•
17
Being-H0: Vision-Language-Action Pretraining from Large-Scale Human
Videos
Paper
•
2507.15597
•
Published
•
33
A Simple "Try Again" Can Elicit Multi-Turn LLM Reasoning
Paper
•
2507.14295
•
Published
•
13
SeC: Advancing Complex Video Object Segmentation via Progressive Concept
Construction
Paper
•
2507.15852
•
Published
•
38
FLEXITOKENS: Flexible Tokenization for Evolving Language Models
Paper
•
2507.12720
•
Published
•
9
RiemannLoRA: A Unified Riemannian Framework for Ambiguity-Free LoRA
Optimization
Paper
•
2507.12142
•
Published
•
36
Replacing thinking with tool usage enables reasoning in small language
models
Paper
•
2507.05065
•
Published
•
15
Lizard: An Efficient Linearization Framework for Large Language Models
Paper
•
2507.09025
•
Published
•
18
MemOS: A Memory OS for AI System
Paper
•
2507.03724
•
Published
•
153
Agentic Reinforced Policy Optimization
Paper
•
2507.19849
•
Published
•
156
Deep Researcher with Test-Time Diffusion
Paper
•
2507.16075
•
Published
•
64
SmallThinker: A Family of Efficient Large Language Models Natively
Trained for Local Deployment
Paper
•
2507.20984
•
Published
•
56
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI
Agents
Paper
•
2507.19478
•
Published
•
30
Geometric-Mean Policy Optimization
Paper
•
2507.20673
•
Published
•
31
UloRL:An Ultra-Long Output Reinforcement Learning Approach for Advancing
Large Language Models' Reasoning Abilities
Paper
•
2507.19766
•
Published
•
14
VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced
Multimodal Reasoning
Paper
•
2507.22607
•
Published
•
46
Beyond Fixed: Variable-Length Denoising for Diffusion Large Language
Models
Paper
•
2508.00819
•
Published
•
62
Beyond the Trade-off: Self-Supervised Reinforcement Learning for
Reasoning Models' Instruction Following
Paper
•
2508.02150
•
Published
•
36
Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent
Foundation Models Training
Paper
•
2508.00414
•
Published
•
91
On the Expressiveness of Softmax Attention: A Recurrent Neural Network
Perspective
Paper
•
2507.23632
•
Published
•
6
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving
Paper
•
2507.23726
•
Published
•
112
SitEmb-v1.5: Improved Context-Aware Dense Retrieval for Semantic
Association and Long Story Comprehension
Paper
•
2508.01959
•
Published
•
56
Tool-integrated Reinforcement Learning for Repo Deep Search
Paper
•
2508.03012
•
Published
•
20
InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy
Optimization
Paper
•
2508.05731
•
Published
•
25
MeshLLM: Empowering Large Language Models to Progressively Understand
and Generate 3D Mesh
Paper
•
2508.01242
•
Published
•
10
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
Paper
•
2508.08221
•
Published
•
47
Reinforcement Learning in Vision: A Survey
Paper
•
2508.08189
•
Published
•
28
Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with
Patch-level CLIP Latents
Paper
•
2508.05954
•
Published
•
6
Feedback-Driven Tool-Use Improvements in Large Language Models via
Automated Build Environments
Paper
•
2508.08791
•
Published
•
16
Training Long-Context, Multi-Turn Software Engineering Agents with
Reinforcement Learning
Paper
•
2508.03501
•
Published
•
56
Complex Logical Instruction Generation
Paper
•
2508.09125
•
Published
•
39
Mol-R1: Towards Explicit Long-CoT Reasoning in Molecule Discovery
Paper
•
2508.08401
•
Published
•
42
Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion
Forcing
Paper
•
2508.09192
•
Published
•
30
Inverse-LLaVA: Eliminating Alignment Pre-training Through Text-to-Vision
Mapping
Paper
•
2508.12466
•
Published
•
8
Has GPT-5 Achieved Spatial Intelligence? An Empirical Study
Paper
•
2508.13142
•
Published
•
34
VertexRegen: Mesh Generation with Continuous Level of Detail
Paper
•
2508.09062
•
Published
•
37
XQuant: Breaking the Memory Wall for LLM Inference with KV Cache
Rematerialization
Paper
•
2508.10395
•
Published
•
42
STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer
Paper
•
2508.10893
•
Published
•
31
FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction
Paper
•
2508.11987
•
Published
•
69
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper
•
2508.13186
•
Published
•
18
Pass@k Training for Adaptively Balancing Exploration and Exploitation of
Large Reasoning Models
Paper
•
2508.10751
•
Published
•
28
UI-Venus Technical Report: Building High-performance UI Agents with RFT
Paper
•
2508.10833
•
Published
•
43
Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models
Paper
•
2508.09968
•
Published
•
15
CRINN: Contrastive Reinforcement Learning for Approximate Nearest
Neighbor Search
Paper
•
2508.02091
•
Published
•
13
LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on
Challenging Queries
Paper
•
2508.15760
•
Published
•
46
Deep Think with Confidence
Paper
•
2508.15260
•
Published
•
87
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference
Optimization
Paper
•
2508.14460
•
Published
•
82
Quantization Meets dLLMs: A Systematic Study of Post-training
Quantization for Diffusion LLMs
Paper
•
2508.14896
•
Published
•
22
PosterGen: Aesthetic-Aware Paper-to-Poster Generation via Multi-Agent
LLMs
Paper
•
2508.17188
•
Published
•
17
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement
Learning for General LLM Reasoning
Paper
•
2508.16949
•
Published
•
22
Visual-CoG: Stage-Aware Reinforcement Learning with Chain of Guidance
for Text-to-Image Generation
Paper
•
2508.18032
•
Published
•
41
Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory
and Test-Time Compute Scaling
Paper
•
2508.16745
•
Published
•
28
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior
Long-Context Learning
Paper
•
2508.18756
•
Published
•
36
Do What? Teaching Vision-Language-Action Models to Reject the Impossible
Paper
•
2508.16292
•
Published
•
9
MeshSplat: Generalizable Sparse-View Surface Reconstruction via Gaussian
Splatting
Paper
•
2508.17811
•
Published
•
6
FastMesh:Efficient Artistic Mesh Generation via Component Decoupling
Paper
•
2508.19188
•
Published
•
16
Spacer: Towards Engineered Scientific Inspiration
Paper
•
2508.17661
•
Published
•
32
ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large
Language Models
Paper
•
2508.18773
•
Published
•
15
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D
Space
Paper
•
2508.19247
•
Published
•
41
TreePO: Bridging the Gap of Policy Optimization and Efficacy and
Inference Efficiency with Heuristic Tree-based Modeling
Paper
•
2508.17445
•
Published
•
80
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable
Text-to-Image Reinforcement Learning
Paper
•
2508.20751
•
Published
•
89
Provable Benefits of In-Tool Learning for Large Language Models
Paper
•
2508.20755
•
Published
•
11
Think in Games: Learning to Reason in Games via Reinforcement Learning
with Large Language Models
Paper
•
2508.21365
•
Published
•
28
Efficient Code Embeddings from Code Generation Models
Paper
•
2508.21290
•
Published
•
18
CLIPSym: Delving into Symmetry Detection with CLIP
Paper
•
2508.14197
•
Published
•
8
Implicit Actor Critic Coupling via a Supervised Learning Framework for
RLVR
Paper
•
2509.02522
•
Published
•
25
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn
Tool-Integrated Reasoning
Paper
•
2509.02479
•
Published
•
83
Universal Deep Research: Bring Your Own Model and Strategy
Paper
•
2509.00244
•
Published
•
13
LMEnt: A Suite for Analyzing Knowledge in Language Models from
Pretraining Data to Representations
Paper
•
2509.03405
•
Published
•
23
Open Data Synthesis For Deep Research
Paper
•
2509.00375
•
Published
•
68
Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow
Real Instructions?
Paper
•
2509.04292
•
Published
•
57
Towards a Unified View of Large Language Model Post-Training
Paper
•
2509.04419
•
Published
•
73
How Can Input Reformulation Improve Tool Usage Accuracy in a Complex
Dynamic Environment? A Study on τ-bench
Paper
•
2508.20931
•
Published
•
15
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers
Paper
•
2509.03059
•
Published
•
24
NER Retriever: Zero-Shot Named Entity Retrieval with Type-Aware
Embeddings
Paper
•
2509.04011
•
Published
•
28
Symbolic Graphics Programming with Large Language Models
Paper
•
2509.05208
•
Published
•
45
Bootstrapping Task Spaces for Self-Improvement
Paper
•
2509.04575
•
Published
•
5
Behavioral Fingerprinting of Large Language Models
Paper
•
2509.04504
•
Published
•
5
Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM
Step-Provers
Paper
•
2509.06493
•
Published
•
11
Reinforcement Learning Foundations for Deep Research Systems: A Survey
Paper
•
2509.06733
•
Published
•
31
Reconstruction Alignment Improves Unified Multimodal Models
Paper
•
2509.07295
•
Published
•
40
Visual Representation Alignment for Multimodal Large Language Models
Paper
•
2509.07979
•
Published
•
82
Revolutionizing Reinforcement Learning Framework for Diffusion Large
Language Models
Paper
•
2509.06949
•
Published
•
56
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper
•
2509.07980
•
Published
•
98
Towards General Agentic Intelligence via Environment Scaling
Paper
•
2509.13311
•
Published
•
70
WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic
Data and Scalable Reinforcement Learning
Paper
•
2509.13305
•
Published
•
88
SearchInstruct: Enhancing Domain Adaptation via Retrieval-Based
Instruction Dataset Creation
Paper
•
2509.10708
•
Published
•
17
HANRAG: Heuristic Accurate Noise-resistant Retrieval-Augmented
Generation for Multi-hop Question Answering
Paper
•
2509.09713
•
Published
•
24
FlowRL: Matching Reward Distributions for LLM Reasoning
Paper
•
2509.15207
•
Published
•
110
Single-stream Policy Optimization
Paper
•
2509.13232
•
Published
•
33
World Modeling with Probabilistic Structure Integration
Paper
•
2509.09737
•
Published
•
13
Scrub It Out! Erasing Sensitive Memorization in Code Language Models via
Machine Unlearning
Paper
•
2509.13755
•
Published
•
19
C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving
Reasoning
Paper
•
2507.16518
•
Published
•
2
WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model
via Training-Free Guidance
Paper
•
2509.15130
•
Published
•
30
Evolving Language Models without Labels: Majority Drives Selection,
Novelty Promotes Variation
Paper
•
2509.15194
•
Published
•
33
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical
Reasoning
Paper
•
2509.13761
•
Published
•
16
A Vision-Language-Action-Critic Model for Robotic Real-World
Reinforcement Learning
Paper
•
2509.15937
•
Published
•
20
BaseReward: A Strong Baseline for Multimodal Reward Model
Paper
•
2509.16127
•
Published
•
21
MultiEdit: Advancing Instruction-based Image Editing on Diverse and
Challenging Tasks
Paper
•
2509.14638
•
Published
•
11
Video2Roleplay: A Multimodal Dataset and Framework for Video-Guided
Role-playing Agents
Paper
•
2509.15233
•
Published
•
2
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid
Vision Tokenizer
Paper
•
2509.16197
•
Published
•
52
Latent Zoning Network: A Unified Principle for Generative Modeling,
Representation Learning, and Classification
Paper
•
2509.15591
•
Published
•
45
BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent
Paper
•
2509.15566
•
Published
•
14
Paper
•
2509.17336
•
Published
•
9
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary
Feedback
Paper
•
2501.10799
•
Published
•
15
Table as Thought: Exploring Structured Thoughts in LLM Reasoning
Paper
•
2501.02152
•
Published
Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning
Paper
•
2412.09078
•
Published
TinyThinker: Distilling Reasoning through Coarse-to-Fine Knowledge
Internalization with Self-Reflection
Paper
•
2412.08024
•
Published
•
1
LLM2: Let Large Language Models Harness System 2 Reasoning
Paper
•
2412.20372
•
Published
Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large
Language Models via a Multi-Paradigm Perspective
Paper
•
2501.11110
•
Published
•
4
Ensembling Large Language Models with Process Reward-Guided Tree Search
for Better Complex Reasoning
Paper
•
2412.15797
•
Published
•
18
RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented
Verification and Refinement
Paper
•
2412.12881
•
Published
•
2
OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion
Transformer Models
Paper
•
2509.17627
•
Published
•
65
Hyper-Bagel: A Unified Acceleration Framework for Multimodal
Understanding and Generation
Paper
•
2509.18824
•
Published
•
22
Understanding the Thinking Process of Reasoning Models: A Perspective
from Schoenfeld's Episode Theory
Paper
•
2509.14662
•
Published
•
13
VCRL: Variance-based Curriculum Reinforcement Learning for Large
Language Models
Paper
•
2509.19803
•
Published
•
117
Tree Search for LLM Agent Reinforcement Learning
Paper
•
2509.21240
•
Published
•
87
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable
Sparse-Linear Attention
Paper
•
2509.24006
•
Published
•
114
Fine-tuning Done Right in Model Editing
Paper
•
2509.22072
•
Published
•
28
No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM
Reinforcement Learning via Entropy-Guided Advantage Shaping
Paper
•
2509.21880
•
Published
•
44
LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale
Diffusion Transformer
Paper
•
2509.22414
•
Published
•
21
Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive
Exploration for Agentic Reinforcement Learning
Paper
•
2509.22601
•
Published
•
29
EPO: Entropy-regularized Policy Optimization for LLM Agents
Reinforcement Learning
Paper
•
2509.22576
•
Published
•
132
Variational Reasoning for Language Models
Paper
•
2509.22637
•
Published
•
68
AutoIntent: AutoML for Text Classification
Paper
•
2509.21138
•
Published
•
32
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning
Paper
•
2509.25760
•
Published
•
52
Attention as a Compass: Efficient Exploration for Process-Supervised RL
in Reasoning Models
Paper
•
2509.26628
•
Published
•
14
Sequential Diffusion Language Models
Paper
•
2509.24007
•
Published
•
42
ReviewScore: Misinformed Peer Review Detection with Large Language
Models
Paper
•
2509.21679
•
Published
•
63
ReviewRL: Towards Automated Scientific Review with RL
Paper
•
2508.10308
•
Published
ReportBench: Evaluating Deep Research Agents via Academic Survey Tasks
Paper
•
2508.15804
•
Published
•
15
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with
Verifiable Rewards via Monte Carlo Tree Search
Paper
•
2509.25454
•
Published
•
133
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget
Allocation
Paper
•
2509.25849
•
Published
•
46
BroRL: Scaling Reinforcement Learning via Broadened Exploration
Paper
•
2510.01180
•
Published
•
17
GEM: A Gym for Agentic LLMs
Paper
•
2510.01051
•
Published
•
86
Interactive Training: Feedback-Driven Neural Network Optimization
Paper
•
2510.02297
•
Published
•
41
More Thought, Less Accuracy? On the Dual Nature of Reasoning in
Vision-Language Models
Paper
•
2509.25848
•
Published
•
77
CLUE: Non-parametric Verification from Experience via Hidden-State
Clustering
Paper
•
2510.01591
•
Published
•
26
LongCodeZip: Compress Long Context for Code Language Models
Paper
•
2510.00446
•
Published
•
106
Efficient Multi-modal Large Language Models via Progressive Consistency
Distillation
Paper
•
2510.00515
•
Published
•
39
Reactive Transformer (RxT) -- Stateful Real-Time Processing for
Event-Driven Reactive Language Models
Paper
•
2510.03561
•
Published
•
23
Large Language Models as Optimizers
Paper
•
2309.03409
•
Published
•
77
Connecting Large Language Models with Evolutionary Algorithms Yields
Powerful Prompt Optimizers
Paper
•
2309.08532
•
Published
•
53
PanGu-Coder2: Boosting Large Language Models for Code with Ranking
Feedback
Paper
•
2307.14936
•
Published
•
41
Factuality Matters: When Image Generation and Editing Meet Structured
Visuals
Paper
•
2510.05091
•
Published
•
17
Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM
Training
Paper
•
2510.04996
•
Published
•
15
Paper2Video: Automatic Video Generation from Scientific Papers
Paper
•
2510.05096
•
Published
•
106
SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior
Reasoning LLMs
Paper
•
2510.05069
•
Published
•
12
MITS: Enhanced Tree Search Reasoning for LLMs via Pointwise Mutual
Information
Paper
•
2510.03632
•
Published
•
40
Large Reasoning Models Learn Better Alignment from Flawed Thinking
Paper
•
2510.00938
•
Published
•
56
Less is More: Recursive Reasoning with Tiny Networks
Paper
•
2510.04871
•
Published
•
445
Multi-Agent Tool-Integrated Policy Optimization
Paper
•
2510.04678
•
Published
•
30
Agent Learning via Early Experience
Paper
•
2510.08558
•
Published
•
241
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large
Multimodal Models
Paper
•
2510.05034
•
Published
•
45
Low-probability Tokens Sustain Exploration in Reinforcement Learning
with Verifiable Reward
Paper
•
2510.03222
•
Published
•
44
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning
for LLMs
Paper
•
2510.11696
•
Published
•
165
PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs
Paper
•
2510.09507
•
Published
•
10
Agentic Context Engineering: Evolving Contexts for Self-Improving
Language Models
Paper
•
2510.04618
•
Published
•
105
Better Together: Leveraging Unpaired Multimodal Data for Stronger
Unimodal Models
Paper
•
2510.08492
•
Published
•
7
Dyna-Mind: Learning to Simulate from Experience for Better AI Agents
Paper
•
2510.09577
•
Published
•
6
BigCodeArena: Unveiling More Reliable Human Preferences in Code
Generation via Execution
Paper
•
2510.08697
•
Published
•
32
Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for
MLLMs
Paper
•
2510.09201
•
Published
•
46
Diffusion Transformers with Representation Autoencoders
Paper
•
2510.11690
•
Published
•
157
UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning
Paper
•
2510.13515
•
Published
•
11
Advancing End-to-End Pixel Space Generative Modeling via Self-supervised
Pre-training
Paper
•
2510.12586
•
Published
•
106
Understanding DeepResearch via Reports
Paper
•
2510.07861
•
Published
•
6
RAG-Anything: All-in-One RAG Framework
Paper
•
2510.12323
•
Published
•
39
The Art of Scaling Reinforcement Learning Compute for LLMs
Paper
•
2510.13786
•
Published
•
30
Glyph: Scaling Context Windows via Visual-Text Compression
Paper
•
2510.17800
•
Published
•
57
LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts
Paper
•
2510.19363
•
Published
•
54
Unified Reinforcement and Imitation Learning for Vision-Language Models
Paper
•
2510.19307
•
Published
•
22
Attention Is All You Need for KV Cache in Diffusion LLMs
Paper
•
2510.14973
•
Published
•
35
Information Gain-based Policy Optimization: A Simple and Effective
Approach for Multi-Turn LLM Agents
Paper
•
2510.14967
•
Published
•
32
Video Reasoning without Training
Paper
•
2510.17045
•
Published
•
6
AdaSPEC: Selective Knowledge Distillation for Efficient Speculative
Decoders
Paper
•
2510.19779
•
Published
•
55