- 
	
	
	
VideoDeepResearch: Long Video Understanding With Agentic Tool Using
Paper • 2506.10821 • Published • 19 - 
	
	
	
Jan-nano Technical Report
Paper • 2506.22760 • Published • 9 - 
	
	
	
MMSearch-R1: Incentivizing LMMs to Search
Paper • 2506.20670 • Published • 64 - 
	
	
	
WebSailor: Navigating Super-human Reasoning for Web Agent
Paper • 2507.02592 • Published • 121 
Collections
Discover the best community collections!
Collections including paper arxiv:2507.19849 
						
					
				- 
	
	
	
				lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 261 • 97 - 
	
	
	
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 - 
	
	
	
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 - 
	
	
	
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88 
- 
	
	
	
Large Language Diffusion Models
Paper • 2502.09992 • Published • 122 - 
	
	
	
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment
Paper • 2502.10391 • Published • 34 - 
	
	
	
Diverse Inference and Verification for Advanced Reasoning
Paper • 2502.09955 • Published • 18 - 
	
	
	
Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models
Paper • 2502.08130 • Published • 9 
- 
	
	
	
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 - 
	
	
	
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 - 
	
	
	
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 - 
	
	
	
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4 
- 
	
	
	
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Paper • 2411.03562 • Published • 68 - 
	
	
	
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning
Paper • 2502.06060 • Published • 38 - 
	
	
	
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper • 2502.14499 • Published • 192 - 
	
	
	
SurveyX: Academic Survey Automation via Large Language Models
Paper • 2502.14776 • Published • 100 
- 
	
	
	
DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research
Paper • 2505.19253 • Published • 31 - 
	
	
	
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents
Paper • 2505.20411 • Published • 89 - 
	
	
	
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers
Paper • 2505.21497 • Published • 108 - 
	
	
	
Agentic Reinforced Policy Optimization
Paper • 2507.19849 • Published • 156 
- 
	
	
	
Nuclear Norm Regularization for Deep Learning
Paper • 2405.14544 • Published • 1 - 
	
	
	
Token embeddings violate the manifold hypothesis
Paper • 2504.01002 • Published • 1 - 
	
	
	
Approximate Nullspace Augmented Finetuning for Robust Vision Transformers
Paper • 2403.10476 • Published • 1 - 
	
	
	
ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning
Paper • 2504.00254 • Published • 1 
- 
	
	
	
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper • 2501.18585 • Published • 61 - 
	
	
	
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!
Paper • 2502.07374 • Published • 40 - 
	
	
	
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Paper • 2502.06703 • Published • 153 - 
	
	
	
S*: Test Time Scaling for Code Generation
Paper • 2502.14382 • Published • 63 
- 
	
	
	
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper • 2501.00192 • Published • 31 - 
	
	
	
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 107 - 
	
	
	
Xmodel-2 Technical Report
Paper • 2412.19638 • Published • 26 - 
	
	
	
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 104 
- 
	
	
	
End-to-End Goal-Driven Web Navigation
Paper • 1602.02261 • Published - 
	
	
	
Learning Language Games through Interaction
Paper • 1606.02447 • Published - 
	
	
	
Naturalizing a Programming Language via Interactive Learning
Paper • 1704.06956 • Published - 
	
	
	
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration
Paper • 1802.08802 • Published • 1 
- 
	
	
	
VideoDeepResearch: Long Video Understanding With Agentic Tool Using
Paper • 2506.10821 • Published • 19 - 
	
	
	
Jan-nano Technical Report
Paper • 2506.22760 • Published • 9 - 
	
	
	
MMSearch-R1: Incentivizing LMMs to Search
Paper • 2506.20670 • Published • 64 - 
	
	
	
WebSailor: Navigating Super-human Reasoning for Web Agent
Paper • 2507.02592 • Published • 121 
- 
	
	
	
DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research
Paper • 2505.19253 • Published • 31 - 
	
	
	
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents
Paper • 2505.20411 • Published • 89 - 
	
	
	
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers
Paper • 2505.21497 • Published • 108 - 
	
	
	
Agentic Reinforced Policy Optimization
Paper • 2507.19849 • Published • 156 
- 
	
	
	
				lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 261 • 97 - 
	
	
	
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 - 
	
	
	
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 - 
	
	
	
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88 
- 
	
	
	
Nuclear Norm Regularization for Deep Learning
Paper • 2405.14544 • Published • 1 - 
	
	
	
Token embeddings violate the manifold hypothesis
Paper • 2504.01002 • Published • 1 - 
	
	
	
Approximate Nullspace Augmented Finetuning for Robust Vision Transformers
Paper • 2403.10476 • Published • 1 - 
	
	
	
ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning
Paper • 2504.00254 • Published • 1 
- 
	
	
	
Large Language Diffusion Models
Paper • 2502.09992 • Published • 122 - 
	
	
	
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment
Paper • 2502.10391 • Published • 34 - 
	
	
	
Diverse Inference and Verification for Advanced Reasoning
Paper • 2502.09955 • Published • 18 - 
	
	
	
Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models
Paper • 2502.08130 • Published • 9 
- 
	
	
	
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper • 2501.18585 • Published • 61 - 
	
	
	
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!
Paper • 2502.07374 • Published • 40 - 
	
	
	
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Paper • 2502.06703 • Published • 153 - 
	
	
	
S*: Test Time Scaling for Code Generation
Paper • 2502.14382 • Published • 63 
- 
	
	
	
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 - 
	
	
	
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 - 
	
	
	
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 - 
	
	
	
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4 
- 
	
	
	
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper • 2501.00192 • Published • 31 - 
	
	
	
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 107 - 
	
	
	
Xmodel-2 Technical Report
Paper • 2412.19638 • Published • 26 - 
	
	
	
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 104 
- 
	
	
	
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Paper • 2411.03562 • Published • 68 - 
	
	
	
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning
Paper • 2502.06060 • Published • 38 - 
	
	
	
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper • 2502.14499 • Published • 192 - 
	
	
	
SurveyX: Academic Survey Automation via Large Language Models
Paper • 2502.14776 • Published • 100 
- 
	
	
	
End-to-End Goal-Driven Web Navigation
Paper • 1602.02261 • Published - 
	
	
	
Learning Language Games through Interaction
Paper • 1606.02447 • Published - 
	
	
	
Naturalizing a Programming Language via Interactive Learning
Paper • 1704.06956 • Published - 
	
	
	
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration
Paper • 1802.08802 • Published • 1