- 
	
	
	
HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels
Paper • 2507.21809 • Published • 131 - 
	
	
	
OmniPart: Part-Aware 3D Generation with Semantic Decoupling and Structural Cohesion
Paper • 2507.06165 • Published • 58 - 
	
	
	
DINOv3
Paper • 2508.10104 • Published • 274 - 
	
	
	
Qwen-Image Technical Report
Paper • 2508.02324 • Published • 259 
Collections
Discover the best community collections!
Collections including paper arxiv:2508.18032 
						
					
				- 
	
	
	
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 93 - 
	
	
	
Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning
Paper • 2508.03501 • Published • 56 - 
	
	
	
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience
Paper • 2508.04700 • Published • 52 - 
	
	
	
RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Lifelong Learning in Physical Embodied Systems
Paper • 2508.01415 • Published • 7 
- 
	
	
	
				microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 11k • 1.21k - 
	
	
	
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15 - 
	
	
	
				nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 284 • 15 - 
	
	
	
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 62 
- 
	
	
	
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper • 2401.09048 • Published • 10 - 
	
	
	
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 18 - 
	
	
	
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 62 - 
	
	
	
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 77 
- 
	
	
	
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
Paper • 2508.09789 • Published • 5 - 
	
	
	
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper • 2508.13186 • Published • 18 - 
	
	
	
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents
Paper • 2508.04038 • Published • 1 - 
	
	
	
Prompt Orchestration Markup Language
Paper • 2508.13948 • Published • 48 
- 
	
	
	
				lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 261 • 97 - 
	
	
	
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 - 
	
	
	
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 - 
	
	
	
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88 
- 
	
	
	
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 166 - 
	
	
	
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO
Paper • 2505.22453 • Published • 46 - 
	
	
	
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning
Paper • 2505.23380 • Published • 22 - 
	
	
	
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
Paper • 2505.21523 • Published • 13 
- 
	
	
	
ControlLLM: Augment Language Models with Tools by Searching on Graphs
Paper • 2310.17796 • Published • 18 - 
	
	
	
Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster
Paper • 2311.08263 • Published • 16 - 
	
	
	
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Paper • 2501.12599 • Published • 123 - 
	
	
	
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning
Paper • 2502.04689 • Published • 7 
- 
	
	
	
HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels
Paper • 2507.21809 • Published • 131 - 
	
	
	
OmniPart: Part-Aware 3D Generation with Semantic Decoupling and Structural Cohesion
Paper • 2507.06165 • Published • 58 - 
	
	
	
DINOv3
Paper • 2508.10104 • Published • 274 - 
	
	
	
Qwen-Image Technical Report
Paper • 2508.02324 • Published • 259 
- 
	
	
	
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
Paper • 2508.09789 • Published • 5 - 
	
	
	
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper • 2508.13186 • Published • 18 - 
	
	
	
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents
Paper • 2508.04038 • Published • 1 - 
	
	
	
Prompt Orchestration Markup Language
Paper • 2508.13948 • Published • 48 
- 
	
	
	
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 93 - 
	
	
	
Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning
Paper • 2508.03501 • Published • 56 - 
	
	
	
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience
Paper • 2508.04700 • Published • 52 - 
	
	
	
RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Lifelong Learning in Physical Embodied Systems
Paper • 2508.01415 • Published • 7 
- 
	
	
	
				lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 261 • 97 - 
	
	
	
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 - 
	
	
	
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 - 
	
	
	
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88 
- 
	
	
	
				microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 11k • 1.21k - 
	
	
	
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15 - 
	
	
	
				nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 284 • 15 - 
	
	
	
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 62 
- 
	
	
	
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 166 - 
	
	
	
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO
Paper • 2505.22453 • Published • 46 - 
	
	
	
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning
Paper • 2505.23380 • Published • 22 - 
	
	
	
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
Paper • 2505.21523 • Published • 13 
- 
	
	
	
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper • 2401.09048 • Published • 10 - 
	
	
	
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 18 - 
	
	
	
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 62 - 
	
	
	
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 77 
- 
	
	
	
ControlLLM: Augment Language Models with Tools by Searching on Graphs
Paper • 2310.17796 • Published • 18 - 
	
	
	
Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster
Paper • 2311.08263 • Published • 16 - 
	
	
	
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Paper • 2501.12599 • Published • 123 - 
	
	
	
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning
Paper • 2502.04689 • Published • 7