Human-Agent Collaborative Paper-to-Page Crafting for Under $0.1 Paper • 2510.19600 • Published 6 days ago • 65
Are We Using the Right Benchmark: An Evaluation Framework for Visual Token Compression Methods Paper • 2510.07143 • Published 20 days ago • 12
Compose Your Policies! Improving Diffusion-based or Flow-based Robot Policies via Test-time Distribution-level Composition Paper • 2510.01068 • Published 27 days ago • 19
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation Paper • 2510.00515 • Published 27 days ago • 39
Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning Paper • 2509.23873 • Published about 1 month ago • 67
Socratic-Zero : Bootstrapping Reasoning via Data-Free Agent Co-evolution Paper • 2509.24726 • Published 29 days ago • 18
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Paper • 2509.22186 • Published Sep 26 • 125
PANORAMA: The Rise of Omnidirectional Vision in the Embodied AI Era Paper • 2509.12989 • Published Sep 16 • 27
Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts Paper • 2508.07785 • Published Aug 11 • 28
Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation Paper • 2508.09987 • Published Aug 13 • 25
Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning Paper • 2507.17512 • Published Jul 23 • 36
WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization Paper • 2507.15061 • Published Jul 20 • 59
The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs Paper • 2507.11097 • Published Jul 15 • 64
REST: Stress Testing Large Reasoning Models by Asking Multiple Problems at Once Paper • 2507.10541 • Published Jul 14 • 29
LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion Paper • 2507.02813 • Published Jul 3 • 60
EfficientVLA: Training-Free Acceleration and Compression for Vision-Language-Action Models Paper • 2506.10100 • Published Jun 11 • 9