VideoAgentTrek: Computer Use Pretraining from Unlabeled Videos Paper • 2510.19488 • Published 8 days ago • 19
Efficient Long-context Language Model Training by Core Attention Disaggregation Paper • 2510.18121 • Published 9 days ago • 114
LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models Paper • 2502.14834 • Published Feb 20 • 24
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models Paper • 2501.11873 • Published Jan 21 • 66
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper • 2501.04519 • Published Jan 8 • 285
view article Article ✴️ ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use By Ziyang and 1 other • Jan 3 • 19
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials Paper • 2412.09605 • Published Dec 12, 2024 • 29
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction Paper • 2412.04454 • Published Dec 5, 2024 • 70
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Paper • 2404.07972 • Published Apr 11, 2024 • 50