Presenting a Paper is an Art: Self-Improvement Aesthetic Agents for Academic Presentations Paper • 2510.05571 • Published 26 days ago • 13
The Unreasonable Effectiveness of Scaling Agents for Computer Use Paper • 2510.02250 • Published about 1 month ago • 23
"PhyWorldBench": A Comprehensive Evaluation of Physical Realism in Text-to-Video Models Paper • 2507.13428 • Published Jul 17 • 15
Hidden in Plain Sight: Probing Implicit Reasoning in Multimodal Language Models Paper • 2506.00258 • Published May 30 • 3
Agents of Change: Self-Evolving LLM Agents for Strategic Planning Paper • 2506.04651 • Published Jun 5 • 8
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models Paper • 2505.21523 • Published May 23 • 13
SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning Paper • 2505.16186 • Published May 22 • 7
Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space Paper • 2505.15778 • Published May 21 • 18
LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models Paper • 2310.03903 • Published Oct 5, 2023 • 1
THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models Paper • 2504.13367 • Published Apr 17 • 26
Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents Paper • 2504.00906 • Published Apr 1 • 25
Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models Paper • 2502.16033 • Published Feb 22 • 18
The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1 Paper • 2502.12659 • Published Feb 18 • 7
Agent S: An Open Agentic Framework that Uses Computers Like a Human Paper • 2410.08164 • Published Oct 10, 2024 • 26
NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models Paper • 2407.12366 • Published Jul 17, 2024 • 4
Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding Paper • 2406.19263 • Published Jun 27, 2024 • 10