GUI-360: A Comprehensive Dataset and Benchmark for Computer-Using Agents Paper • 2511.04307 • Published Nov 6, 2025 • 14
JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence Paper • 2510.23538 • Published Oct 27, 2025 • 96
Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification Paper • 2509.15591 • Published Sep 19, 2025 • 45
GUI-G^2: Gaussian Reward Modeling for GUI Grounding Paper • 2507.15846 • Published Jul 21, 2025 • 133
SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories? Paper • 2507.12415 • Published Jul 16, 2025 • 42
REST: Stress Testing Large Reasoning Models by Asking Multiple Problems at Once Paper • 2507.10541 • Published Jul 14, 2025 • 29
DualTHOR: A Dual-Arm Humanoid Simulation Platform for Contingency-Aware Planning Paper • 2506.16012 • Published Jun 19, 2025 • 22