PICABench: How Far Are We from Physically Realistic Image Editing? Paper • 2510.17681 • Published 7 days ago • 60
LongCodeZip: Compress Long Context for Code Language Models Paper • 2510.00446 • Published 27 days ago • 107
SWE-QA: Can Language Models Answer Repository-level Code Questions? Paper • 2509.14635 • Published Sep 18 • 36
RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation Paper • 2509.16198 • Published Sep 19 • 126
Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models Paper • 2509.06949 • Published Sep 8 • 56
WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents Paper • 2509.06501 • Published Sep 8 • 78
MachineLearningLM: Continued Pretraining Language Models on Millions of Synthetic Tabular Prediction Tasks Scales In-Context ML Paper • 2509.06806 • Published Sep 8 • 63
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published Aug 25 • 201
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR Paper • 2508.14029 • Published Aug 19 • 118
VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos Paper • 2505.23693 • Published May 29 • 55
Sherlock: Self-Correcting Reasoning in Vision-Language Models Paper • 2505.22651 • Published May 28 • 50
The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs Paper • 2504.17768 • Published Apr 24 • 14
BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs Paper • 2504.18415 • Published Apr 25 • 47
SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models Paper • 2503.07605 • Published Mar 10 • 68
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding Paper • 2502.08946 • Published Feb 13 • 193
ViBe: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models Paper • 2411.10867 • Published Nov 16, 2024 • 10
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation Paper • 2411.13281 • Published Nov 20, 2024 • 21