ClinHallu: A Benchmark for Diagnosing Stage-Wise Hallucinations in Medical MLLM Reasoning Paper • 2606.14697 • Published 20 days ago • 8
GrepSeek: Training Search Agents for Direct Corpus Interaction Paper • 2605.29307 • Published May 28 • 116
SOD: Step-wise On-policy Distillation for Small Language Model Agents Paper • 2605.07725 • Published May 8 • 25
WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation Paper • 2605.25874 • Published May 25 • 103
CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence Paper • 2605.12882 • Published May 13 • 274
EmbodiedMidtrain: Bridging the Gap between Vision-Language Models and Vision-Language-Action Models via Mid-training Paper • 2604.20012 • Published Apr 21 • 5
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model Paper • 2604.20796 • Published Apr 22 • 244
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published Apr 2 • 509
Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis Paper • 2603.29620 • Published Mar 31 • 49