CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion Paper • 2512.19535 • Published 5 days ago • 9
LoPA: Scaling dLLM Inference via Lookahead Parallel Decoding Paper • 2512.16229 • Published 9 days ago • 15
Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents Paper • 2512.20092 • Published 4 days ago • 5
LongVideoAgent: Multi-Agent Reasoning with Long Videos Paper • 2512.20618 • Published 4 days ago • 48
LLM Swiss Round: Aggregating Multi-Benchmark Performance via Competitive Swiss-System Dynamics Paper • 2512.21010 • Published 3 days ago • 2
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times Paper • 2512.16093 • Published 10 days ago • 73
Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding Paper • 2512.17532 • Published 8 days ago • 63
Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs Paper • 2512.17008 • Published 9 days ago • 10
Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers Paper • 2512.17351 • Published 8 days ago • 21
An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges Paper • 2512.11362 • Published 15 days ago • 20
PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence Paper • 2512.16793 • Published 9 days ago • 71
Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows Paper • 2512.16969 • Published 9 days ago • 105
Hybrid Attribution Priors for Explainable and Robust Model Training Paper • 2512.14719 • Published 18 days ago • 2
In Pursuit of Pixel Supervision for Visual Pre-training Paper • 2512.15715 • Published 10 days ago • 8