RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning Paper • 2510.02240 • Published 25 days ago • 17
Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence Paper • 2510.20579 • Published 4 days ago • 46
VideoLucy: Deep Memory Backtracking for Long Video Understanding Paper • 2510.12422 • Published 13 days ago • 1
BrowserArena: Evaluating LLM Agents on Real-World Web Navigation Tasks Paper • 2510.02418 • Published 25 days ago • 2
Distributional Semantics Tracing: A Framework for Explaining Hallucinations in Large Language Models Paper • 2510.06107 • Published 20 days ago • 2
See, Point, Fly: A Learning-Free VLM Framework for Universal Unmanned Aerial Navigation Paper • 2509.22653 • Published about 1 month ago • 23
EmbeddingGemma: Powerful and Lightweight Text Representations Paper • 2509.20354 • Published Sep 24 • 39
MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query Paper • 2506.03144 • Published Jun 3 • 7
Talk2Event: Grounded Understanding of Dynamic Scenes from Event Cameras Paper • 2507.17664 • Published Jul 23 • 1
aiXiv: A Next-Generation Open Access Ecosystem for Scientific Discovery Generated by AI Scientists Paper • 2508.15126 • Published Aug 20 • 20
Grounding Multilingual Multimodal LLMs With Cultural Knowledge Paper • 2508.07414 • Published Aug 10 • 1
RefCritic: Training Long Chain-of-Thought Critic Models with Refinement Feedback Paper • 2507.15024 • Published Jul 20 • 14
LBM: Latent Bridge Matching for Fast Image-to-Image Translation Paper • 2503.07535 • Published Mar 10 • 4
Leveraging Vision-Language Pre-training for Human Activity Recognition in Still Images Paper • 2506.13458 • Published Jun 16
FlowMo: Variance-Based Flow Guidance for Coherent Motion in Video Generation Paper • 2506.01144 • Published Jun 1 • 14
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper • 2506.01939 • Published Jun 2 • 185