VIR-Bench: Evaluating Geospatial and Temporal Understanding of MLLMs via Travel Video Itinerary Reconstruction Paper • 2509.19002 • Published Sep 23 • 2
Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs Paper • 2505.15075 • Published May 21 • 1
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 11 items • Updated Jul 21 • 547
SlideAVSR: A Dataset of Paper Explanation Videos for Audio-Visual Speech Recognition Paper • 2401.09759 • Published Jan 18, 2024 • 2