19 29 24

Zhang Yuanhan

ZhangYuanhan

https://zhangyuanhan-ai.github.io/

AI & ML interests

None yet

Recent Activity

upvoted a collection 16 days ago

LLaVA-Video

upvoted a paper about 1 month ago

Visual Jigsaw Post-Training Improves MLLMs

upvoted a paper about 2 months ago

LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model

View all activity

Organizations

upvoted a collection 16 days ago

LLaVA-Video

Collection

Models focus on video understanding (previously known as LLaVA-NeXT-Video). • 8 items • Updated Feb 21 • 63

upvoted a paper about 1 month ago

Visual Jigsaw Post-Training Improves MLLMs

Paper • 2509.25190 • Published Sep 29 • 35

upvoted 2 papers about 2 months ago

LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model

Paper • 2509.00676 • Published Aug 31 • 83

POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion

Paper • 2509.01215 • Published Sep 1 • 50

upvoted 2 papers 3 months ago

SPAR: Scholar Paper Retrieval with LLM-based Agents for Enhanced Academic Search

Paper • 2507.15245 • Published Jul 21 • 11

Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding

Paper • 2507.15028 • Published Jul 20 • 21

upvoted a paper 4 months ago

MMSearch-R1: Incentivizing LMMs to Search

Paper • 2506.20670 • Published Jun 25 • 64

upvoted 2 papers 7 months ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14 • 298

VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness

Paper • 2503.21755 • Published Mar 27 • 33

upvoted 2 papers 8 months ago

BIMBA: Selective-Scan Compression for Long-Range Video Question Answering

Paper • 2503.09590 • Published Mar 12 • 3

EgoLife: Towards Egocentric Life Assistant

Paper • 2503.03803 • Published Mar 5 • 46

upvoted 4 papers 9 months ago

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency

Paper • 2502.09621 • Published Feb 13 • 28

VideoRoPE: What Makes for Good Video Rotary Position Embedding?

Paper • 2502.05173 • Published Feb 7 • 65

Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models

Paper • 2501.14818 • Published Jan 20 • 9

Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos

Paper • 2501.13826 • Published Jan 23 • 25

upvoted a paper 10 months ago

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?

Paper • 2501.05510 • Published Jan 9 • 43

upvoted 2 papers 11 months ago

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 147

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Paper • 2411.17465 • Published Nov 26, 2024 • 88

upvoted a paper 12 months ago

HourVideo: 1-Hour Video-Language Understanding

Paper • 2411.04998 • Published Nov 7, 2024 • 1

upvoted a paper about 1 year ago

Contrastive Localized Language-Image Pre-Training

Paper • 2410.02746 • Published Oct 3, 2024 • 37

Zhang Yuanhan

AI & ML interests

Recent Activity

Organizations

ZhangYuanhan's activity