Sa2VA Model Zoo - a ByteDance Collection

ByteDance 's Collections

Ouro

Video-As-Prompt

Sa2VA Model Zoo

Sa2VA Model Zoo

updated Nov 27, 2025

Huggingace Model Zoo For Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos By Bytedance Seed CV Research

ByteDance/Sa2VA-4B

Image-Text-to-Text • 4B • Updated Sep 8, 2025 • 151k • 93
ByteDance/Sa2VA-8B

Image-Text-to-Text • 8B • Updated Sep 8, 2025 • 1.28k • 65
ByteDance/Sa2VA-1B

Image-Text-to-Text • 1B • Updated Sep 8, 2025 • 1.11k • 29
ByteDance/Sa2VA-26B

Image-Text-to-Text • 26B • Updated Sep 8, 2025 • 78 • 31
ByteDance/Sa2VA-InternVL3-2B

Image-Text-to-Text • 2B • Updated Oct 16, 2025 • 174 • 1
ByteDance/Sa2VA-InternVL3-8B

Image-Text-to-Text • 8B • Updated Oct 16, 2025 • 79 • 4
ByteDance/Sa2VA-InternVL3-14B

Image-Text-to-Text • 15B • Updated Oct 16, 2025 • 47 • 9
ByteDance/Sa2VA-Qwen2_5-VL-3B

Image-Text-to-Text • 4B • Updated Oct 16, 2025 • 134 • 2
ByteDance/Sa2VA-Qwen2_5-VL-7B

Image-Text-to-Text • 9B • Updated Oct 16, 2025 • 80 • 4
ByteDance/Sa2VA-Qwen3-VL-4B

Image-Text-to-Text • 5B • Updated Oct 21, 2025 • 1.26k • 14
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Paper • 2501.04001 • Published Jan 7, 2025 • 47

Note Techinical Report For Sa2VA.
ByteDance/Sa2VA-Qwen3-VL-2B

Image-Text-to-Text • 3B • Updated Nov 27, 2025 • 35 • 14