1 6 8

Yehang Zhang

Buzz-lightyear

AI & ML interests

None yet

Recent Activity

upvoted a paper 10 days ago

PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs

liked a model 12 days ago

deepghs/manga109_yolo

published a model 3 months ago

Buzz-lightyear/levir_cot_sft_qwen_vl_7b

View all activity

Organizations

None yet

upvoted a paper 10 days ago

PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs

Paper • 2510.09507 • Published 21 days ago • 10

liked a model 12 days ago

deepghs/manga109_yolo

Object Detection • Updated Feb 5 • 11

published a model 3 months ago

Buzz-lightyear/levir_cot_sft_qwen_vl_7b

Updated Jul 25

upvoted an article 5 months ago

Article

nanoVLM: The simplest repository to train your VLM in pure PyTorch

May 21

• 225

upvoted a paper 5 months ago

Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers

Paper • 2505.21497 • Published May 27 • 107

liked a Space 6 months ago

AgentReview

🎓

EMNLP 2024

upvoted a paper 6 months ago

AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction

Paper • 2504.01014 • Published Apr 1 • 70

upvoted a paper 7 months ago

VideoRAG: Retrieval-Augmented Generation over Video Corpus

Paper • 2501.05874 • Published Jan 10 • 75

liked a Space 7 months ago

VideoMind 2B

💡

A Chain-of-LoRA Agent for Long Video Reasoning

liked a dataset 7 months ago

omni-research/DREAM-1K

Viewer • Updated Sep 14, 2024 • 1k • 346 • 32

liked a model 7 months ago

XLabs-AI/flux-ip-adapter-v2

Image-to-Image • Updated Oct 24, 2024 • 8.26k • 299

upvoted a paper 8 months ago

Long-Video Audio Synthesis with Multi-Agent Collaboration

Paper • 2503.10719 • Published Mar 13 • 9

commented a paper 8 months ago

Long-Video Audio Synthesis with Multi-Agent Collaboration

Paper • 2503.10719 • Published Mar 13 • 9 •

authored a paper 8 months ago

Long-Video Audio Synthesis with Multi-Agent Collaboration

Paper • 2503.10719 • Published Mar 13 • 9

liked 2 Spaces 8 months ago

2.01k

Chat With Janus-Pro-7B

🌍

A unified multimodal understanding and generation model.

VLM R1 Referral Expression

💬

Mark regions in images based on text descriptions

liked a Space 10 months ago

883

MMAudio — generating synchronized audio from video/text

🔊

Generate audio from video or text prompts

Yehang Zhang

AI & ML interests

Recent Activity

Organizations

Buzz-lightyear's activity

nanoVLM: The simplest repository to train your VLM in pure PyTorch

AgentReview

VideoMind 2B

Chat With Janus-Pro-7B

VLM R1 Referral Expression

MMAudio — generating synchronized audio from video/text