PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs Paper โข 2510.09507 โข Published 21 days ago โข 10
view article Article nanoVLM: The simplest repository to train your VLM in pure PyTorch May 21 โข 225
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers Paper โข 2505.21497 โข Published May 27 โข 107
AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction Paper โข 2504.01014 โข Published Apr 1 โข 70
VideoRAG: Retrieval-Augmented Generation over Video Corpus Paper โข 2501.05874 โข Published Jan 10 โข 75
Long-Video Audio Synthesis with Multi-Agent Collaboration Paper โข 2503.10719 โข Published Mar 13 โข 9
Long-Video Audio Synthesis with Multi-Agent Collaboration Paper โข 2503.10719 โข Published Mar 13 โข 9 โข 3
Long-Video Audio Synthesis with Multi-Agent Collaboration Paper โข 2503.10719 โข Published Mar 13 โข 9
Running on Zero 2.01k 2.01k Chat With Janus-Pro-7B ๐ A unified multimodal understanding and generation model.
Runtime error 72 72 VLM R1 Referral Expression ๐ฌ Mark regions in images based on text descriptions
Running on Zero 883 883 MMAudio โ generating synchronized audio from video/text ๐ Generate audio from video or text prompts