L&V Models
updated
Sora: A Review on Background, Technology, Limitations, and Opportunities
of Large Vision Models
Paper
• 2402.17177
• Published
• 88
Mora: Enabling Generalist Video Generation via A Multi-Agent Framework
Paper
• 2403.13248
• Published
• 78
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper
• 2311.05437
• Published
• 51
UniAff: A Unified Representation of Affordances for Tool Usage and
Articulation with Vision-Language Models
Paper
• 2409.20551
• Published
• 14
Visual Question Decomposition on Multimodal Large Language Models
Paper
• 2409.19339
• Published
• 8
Image Copy Detection for Diffusion Models
Paper
• 2409.19952
• Published
• 13
FreeInit: Bridging Initialization Gap in Video Diffusion Models
Paper
• 2312.07537
• Published
• 27