PeppePasti 's Collections Multimodal LLMs
updated
Building and better understanding vision-language models: insights and
future directions
Paper
• 2408.12637
• Published • 133
Transfusion: Predict the Next Token and Diffuse Images with One
Multi-Modal Model
Paper
• 2408.11039
• Published • 63
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
Paper
• 2408.16725
• Published • 53
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of
Encoders
Paper
• 2408.15998
• Published • 86
MAVIS: Mathematical Visual Instruction Tuning
Paper
• 2407.08739
• Published • 32
Law of Vision Representation in MLLMs
Paper
• 2408.16357
• Published • 95
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Paper
• 2406.16860
• Published • 63
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper
• 2403.09611
• Published • 129
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via
Hybrid Architecture
Paper
• 2409.02889
• Published • 54
FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary
Segmentation
Paper
• 2409.03525
• Published • 12
PiTe: Pixel-Temporal Alignment for Large Video-Language Model
Paper
• 2409.07239
• Published • 15
One missing piece in Vision and Language: A Survey on Comics
Understanding
Paper
• 2409.09502
• Published • 24
NVLM: Open Frontier-Class Multimodal LLMs
Paper
• 2409.11402
• Published • 74
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at
Any Resolution
Paper
• 2409.12191
• Published • 79
MMSearch: Benchmarking the Potential of Large Models as Multi-modal
Search Engines
Paper
• 2409.12959
• Published • 38
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced
Mathematical Reasoning
Paper
• 2409.12568
• Published • 50
Phantom of Latent for Large Language and Vision Models
Paper
• 2409.14713
• Published • 29
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art
Multimodal Models
Paper
• 2409.17146
• Published • 121
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid
Emotions
Paper
• 2409.18042
• Published • 39