 dipta007
			's Collections
			dipta007
			's Collections
			
			
		Small Multimodal Models
		
	updated
			
 
				
				
 - Textbooks Are All You Need- 
			Paper
			 •- 
			2306.11644
			 •
			Published
				
			•- 
				146
			 
 - LLaVA-φ: Efficient Multi-Modal Assistant with Small Language Model- 
			Paper
			 •- 
			2401.02330
			 •
			Published
				
			•- 
				18
			 
 - Textbooks Are All You Need II: phi-1.5 technical report- 
			Paper
			 •- 
			2309.05463
			 •
			Published
				
			•- 
				88
			 
 - Visual Instruction Tuning- 
			Paper
			 •- 
			2304.08485
			 •
			Published
				
			•- 
				20
			 
 - Improved Baselines with Visual Instruction Tuning- 
			Paper
			 •- 
			2310.03744
			 •
			Published
				
			•- 
				39
			 
 - CORE-MM: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large
  Language Models- 
			Paper
			 •- 
			2311.11567
			 •
			Published
				
			•- 
				8
			 
 - Towards Vision Enhancing LLMs: Empowering Multimodal Knowledge Storage
  and Sharing in LLMs- 
			Paper
			 •- 
			2311.15759
			 •
			Published
				
			•- 
				1
			 
 - Unlock the Power: Competitive Distillation for Multi-Modal Large
  Language Models- 
			Paper
			 •- 
			2311.08213
			 •
			Published
 - Generative Multimodal Models are In-Context Learners- 
			Paper
			 •- 
			2312.13286
			 •
			Published
				
			•- 
				37
			 
 - InfMLLM: A Unified Framework for Visual-Language Tasks- 
			Paper
			 •- 
			2311.06791
			 •
			Published
				
			•- 
				3
			 
 - MM-LLMs: Recent Advances in MultiModal Large Language Models- 
			Paper
			 •- 
			2401.13601
			 •
			Published
				
			•- 
				48
			 
 - Efficient Multimodal Learning from Data-centric Perspective- 
			Paper
			 •- 
			2402.11530
			 •
			Published
				
			•- 
				1
			 
 - Multi-modal preference alignment remedies regression of visual
  instruction tuning on language model- 
			Paper
			 •- 
			2402.10884
			 •
			Published
 - LLaMA Pro: Progressive LLaMA with Block Expansion- 
			Paper
			 •- 
			2401.02415
			 •
			Published
				
			•- 
				53
			 
 - MobileVLM : A Fast, Reproducible and Strong Vision Language Assistant
  for Mobile Devices- 
			Paper
			 •- 
			2312.16886
			 •
			Published
				
			•- 
				22
			 
 - MobileVLM V2: Faster and Stronger Baseline for Vision Language Model- 
			Paper
			 •- 
			2402.03766
			 •
			Published
				
			•- 
				15
			 
 - MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT- 
			Paper
			 •- 
			2402.16840
			 •
			Published
				
			•- 
				26
			 
 - TinyLLaVA: A Framework of Small-scale Large Multimodal Models- 
			Paper
			 •- 
			2402.14289
			 •
			Published
				
			•- 
				21
			 
 - MobileLLM: Optimizing Sub-billion Parameter Language Models for
  On-Device Use Cases- 
			Paper
			 •- 
			2402.14905
			 •
			Published
				
			•- 
				134