 matlok
			's Collections
			matlok
			's Collections
			
			
		Papers - Video - Understanding
		
	updated
			
 
				
				
 - Video Mamba Suite: State Space Model as a Versatile Alternative for
  Video Understanding- 
			Paper
			 •- 
			2403.09626
			 •
			Published
				
			•- 
				16
			 
 - VideoAgent: Long-form Video Understanding with Large Language Model as
  Agent- 
			Paper
			 •- 
			2403.10517
			 •
			Published
				
			•- 
				37
			 
 - VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis- 
			Paper
			 •- 
			2403.13501
			 •
			Published
				
			•- 
				9
			 
 - LITA: Language Instructed Temporal-Localization Assistant- 
			Paper
			 •- 
			2403.19046
			 •
			Published
				
			•- 
				19
			 
 - MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with
  Interleaved Visual-Textual Tokens- 
			Paper
			 •- 
			2404.03413
			 •
			Published
				
			•- 
				28
			 
 - Pegasus-v1 Technical Report- 
			Paper
			 •- 
			2404.14687
			 •
			Published
				
			•- 
				33
			 
 - MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation
  in Videos- 
			Paper
			 •- 
			2406.08407
			 •
			Published
				
			•- 
				28
			 
 - InternLM-XComposer-2.5: A Versatile Large Vision Language Model
  Supporting Long-Contextual Input and Output- 
			Paper
			 •- 
			2407.03320
			 •
			Published
				
			•- 
				95
			 
 - LLaVA-OneVision: Easy Visual Task Transfer- 
			Paper
			 •- 
			2408.03326
			 •
			Published
				
			•- 
				60
			 
 - Apollo: An Exploration of Video Understanding in Large Multimodal Models- 
			Paper
			 •- 
			2412.10360
			 •
			Published
				
			•- 
				147