fsommers
			's Collections
			 
		
			
		Misc papers
		
	updated
			
 
				
				
	
	
	
			
			Specialized Language Models with Cheap Inference from Limited Domain
  Data
		
			Paper
			
•
			2402.01093
			
•
			Published
				
			•
				
				47
			
 
	
	 
	
	
	
			
			Attention Heads of Large Language Models: A Survey
		
			Paper
			
•
			2409.03752
			
•
			Published
				
			•
				
				92
			
 
	
	 
	
	
	
			
			General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
		
			Paper
			
•
			2409.01704
			
•
			Published
				
			•
				
				83
			
 
	
	 
	
	
	
			
			jina-embeddings-v3: Multilingual Embeddings With Task LoRA
		
			Paper
			
•
			2409.10173
			
•
			Published
				
			•
				
				34
			
 
	
	 
	
	
	
			
			Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at
  Any Resolution
		
			Paper
			
•
			2409.12191
			
•
			Published
				
			•
				
				78
			
 
	
	 
	
	
	
			
			From Generalist to Specialist: Adapting Vision Language Models via
  Task-Specific Visual Instruction Tuning
		
			Paper
			
•
			2410.06456
			
•
			Published
				
			•
				
				37
			
 
	
	 
	
	
	
			
			Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages
		
			Paper
			
•
			2410.16153
			
•
			Published
				
			•
				
				44
			
 
	
	 
	
	
	
			
			Document Parsing Unveiled: Techniques, Challenges, and Prospects for
  Structured Information Extraction
		
			Paper
			
•
			2410.21169
			
•
			Published
				
			•
				
				30
			
 
	
	 
	
	
	
			
			A Survey of Small Language Models
		
			Paper
			
•
			2410.20011
			
•
			Published
				
			•
				
				46
			
 
	
	 
	
	
	
			
			PaliGemma 2: A Family of Versatile VLMs for Transfer
		
			Paper
			
•
			2412.03555
			
•
			Published
				
			•
				
				133
			
 
	
	 
	
	
	
			
			Florence-VL: Enhancing Vision-Language Models with Generative Vision
  Encoder and Depth-Breadth Fusion
		
			Paper
			
•
			2412.04424
			
•
			Published
				
			•
				
				63
			
 
	
	 
	
	
	
			
			DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse
  Synthetic Data and Global-to-Local Adaptive Perception
		
			Paper
			
•
			2410.12628
			
•
			Published
				
			•
				
				41
			
 
	
	 
	
	
	
			
			Question Answering on Patient Medical Records with Private Fine-Tuned
  LLMs
		
			Paper
			
•
			2501.13687
			
•
			Published
				
			•
				
				9
			
 
	
	 
	
	
	
			
			ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic
  Iterative Reasoning Agents
		
			Paper
			
•
			2502.18017
			
•
			Published
				
			•
				
				21
			
 
	
	 
	
	
	
			
			Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal
  Document Understanding
		
			Paper
			
•
			2506.16035
			
•
			Published
				
			•
				
				88
			
 
	
	 
	
	
	
			
			Pixels, Patterns, but No Poetry: To See The World like Humans
		
			Paper
			
•
			2507.16863
			
•
			Published
				
			•
				
				68
			
 
	
	 
	
	
	
			
			Group Sequence Policy Optimization
		
			Paper
			
•
			2507.18071
			
•
			Published
				
			•
				
				306
			
 
	
	 
	
	
	
			
			AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
		
			Paper
			
•
			2508.16153
			
•
			Published
				
			•
				
				154