- 
	
	
	LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMsPaper • 2408.07055 • Published • 67
- 
	
	
	LongVILA: Scaling Long-Context Visual Language Models for Long VideosPaper • 2408.10188 • Published • 52
- 
	
	
	Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language ModelsPaper • 2408.15518 • Published • 42
Collections
Discover the best community collections!
Collections including paper arxiv:2408.15518 
						
					
				- 
	
	
	RLHF Workflow: From Reward Modeling to Online RLHFPaper • 2405.07863 • Published • 71
- 
	
	
	Chameleon: Mixed-Modal Early-Fusion Foundation ModelsPaper • 2405.09818 • Published • 131
- 
	
	
	Meteor: Mamba-based Traversal of Rationale for Large Language and Vision ModelsPaper • 2405.15574 • Published • 55
- 
	
	
	An Introduction to Vision-Language ModelingPaper • 2405.17247 • Published • 90
- 
	
	
	Chain-of-Verification Reduces Hallucination in Large Language ModelsPaper • 2309.11495 • Published • 39
- 
	
	
	Adapting Large Language Models via Reading ComprehensionPaper • 2309.09530 • Published • 81
- 
	
	
	CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 LanguagesPaper • 2309.09400 • Published • 85
- 
	
	
	Language Modeling Is CompressionPaper • 2309.10668 • Published • 83
- 
	
	
	VILA^2: VILA Augmented VILAPaper • 2407.17453 • Published • 41
- 
	
	
	Octopus v4: Graph of language modelsPaper • 2404.19296 • Published • 118
- 
	
	
	Octo-planner: On-device Language Model for Planner-Action AgentsPaper • 2406.18082 • Published • 48
- 
	
	
	Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language ModelsPaper • 2408.15518 • Published • 42
- 
	
	
	CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text DataPaper • 2404.15653 • Published • 29
- 
	
	
	MoDE: CLIP Data Experts via ClusteringPaper • 2404.16030 • Published • 15
- 
	
	
	MoRA: High-Rank Updating for Parameter-Efficient Fine-TuningPaper • 2405.12130 • Published • 50
- 
	
	
	Reducing Transformer Key-Value Cache Size with Cross-Layer AttentionPaper • 2405.12981 • Published • 33
- 
	
	
	Language Agent Tree Search Unifies Reasoning Acting and Planning in Language ModelsPaper • 2310.04406 • Published • 10
- 
	
	
	Chain-of-Thought Reasoning Without PromptingPaper • 2402.10200 • Published • 109
- 
	
	
	ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference OptimizationPaper • 2402.09320 • Published • 6
- 
	
	
	Self-Discover: Large Language Models Self-Compose Reasoning StructuresPaper • 2402.03620 • Published • 117
- 
	
	
	LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMsPaper • 2408.07055 • Published • 67
- 
	
	
	LongVILA: Scaling Long-Context Visual Language Models for Long VideosPaper • 2408.10188 • Published • 52
- 
	
	
	Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language ModelsPaper • 2408.15518 • Published • 42
- 
	
	
	VILA^2: VILA Augmented VILAPaper • 2407.17453 • Published • 41
- 
	
	
	Octopus v4: Graph of language modelsPaper • 2404.19296 • Published • 118
- 
	
	
	Octo-planner: On-device Language Model for Planner-Action AgentsPaper • 2406.18082 • Published • 48
- 
	
	
	Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language ModelsPaper • 2408.15518 • Published • 42
- 
	
	
	RLHF Workflow: From Reward Modeling to Online RLHFPaper • 2405.07863 • Published • 71
- 
	
	
	Chameleon: Mixed-Modal Early-Fusion Foundation ModelsPaper • 2405.09818 • Published • 131
- 
	
	
	Meteor: Mamba-based Traversal of Rationale for Large Language and Vision ModelsPaper • 2405.15574 • Published • 55
- 
	
	
	An Introduction to Vision-Language ModelingPaper • 2405.17247 • Published • 90
- 
	
	
	CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text DataPaper • 2404.15653 • Published • 29
- 
	
	
	MoDE: CLIP Data Experts via ClusteringPaper • 2404.16030 • Published • 15
- 
	
	
	MoRA: High-Rank Updating for Parameter-Efficient Fine-TuningPaper • 2405.12130 • Published • 50
- 
	
	
	Reducing Transformer Key-Value Cache Size with Cross-Layer AttentionPaper • 2405.12981 • Published • 33
- 
	
	
	Language Agent Tree Search Unifies Reasoning Acting and Planning in Language ModelsPaper • 2310.04406 • Published • 10
- 
	
	
	Chain-of-Thought Reasoning Without PromptingPaper • 2402.10200 • Published • 109
- 
	
	
	ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference OptimizationPaper • 2402.09320 • Published • 6
- 
	
	
	Self-Discover: Large Language Models Self-Compose Reasoning StructuresPaper • 2402.03620 • Published • 117
- 
	
	
	Chain-of-Verification Reduces Hallucination in Large Language ModelsPaper • 2309.11495 • Published • 39
- 
	
	
	Adapting Large Language Models via Reading ComprehensionPaper • 2309.09530 • Published • 81
- 
	
	
	CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 LanguagesPaper • 2309.09400 • Published • 85
- 
	
	
	Language Modeling Is CompressionPaper • 2309.10668 • Published • 83
 
							
							 
							
							 
							
							 
							
							