 pranay-j
			's Collections
			pranay-j
			's Collections
			
			
		LLM_architectures
		
	updated
			
 
				
				
 - Nemotron-4 15B Technical Report- 
			Paper
			 •- 
			2402.16819
			 •
			Published
				
			•- 
				46
			 
 - Griffin: Mixing Gated Linear Recurrences with Local Attention for
  Efficient Language Models- 
			Paper
			 •- 
			2402.19427
			 •
			Published
				
			•- 
				56
			 
 - RWKV: Reinventing RNNs for the Transformer Era- 
			Paper
			 •- 
			2305.13048
			 •
			Published
				
			•- 
				19
			 
 - Reformer: The Efficient Transformer- 
			Paper
			 •- 
			2001.04451
			 •
			Published
 - Attention Is All You Need- 
			Paper
			 •- 
			1706.03762
			 •
			Published
				
			•- 
				91
			 
 - BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding- 
			Paper
			 •- 
			1810.04805
			 •
			Published
				
			•- 
				23
			 
 - Exploring the Limits of Transfer Learning with a Unified Text-to-Text
  Transformer- 
			Paper
			 •- 
			1910.10683
			 •
			Published
				
			•- 
				14
			 
 - GLaM: Efficient Scaling of Language Models with Mixture-of-Experts- 
			Paper
			 •- 
			2112.06905
			 •
			Published
				
			•- 
				2
			 
 - UL2: Unifying Language Learning Paradigms- 
			Paper
			 •- 
			2205.05131
			 •
			Published
				
			•- 
				5
			 
 - BLOOM: A 176B-Parameter Open-Access Multilingual Language Model- 
			Paper
			 •- 
			2211.05100
			 •
			Published
				
			•- 
				34
			 
 - The Flan Collection: Designing Data and Methods for Effective
  Instruction Tuning- 
			Paper
			 •- 
			2301.13688
			 •
			Published
				
			•- 
				9
			 
 - Llama 2: Open Foundation and Fine-Tuned Chat Models- 
			Paper
			 •- 
			2307.09288
			 •
			Published
				
			•- 
				245
			 
 - Mamba: Linear-Time Sequence Modeling with Selective State Spaces- 
			Paper
			 •- 
			2312.00752
			 •
			Published
				
			•- 
				145
			 
 - Textbooks Are All You Need- 
			Paper
			 •- 
			2306.11644
			 •
			Published
				
			•- 
				146
			 
 - 
			Paper
			 •- 
			2310.06825
			 •
			Published
				
			•- 
				55
			 
 - SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
  Depth Up-Scaling- 
			Paper
			 •- 
			2312.15166
			 •
			Published
				
			•- 
				60
			 
 - Gemini: A Family of Highly Capable Multimodal Models- 
			Paper
			 •- 
			2312.11805
			 •
			Published
				
			•- 
				47
			 
 - 
			Paper
			 •- 
			2401.04088
			 •
			Published
				
			•- 
				159
			 
 - The Falcon Series of Open Language Models- 
			Paper
			 •- 
			2311.16867
			 •
			Published
				
			•- 
				14
			 
 - Gemma: Open Models Based on Gemini Research and Technology- 
			Paper
			 •- 
			2403.08295
			 •
			Published
				
			•- 
				50
			 
 - Jamba: A Hybrid Transformer-Mamba Language Model- 
			Paper
			 •- 
			2403.19887
			 •
			Published
				
			•- 
				111
			 
 - ReALM: Reference Resolution As Language Modeling- 
			Paper
			 •- 
			2403.20329
			 •
			Published
				
			•- 
				22
			 
 - Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence- 
			Paper
			 •- 
			2404.05892
			 •
			Published
				
			•- 
				39
			 
 - RecurrentGemma: Moving Past Transformers for Efficient Open Language
  Models- 
			Paper
			 •- 
			2404.07839
			 •
			Published
				
			•- 
				47
			 
 - Megalodon: Efficient LLM Pretraining and Inference with Unlimited
  Context Length- 
			Paper
			 •- 
			2404.08801
			 •
			Published
				
			•- 
				66
			 
 - Leave No Context Behind: Efficient Infinite Context Transformers with
  Infini-attention- 
			Paper
			 •- 
			2404.07143
			 •
			Published
				
			•- 
				111
			 
 - Phi-3 Technical Report: A Highly Capable Language Model Locally on Your
  Phone- 
			Paper
			 •- 
			2404.14219
			 •
			Published
				
			•- 
				257
			 
 - You Only Cache Once: Decoder-Decoder Architectures for Language Models- 
			Paper
			 •- 
			2405.05254
			 •
			Published
				
			•- 
				10
			 
 - TransformerFAM: Feedback attention is working memory- 
			Paper
			 •- 
			2404.09173
			 •
			Published
				
			•- 
				43
			 
 - ZeroQuant-V2: Exploring Post-training Quantization in LLMs from
  Comprehensive Study to Low Rank Compensation- 
			Paper
			 •- 
			2303.08302
			 •
			Published
 - Kolmogorov-Arnold Transformer- 
			Paper
			 •- 
			2409.10594
			 •
			Published
				
			•- 
				45
			 
 - Fast Inference from Transformers via Speculative Decoding- 
			Paper
			 •- 
			2211.17192
			 •
			Published
				
			•- 
				9
			 
 - Exploring the Limit of Outcome Reward for Learning Mathematical
  Reasoning- 
			Paper
			 •- 
			2502.06781
			 •
			Published
				
			•- 
				59
			 
 - Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth
  Approach- 
			Paper
			 •- 
			2502.05171
			 •
			Published
				
			•- 
				150