Artificial Hippocampus Networks for Efficient Long-Context Modeling Paper • 2510.07318 • Published 21 days ago • 27
Muon Outperforms Adam in Tail-End Associative Memory Learning Paper • 2509.26030 • Published 30 days ago • 19
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain Paper • 2509.26507 • Published 29 days ago • 516
Textbooks Are All You Need II: phi-1.5 technical report Paper • 2309.05463 • Published Sep 11, 2023 • 88
TinyStories: How Small Can Language Models Be and Still Speak Coherent English? Paper • 2305.07759 • Published May 12, 2023 • 36