Running on CPU Upgrade Featured 2.81k The Smol Training Playbook 📚 2.81k The secrets to building world-class LLMs
Predicting the Order of Upcoming Tokens Improves Language Modeling Paper • 2508.19228 • Published Aug 26, 2025 • 23
Datasheets Aren't Enough: DataRubrics for Automated Quality Metrics and Accountability Paper • 2506.01789 • Published Jun 2, 2025 • 14
Crosslingual Reasoning through Test-Time Scaling Paper • 2505.05408 • Published May 8, 2025 • 8
Softpick: No Attention Sink, No Massive Activations with Rectified Softmax Paper • 2504.20966 • Published Apr 29, 2025 • 31
WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines Paper • 2410.12705 • Published Oct 16, 2024 • 32