view post Post 270 My deep dive in DeepSeek V3.2-exp DSA: https://disruptedai.substack.com/p/deepseek-v32-exp-deep-diveWould love to know your opinion. Thanks! See translation 👍 1 1 + Reply
BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution Paper • 2510.08697 • Published 26 days ago • 34
view post Post 1754 Just posted https://huggingface.co/blog/jlopez-dl/hybrid-attention-game-changer, for those interested in Hybrid Attention See translation 👍 2 2 + Reply
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language Paper • 2506.20920 • Published Jun 26 • 75
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 246
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published Jun 25, 2024 • 97
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations Paper • 2405.18392 • Published May 28, 2024 • 12
OctoPack: Instruction Tuning Code Large Language Models Paper • 2308.07124 • Published Aug 14, 2023 • 31
Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements Paper • 2210.01970 • Published Sep 30, 2022 • 13