--- license: apache-2.0 tags: - text-generation - gpt2 - finetuned datasets: - karpathy/fineweb-edu-100b-shuffle - HuggingFaceTB/smoltalk language: en library_name: transformers --- # GPT-2 450M FineWebEdu & SmolTalk Training Progress ### FineWebEdu Pre-Traning | **Step** | **Train PPL** | **Avg Loss** | **Current Loss** | **Notes / Key Events** | | :-------: | :-----------: | :----------: | :--------------: | :------------------- | | ~0 – 10 k | ~100 – 150 | 4.6 – 5.0 | Varies | Initial warm-up, large fluctuations | | 20 k | 70 | 4.25 | Varies | First solid convergence milestone | | 24 k | 63 | 4.14 | Varies | Smooth training, steady drop | | 26 k | 60 | 4.09 | Varies | Stable regime, lower variance | | 37 k | 52.97 | 3.970 | 3.818 | Post-resume from 36 k checkpoint | | 38 k | 64.96 | 4.174 | 4.119 | Spike (hard batch / buffer shuffle) | | 39 k | 55.24 | 4.012 | 3.678 | Recovery from spike | | 40 k | 53.61 | 3.982 | 4.230 | HF safetensors push checkpoint | | 41 k | 49.98 | 3.912 | 4.163 | Broke below 50 PPL | | 42 k | 54.33 | 3.995 | 4.313 | Slight fluctuation | | 43 k | 51.27 | 3.937 | 3.925 | Stabilizing phase | | 44 k | 50.74 | 3.927 | 3.894 | Smooth training | | 45 k | 51.12 | 3.934 | 3.744 | Minor plateau | | 46 k | 53.87 | 3.987 | 4.145 | Batch variance | | 47 k | 52.39 | 3.959 | 4.092 | Mid-range phase | | 48 k | **43.85** | **3.781** | 4.038 | 🏅 Best transient PPL drop so far | | 49 k | 48.94 | 3.891 | 3.780 | Rebound stabilization | | 50 k | **44.37** | **3.793** | 3.821 | ✅ HF milestone push (pre-resume) | ![Pre-Training Plot](./plot.jpg) ### SmolTalk Fine-Tuning (50% Conversation / 50% Instruction) | **Step** | **Train PPL** | **Avg Loss** | **Current Loss** | **Notes / Key Events** | | :-------: | :-----------: | :----------: | :--------------: | :------------------- | | 0 – 0.5k | 13.38 | 2.5937 | 2.2737 | Initial SmolTalk fine-tuning, Mix: Conv 46.6%, Instruct 53.4%; checkpoint saved at step 500 | | 0.5k – 1k| 9.89 | 2.2916 | 2.2337 | Eval at step 1000: ✅ Eval PPL: 9.31; Mix: Conv 46.9%, Inst 53.1%; checkpoint saved at step 1000 | |1k – 1.5k| 9.96 | 2.2989 | 1.8901 | Stable training, Mix: Conv 47.8%, Instruct 52.2%; checkpoint saved at step 1500 | |1.5k – 2k| 8.32 | 2.1190 | 1.8197 | SmolTalk Mix balanced, Mix: Conv 48.9%, Instruct 51.1%; checkpoint saved at step 2000 | ### **EVAL AT 50K** | Step | Train PPL | Avg Loss | Current Loss | Eval PPL | HellaSwag | Notes / Key Events | |:----:|:---------:|:--------:|:------------:|:--------:|:---------:|:-----------------| | 50k | 44.37 | 3.793 | 3.821 | 45.68 | 31 | ✅ HF milestone push / pre-resume evaluation |