Text Generation
Transformers
English
gpt2
finetuned

GPT-2 450M FineWebEdu & SmolTalk Training Progress

FineWebEdu Pre-Traning

Step Train PPL Avg Loss Current Loss Notes / Key Events
~0 – 10 k ~100 – 150 4.6 – 5.0 Varies Initial warm-up, large fluctuations
20 k 70 4.25 Varies First solid convergence milestone
24 k 63 4.14 Varies Smooth training, steady drop
26 k 60 4.09 Varies Stable regime, lower variance
37 k 52.97 3.970 3.818 Post-resume from 36 k checkpoint
38 k 64.96 4.174 4.119 Spike (hard batch / buffer shuffle)
39 k 55.24 4.012 3.678 Recovery from spike
40 k 53.61 3.982 4.230 HF safetensors push checkpoint
41 k 49.98 3.912 4.163 Broke below 50 PPL
42 k 54.33 3.995 4.313 Slight fluctuation
43 k 51.27 3.937 3.925 Stabilizing phase
44 k 50.74 3.927 3.894 Smooth training
45 k 51.12 3.934 3.744 Minor plateau
46 k 53.87 3.987 4.145 Batch variance
47 k 52.39 3.959 4.092 Mid-range phase
48 k 43.85 3.781 4.038 πŸ… Best transient PPL drop so far
49 k 48.94 3.891 3.780 Rebound stabilization
50 k 44.37 3.793 3.821 βœ… HF milestone push (pre-resume)

Pre-Training Plot

SmolTalk Fine-Tuning (50% Conversation / 50% Instruction)

Step Train PPL Avg Loss Current Loss Notes / Key Events
0 – 0.5k 13.38 2.5937 2.2737 Initial SmolTalk fine-tuning, Mix: Conv 46.6%, Instruct 53.4%; checkpoint saved at step 500
0.5k – 1k 9.89 2.2916 2.2337 Eval at step 1000: βœ… Eval PPL: 9.31; Mix: Conv 46.9%, Inst 53.1%; checkpoint saved at step 1000
1k – 1.5k 9.96 2.2989 1.8901 Stable training, Mix: Conv 47.8%, Instruct 52.2%; checkpoint saved at step 1500
1.5k – 2k 8.32 2.1190 1.8197 SmolTalk Mix balanced, Mix: Conv 48.9%, Instruct 51.1%; checkpoint saved at step 2000

EVAL AT 50K

Step Train PPL Avg Loss Current Loss Eval PPL HellaSwag Notes / Key Events
50k 44.37 3.793 3.821 45.68 31 βœ… HF milestone push / pre-resume evaluation
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Datasets used to train nnsohamnn/gpt2-450M-fineweb

Space using nnsohamnn/gpt2-450M-fineweb 1