training script
#17
by
						
sugintama
	
							
						- opened
							
					
In the example training script, the parameters are lr=0.001, betas=[0.9,0.999], weight_decay=0.0001. But in the released NeMo file, they are lr=1.0e-05, betas=[0.9,0.98], weight_decay=0.001, sched:=CosineAnnealing. Which set should I use?
We trained this model in two stages as mentioned here: https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3#training
First stage:  Trained for 150000 steps usinglr=0.001 with CosineAnnealing scheduler and warmup of 15000
Second stage: Trained for 5000 steps usinglr=1e-5 with CosineAnnealing scheduler and warmup of 0

