llama8b-netlist-lora

This model is a fine-tuned version of deepseek-ai/DeepSeek-R1-Distill-Llama-8B on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 1
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 16
total_train_batch_size: 16
optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
num_epochs: 3
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
1.1418	0.1465	50	1.1014
0.9843	0.2929	100	0.9528
0.8627	0.4394	150	0.9061
0.8749	0.5859	200	0.8757
0.8136	0.7323	250	0.8529
0.8294	0.8788	300	0.8440
0.7829	1.0234	350	0.8361
0.747	1.1699	400	0.8230
0.7567	1.3164	450	0.8226
0.7579	1.4628	500	0.8138
0.7387	1.6093	550	0.8079
0.7744	1.7558	600	0.8008
0.7494	1.9022	650	0.7939
0.6829	2.0469	700	0.7967
0.7044	2.1933	750	0.7945
0.7144	2.3398	800	0.7925
0.6889	2.4863	850	0.7894
0.7095	2.6327	900	0.7882
0.7064	2.7792	950	0.7878
0.6854	2.9257	1000	0.7873

Base model

Adapter

(53)

this model