train_wsc_42_1760355875

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

Loss: 0.3453
Num Input Tokens Seen: 492304

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.03
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10.0

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.7994	0.504	63	0.5149	24288
0.3727	1.008	126	0.4319	49584
0.3525	1.512	189	0.3483	74512
0.3517	2.016	252	0.3508	99264
0.3456	2.52	315	0.3498	123360
0.3684	3.024	378	0.3755	149120
0.3371	3.528	441	0.3488	174208
0.3435	4.032	504	0.3463	198016
0.3496	4.536	567	0.3453	223296
0.3514	5.04	630	0.3488	247344
0.337	5.5440	693	0.3498	271856
0.336	6.048	756	0.3467	297472
0.3464	6.552	819	0.3547	322272
0.3185	7.056	882	0.3476	347200
0.3416	7.5600	945	0.3508	372576
0.324	8.064	1008	0.3456	397008
0.3462	8.568	1071	0.3482	421904
0.3445	9.072	1134	0.3459	446720
0.3411	9.576	1197	0.3495	471168

Framework versions

PEFT 0.15.2
Transformers 4.51.3
Pytorch 2.8.0+cu128
Datasets 3.6.0
Tokenizers 0.21.1

Downloads last month: 125

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_42_1760355875

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2013)

this model

Evaluation results

Metadata error: specify a dataset to view leaderboard