b16ddf88c9328200c2959e9e9d858e82

This model is a fine-tuned version of studio-ousia/luke-japanese-base on the contemmcm/cls_20newsgroups dataset. It achieves the following results on the evaluation set:

Loss: 0.7575
Data Size: 1.0
Epoch Runtime: 48.7416
Accuracy: 0.8538
F1 Macro: 0.8535

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Accuracy	F1 Macro
No log	0	0	3.0167	0	3.9534	0.0436	0.0081
No log	1	499	3.0260	0.0078	4.6152	0.0517	0.0049
0.0301	2	998	3.0038	0.0156	5.0511	0.0512	0.0049
0.0543	3	1497	3.0172	0.0312	6.0237	0.0423	0.0041
0.1021	4	1996	2.9596	0.0625	7.3659	0.1001	0.0343
2.9625	5	2495	2.8732	0.125	10.1085	0.0786	0.0198
2.7083	6	2994	2.5794	0.25	15.5974	0.1452	0.0692
2.3978	7	3493	2.2822	0.5	26.2409	0.1948	0.1320
1.7382	8.0	3992	1.6128	1.0	49.2345	0.4723	0.4398
1.2808	9.0	4491	1.2417	1.0	49.0428	0.6116	0.5944
0.9815	10.0	4990	1.0719	1.0	48.5440	0.6673	0.6604
0.7689	11.0	5489	0.8721	1.0	48.7254	0.7240	0.7141
0.6496	12.0	5988	0.8674	1.0	49.1682	0.7518	0.7449
0.6087	13.0	6487	0.7625	1.0	47.2193	0.7901	0.7873
0.4475	14.0	6986	0.7612	1.0	48.0498	0.8092	0.8090
0.3914	15.0	7485	0.7046	1.0	48.2779	0.8259	0.8258
0.3933	16.0	7984	0.7010	1.0	48.3314	0.8271	0.8265
0.2843	17.0	8483	0.7255	1.0	49.7303	0.8319	0.8301
0.3222	18.0	8982	0.7237	1.0	47.5360	0.8327	0.8318
0.2325	19.0	9481	0.7072	1.0	47.5041	0.8395	0.8390
0.2313	20.0	9980	0.6788	1.0	47.7727	0.8374	0.8328
0.2347	21.0	10479	0.7167	1.0	50.5596	0.8488	0.8478
0.2437	22.0	10978	0.7177	1.0	49.1717	0.8465	0.8447
0.2261	23.0	11477	0.7627	1.0	48.5486	0.8468	0.8468
0.2093	24.0	11976	0.7575	1.0	48.7416	0.8538	0.8535

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.3.0
Tokenizers 0.22.1

Downloads last month: 7

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for contemmcm/b16ddf88c9328200c2959e9e9d858e82

Base model

studio-ousia/luke-japanese-base

Finetuned

(16)

this model

Evaluation results

Metadata error: specify a dataset to view leaderboard