b16ddf88c9328200c2959e9e9d858e82
This model is a fine-tuned version of studio-ousia/luke-japanese-base on the contemmcm/cls_20newsgroups dataset. It achieves the following results on the evaluation set:
- Loss: 0.7575
- Data Size: 1.0
- Epoch Runtime: 48.7416
- Accuracy: 0.8538
- F1 Macro: 0.8535
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: constant
- num_epochs: 50
Training results
| Training Loss | Epoch | Step | Validation Loss | Data Size | Epoch Runtime | Accuracy | F1 Macro |
|---|---|---|---|---|---|---|---|
| No log | 0 | 0 | 3.0167 | 0 | 3.9534 | 0.0436 | 0.0081 |
| No log | 1 | 499 | 3.0260 | 0.0078 | 4.6152 | 0.0517 | 0.0049 |
| 0.0301 | 2 | 998 | 3.0038 | 0.0156 | 5.0511 | 0.0512 | 0.0049 |
| 0.0543 | 3 | 1497 | 3.0172 | 0.0312 | 6.0237 | 0.0423 | 0.0041 |
| 0.1021 | 4 | 1996 | 2.9596 | 0.0625 | 7.3659 | 0.1001 | 0.0343 |
| 2.9625 | 5 | 2495 | 2.8732 | 0.125 | 10.1085 | 0.0786 | 0.0198 |
| 2.7083 | 6 | 2994 | 2.5794 | 0.25 | 15.5974 | 0.1452 | 0.0692 |
| 2.3978 | 7 | 3493 | 2.2822 | 0.5 | 26.2409 | 0.1948 | 0.1320 |
| 1.7382 | 8.0 | 3992 | 1.6128 | 1.0 | 49.2345 | 0.4723 | 0.4398 |
| 1.2808 | 9.0 | 4491 | 1.2417 | 1.0 | 49.0428 | 0.6116 | 0.5944 |
| 0.9815 | 10.0 | 4990 | 1.0719 | 1.0 | 48.5440 | 0.6673 | 0.6604 |
| 0.7689 | 11.0 | 5489 | 0.8721 | 1.0 | 48.7254 | 0.7240 | 0.7141 |
| 0.6496 | 12.0 | 5988 | 0.8674 | 1.0 | 49.1682 | 0.7518 | 0.7449 |
| 0.6087 | 13.0 | 6487 | 0.7625 | 1.0 | 47.2193 | 0.7901 | 0.7873 |
| 0.4475 | 14.0 | 6986 | 0.7612 | 1.0 | 48.0498 | 0.8092 | 0.8090 |
| 0.3914 | 15.0 | 7485 | 0.7046 | 1.0 | 48.2779 | 0.8259 | 0.8258 |
| 0.3933 | 16.0 | 7984 | 0.7010 | 1.0 | 48.3314 | 0.8271 | 0.8265 |
| 0.2843 | 17.0 | 8483 | 0.7255 | 1.0 | 49.7303 | 0.8319 | 0.8301 |
| 0.3222 | 18.0 | 8982 | 0.7237 | 1.0 | 47.5360 | 0.8327 | 0.8318 |
| 0.2325 | 19.0 | 9481 | 0.7072 | 1.0 | 47.5041 | 0.8395 | 0.8390 |
| 0.2313 | 20.0 | 9980 | 0.6788 | 1.0 | 47.7727 | 0.8374 | 0.8328 |
| 0.2347 | 21.0 | 10479 | 0.7167 | 1.0 | 50.5596 | 0.8488 | 0.8478 |
| 0.2437 | 22.0 | 10978 | 0.7177 | 1.0 | 49.1717 | 0.8465 | 0.8447 |
| 0.2261 | 23.0 | 11477 | 0.7627 | 1.0 | 48.5486 | 0.8468 | 0.8468 |
| 0.2093 | 24.0 | 11976 | 0.7575 | 1.0 | 48.7416 | 0.8538 | 0.8535 |
Framework versions
- Transformers 4.57.0
- Pytorch 2.8.0+cu128
- Datasets 4.3.0
- Tokenizers 0.22.1
- Downloads last month
- 7
Model tree for contemmcm/b16ddf88c9328200c2959e9e9d858e82
Base model
studio-ousia/luke-japanese-base