029ea5a9973ca7672d4f63af802a9614
This model is a fine-tuned version of studio-ousia/mluke-large-lite on the nyu-mll/glue [stsb] dataset. It achieves the following results on the evaluation set:
- Loss: 0.4841
- Data Size: 1.0
- Epoch Runtime: 36.7842
- Mse: 0.4842
- Mae: 0.5275
- R2: 0.7834
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: constant
- num_epochs: 50
Training results
| Training Loss | Epoch | Step | Validation Loss | Data Size | Epoch Runtime | Mse | Mae | R2 |
|---|---|---|---|---|---|---|---|---|
| No log | 0 | 0 | 8.0694 | 0 | 2.8893 | 8.0706 | 2.4163 | -2.6103 |
| No log | 1 | 179 | 4.0866 | 0.0078 | 3.7698 | 4.0877 | 1.6523 | -0.8286 |
| No log | 2 | 358 | 3.7891 | 0.0156 | 3.9160 | 3.7901 | 1.6106 | -0.6955 |
| No log | 3 | 537 | 2.8680 | 0.0312 | 5.5733 | 2.8685 | 1.4032 | -0.2832 |
| No log | 4 | 716 | 2.4632 | 0.0625 | 7.6556 | 2.4641 | 1.3432 | -0.1023 |
| No log | 5 | 895 | 2.2990 | 0.125 | 10.0617 | 2.2998 | 1.2733 | -0.0288 |
| 0.1899 | 6 | 1074 | 0.8564 | 0.25 | 14.8123 | 0.8569 | 0.7272 | 0.6167 |
| 0.7716 | 7 | 1253 | 0.6077 | 0.5 | 24.3844 | 0.6081 | 0.6279 | 0.7280 |
| 0.5351 | 8.0 | 1432 | 0.4998 | 1.0 | 40.3381 | 0.5001 | 0.5700 | 0.7763 |
| 0.3326 | 9.0 | 1611 | 0.4956 | 1.0 | 36.5045 | 0.4958 | 0.5464 | 0.7782 |
| 0.2897 | 10.0 | 1790 | 0.5737 | 1.0 | 37.7494 | 0.5737 | 0.5835 | 0.7434 |
| 0.224 | 11.0 | 1969 | 0.6626 | 1.0 | 37.5533 | 0.6625 | 0.6259 | 0.7036 |
| 0.1742 | 12.0 | 2148 | 0.6232 | 1.0 | 36.4729 | 0.6233 | 0.5975 | 0.7212 |
| 0.164 | 13.0 | 2327 | 0.4797 | 1.0 | 37.0413 | 0.4797 | 0.5247 | 0.7854 |
| 0.1235 | 14.0 | 2506 | 0.5450 | 1.0 | 36.2418 | 0.5450 | 0.5508 | 0.7562 |
| 0.1068 | 15.0 | 2685 | 0.5316 | 1.0 | 36.4645 | 0.5316 | 0.5460 | 0.7622 |
| 0.102 | 16.0 | 2864 | 0.4720 | 1.0 | 36.6696 | 0.4720 | 0.5192 | 0.7889 |
| 0.0962 | 17.0 | 3043 | 0.4488 | 1.0 | 36.5595 | 0.4489 | 0.5036 | 0.7992 |
| 0.0788 | 18.0 | 3222 | 0.4922 | 1.0 | 37.7005 | 0.4922 | 0.5357 | 0.7798 |
| 0.0755 | 19.0 | 3401 | 0.4559 | 1.0 | 36.4642 | 0.4560 | 0.5075 | 0.7960 |
| 0.0791 | 20.0 | 3580 | 0.4919 | 1.0 | 36.6052 | 0.4920 | 0.5331 | 0.7799 |
| 0.0752 | 21.0 | 3759 | 0.4841 | 1.0 | 36.7842 | 0.4842 | 0.5275 | 0.7834 |
Framework versions
- Transformers 4.57.0
- Pytorch 2.8.0+cu128
- Datasets 4.3.0
- Tokenizers 0.22.1
- Downloads last month
- 6
Model tree for contemmcm/029ea5a9973ca7672d4f63af802a9614
Base model
studio-ousia/mluke-large-lite