llama3-8b-mypo3_sim-full-beta15.0-lr4e-7
This model is a fine-tuned version of princeton-nlp/Llama-3-Base-8B-SFT on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:
- Loss: 1.3909
- Rewards/chosen: 0.0727
- Rewards/rejected: -0.3539
- Rewards/accuracies: 0.7460
- Rewards/margins: 0.4266
- Logps/rejected: -1.5130
- Logps/chosen: -1.2663
- Logits/rejected: -1.0742
- Logits/chosen: -1.0477
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 4e-07
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1.3855 | 0.0523 | 100 | 1.3827 | 0.0165 | -0.0619 | 0.6448 | 0.0784 | -1.4935 | -1.2700 | -1.0460 | -1.0142 |
| 1.4322 | 0.1047 | 200 | 1.3940 | -0.0734 | -0.2988 | 0.7004 | 0.2254 | -1.5093 | -1.2760 | -1.0357 | -1.0064 |
| 1.4033 | 0.1570 | 300 | 1.4074 | 0.0406 | -0.2379 | 0.7103 | 0.2785 | -1.5053 | -1.2684 | -1.0433 | -1.0153 |
| 1.4072 | 0.2094 | 400 | 1.4383 | 0.1589 | -0.1461 | 0.7222 | 0.3050 | -1.4992 | -1.2606 | -1.0630 | -1.0346 |
| 1.4887 | 0.2617 | 500 | 1.5038 | 0.3697 | 0.0045 | 0.7044 | 0.3652 | -1.4891 | -1.2465 | -1.0874 | -1.0585 |
| 1.4435 | 0.3141 | 600 | 1.4243 | -0.0291 | -0.4141 | 0.7282 | 0.3849 | -1.5170 | -1.2731 | -1.0806 | -1.0527 |
| 1.3754 | 0.3664 | 700 | 1.4125 | 0.0848 | -0.3254 | 0.7440 | 0.4102 | -1.5111 | -1.2655 | -1.0871 | -1.0583 |
| 1.4283 | 0.4187 | 800 | 1.4428 | -0.0817 | -0.5182 | 0.7262 | 0.4365 | -1.5240 | -1.2766 | -1.0950 | -1.0662 |
| 1.4394 | 0.4711 | 900 | 1.4416 | 0.1838 | -0.2361 | 0.7421 | 0.4199 | -1.5051 | -1.2589 | -1.0675 | -1.0399 |
| 1.3847 | 0.5234 | 1000 | 1.4147 | 0.1135 | -0.3291 | 0.7440 | 0.4426 | -1.5114 | -1.2636 | -1.0717 | -1.0453 |
| 1.4128 | 0.5758 | 1100 | 1.4050 | 0.1814 | -0.2344 | 0.7341 | 0.4158 | -1.5050 | -1.2591 | -1.0804 | -1.0533 |
| 1.4134 | 0.6281 | 1200 | 1.3930 | 0.0583 | -0.3878 | 0.7381 | 0.4461 | -1.5153 | -1.2673 | -1.0603 | -1.0341 |
| 1.3657 | 0.6805 | 1300 | 1.3927 | 0.0738 | -0.3600 | 0.7361 | 0.4338 | -1.5134 | -1.2662 | -1.0977 | -1.0688 |
| 1.3569 | 0.7328 | 1400 | 1.4012 | 0.1382 | -0.3017 | 0.7381 | 0.4399 | -1.5095 | -1.2619 | -1.0589 | -1.0332 |
| 1.4025 | 0.7851 | 1500 | 1.3905 | 0.0713 | -0.3643 | 0.7460 | 0.4356 | -1.5137 | -1.2664 | -1.0775 | -1.0504 |
| 1.4056 | 0.8375 | 1600 | 1.3950 | 0.1345 | -0.2974 | 0.7341 | 0.4319 | -1.5092 | -1.2622 | -1.0792 | -1.0522 |
| 1.3963 | 0.8898 | 1700 | 1.3916 | 0.0752 | -0.3533 | 0.7401 | 0.4285 | -1.5130 | -1.2661 | -1.0792 | -1.0522 |
| 1.3775 | 0.9422 | 1800 | 1.3900 | 0.0792 | -0.3462 | 0.7401 | 0.4255 | -1.5125 | -1.2659 | -1.0729 | -1.0464 |
| 1.3827 | 0.9945 | 1900 | 1.3904 | 0.0707 | -0.3510 | 0.7401 | 0.4217 | -1.5128 | -1.2664 | -1.0743 | -1.0477 |
Framework versions
- Transformers 4.43.1
- Pytorch 2.1.2+cu121
- Datasets 2.18.0
- Tokenizers 0.19.1
- Downloads last month
- 2
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for aaaalongaa/llama3-8b-mypo3_sim-full-beta15.0-lr4e-7
Base model
princeton-nlp/Llama-3-Base-8B-SFT