zephyr-7b-dpo-full-prometheus-high-curriculum
This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.4919
- Rewards/chosen: -1.0893
- Rewards/rejected: -1.9611
- Rewards/accuracies: 0.7457
- Rewards/margins: 0.8717
- Logps/rejected: -444.3828
- Logps/chosen: -368.8934
- Logits/rejected: 2.6386
- Logits/chosen: 2.0277
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 8
- seed: 55
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 2
- total_train_batch_size: 128
- total_eval_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.6482 | 0.1143 | 50 | 0.6407 | -0.1062 | -0.2248 | 0.6853 | 0.1186 | -270.7546 | -270.5768 | -2.5123 | -2.5571 |
| 0.5804 | 0.2286 | 100 | 0.5786 | -0.5277 | -0.8872 | 0.6724 | 0.3595 | -336.9930 | -312.7311 | -2.4375 | -2.4881 |
| 0.5484 | 0.3429 | 150 | 0.5353 | -0.9089 | -1.5948 | 0.7241 | 0.6859 | -407.7577 | -350.8555 | 0.5418 | 0.1584 |
| 0.5259 | 0.4571 | 200 | 0.5121 | -1.0561 | -1.8604 | 0.7371 | 0.8043 | -434.3204 | -365.5715 | 1.3888 | 0.9506 |
| 0.5137 | 0.5714 | 250 | 0.5031 | -0.8771 | -1.6989 | 0.7155 | 0.8218 | -418.1702 | -347.6707 | 1.5929 | 1.0452 |
| 0.5034 | 0.6857 | 300 | 0.4974 | -1.1806 | -1.9828 | 0.7241 | 0.8022 | -446.5578 | -378.0252 | 2.7550 | 2.1828 |
| 0.498 | 0.8 | 350 | 0.4931 | -1.1358 | -1.9863 | 0.7414 | 0.8505 | -446.9080 | -373.5433 | 2.6879 | 2.0881 |
| 0.4898 | 0.9143 | 400 | 0.4919 | -1.0893 | -1.9611 | 0.7457 | 0.8717 | -444.3828 | -368.8934 | 2.6386 | 2.0277 |
Framework versions
- Transformers 4.44.0.dev0
- Pytorch 2.1.2
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for sfulay/zephyr-7b-dpo-full-prometheus-high-curriculum
Base model
mistralai/Mistral-7B-v0.1
Finetuned
alignment-handbook/zephyr-7b-sft-full