--- license: apache-2.0 base_model: alignment-handbook/zephyr-7b-sft-full tags: - trl - dpo - generated_from_trainer model-index: - name: zephyr-7b-dpo-full-prometheus-high-curriculum results: [] --- # zephyr-7b-dpo-full-prometheus-high-curriculum This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the None dataset. It achieves the following results on the evaluation set: - Loss: 0.4919 - Rewards/chosen: -1.0893 - Rewards/rejected: -1.9611 - Rewards/accuracies: 0.7457 - Rewards/margins: 0.8717 - Logps/rejected: -444.3828 - Logps/chosen: -368.8934 - Logits/rejected: 2.6386 - Logits/chosen: 2.0277 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-07 - train_batch_size: 8 - eval_batch_size: 8 - seed: 55 - distributed_type: multi-GPU - num_devices: 8 - gradient_accumulation_steps: 2 - total_train_batch_size: 128 - total_eval_batch_size: 64 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6482 | 0.1143 | 50 | 0.6407 | -0.1062 | -0.2248 | 0.6853 | 0.1186 | -270.7546 | -270.5768 | -2.5123 | -2.5571 | | 0.5804 | 0.2286 | 100 | 0.5786 | -0.5277 | -0.8872 | 0.6724 | 0.3595 | -336.9930 | -312.7311 | -2.4375 | -2.4881 | | 0.5484 | 0.3429 | 150 | 0.5353 | -0.9089 | -1.5948 | 0.7241 | 0.6859 | -407.7577 | -350.8555 | 0.5418 | 0.1584 | | 0.5259 | 0.4571 | 200 | 0.5121 | -1.0561 | -1.8604 | 0.7371 | 0.8043 | -434.3204 | -365.5715 | 1.3888 | 0.9506 | | 0.5137 | 0.5714 | 250 | 0.5031 | -0.8771 | -1.6989 | 0.7155 | 0.8218 | -418.1702 | -347.6707 | 1.5929 | 1.0452 | | 0.5034 | 0.6857 | 300 | 0.4974 | -1.1806 | -1.9828 | 0.7241 | 0.8022 | -446.5578 | -378.0252 | 2.7550 | 2.1828 | | 0.498 | 0.8 | 350 | 0.4931 | -1.1358 | -1.9863 | 0.7414 | 0.8505 | -446.9080 | -373.5433 | 2.6879 | 2.0881 | | 0.4898 | 0.9143 | 400 | 0.4919 | -1.0893 | -1.9611 | 0.7457 | 0.8717 | -444.3828 | -368.8934 | 2.6386 | 2.0277 | ### Framework versions - Transformers 4.44.0.dev0 - Pytorch 2.1.2 - Datasets 2.20.0 - Tokenizers 0.19.1