---
license: apache-2.0
base_model: alignment-handbook/zephyr-7b-sft-full
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: zephyr-7b-dpo-full-prometheus-high-curriculum
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# zephyr-7b-dpo-full-prometheus-high-curriculum

This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.4919
- Rewards/chosen: -1.0893
- Rewards/rejected: -1.9611
- Rewards/accuracies: 0.7457
- Rewards/margins: 0.8717
- Logps/rejected: -444.3828
- Logps/chosen: -368.8934
- Logits/rejected: 2.6386
- Logits/chosen: 2.0277

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 8
- seed: 55
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 2
- total_train_batch_size: 128
- total_eval_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6482        | 0.1143 | 50   | 0.6407          | -0.1062        | -0.2248          | 0.6853             | 0.1186          | -270.7546      | -270.5768    | -2.5123         | -2.5571       |
| 0.5804        | 0.2286 | 100  | 0.5786          | -0.5277        | -0.8872          | 0.6724             | 0.3595          | -336.9930      | -312.7311    | -2.4375         | -2.4881       |
| 0.5484        | 0.3429 | 150  | 0.5353          | -0.9089        | -1.5948          | 0.7241             | 0.6859          | -407.7577      | -350.8555    | 0.5418          | 0.1584        |
| 0.5259        | 0.4571 | 200  | 0.5121          | -1.0561        | -1.8604          | 0.7371             | 0.8043          | -434.3204      | -365.5715    | 1.3888          | 0.9506        |
| 0.5137        | 0.5714 | 250  | 0.5031          | -0.8771        | -1.6989          | 0.7155             | 0.8218          | -418.1702      | -347.6707    | 1.5929          | 1.0452        |
| 0.5034        | 0.6857 | 300  | 0.4974          | -1.1806        | -1.9828          | 0.7241             | 0.8022          | -446.5578      | -378.0252    | 2.7550          | 2.1828        |
| 0.498         | 0.8    | 350  | 0.4931          | -1.1358        | -1.9863          | 0.7414             | 0.8505          | -446.9080      | -373.5433    | 2.6879          | 2.0881        |
| 0.4898        | 0.9143 | 400  | 0.4919          | -1.0893        | -1.9611          | 0.7457             | 0.8717          | -444.3828      | -368.8934    | 2.6386          | 2.0277        |


### Framework versions

- Transformers 4.44.0.dev0
- Pytorch 2.1.2
- Datasets 2.20.0
- Tokenizers 0.19.1