Mistral 7B Zephyr DPO V2

The Zephyr DPO recipe applied on top of Mistral 7B (new recipe with chatML format)

Model description

Model type: A 7.2B parameter GPT-like model fine-tuned on a mix of publicly available, synthetic datasets.
Language(s) (NLP): Primarily English
Finetuned from model: wandb/mistral-7b-zephyr-sft

We trained using the alignment handbook recipe and logging to W&B

Detailed results can be found here

Safetensors

Model size

7B params

Tensor type

BF16

Base model

Finetuned

Finetuned

(1)

this model

normalized accuracy on AI2 Reasoning Challenge (25-Shot)
test set Open LLM Leaderboard

63.050
normalized accuracy on HellaSwag (10-Shot)
validation set Open LLM Leaderboard

85.540
accuracy on MMLU (5-Shot)
test set Open LLM Leaderboard

61.880
mc2 on TruthfulQA (0-shot)
validation set Open LLM Leaderboard

59.300
accuracy on Winogrande (5-shot)
validation set Open LLM Leaderboard

78.530
accuracy on GSM8k (5-shot)
test set Open LLM Leaderboard

31.010