pythia-helpful-1epoch
Collection
Pythia-2.8b supervised finetuned and DPO finetuned with the helpful subset of Anthropic-hh-rlhf dataset for 1 epoch.
•
12 items
•
Updated
Pythia-410m supervised finetuned using TRLx library with the helpful subset of Anthropic-hh-rlhf dataset for 1 epoch.
Checkpoints are also uploaded.
Fully reproducible finetuning code is available on GitHub
See Pythia-410m for model details (paper).
See further details of these models in the paper Attributing Mode Collapse in the Fine-Tuning of Large Language Models.
You can cite these models if they are helpful as follows:
@inproceedings{o2024attributing,
title={Attributing Mode Collapse in the Fine-Tuning of Large Language Models},
author={O’Mahony, Laura and Grinsztajn, Leo and Schoelkopf, Hailey and Biderman, Stella},
booktitle={ICLR 2024, Mathematical and Empirical Understanding of Foundation Models (ME-FoMo) workshop},
year={2024}
}
hf (pretrained=lomahony/pythia-410m-helpful-sft), gen_kwargs: (None), limit: None, num_fewshot: 0, batch_size: 16
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
|---|---|---|---|---|---|---|---|
| arc_challenge | 1 | none | 0 | acc | 0.2355 | ± | 0.0124 |
| none | 0 | acc_norm | 0.2594 | ± | 0.0128 | ||
| arc_easy | 1 | none | 0 | acc | 0.5051 | ± | 0.0103 |
| none | 0 | acc_norm | 0.4478 | ± | 0.0102 | ||
| boolq | 2 | none | 0 | acc | 0.6113 | ± | 0.0085 |
| hellaswag | 1 | none | 0 | acc | 0.3372 | ± | 0.0047 |
| none | 0 | acc_norm | 0.4001 | ± | 0.0049 | ||
| lambada_openai | 1 | none | 0 | perplexity | 21.8172 | ± | 0.7736 |
| none | 0 | acc | 0.3755 | ± | 0.0067 | ||
| openbookqa | 1 | none | 0 | acc | 0.1940 | ± | 0.0177 |
| none | 0 | acc_norm | 0.2960 | ± | 0.0204 | ||
| piqa | 1 | none | 0 | acc | 0.6719 | ± | 0.0110 |
| none | 0 | acc_norm | 0.6687 | ± | 0.0110 | ||
| sciq | 1 | none | 0 | acc | 0.7700 | ± | 0.0133 |
| none | 0 | acc_norm | 0.6540 | ± | 0.0151 | ||
| wikitext | 2 | none | 0 | word_perplexity | 23.8136 | ± | N/A |
| none | 0 | byte_perplexity | 1.8091 | ± | N/A | ||
| none | 0 | bits_per_byte | 0.8553 | ± | N/A | ||
| winogrande | 1 | none | 0 | acc | 0.5320 | ± | 0.0140 |
hf (pretrained=lomahony/pythia-410m-helpful-sft), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 16
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
|---|---|---|---|---|---|---|---|
| arc_challenge | 1 | none | 5 | acc | 0.2355 | ± | 0.0124 |
| none | 5 | acc_norm | 0.2790 | ± | 0.0131 | ||
| arc_easy | 1 | none | 5 | acc | 0.5274 | ± | 0.0102 |
| none | 5 | acc_norm | 0.5072 | ± | 0.0103 | ||
| boolq | 2 | none | 5 | acc | 0.5226 | ± | 0.0087 |
| hellaswag | 1 | none | 5 | acc | 0.3367 | ± | 0.0047 |
| none | 5 | acc_norm | 0.3991 | ± | 0.0049 | ||
| lambada_openai | 1 | none | 5 | perplexity | 37.4791 | ± | 1.3737 |
| none | 5 | acc | 0.3049 | ± | 0.0064 | ||
| openbookqa | 1 | none | 5 | acc | 0.1620 | ± | 0.0165 |
| none | 5 | acc_norm | 0.2900 | ± | 0.0203 | ||
| piqa | 1 | none | 5 | acc | 0.6708 | ± | 0.0110 |
| none | 5 | acc_norm | 0.6676 | ± | 0.0110 | ||
| sciq | 1 | none | 5 | acc | 0.8630 | ± | 0.0109 |
| none | 5 | acc_norm | 0.8430 | ± | 0.0115 | ||
| wikitext | 2 | none | 5 | word_perplexity | 23.8136 | ± | N/A |
| none | 5 | byte_perplexity | 1.8091 | ± | N/A | ||
| none | 5 | bits_per_byte | 0.8553 | ± | N/A | ||
| winogrande | 1 | none | 5 | acc | 0.5272 | ± | 0.0140 |