nvidia
/

retro-8b-instruct-4k

@@ -17,6 +17,16 @@ library_name: Megatron-LM
 # InstructRetro
 Retro [(Borgeaud et al., 2022)](https://arxiv.org/abs/2112.04426) is an autoregressive decoder-only language model (LM) pretrained with retrieval-augmentation.
 Retro features practical scalibility to support large-scale pretraining from scratch by retrieving from trillions of token.
 Pretraining with retrieval provides a more efficient storage mechanism of factual knowledge, when compared to storing factual knowledge implicitly within the network's parameters, thus largely reducing model parameters while achieving lower perplexity than standard GPT.
@@ -24,11 +34,7 @@ Retro also provides the flexibility to update the
 knowledge stored in LMs [(Wang et al., 2023a)](https://arxiv.org/abs/2304.06762)
 by updating the retrieval database without training LMs again.
-InstructRetro [(Wang et al., 2023b)](https://arxiv.org/abs/2310.07713) further scales up the size of Retro to 48B, featuring the largest LLM pretrained with retrieval (as of December 2023).
-The obtained foundation model, Retro 48B, largely outperforms the GPT counterpart in terms of perplexity.
-With instruction tuning on Retro, InstructRetro demonstrates significant improvement over the instruction tuned GPT on downstream tasks in the zero-shot setting. Specifically, the average improvement of InstructRetro is 7% over its GPT counterpart across 8 short-form QA tasks, and 10% over GPT across 4 challenging long-form QA tasks. We also find that one can ablate the encoder from InstructRetro architecture and directly use the InstructRetro decoder backbone as GPT, while achieving comparable results.
-## Model Overview
 ### License

 # InstructRetro
+[Documentation](https://github.com/NVIDIA/Megatron-LM/tree/InstructRetro/tools/retro) &ensp; [Paper](https://arxiv.org/abs/2310.07713) &ensp; [Evaluation Data](https://drive.google.com/drive/folders/1xw-N0LJR_lIWnH6BKzHIb49quVCS_V72?usp=drive_link) &ensp; [Model Weights](https://huggingface.co/collections/nvidia/instructretro-65837ea76b60651e01faec8d)
+InstructRetro [(Wang et al., 2023b)](https://arxiv.org/abs/2310.07713) scales up the size of Retro to 48B, featuring the largest LLM pretrained with retrieval (as of December 2023).
+The obtained foundation model, Retro 48B, largely outperforms the GPT counterpart in terms of perplexity.
+With instruction tuning on Retro, InstructRetro demonstrates significant improvement over the instruction tuned GPT on downstream tasks in the zero-shot setting. Specifically, the average improvement of InstructRetro is 7% over its GPT counterpart across 8 short-form QA tasks, and 10% over GPT across 4 challenging long-form QA tasks. We also find that one can ablate the encoder from InstructRetro architecture and directly use the InstructRetro decoder backbone as GPT, while achieving comparable results.
+**For more information about InstructRetro, check the [Documentation](https://github.com/NVIDIA/Megatron-LM/tree/InstructRetro/tools/retro)!**
+## Background
 Retro [(Borgeaud et al., 2022)](https://arxiv.org/abs/2112.04426) is an autoregressive decoder-only language model (LM) pretrained with retrieval-augmentation.
 Retro features practical scalibility to support large-scale pretraining from scratch by retrieving from trillions of token.
 Pretraining with retrieval provides a more efficient storage mechanism of factual knowledge, when compared to storing factual knowledge implicitly within the network's parameters, thus largely reducing model parameters while achieving lower perplexity than standard GPT.
 knowledge stored in LMs [(Wang et al., 2023a)](https://arxiv.org/abs/2304.06762)
 by updating the retrieval database without training LMs again.
+## Overview
 ### License