microsoft
/

dolly-v2-7b-olive-optimized

@@ -7,7 +7,8 @@ inference: false
 datasets:
 - databricks/databricks-dolly-15k
 ---
-# dolly-v2-7b Model Card
 ## Summary
 Databricks’ `dolly-v2-7b`, an instruction-following large language model trained on the Databricks machine learning platform
@@ -27,6 +28,22 @@ running inference for various GPU configurations.
 **Owner**: Databricks, Inc.
 ## Model Overview
 `dolly-v2-7b` is a 6.9 billion parameter causal language model created by [Databricks](https://databricks.com/) that is derived from
 [EleutherAI’s](https://www.eleuther.ai/) [Pythia-6.9b](https://huggingface.co/EleutherAI/pythia-6.9b) and fine-tuned
@@ -173,4 +190,6 @@ but a robust statement as to the sources of these variations requires further st
 | databricks/dolly-v1-6b            |        0.41  |   0.62963  |     0.643252 |    0.676758 |        0.384812 | 0.773667 | 0.687768 | 0.583431 |
 | EleutherAI/gpt-neox-20b           |        0.402 |   0.683923 |     0.656669 |    0.7142   |        0.408703 | 0.784004 | 0.695413 | 0.602236 |
-# Happy Hacking!

 datasets:
 - databricks/databricks-dolly-15k
 ---
+# dolly-v2-7b Olive Optimized Model Card
 ## Summary
 Databricks’ `dolly-v2-7b`, an instruction-following large language model trained on the Databricks machine learning platform
 **Owner**: Databricks, Inc.
+## Olive Optimization
+This repo hosts model files that may be loaded as an [`ORTModelForCausalLM`](https://github.com/huggingface/optimum/blob/a6951c17c3450e1dea99617aa842334f4e904392/optimum/onnxruntime/modeling_decoder.py#L623) when using Python with [🤗 Optimum](https://huggingface.co/docs/optimum/onnxruntime/overview). Alternatively, the ONNX models may be composed into a custom pipeline in any language that supports ONNX Runtime & DirectML. If you choose to use ONNX Runtime & DirectML outside of Python, then you will need to provide your own implementation of the tokenizer.
+| Model                           | Impl                                                        |
+| ------------------------------- | ----------------------------------------------------------- |
+| **dolly-v2-7b decoder merged with past**    | **ONNX Model**                                  |
+| Tokenizer                       | `AutoTokenizer` (🤗 Transformers)                          |
+The ONNX model above was processed with the [Olive](https://github.com/microsoft/olive) toolchain using the [Olive + Dolly V2 with DirectML Sample](https://github.com/microsoft/Olive/tree/main/examples/directml/dolly_v2). The Olive sample performs the following steps:
+1. Run the [OptimumConversion Pass](https://microsoft.github.io/Olive/api/passes.html#optimumconversion)
+2. Run the [OrtTransformersOptimization Pass](https://microsoft.github.io/Olive/api/passes.html#orttransformersoptimization), which leverages the [ONNX Runtime Transformer Model Optimization Tool](https://onnxruntime.ai/docs/performance/transformers-optimization.html). This step executes several time-consuming graph transformations, such as fusing subgraphs into LayerNorm.
+3. Convert the optimized ONNX models from FLOAT32 to FLOAT16.
+4. Run the [OptimumMerging Pass](https://microsoft.github.io/Olive/api/passes.html#optimummerging) to leverage caching and reduce memory usage by merging the decoder_model.onnx and decoder_with_past_model.onnx models together.
 ## Model Overview
 `dolly-v2-7b` is a 6.9 billion parameter causal language model created by [Databricks](https://databricks.com/) that is derived from
 [EleutherAI’s](https://www.eleuther.ai/) [Pythia-6.9b](https://huggingface.co/EleutherAI/pythia-6.9b) and fine-tuned
 | databricks/dolly-v1-6b            |        0.41  |   0.62963  |     0.643252 |    0.676758 |        0.384812 | 0.773667 | 0.687768 | 0.583431 |
 | EleutherAI/gpt-neox-20b           |        0.402 |   0.683923 |     0.656669 |    0.7142   |        0.408703 | 0.784004 | 0.695413 | 0.602236 |
+# Happy Hacking!
+This model is an optimized version of Databricks, Inc. [databricks/dolly-v2-7b](https://huggingface.co/databricks/dolly-v2-7b).