Adding optimization section in model card.
Browse files
README.md
CHANGED
|
@@ -7,7 +7,8 @@ inference: false
|
|
| 7 |
datasets:
|
| 8 |
- databricks/databricks-dolly-15k
|
| 9 |
---
|
| 10 |
-
# dolly-v2-7b Model Card
|
|
|
|
| 11 |
## Summary
|
| 12 |
|
| 13 |
Databricks’ `dolly-v2-7b`, an instruction-following large language model trained on the Databricks machine learning platform
|
|
@@ -27,6 +28,22 @@ running inference for various GPU configurations.
|
|
| 27 |
|
| 28 |
**Owner**: Databricks, Inc.
|
| 29 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
## Model Overview
|
| 31 |
`dolly-v2-7b` is a 6.9 billion parameter causal language model created by [Databricks](https://databricks.com/) that is derived from
|
| 32 |
[EleutherAI’s](https://www.eleuther.ai/) [Pythia-6.9b](https://huggingface.co/EleutherAI/pythia-6.9b) and fine-tuned
|
|
@@ -173,4 +190,6 @@ but a robust statement as to the sources of these variations requires further st
|
|
| 173 |
| databricks/dolly-v1-6b | 0.41 | 0.62963 | 0.643252 | 0.676758 | 0.384812 | 0.773667 | 0.687768 | 0.583431 |
|
| 174 |
| EleutherAI/gpt-neox-20b | 0.402 | 0.683923 | 0.656669 | 0.7142 | 0.408703 | 0.784004 | 0.695413 | 0.602236 |
|
| 175 |
|
| 176 |
-
# Happy Hacking!
|
|
|
|
|
|
|
|
|
| 7 |
datasets:
|
| 8 |
- databricks/databricks-dolly-15k
|
| 9 |
---
|
| 10 |
+
# dolly-v2-7b Olive Optimized Model Card
|
| 11 |
+
|
| 12 |
## Summary
|
| 13 |
|
| 14 |
Databricks’ `dolly-v2-7b`, an instruction-following large language model trained on the Databricks machine learning platform
|
|
|
|
| 28 |
|
| 29 |
**Owner**: Databricks, Inc.
|
| 30 |
|
| 31 |
+
## Olive Optimization
|
| 32 |
+
|
| 33 |
+
This repo hosts model files that may be loaded as an [`ORTModelForCausalLM`](https://github.com/huggingface/optimum/blob/a6951c17c3450e1dea99617aa842334f4e904392/optimum/onnxruntime/modeling_decoder.py#L623) when using Python with [🤗 Optimum](https://huggingface.co/docs/optimum/onnxruntime/overview). Alternatively, the ONNX models may be composed into a custom pipeline in any language that supports ONNX Runtime & DirectML. If you choose to use ONNX Runtime & DirectML outside of Python, then you will need to provide your own implementation of the tokenizer.
|
| 34 |
+
|
| 35 |
+
| Model | Impl |
|
| 36 |
+
| ------------------------------- | ----------------------------------------------------------- |
|
| 37 |
+
| **dolly-v2-7b decoder merged with past** | **ONNX Model** |
|
| 38 |
+
| Tokenizer | `AutoTokenizer` (🤗 Transformers) |
|
| 39 |
+
|
| 40 |
+
The ONNX model above was processed with the [Olive](https://github.com/microsoft/olive) toolchain using the [Olive + Dolly V2 with DirectML Sample](https://github.com/microsoft/Olive/tree/main/examples/directml/dolly_v2). The Olive sample performs the following steps:
|
| 41 |
+
|
| 42 |
+
1. Run the [OptimumConversion Pass](https://microsoft.github.io/Olive/api/passes.html#optimumconversion)
|
| 43 |
+
2. Run the [OrtTransformersOptimization Pass](https://microsoft.github.io/Olive/api/passes.html#orttransformersoptimization), which leverages the [ONNX Runtime Transformer Model Optimization Tool](https://onnxruntime.ai/docs/performance/transformers-optimization.html). This step executes several time-consuming graph transformations, such as fusing subgraphs into LayerNorm.
|
| 44 |
+
3. Convert the optimized ONNX models from FLOAT32 to FLOAT16.
|
| 45 |
+
4. Run the [OptimumMerging Pass](https://microsoft.github.io/Olive/api/passes.html#optimummerging) to leverage caching and reduce memory usage by merging the decoder_model.onnx and decoder_with_past_model.onnx models together.
|
| 46 |
+
|
| 47 |
## Model Overview
|
| 48 |
`dolly-v2-7b` is a 6.9 billion parameter causal language model created by [Databricks](https://databricks.com/) that is derived from
|
| 49 |
[EleutherAI’s](https://www.eleuther.ai/) [Pythia-6.9b](https://huggingface.co/EleutherAI/pythia-6.9b) and fine-tuned
|
|
|
|
| 190 |
| databricks/dolly-v1-6b | 0.41 | 0.62963 | 0.643252 | 0.676758 | 0.384812 | 0.773667 | 0.687768 | 0.583431 |
|
| 191 |
| EleutherAI/gpt-neox-20b | 0.402 | 0.683923 | 0.656669 | 0.7142 | 0.408703 | 0.784004 | 0.695413 | 0.602236 |
|
| 192 |
|
| 193 |
+
# Happy Hacking!
|
| 194 |
+
|
| 195 |
+
This model is an optimized version of Databricks, Inc. [databricks/dolly-v2-7b](https://huggingface.co/databricks/dolly-v2-7b).
|