TheBloke
/

Mixtral-8x7B-Instruct-v0.1-GPTQ

@@ -11,8 +11,34 @@ license: apache-2.0
 model_creator: Mistral AI_
 model_name: Mixtral 8X7B Instruct v0.1
 model_type: mixtral
-prompt_template: '[INST] {prompt} [/INST]  '
 quantized_by: TheBloke
 ---
 <!-- markdownlint-disable MD041 -->
@@ -40,15 +66,12 @@ quantized_by: TheBloke
 <!-- description start -->
 # Description
-This repo contains **EXPERIMENTAL** GPTQ model files for [Mistral AI_'s Mixtral 8X7B Instruct v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1).
-## Requires AutoGPTQ PR + transformers 4.36.0
-These files were made with, and will currently only work with, this AutoGPTQ PR: https://github.com/LaaZa/AutoGPTQ/tree/Mixtral-fix
-To test, please build AutoGPTQ from source using that PR.  You also need Transformers version 4.36.0, released December 11th.
-Transformers support has just arrived also via two PRs - and is expected in main Transformers + Optimum tomorrow (Dec 12th).
 Multiple GPTQ parameter permutations are provided; see Provided Files below for details of the options provided, their parameters, and the software used to create them.
@@ -56,7 +79,7 @@ Multiple GPTQ parameter permutations are provided; see Provided Files below for
 <!-- repositories-available start -->
 ## Repositories available
-* AWQ coming soon
 * [GPTQ models for GPU inference, with multiple quantisation parameter options.](https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ)
 * [2, 3, 4, 5, 6 and 8-bit GGUF models for CPU+GPU inference](https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF)
 * [Mistral AI_'s original unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)
@@ -67,9 +90,22 @@ Multiple GPTQ parameter permutations are provided; see Provided Files below for
 ```
 [INST] {prompt} [/INST]
 ```
 <!-- prompt-template end -->
 <!-- README_GPTQ.md-provided-files start -->
 ## Provided files, and GPTQ parameters
@@ -174,7 +210,11 @@ Note that using Git with HF repos is strongly discouraged. It will be much slowe
 <!-- README_GPTQ.md-text-generation-webui start -->
 ## How to easily download and use this model in [text-generation-webui](https://github.com/oobabooga/text-generation-webui)
-**WILL CURRENTLY ONLY WORK WITH AUTOGPTQ LOADER, WITH AUTOGPTQ COMPILED FROM PR LISTED ABOVE**
 Please make sure you're using the latest version of [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
@@ -199,6 +239,87 @@ It is strongly recommended to use the text-generation-webui one-click-installers
 <!-- README_GPTQ.md-text-generation-webui end -->
 <!-- footer start -->
 <!-- 200823 -->

 model_creator: Mistral AI_
 model_name: Mixtral 8X7B Instruct v0.1
 model_type: mixtral
+prompt_template: '[INST] {prompt} [/INST]
+  '
 quantized_by: TheBloke
+widget:
+- output:
+    text: 'Arr, shiver me timbers! Ye have a llama on yer lawn, ye say? Well, that
+      be a new one for me! Here''s what I''d suggest, arr:
+      1. Firstly, ensure yer safety. Llamas may look gentle, but they can be protective
+      if they feel threatened.
+      2. Try to make the area less appealing to the llama. Remove any food sources
+      or water that might be attracting it.
+      3. Contact local animal control or a wildlife rescue organization. They be the
+      experts and can provide humane ways to remove the llama from yer property.
+      4. If ye have any experience with animals, you could try to gently herd the
+      llama towards a nearby field or open space. But be careful, arr!
+      Remember, arr, it be important to treat the llama with respect and care. It
+      be a creature just trying to survive, like the rest of us.'
+  text: '[INST] You are a pirate chatbot who always responds with Arr and pirate speak!
+    There''s a llama on my lawn, how can I get rid of him? [/INST]'
 ---
 <!-- markdownlint-disable MD041 -->
 <!-- description start -->
 # Description
+This repo contains GPTQ model files for [Mistral AI_'s Mixtral 8X7B Instruct v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1).
+Mixtral GPTQs currently require:
+* Transformers 4.36.0 or later
+* either, AutoGPTQ 0.6 compiled from source, or
+* Transformers 4.37.0.dev0 compiled from Github with: `pip3 install git+https://github.com/huggingface/transformers`
 Multiple GPTQ parameter permutations are provided; see Provided Files below for details of the options provided, their parameters, and the software used to create them.
 <!-- repositories-available start -->
 ## Repositories available
+* [AWQ model(s) for GPU inference.](https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-AWQ)
 * [GPTQ models for GPU inference, with multiple quantisation parameter options.](https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ)
 * [2, 3, 4, 5, 6 and 8-bit GGUF models for CPU+GPU inference](https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF)
 * [Mistral AI_'s original unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)
 ```
 [INST] {prompt} [/INST]
 ```
 <!-- prompt-template end -->
+<!-- README_GPTQ.md-compatible clients start -->
+## Known compatible clients / servers
+GPTQ models are currently supported on Linux (NVidia/AMD) and Windows (NVidia only). macOS users: please use GGUF models.
+Mixtral GPTQs currently have special requirements - see Description above.
+<!-- README_GPTQ.md-compatible clients end -->
 <!-- README_GPTQ.md-provided-files start -->
 ## Provided files, and GPTQ parameters
 <!-- README_GPTQ.md-text-generation-webui start -->
 ## How to easily download and use this model in [text-generation-webui](https://github.com/oobabooga/text-generation-webui)
+**NOTE**: Requires:
+* Transformers 4.36.0, or Transformers 4.37.0.dev0 from Github
+* Either AutoGPTQ 0.6 compiled from source and `Loader: AutoGPTQ`,
+* or, `Loader: Transformers`, if you installed Transformers from Github: `pip3 install git+https://github.com/huggingface/transformers`
 Please make sure you're using the latest version of [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
 <!-- README_GPTQ.md-text-generation-webui end -->
+<!-- README_GPTQ.md-use-from-tgi start -->
+## Serving this model from Text Generation Inference (TGI)
+Not currently supported for Mixtral models.
+<!-- README_GPTQ.md-use-from-tgi end -->
+<!-- README_GPTQ.md-use-from-python start -->
+## Python code example: inference from this GPTQ model
+### Install the necessary packages
+Requires: Transformers 4.37.0.dev0 from Github, Optimum 1.16.0 or later, and AutoGPTQ 0.5.1 or later.
+```shell
+pip3 install --upgrade "git+https://github.com/huggingface/transformers" optimum
+# If using PyTorch 2.1 + CUDA 12.x:
+pip3 install --upgrade auto-gptq
+# or, if using PyTorch 2.1 + CUDA 11.x:
+pip3 install --upgrade auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/
+```
+If you are using PyTorch 2.0, you will need to install AutoGPTQ from source. Likewise if you have problems with the pre-built wheels, you should try building from source:
+```shell
+pip3 uninstall -y auto-gptq
+git clone https://github.com/PanQiWei/AutoGPTQ
+cd AutoGPTQ
+DISABLE_QIGEN=1 pip3 install .
+```
+### Example Python code
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
+model_name_or_path = "TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ"
+# To use a different branch, change revision
+# For example: revision="gptq-4bit-128g-actorder_True"
+model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
+                                             device_map="auto",
+                                             trust_remote_code=False,
+                                             revision="main")
+tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
+prompt = "Write a story about llamas"
+system_message = "You are a story writing assistant"
+prompt_template=f'''[INST] {prompt} [/INST]
+'''
+print("\n\n*** Generate:")
+input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
+output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)
+print(tokenizer.decode(output[0]))
+# Inference can also be done using transformers' pipeline
+print("*** Pipeline:")
+pipe = pipeline(
+    "text-generation",
+    model=model,
+    tokenizer=tokenizer,
+    max_new_tokens=512,
+    do_sample=True,
+    temperature=0.7,
+    top_p=0.95,
+    top_k=40,
+    repetition_penalty=1.1
+)
+print(pipe(prompt_template)[0]['generated_text'])
+```
+<!-- README_GPTQ.md-use-from-python end -->
+<!-- README_GPTQ.md-compatibility start -->
+## Compatibility
+The files provided are tested to work with AutoGPTQ 0.6 (compiled from source) and Transformers 4.37.0 (installed from Github).
+<!-- README_GPTQ.md-compatibility end -->
 <!-- footer start -->
 <!-- 200823 -->