Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,58 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: llama3.1
|
| 3 |
+
tags:
|
| 4 |
+
- llmcompressor
|
| 5 |
+
- GPTQ
|
| 6 |
+
datasets:
|
| 7 |
+
- openerotica/erotiquant3
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
<p align="center">
|
| 11 |
+
<img width="120px" alt="Sentient Simulations Plumbob" src="https://www.sentientsimulations.com/transparent-plumbob2.png">
|
| 12 |
+
</p>
|
| 13 |
+
<p align="center"><a href="https://www.sentientsimulations.com/">[🏠Sentient Simulations]</a> | <a href="https://discord.com/invite/JTjbydmUAp">[Discord]</a> | <a href="https://www.patreon.com/SentientSims">[Patreon]</a>
|
| 14 |
+
<hr>
|
| 15 |
+
|
| 16 |
+
# Llama-3.1-70B-ArliAI-RPMax-v1.3-GPTQ
|
| 17 |
+
|
| 18 |
+
This repository contains a 4 bit GPTQ-quantized version of the [ArliAI Llama 3.1 70B model](https://huggingface.co/ArliAI/Llama-3.1-70B-ArliAI-RPMax-v1.3) using [llm-compressor](https://github.com/vllm-project/llm-compressor).
|
| 19 |
+
|
| 20 |
+
## Quantization Settings
|
| 21 |
+
|
| 22 |
+
| **Attribute** | **Value** |
|
| 23 |
+
|---------------------------------|------------------------------------------------------------------------------------|
|
| 24 |
+
| **Algorithm** | GPTQ |
|
| 25 |
+
| **Layers** | Linear |
|
| 26 |
+
| **Weight Scheme** | W4A16 |
|
| 27 |
+
| **Group Size** | 128 |
|
| 28 |
+
| **Calibration Dataset** | [openerotica/erotiquant3](https://huggingface.co/datasets/openerotica/erotiquant3) |
|
| 29 |
+
| **Calibration Sequence Length** | 4096 |
|
| 30 |
+
| **Calibration Samples** | 512 |
|
| 31 |
+
|
| 32 |
+
### Dataset Preprocessing
|
| 33 |
+
|
| 34 |
+
The dataset was preprocessed with the following steps:
|
| 35 |
+
1. Extract and structure the conversation data using role-based templates (`SYSTEM`, `USER`, `ASSISTANT`).
|
| 36 |
+
2. Convert the structured conversations into a tokenized format using the model's tokenizer.
|
| 37 |
+
3. Filter out sequences shorter than 4096 tokens.
|
| 38 |
+
4. Shuffle and select 512 samples for calibration.
|
| 39 |
+
|
| 40 |
+
## Quantization Process
|
| 41 |
+
|
| 42 |
+
View the shell and python script used to quantize this model.
|
| 43 |
+
|
| 44 |
+
4 A40s with 300gb of ram was rented on runpod.
|
| 45 |
+
|
| 46 |
+
Quantization took approximately 11 hours with a total of \$23.65 in compute costs. (And another \$70 of me screwing up the quants like 10 times but anyways...)
|
| 47 |
+
|
| 48 |
+
- [compress.sh](./compress.sh)
|
| 49 |
+
- [compress.py](./compress.py)
|
| 50 |
+
|
| 51 |
+
## Acknowledgments
|
| 52 |
+
|
| 53 |
+
- Base Model: [ArliAI/Llama-3.1-70B-ArliAI-RPMax-v1.3](https://huggingface.co/ArliAI/Llama-3.1-70B-ArliAI-RPMax-v1.3)
|
| 54 |
+
- Calibration Dataset: [openerotica/erotiquant3](https://huggingface.co/datasets/openerotica/erotiquant3)
|
| 55 |
+
- LLM Compressor: [llm-compressor](https://github.com/vllm-project/llm-compressor)
|
| 56 |
+
- Everyone subscribed to the [Sentient Simulations Patreon](https://www.patreon.com/SentientSims)
|
| 57 |
+
|
| 58 |
+

|