Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,17 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
base_model:
|
| 4 |
+
- openai/gpt-oss-20b
|
| 5 |
+
library_name: transformers
|
| 6 |
+
---
|
| 7 |
+
|
| 8 |
+
> [!IMPORTANT]
|
| 9 |
+
> This repository is an **experimental** re-quantized version of the original model [`openai/gpt-oss-20b`](https://huggingface.co/openai/gpt-oss-20b).
|
| 10 |
+
>
|
| 11 |
+
> It requires development versions of `transformers` and `bitsandbytes`.
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
# Quantization
|
| 15 |
+
The MLP expert parameters have been dequantized from MXFP4 to BF16, and then requantized in the NF4 double-quantization format using an experimental `bnb_4bit_target_parameters` configuration option. The self-attention, routing, and embedding parameters are kept in BF16.
|
| 16 |
+
|
| 17 |
+
|