This repository is an experimental re-quantized version of the original model
openai/gpt-oss-20b.It requires development versions of
transformersandbitsandbytes.
Quantization
The MLP expert parameters have been dequantized from MXFP4 to BF16, and then requantized in the NF4 double-quantization format using an experimental bnb_4bit_target_parameters configuration option. The self-attention, routing, and embedding parameters are kept in BF16.
- Downloads last month
- 39
Model tree for mdouglas/gpt-oss-20b-bnb-nf4
Base model
openai/gpt-oss-20b