This repository is an experimental re-quantized version of the original model openai/gpt-oss-20b.

It requires development versions of transformers and bitsandbytes.

Quantization

The MLP expert parameters have been dequantized from MXFP4 to BF16, and then requantized in the NF4 double-quantization format using an experimental bnb_4bit_target_parameters configuration option. The self-attention, routing, and embedding parameters are kept in BF16.

Downloads last month: 39

Safetensors

Model size

11B params

Tensor type

F32

BF16

Model tree for mdouglas/gpt-oss-20b-bnb-nf4

Base model

openai/gpt-oss-20b

Quantized

(115)

this model