Huihui-gpt-oss-20b-BF16-abliterated - W8A16 Quantized Version

This is the W8A16 quantized version of the huihui-ai/Huihui-gpt-oss-20b-BF16-abliterated model. The model has been quantized using LLM Compressor with the MOE-specific quantization approach.

Model Details

Original Model: huihui-ai/Huihui-gpt-oss-20b-BF16-abliterated
Quantization Method: W8A16 (8-bit weights, 16-bit activations)
Quantization Tool: LLM Compressor
Format: safetensors with compressed-tensors format

Usage

This quantized model can be used with vLLM and other inference frameworks that support the compressed-tensors format.

# Example usage with vLLM (if supported)
from vllm import LLM
model = LLM("huihui-ai/Huihui-gpt-oss-20b-BF16-abliterated-W8A16")
output = model.generate("My name is")

Quantization Process

The model was quantized using the MOE-specific approach in LLM Compressor, which preserves full precision for sensitive gate layers while quantizing the rest of the network to W8A16.

Benefits

Reduced model size compared to the BF16 version
Maintains good performance despite quantization
Compatible with vLLM for efficient inference

License

This model is licensed under the Apache 2.0 license, same as the original model.

Downloads last month: 112

Safetensors

Model size

20B params

Tensor type

BF16

I64

I32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for groxaxo/Huihui-gpt-oss-20b-BF16-abliterated-W8A16

Base model

openai/gpt-oss-20b

Finetuned

unsloth/gpt-oss-20b-BF16

Quantized

huihui-ai/Huihui-gpt-oss-20b-BF16-abliterated

Quantized

(14)

this model