Huihui-gpt-oss-20b-BF16-abliterated - W8A16 Quantized Version

This is the W8A16 quantized version of the huihui-ai/Huihui-gpt-oss-20b-BF16-abliterated model. The model has been quantized using LLM Compressor with the MOE-specific quantization approach.

Model Details

Usage

This quantized model can be used with vLLM and other inference frameworks that support the compressed-tensors format.

# Example usage with vLLM (if supported)
from vllm import LLM
model = LLM("huihui-ai/Huihui-gpt-oss-20b-BF16-abliterated-W8A16")
output = model.generate("My name is")

Quantization Process

The model was quantized using the MOE-specific approach in LLM Compressor, which preserves full precision for sensitive gate layers while quantizing the rest of the network to W8A16.

Benefits

  • Reduced model size compared to the BF16 version
  • Maintains good performance despite quantization
  • Compatible with vLLM for efficient inference

License

This model is licensed under the Apache 2.0 license, same as the original model.

Downloads last month
112
Safetensors
Model size
20B params
Tensor type
BF16
·
I64
·
I32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for groxaxo/Huihui-gpt-oss-20b-BF16-abliterated-W8A16