L3.3-70B-Animus-V12.0 — Quantized (compressed-tensors for vLLM)

This repository hosts quantized runtime builds of
Darkhn/Magistral-2509-24B-Animus-V12.0, repackaged for vLLM using the compressed-tensors format.

TL;DR
• Quantized package intended for vLLM with --quantization compressed-tensors.
• Two published branches: W4A16 (INT4/A16) and W8A16 (INT8/A16).
• Weight-only quantization; see “Quantization recipe” below.

Revisions & Branches

The main branch is a landing page (model card + links). All runnable artifacts live under per-revision branches.

main — placeholder / landing page
W4A16 — 4-bit weights / 16-bit activations builds and runtime assets
W8A16 — 8-bit weights / 16-bit activations builds

Quick links

What’s inside (per revision)

Sharded quantized weights (*.safetensors) + index (model.safetensors.index.json)
config.json with compressed-tensors metadata (weight_format, quantization, quantization_config, etc.)
Tokenizer artifacts (tokenizer.json, tokenizer.model, merges/vocab if applicable)
Optional: chat_template.jinja (inherits the parent finetune’s chat style)

Exact files vary by branch; see the Files and versions tab.

Quantization recipe (reference)

Shared choices

Format: compressed-tensors (vLLM runtime), weight-only quantization
Group size: 128 (typical for W4A16)
Ignored layers: lm_head kept in higher precision
Calibration: prose-heavy text (WikiText-style), seq_len ≈ 512–1024, tokenized with the parent chat template
Export: save_compressed=True to embed compressed-tensors metadata for vLLM

Branch specifics

W4A16 — INT4 weights / 16-bit activations (BF16/FP16 at runtime)
W8A16 — INT8 weights / 16-bit activations (higher memory, extra stability on some workloads)

Refer to each branch’s config.json for the exact parameters used.

Quickstart — vLLM (compressed-tensors)

Install vLLM (recent version recommended):

pip install vllm

Serve (adjust to your hardware):

CUDA_VISIBLE_DEVICES=0,1,2,3 \
vllm serve TheHouseOfTheDude/L3.3-70B-Animus-V12.0_Compressed-Tensors \
  --quantizat

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for TheHouseOfTheDude/L3.3-70B-Animus-V12.0_Compressed-Tensors

Base model

mistralai/Mistral-Small-3.1-24B-Base-2503

Finetuned

mistralai/Mistral-Small-3.2-24B-Instruct-2506

Finetuned

mistralai/Magistral-Small-2509

Finetuned

Darkhn/Magistral-2509-24B-Animus-V12.0

Quantized

(4)

this model