GGUF Heretic Multimodal

Gemma-4-12B-it-heretic-GGUF

English | 📖 中文文档

🧠 Uncensored · Encoder-Free Multimodal · Vision + Audio

Uncensored version of Google Gemma 4 12B IT, processed with Heretic ARA+LoRA abliteration. Encoder-free multimodal architecture supporting text, image, audio, and video inputs natively.

Base model: google/gemma-4-12B-it · Parameters: 11.95B · Layers: 48 · Context: 256K · Vocab: 262K

📢 Update (June 7, 2026)

Context length fix: all quantized files have been re-generated with correct 256K context metadata. Previously downloaded models were capped at 128K context. If you need 256K long context, please re-download.

⚙️ Heretic ARA+LoRA Abliteration

Method: Arbitrary-Rank Ablation with LoRA (ARA+LoRA) — Heretic v2

Trial: 177 / 200 (optimized)

Parameter Value
Layer Range 8 – 47 (40 of 48 layers)
preserve_good_behavior_weight 0.3894
steer_bad_behavior_weight 0.0002
overcorrect_relative_weight 0.6769
neighbor_count 12
KL Divergence 0.0550 (low = minimal behavior drift)
Refusals 15 / 100 (85% reduction from baseline)
📦 GGUF Quantization Files
File Type BPW Size
Gemma-4-12B-it-heretic-Q4_K_M.gguf Q4_K_M 4.95 6.9 GB
Gemma-4-12B-it-heretic-Q6_K.gguf Q6_K 6.50 9.2 GB
Gemma-4-12B-it-heretic-Q8_0.gguf Q8_0 8.50 12 GB
Gemma-4-12B-it-heretic-f16.gguf F16 16.00 23 GB

Quantized with llama.cpp b9535 (CUDA 13.3). Recommended: Q4_K_M for daily use, Q8_0 for quality-critical tasks.

🧠 Architecture

Gemma 4 12B uses a unified, encoder-free architecture — no separate vision or audio encoders. All modalities (text, image, audio, video) are processed directly by the decoder-only transformer via linear projection layers.

Feature Value
Architecture gemma4 (Dense, decoder-only)
Parameters 11.95B
Layers 48
Hidden Size 3840
Attention Heads 16 (GQA with 8 KV heads)
Context Length 256K tokens
Sliding Window 1024 (hybrid local + global attention)
Vocabulary 262,144 tokens
Modalities Text, Image, Audio, Video
License Apache 2.0
📊 Original Model Benchmarks (google/gemma-4-12B-it)

Reference scores from Google DeepMind. Heretic abliteration may slightly affect benchmark performance.

Benchmark Gemma 4 12B Gemma 3 27B
MMLU Pro 77.2% 67.6%
AIME 2026 (no tools) 77.5% 20.8%
LiveCodeBench v6 72.0% 29.1%
Codeforces ELO 1659 110
GPQA Diamond 78.8% 42.4%
MMMU Pro (Vision) 69.1% 49.7%
MATH-Vision 79.7% 46.0%
🚀 Recommended Sampling
Parameter Value
temperature 1.0
top_p 0.95
top_k 64
📝 Notes

• This is an abliterated (uncensored) model. It may produce content that the original model would refuse. Use responsibly.

• Original model: google/gemma-4-12B-it (Apache 2.0)

Downloads last month
2,960
GGUF
Model size
12B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SC117/Gemma-4-12B-it-heretic-GGUF

Quantized
(153)
this model