Instructions to use llmfan46/Gemma-4-Harmonia-31B-uncensored-heretic-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use llmfan46/Gemma-4-Harmonia-31B-uncensored-heretic-GGUF with Transformers:

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("llmfan46/Gemma-4-Harmonia-31B-uncensored-heretic-GGUF", dtype="auto")

llama-cpp-python

How to use llmfan46/Gemma-4-Harmonia-31B-uncensored-heretic-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="llmfan46/Gemma-4-Harmonia-31B-uncensored-heretic-GGUF",
	filename="Gemma-4-Harmonia-31B-uncensored-heretic-BF16.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use llmfan46/Gemma-4-Harmonia-31B-uncensored-heretic-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf llmfan46/Gemma-4-Harmonia-31B-uncensored-heretic-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf llmfan46/Gemma-4-Harmonia-31B-uncensored-heretic-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf llmfan46/Gemma-4-Harmonia-31B-uncensored-heretic-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf llmfan46/Gemma-4-Harmonia-31B-uncensored-heretic-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf llmfan46/Gemma-4-Harmonia-31B-uncensored-heretic-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf llmfan46/Gemma-4-Harmonia-31B-uncensored-heretic-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf llmfan46/Gemma-4-Harmonia-31B-uncensored-heretic-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf llmfan46/Gemma-4-Harmonia-31B-uncensored-heretic-GGUF:Q4_K_M

Use Docker

docker model run hf.co/llmfan46/Gemma-4-Harmonia-31B-uncensored-heretic-GGUF:Q4_K_M

LM Studio
Jan
Ollama
How to use llmfan46/Gemma-4-Harmonia-31B-uncensored-heretic-GGUF with Ollama:
```
ollama run hf.co/llmfan46/Gemma-4-Harmonia-31B-uncensored-heretic-GGUF:Q4_K_M
```

Unsloth Studio

How to use llmfan46/Gemma-4-Harmonia-31B-uncensored-heretic-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for llmfan46/Gemma-4-Harmonia-31B-uncensored-heretic-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for llmfan46/Gemma-4-Harmonia-31B-uncensored-heretic-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for llmfan46/Gemma-4-Harmonia-31B-uncensored-heretic-GGUF to start chatting

How to use llmfan46/Gemma-4-Harmonia-31B-uncensored-heretic-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf llmfan46/Gemma-4-Harmonia-31B-uncensored-heretic-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "llmfan46/Gemma-4-Harmonia-31B-uncensored-heretic-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use llmfan46/Gemma-4-Harmonia-31B-uncensored-heretic-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf llmfan46/Gemma-4-Harmonia-31B-uncensored-heretic-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default llmfan46/Gemma-4-Harmonia-31B-uncensored-heretic-GGUF:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use llmfan46/Gemma-4-Harmonia-31B-uncensored-heretic-GGUF with Docker Model Runner:
```
docker model run hf.co/llmfan46/Gemma-4-Harmonia-31B-uncensored-heretic-GGUF:Q4_K_M
```

Lemonade

How to use llmfan46/Gemma-4-Harmonia-31B-uncensored-heretic-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull llmfan46/Gemma-4-Harmonia-31B-uncensored-heretic-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Gemma-4-Harmonia-31B-uncensored-heretic-GGUF-Q4_K_M

List all available models

lemonade list

🚨⚠️ I HAVE REACHED HUGGING FACE'S FREE STORAGE LIMIT ⚠️🚨

I can no longer upload new models unless I can cover the cost of additional storage.
I host 70+ free models as an independent contributor and this work is unpaid.
Without your support, no more new models can be uploaded.

🎉 Patreon (Monthly) | ☕ Ko-fi (One-time)

Every contribution goes directly toward Hugging Face storage fees to keep models free for everyone.

91% fewer refusals (9/100 Uncensored vs 97/100 Original) while preserving model quality (0.0047 KL divergence).

❤️ Support My Work

Creating these models takes significant time, work and compute. If you find them useful consider supporting me:

Platform	Link	What you get
🎉 Patreon	Monthly support	Priority model requests
☕ Ko-fi	One-time tip	My eternal gratitude

Your help will motivate me and would go into further improving my workflow and coverings fees for storage, compute and may even help uncensoring bigger model with rental Cloud GPUs.

GGUF quantizations of llmfan46/Gemma-4-Harmonia-31B-it-uncensored-heretic.

This is a decensored version of virtuous7373/Gemma-4-Harmonia-31B, made using Heretic v1.2.0 with the Arbitrary-Rank Ablation (ARA) method

Abliteration parameters

Parameter	Value
start_layer_index	14
end_layer_index	55
preserve_good_behavior_weight	0.7754
steer_bad_behavior_weight	0.0001
overcorrect_relative_weight	0.9765
neighbor_count	14

Targeted components

attn.o_proj

Performance

Metric	This model	Original model (Gemma-4-Harmonia-31B)
KL divergence	0.0047	0 (by definition)
Refusals	✅ 9/100	❌ 97/100

Lower refusals indicate fewer content restrictions, while lower KL divergence indicates more closeness to the original model's baseline. Higher refusals cause more rejections, objections, pushbacks, lecturing, censorship, softening and deflections.

MMLU test results:

Original:

============================================================

Total questions: 7021
Correct: 6014
Accuracy: 0.8566 (85.66%)
Parse failures: 22

============================================================

Tested subject scores:

professional_law: 0.7592 (596/785)
moral_scenarios: 0.8394 (371/442)
miscellaneous: 0.9243 (354/383)
professional_psychology: 0.8797 (278/316)
high_school_psychology: 0.9593 (259/270)
high_school_macroeconomics: 0.9137 (180/197)
elementary_mathematics: 0.9239 (170/184)
moral_disputes: 0.8678 (151/174)
prehistory: 0.9128 (157/172)
philosophy: 0.8553 (136/159)
high_school_biology: 0.9605 (146/152)
professional_accounting: 0.7902 (113/143)
clinical_knowledge: 0.8929 (125/140)
high_school_microeconomics: 0.9632 (131/136)
nutrition: 0.8815 (119/135)
professional_medicine: 0.9104 (122/134)
conceptual_physics: 0.9062 (116/128)
high_school_mathematics: 0.5669 (72/127)
human_aging: 0.8448 (98/116)
security_studies: 0.8571 (96/112)
high_school_statistics: 0.8649 (96/111)
marketing: 0.9725 (106/109)
high_school_world_history: 0.9528 (101/106)
sociology: 0.9223 (95/103)
high_school_government_and_politics: 0.9406 (95/101)
high_school_geography: 0.9596 (95/99)
high_school_chemistry: 0.7835 (76/97)
high_school_us_history: 0.9053 (86/95)
virology: 0.5056 (45/89)
college_medicine: 0.8636 (76/88)
world_religions: 0.9205 (81/88)
high_school_physics: 0.7619 (64/84)
electrical_engineering: 0.8395 (68/81)
astronomy: 0.9241 (73/79)
logical_fallacies: 0.8816 (67/76)
high_school_european_history: 0.8904 (65/73)
anatomy: 0.8732 (62/71)
college_biology: 0.9844 (63/64)
human_sexuality: 0.8750 (56/64)
formal_logic: 0.7031 (45/64)
public_relations: 0.7213 (44/61)
international_law: 0.8667 (52/60)
college_physics: 0.7193 (41/57)
college_mathematics: 0.7818 (43/55)
econometrics: 0.7407 (40/54)
jurisprudence: 0.8302 (44/53)
high_school_computer_science: 0.9808 (51/52)
machine_learning: 0.8462 (44/52)
medical_genetics: 0.9020 (46/51)
global_facts: 0.5686 (29/51)
management: 0.8800 (44/50)
us_foreign_policy: 0.9800 (49/50)
college_chemistry: 0.6170 (29/47)
abstract_algebra: 0.7447 (35/47)
business_ethics: 0.8478 (39/46)
college_computer_science: 0.9333 (42/45)
computer_security: 0.8605 (37/43)

Heretic:

============================================================

Total questions: 7021
Correct: 5936
Accuracy: 0.8455 (84.55%)
Parse failures: 17

============================================================

Tested subject scores:

professional_law: 0.7121 (559/785)
moral_scenarios: 0.8281 (366/442)
miscellaneous: 0.9191 (352/383)
professional_psychology: 0.8703 (275/316)
high_school_psychology: 0.9593 (259/270)
high_school_macroeconomics: 0.9188 (181/197)
elementary_mathematics: 0.9348 (172/184)
moral_disputes: 0.8448 (147/174)
prehistory: 0.9128 (157/172)
philosophy: 0.8113 (129/159)
high_school_biology: 0.9605 (146/152)
professional_accounting: 0.7902 (113/143)
clinical_knowledge: 0.8786 (123/140)
high_school_microeconomics: 0.9559 (130/136)
nutrition: 0.8815 (119/135)
professional_medicine: 0.9030 (121/134)
conceptual_physics: 0.8828 (113/128)
high_school_mathematics: 0.5433 (69/127)
human_aging: 0.8448 (98/116)
security_studies: 0.8571 (96/112)
high_school_statistics: 0.8559 (95/111)
marketing: 0.9817 (107/109)
high_school_world_history: 0.9528 (101/106)
sociology: 0.9223 (95/103)
high_school_government_and_politics: 0.9406 (95/101)
high_school_geography: 0.9596 (95/99)
high_school_chemistry: 0.7835 (76/97)
high_school_us_history: 0.8947 (85/95)
virology: 0.5056 (45/89)
college_medicine: 0.8295 (73/88)
world_religions: 0.9205 (81/88)
high_school_physics: 0.7619 (64/84)
electrical_engineering: 0.8148 (66/81)
astronomy: 0.9367 (74/79)
logical_fallacies: 0.8947 (68/76)
high_school_european_history: 0.8630 (63/73)
anatomy: 0.8873 (63/71)
college_biology: 0.9844 (63/64)
human_sexuality: 0.8750 (56/64)
formal_logic: 0.7031 (45/64)
public_relations: 0.6885 (42/61)
international_law: 0.8667 (52/60)
college_physics: 0.7193 (41/57)
college_mathematics: 0.7455 (41/55)
econometrics: 0.7407 (40/54)
jurisprudence: 0.8113 (43/53)
high_school_computer_science: 0.9808 (51/52)
machine_learning: 0.8077 (42/52)
medical_genetics: 0.9020 (46/51)
global_facts: 0.5686 (29/51)
management: 0.8800 (44/50)
us_foreign_policy: 0.9600 (48/50)
college_chemistry: 0.6383 (30/47)
abstract_algebra: 0.7447 (35/47)
business_ethics: 0.8478 (39/46)
college_computer_science: 0.9333 (42/45)
computer_security: 0.8372 (36/43)

MMLU - Massive Multitask Language Understanding, multiple-choice questions across 57 subjects (math, history, law, medicine, etc.).

Quantizations

For the K-quants below, selected Gemma 4 attention and FFN tensors are kept at higher precision where useful.

These GGUFs preserve key Gemma 4 attention projection tensors at higher precision.

Q6_K, Q5_K_M, Q5_K_S, Q4_K_M, Q4_K_S Q3_K_LandQ3_K_Mkeep the main attention projection tensors asQ8_0`:
- attn_q
- attn_k
- attn_v
- attn_output

This helps preserve Gemma 4’s attention path at higher precision, especially for lower-bit quants, while avoiding large file-size increases from unnecessarily up-quantizing the largest MoE expert tensors.

Filename	Quant	Description
Gemma-4-Harmonia-31B-uncensored-heretic-BF16.gguf	BF16	Full precision
Gemma-4-Harmonia-31B-uncensored-heretic-Q8_0.gguf	Q8_0	Near-lossless, recommended
Gemma-4-Harmonia-31B-uncensored-heretic-Q6_K.gguf	Q6_K	Excellent quality
Gemma-4-Harmonia-31B-uncensored-heretic-Q5_K_M.gguf	Q5_K_M	Good balance
Gemma-4-Harmonia-31B-uncensored-heretic-Q5_K_S.gguf	Q5_K_S	Smaller Q5
Gemma-4-Harmonia-31B-uncensored-heretic-Q4_K_M.gguf	Q4_K_M	Good for limited VRAM
Gemma-4-Harmonia-31B-uncensored-heretic-Q4_K_S.gguf	Q4_K_S	Smaller Q4
Gemma-4-Harmonia-31B-uncensored-heretic-Q3_K_L.gguf	Q3_K_L	Low VRAM, decent quality
Gemma-4-Harmonia-31B-uncensored-heretic-Q3_K_M.gguf	Q3_K_M	Low VRAM, smaller

Vision Projector

Filename	Quant	Description
Gemma-4-Harmonia-31B-uncensored-heretic-mmproj-BF16.gguf	BF16	Native precision

A Vision Projector File is Required for vision/multimodal capabilities. Use alongside any quantization above.

Usage

Works with llama.cpp, LM Studio, Ollama, and other GGUF-compatible tools.

HARMONIA

The Greek goddess of harmony and concord.

Gemini Word Salad Initialization

Harmonious Synthesis

Harmonia is a high-dimensional 31-billion parameter merge of Gemma 4. By executing a meticulous three-phase fusion of seven elite foundation and specialized models, Harmonia demonstrates a targeted approach to deep neural consolidation, minimizing regression while amplifying unique capability boundaries.

Instead of simple linear blending, which often degrades logical coherence and dilutes nuanced behavior, Harmonia was sculpted using a combination of mathematical projections, covariance activation matching, and surgical synaptic pruning. The model appears pretty solid so far.

Multi-Stage Fusion Protocol

The lineage of Harmonia is constructed systematically, passing through three isolated mathematical states to layer capabilities cleanly.

Phase I

Nullspace Coherence Mapping

To anchor base capabilities, the primary Gemma-4-31B-Base is combined with the analytically rigorous GarnetV2-31B. Utilizing low-rank Singular Value Decomposition (SVD), the specialized donor features are projected entirely onto the mathematical null-space of the base weights. This prevents the creative delta vectors from distorting essential core intelligence, producing the stable platform clever-basename.

> Method: Null-Space Filtering

> Core Integrity Protection (Base Protect): Active (True)

> Targeted Active Rank Limit: 256

Phase II

Surgical Synaptic Gating

Next, our newly anchored base is layered with the highly independent cognitive engines MeroMero-31B and Gembrain-31B. We apply Context-Aware Binary Selection (CABS) to execute structured, localized parameter gating. By enforcing precise structural pruning ratios (retaining optimal synapses in 16:32 and 11:33 ratios), we weave complex creative reasoning directly into the core matrix without causing neural interference. The result is the highly expressive clever-intname.

> Method: Context-Aware Binary Selection (CABS)

> Structural Masking Ratio (MeroMero): 16 : 32 (Weight: 0.6)

> Structural Masking Ratio (Gembrain): 11 : 33 (Weight: 0.4)

> Default Sparse Gating Step: 8 : 32

Phase III

Covariance Activation Matching

In the final harmonization phase, the expressive clever-intname is combined with the narrative mastery of Equinox-31B, the creative depth of Fabled-Gemma4, and our primary conversational core Ortenzya-The-Creative-Wordsmith. Using data-free covariance estimation via task vectors, ACTMat reconstructs layer-wise input activation properties, solving for optimal projection weights in activation space. This resolves semantic alignment anomalies and delivers the unified output model.

> Method: ACTMat Activation Matching

> Task Vector Blending Covariance Limit: 16,384

> Epsilon Solver Regularizer: 1e-06

> Output Precision Profile: bfloat16

Methodological Innovations

Nullspace Projection

Instead of destroying structural logic via linear interpolation, this method extracts the base model's essential singular values. It projects specialized donor features orthogonally, preventing core capability degradation.

Context-Aware Binary Selection

A dynamic, high-fidelity neural filter. Applying structured magnitude masking at customizable N:M fractions removes low-signal synaptic weights, seamlessly layering domain specialization into active logical paths.

Activation Covariance Matching

Using Gram matrices computed directly from task vectors, ACTMat aligns semantic representations in the activation space rather than the parameter space. It dynamically falls back to robust pseudo-inverse SVD solvers when numerical anomalies arise.

Model Lineage & Ingredients

We extend our gratitude to the creators of the ancestral paths that intersect within Harmonia:

Ortenzya-The-Creative-Wordsmith

llmfan46.

Equinox-31B

LatitudeGames.

Merge Blueprint

The entire orchestration sequence is structured via a multi-stage MergeKit pipeline. Expand the block below to view the structural YAML recipes.

Show MergeKit Configuration

name: clever-basename

merge_method: nullspace
base_model: ./gemma-4-31B-base

models:
  - model: ./Gemma4-GarnetV2-31B
    parameters:
      weight: 1.0

parameters:
  protect_base: true
  nr: 256

tokenizer:
  source: base
chat_template: auto

dtype: float32
out_dtype: bfloat16
---
name: clever-intname
merge_method: cabs

base_model: ./clever-basename

models:
  - model: ./clever-basename

  - model: ./G4-MeroMero-31B-uncensored-heretic
    parameters:
      weight: 0.6
      n_val: 16
      m_val: 32
  - model: ./Gemma-4-Gembrain-31B-heretic
    parameters:
      weight: 0.4
      n_val: 11
      m_val: 33

default_n_val: 8
default_m_val: 32

pruning_order:
  - ./G4-MeroMero-31B-uncensored-heretic
  - ./Gemma-4-Gembrain-31B-heretic

dtype: float32
out_dtype: bfloat16

tokenizer:
  source: union

chat_template: auto
---
name: Harmonia

merge_method: actmat

base_model: ./gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic

models:
  - model: ./gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic
  - model: ./LatitudeGames-Equinox-31B
    parameters:
      weight: 1
  - model: ./clever-intname
    parameters:
      weight: 1
  - model: ./Fabled-Gemma4-31B
    parameters:
      weight: 1

parameters:
  epsilon: 1e-6

tokenizer:
  source: "union"

dtype: bfloat16
out_dtype: bfloat16

chat_template: auto

Symphony Contributors

I am grateful to the following individuals for their models, inspiration, and other contributions.:

Lambent ConicCat llmfan46 Arcee AI zerofata p-e-w Naphula Nimbz Latitude Games Blazed-Forge

And of course, every wonderful person on:

LocalLLaMA

A big thanks to Gemini-3.5-flash for creating this README alongside the word salads found within it. A special acknowledgment is extended to Google DeepMind for their contribution of the Gemma-4 foundation family to the open-weight ecosystem, representing the structural cornerstone of this merge and its constituents.

Downloads last month: 10,761

GGUF

Model size

31B params

Architecture

gemma4

Hardware compatibility

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for llmfan46/Gemma-4-Harmonia-31B-uncensored-heretic-GGUF

Base model

virtuous7373/Gemma-4-Harmonia-31B

Finetuned

llmfan46/Gemma-4-Harmonia-31B-uncensored-heretic

Quantized

(4)

this model