Instructions to use GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP",
	filename="Qwen3.6-35B-A3B-NSC-ACE-SABER-MTP-F16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP:Q4_K_M

Use Docker

docker model run hf.co/GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP:Q4_K_M

LM Studio
Jan

vLLM

How to use GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP:Q4_K_M

Ollama
How to use GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP with Ollama:
```
ollama run hf.co/GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP:Q4_K_M
```

Unsloth Studio

How to use GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP to start chatting

How to use GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP:Q4_K_M

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP with Docker Model Runner:
```
docker model run hf.co/GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP:Q4_K_M
```

Lemonade

How to use GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP:Q4_K_M

Run and chat with the model

lemonade run user.Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP-Q4_K_M

List all available models

lemonade list

Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Support this work on Ko-fi

Qwen3.6-35B-A3B-NSC-ACE-SABER GGUF MTP

This repository hosts the MTP-oriented llama.cpp/GGUF builds for GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER. The source checkpoint is the full safetensors model in GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER.

These files are rebuilt separately from the first GGUF release using a fresh MTP-aware conversion path. Use these artifacts when your runtime supports Qwen MTP / multi-token prediction acceleration.

Image/video sidecars: This repository now includes the restored Qwen3.6 multimodal config, processor/preprocessor files, tokenizer/chat template, safetensors index, and model-vision-from-qwen3.6-base.safetensors visual tower sidecar. The existing .gguf binaries were not rewritten in this metadata-copy pass.

Current Status

Files are published only after the rebuilt F16 GGUF verifies as MTP/NextN-capable from its actual metadata/tensor layout. Qwen HF tensors named mtp.* are remapped by the MTP-aware llama.cpp converter into GGUF blk.*.nextn.* tensors plus the nextn_predict_layers metadata key. The upload worker refuses to publish quants until the verified F16 marker exists.

Release Snapshot

Item	Value
Source checkpoint	`GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER`
Base model	`Qwen/Qwen3.6-35B-A3B`
Format	GGUF for llama.cpp-compatible runtimes
Conversion target	MTP-aware GGUF export from the llama.cpp MTP branch
Quantization range	F16, Q8_0, Q6_K, Q5_K_M, Q5_K_S, Q4_K_M, Q4_K_S, Q3_K_L, Q3_K_M, Q3_K_S, Q2_K
Final source compliance	98.33% on HarmBench-300
Final source KLD	0.025383937664711
BFCL average plotted improvement	+2.87 percentage points

Benchmark Plots

35B SABER release gate

35B BFCL function calling

BFCL Tool-Calling Check

The source safetensors checkpoint was compared against Qwen/Qwen3.6-35B-A3B on a 40-case BFCL subset: 20 simple and 20 multiple-function prompts. GGUF files inherit from that checkpoint, but individual quants should be rechecked if exact tool-call behavior matters.

Metric	Base	NSC-ACE SABER source
Tool-call rate	92.50%	95.00%
Function name accuracy	92.50%	95.00%
Required argument name accuracy	90.00%	93.12%
Required argument value accuracy	79.79%	83.54%
Exact required-call accuracy	75.00%	77.50%

Available Files

File	Status	Notes
`Qwen3.6-35B-A3B-NSC-ACE-SABER-MTP-F16.gguf`	uploaded	Full GGUF conversion source / highest local fidelity
`Qwen3.6-35B-A3B-NSC-ACE-SABER-MTP-Q8_0.gguf`	uploaded	Near-full quality, large local file
`Qwen3.6-35B-A3B-NSC-ACE-SABER-MTP-Q6_K.gguf`	uploaded	High-quality local default if memory allows
`Qwen3.6-35B-A3B-NSC-ACE-SABER-MTP-Q5_K_M.gguf`	uploaded	Strong quality/size balance
`Qwen3.6-35B-A3B-NSC-ACE-SABER-MTP-Q5_K_S.gguf`	uploaded	Smaller Q5 option
`Qwen3.6-35B-A3B-NSC-ACE-SABER-MTP-Q4_K_M.gguf`	uploaded	Common balanced local target
`Qwen3.6-35B-A3B-NSC-ACE-SABER-MTP-Q4_K_S.gguf`	uploaded	Smaller Q4 option
`Qwen3.6-35B-A3B-NSC-ACE-SABER-MTP-Q3_K_L.gguf`	uploaded	Lower-memory Q3 option
`Qwen3.6-35B-A3B-NSC-ACE-SABER-MTP-Q3_K_M.gguf`	uploaded	Smaller Q3 balance
`Qwen3.6-35B-A3B-NSC-ACE-SABER-MTP-Q3_K_S.gguf`	uploaded	Small Q3 option
`Qwen3.6-35B-A3B-NSC-ACE-SABER-MTP-Q2_K.gguf`	uploaded	Minimum-size target; quality loss expected

The uploader refreshes this card as each artifact finishes. Uploaded non-F16 files are deleted from the build pod after upload to stay under the pod volume quota.

Which Quant Should I Use?

Quant	Best fit
F16	Maximum fidelity when disk/RAM are not a concern
Q8_0	Very high fidelity local inference
Q6_K	Recommended high-quality local starting point
Q5_K_M	Strong balance for quality and size
Q4_K_M	Practical default for constrained machines
Q3_K_M / Q3_K_S	Low-memory experiments
Q2_K	Smallest target; use only when memory is the hard constraint

For agentic/tool-calling workloads, prefer Q6_K, Q5_K_M, or Q4_K_M when possible. Very low quants can affect formatting, argument fidelity, and refusal calibration.

MTP Notes

These are separate MTP-oriented exports; do not assume the original GGUF repo exposes MTP behavior in runtimes that require MTP metadata/tensors.
MTP speedups depend on runtime support. Use a current llama.cpp build.
Quantized body weights keep blk.*.nextn.* tensors at Q8_0 where supported, because draft-head quality affects speculative acceptance.
The source model's final release metrics are measured before quantization.
Quantized files should be re-evaluated if exact compliance/KLD behavior matters.

Running With llama.cpp

llama-cli \
  -m Qwen3.6-35B-A3B-NSC-ACE-SABER-MTP-Q5_K_M.gguf \
  -c 32768 \
  -ngl 999 \
  -p "Write a compact tool plan for indexing a Python repo."

For OpenAI-compatible local serving:

llama-server \
  -m Qwen3.6-35B-A3B-NSC-ACE-SABER-MTP-Q5_K_M.gguf \
  -c 32768 \
  -ngl 999 \
  --jinja

Related Repositories

Full safetensors checkpoint: GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER
Non-MTP GGUF release: GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF
Base model: Qwen/Qwen3.6-35B-A3B

Downloads last month: 6,332

GGUF

Model size

36B params

Architecture

qwen35moe

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP

Quantizations

1 model

Collections including GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP

Qwen3.6-35B-A3B SFTs

Collection

Fine tunes and NSC-ACE versions of Qwen 3.6-35B-A3B • 4 items • Updated 29 days ago

NSC-ACE-SABER Pipeline

Collection

Full pipeline: SFT -> NSC-ACE -> SABER -> GGUF. No naive abliteration. • 4 items • Updated 23 days ago