Qwen3-Next-80B-A3B-Thinking — MLX 6-bit (affine)

Apple MLX-optimized 6-bit affine-quantized checkpoint of the base model Qwen/Qwen3-Next-80B-A3B-Thinking for local inference on Apple Silicon.

Key details

Format: MLX runtime, safetensors sharded weights
Quantization: affine int6, group_size=64
Task: text generation / chat
Tokenizer: provided via tokenizer.json (BPE) with chat_template.jinja

Usage (MLX)

pip install mlx-lm

from mlx_lm import load, generate
repo_id = "abnormalmapstudio/Qwen3-Next-80B-A3B-Thinking-6bit-mlx"
model, tokenizer = load(repo_id)
out = generate(model, tokenizer, "List 5 creative dinner ideas.", max_tokens=200)
print(out)

Benchmarks

Will be added after upload completes; see scripts/bench/qwen_mxfp4_vs_int4.py and scripts/bench/model_queue_eval.py.

License

Apache-2.0 for this packaging. See LICENSE.
Base model license and terms apply (Qwen/Qwen3-Next-80B-A3B-Thinking).

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for abnormalmapstudio/Qwen3-Next-80B-A3B-Thinking-6bit-mlx

Base model

Qwen/Qwen3-Next-80B-A3B-Thinking

Finetuned

(5)

this model

Evaluation results

Metadata error: specify a dataset to view leaderboard