Qwen3-Next-80B-A3B-Thinking β MLX 6-bit (affine)
Apple MLX-optimized 6-bit affine-quantized checkpoint of the base model
Qwen/Qwen3-Next-80B-A3B-Thinking for local inference on Apple Silicon.
Key details
- Format: MLX runtime, safetensors sharded weights
- Quantization: affine int6, group_size=64
- Task: text generation / chat
- Tokenizer: provided via
tokenizer.json(BPE) withchat_template.jinja
Usage (MLX)
pip install mlx-lm
from mlx_lm import load, generate
repo_id = "abnormalmapstudio/Qwen3-Next-80B-A3B-Thinking-6bit-mlx"
model, tokenizer = load(repo_id)
out = generate(model, tokenizer, "List 5 creative dinner ideas.", max_tokens=200)
print(out)
Benchmarks
- Will be added after upload completes; see
scripts/bench/qwen_mxfp4_vs_int4.pyandscripts/bench/model_queue_eval.py.
License
- Apache-2.0 for this packaging. See
LICENSE. - Base model license and terms apply (Qwen/Qwen3-Next-80B-A3B-Thinking).
Model tree for abnormalmapstudio/Qwen3-Next-80B-A3B-Thinking-6bit-mlx
Base model
Qwen/Qwen3-Next-80B-A3B-Thinking