This is a MXFP4_MOE quantization of the model Qwen3-VL-30B-A3B-Instruct

Original model: https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Instruct

This GGUF quant has been made possible due to the excellent work from [yairpatch] (https://huggingface.co/yairpatch) and [Thireus] (https://huggingface.co/Thireus), and anyone else I forgot to mention

As of 2025-10-22 this is still experimental and should be treated as such.
In order to run it you must download a custom version of llama.cpp from here:
https://github.com/Thireus/llama.cpp/releases/tag/tr-qwen3-vl-6-b7106-495c611

Downloads last month
1,313
GGUF
Model size
31B params
Architecture
qwen3vlmoe
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for noctrex/Qwen3-VL-30B-A3B-Instruct-MXFP4_MOE-GGUF

Quantized
(18)
this model

Collections including noctrex/Qwen3-VL-30B-A3B-Instruct-MXFP4_MOE-GGUF