Osaurus AI

OsaurusAI/gemma-4-E2B-it-qat-MXFP4

MXFP4 MLX bundle converted from google/gemma-4-E2B-it-qat-q4_0-unquantized. Decoder linears are quantized with MLX mxfp4 at group size 32; embeddings, norms, and Gemma 4 early-fusion media embedders are preserved as fp16 passthrough.

Bundle

Field Value
Source google/gemma-4-E2B-it-qat-q4_0-unquantized
Architecture gemma4 / Gemma4ForConditionalGeneration
Text layers 35 total (7 full attention, 28 sliding attention)
Hidden size 1536
Quantization mxfp4, bits=4, group_size=32
Quantized weights 277 tensors with matching .scales sidecars
Shards 4 safetensors shards
Indexed weight bytes 3.73 GiB
Processor processor_class=Gemma4Processor, image_seq_length=280, audio_seq_length=750, audio_ms_per_token=40, video_processor=present

Modalities

Path Status
Text Preserved
Vision model_type=gemma4_vision, num_hidden_layers=16, hidden_size=768, patch_size=16
Audio model_type=gemma4_audio, num_hidden_layers=12, hidden_size=1024
Video no video_config

Audio encoder/config is present and preserved.

No video_config is present in the source config; the processor file includes a video processor block, but this card does not claim a verified video runtime path.

Tokenizer And Template

Field Value
BOS token/id <bos> / 2
EOS token/id <eos> / [1, 106, 50]
PAD token/id <pad> / 0
Suppress tokens null
Chat template chat_template.jinja, also folded into tokenizer_config.json
Tool parser metadata gemma4
Reasoning parser metadata gemma4

The chat template keeps Gemma 4 turn/channel formatting and includes the required-tool-choice compatibility stanza used by vMLX/Osaurus runtimes. The empty no-thinking thought-channel prefill is removed so non-thinking turns start in visible assistant content.

Files To Keep Together

  • config.json
  • jang_config.json
  • model.safetensors.index.json
  • all model-*.safetensors shards
  • tokenizer.json
  • tokenizer_config.json
  • processor_config.json
  • generation_config.json
  • chat_template.jinja

Loading

Use an MLX/vMLX runtime with Gemma 4 MXFP4 support. This bundle is not GGUF and should not be loaded with GGUF runtimes.

from mlx_vlm import load, generate

model, processor = load("OsaurusAI/gemma-4-E2B-it-qat-MXFP4")

Notes

This is a quantized derivative of Google's Gemma 4 QAT release. License and use restrictions follow the upstream Gemma terms. Packaged for Apple Silicon MLX/vMLX use. Contact: eric@osaurus.ai.

Bundle Metadata

This bundle metadata is source-derived: text=true, vision=true, audio=true, video=false. No video runtime path is claimed unless video_config is present.

Downloads last month
66
Safetensors
Model size
2B params
Tensor type
F16
·
U32
·
U8
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OsaurusAI/gemma-4-E2B-it-qat-MXFP4

Finetuned
(7)
this model