Instructions to use OsaurusAI/gemma-4-E2B-it-qat-MXFP4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use OsaurusAI/gemma-4-E2B-it-qat-MXFP4 with MLX:
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("OsaurusAI/gemma-4-E2B-it-qat-MXFP4") config = load_config("OsaurusAI/gemma-4-E2B-it-qat-MXFP4") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use OsaurusAI/gemma-4-E2B-it-qat-MXFP4 with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "OsaurusAI/gemma-4-E2B-it-qat-MXFP4"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "OsaurusAI/gemma-4-E2B-it-qat-MXFP4" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use OsaurusAI/gemma-4-E2B-it-qat-MXFP4 with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "OsaurusAI/gemma-4-E2B-it-qat-MXFP4"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default OsaurusAI/gemma-4-E2B-it-qat-MXFP4
Run Hermes
hermes
OsaurusAI/gemma-4-E2B-it-qat-MXFP4
MXFP4 MLX bundle converted from google/gemma-4-E2B-it-qat-q4_0-unquantized. Decoder linears are quantized with MLX mxfp4 at group size 32; embeddings, norms, and Gemma 4 early-fusion media embedders are preserved as fp16 passthrough.
Bundle
| Field | Value |
|---|---|
| Source | google/gemma-4-E2B-it-qat-q4_0-unquantized |
| Architecture | gemma4 / Gemma4ForConditionalGeneration |
| Text layers | 35 total (7 full attention, 28 sliding attention) |
| Hidden size | 1536 |
| Quantization | mxfp4, bits=4, group_size=32 |
| Quantized weights | 277 tensors with matching .scales sidecars |
| Shards | 4 safetensors shards |
| Indexed weight bytes | 3.73 GiB |
| Processor | processor_class=Gemma4Processor, image_seq_length=280, audio_seq_length=750, audio_ms_per_token=40, video_processor=present |
Modalities
| Path | Status |
|---|---|
| Text | Preserved |
| Vision | model_type=gemma4_vision, num_hidden_layers=16, hidden_size=768, patch_size=16 |
| Audio | model_type=gemma4_audio, num_hidden_layers=12, hidden_size=1024 |
| Video | no video_config |
Audio encoder/config is present and preserved.
No video_config is present in the source config; the processor file includes a video processor block, but this card does not claim a verified video runtime path.
Tokenizer And Template
| Field | Value |
|---|---|
| BOS token/id | <bos> / 2 |
| EOS token/id | <eos> / [1, 106, 50] |
| PAD token/id | <pad> / 0 |
| Suppress tokens | null |
| Chat template | chat_template.jinja, also folded into tokenizer_config.json |
| Tool parser metadata | gemma4 |
| Reasoning parser metadata | gemma4 |
The chat template keeps Gemma 4 turn/channel formatting and includes the required-tool-choice compatibility stanza used by vMLX/Osaurus runtimes. The empty no-thinking thought-channel prefill is removed so non-thinking turns start in visible assistant content.
Files To Keep Together
config.jsonjang_config.jsonmodel.safetensors.index.json- all
model-*.safetensorsshards tokenizer.jsontokenizer_config.jsonprocessor_config.jsongeneration_config.jsonchat_template.jinja
Loading
Use an MLX/vMLX runtime with Gemma 4 MXFP4 support. This bundle is not GGUF and should not be loaded with GGUF runtimes.
from mlx_vlm import load, generate
model, processor = load("OsaurusAI/gemma-4-E2B-it-qat-MXFP4")
Notes
This is a quantized derivative of Google's Gemma 4 QAT release. License and use restrictions follow the upstream Gemma terms. Packaged for Apple Silicon MLX/vMLX use. Contact: eric@osaurus.ai.
Bundle Metadata
This bundle metadata is source-derived: text=true, vision=true, audio=true, video=false. No video runtime path is claimed unless video_config is present.
- Downloads last month
- 66
Quantized
Model tree for OsaurusAI/gemma-4-E2B-it-qat-MXFP4
Base model
google/gemma-4-E2B