--- license: gemma language: - en - it tags: - gemma - gemma3 - quantized - gguf - raspberry-pi - edge-ai - bilingual - ollama - offline model_type: text-generation inference: false --- # ๐Ÿง  Gemma3 Smart Q4 โ€” Bilingual Offline Assistant for Raspberry Pi **Gemma3 Smart Q4** is a quantized bilingual (Italianโ€“English) variant of Google's Gemma 3 1B model, specifically optimized for edge devices like the **Raspberry Pi 4 & 5**. It runs **completely offline** with Ollama or llama.cpp, ensuring **privacy and speed** without external dependencies. **Version**: v0.1.0 **Author**: Antonio (chill123) **Base Model**: [Google Gemma 3 1B IT](https://huggingface.co/google/gemma-3-1b-it) --- ## ๐Ÿ’ป Optimized for Raspberry Pi > โœ… **Tested on Raspberry Pi 4 (4GB)** โ€” average speed 3.56-3.67 tokens/s > โœ… **Fully offline** โ€” no external APIs, no internet required > โœ… **Lightweight** โ€” under 800 MB in Q4 quantization > โœ… **Bilingual** โ€” seamlessly switches between Italian and English --- ## ๐Ÿ” Key Features - ๐Ÿ—ฃ๏ธ **Bilingual AI** โ€” Automatically detects and responds in Italian or English - โšก **Edge-optimized** โ€” Fine-tuned parameters for low-power ARM devices - ๐Ÿ”’ **Privacy-first** โ€” All inference happens locally on your device - ๐Ÿงฉ **Two quantizations available**: - **Q4_0** (โ‰ˆ687 MB) โ†’ Default choice, 3% faster - **Q4_K_M** (โ‰ˆ769 MB) โ†’ Better quality for long conversations --- ## ๐Ÿ“Š Benchmark Results Tested on **Raspberry Pi 4 (4GB RAM)** with Ollama. | Model | Avg Speed | Size | Recommended For | |-------|-----------|------|-----------------| | **Q4_0** โญ | **3.67 tokens/s** | 687 MB | Default (chat, voice assistants) | | **Q4_K_M** | **3.56 tokens/s** | 769 MB | Long-form conversations | **Individual test results**: - Q4_0: 3.65, 3.67, 3.70 tokens/s - Q4_K_M: 3.71, 3.58, 3.40 tokens/s ### Benchmark Methodology Benchmark executed on **Raspberry Pi 4 (4GB)** using 3 bilingual prompts (mixed Italian/English). Average eval rate calculated from `eval rate:` logs only, excluding load and warm-up time. Runtime: Ollama 0.x on Raspberry Pi OS (Debian Bookworm). > **Recommendation**: Use **Q4_0** as default (3% faster, 82MB smaller, equivalent quality). Use **Q4_K_M** only if you need slightly better coherence in very long conversations (1000+ tokens). --- ## ๐Ÿ› ๏ธ Quick Start with Ollama ### Option 1: Pull from Hugging Face Create a `Modelfile`: ```bash cat > Modelfile <<'MODELFILE' FROM hf://chill123/antonio-gemma3-smart-q4/gemma3-1b-q4_0.gguf PARAMETER temperature 0.7 PARAMETER top_p 0.9 PARAMETER num_ctx 1024 PARAMETER num_thread 4 PARAMETER num_batch 32 PARAMETER repeat_penalty 1.05 PARAMETER stop "" PARAMETER stop "" SYSTEM """ You are an offline AI assistant running on a Raspberry Pi. Automatically detect the user's language (Italian or English) and respond in the same language. Be concise, practical, and helpful. If a task requires internet access or external services, clearly state this and suggest local alternatives when possible. Sei un assistente AI offline che opera su Raspberry Pi. Rileva automaticamente la lingua dell'utente (italiano o inglese) e rispondi nella stessa lingua. Sii conciso, pratico e utile. Se un compito richiede accesso a internet o servizi esterni, indicalo chiaramente e suggerisci alternative locali quando possibile. """ MODELFILE ``` Then run: ```bash ollama create antonio-gemma3-smart-q4 -f Modelfile ollama run antonio-gemma3-smart-q4 "Ciao! Chi sei?" ``` ### Option 2: Download and Use Locally ```bash # Download the model wget https://huggingface.co/chill123/antonio-gemma3-smart-q4/resolve/main/gemma3-1b-q4_0.gguf # Create Modelfile pointing to local file cat > Modelfile <<'MODELFILE' FROM ./gemma3-1b-q4_0.gguf PARAMETER temperature 0.7 PARAMETER top_p 0.9 PARAMETER num_ctx 1024 PARAMETER num_thread 4 PARAMETER num_batch 32 PARAMETER repeat_penalty 1.05 PARAMETER stop "" PARAMETER stop "" SYSTEM """ You are an offline AI assistant running on a Raspberry Pi. Automatically detect the user's language (Italian or English) and respond in the same language. Be concise, practical, and helpful. Sei un assistente AI offline su Raspberry Pi. Rileva automaticamente la lingua dell'utente (italiano o inglese) e rispondi nella stessa lingua. Sii conciso, pratico e utile. """ MODELFILE # Create and run ollama create antonio-gemma3-smart-q4 -f Modelfile ollama run antonio-gemma3-smart-q4 "Hello! Introduce yourself." ``` --- ## โš™๏ธ Recommended Settings (Raspberry Pi 4) For **optimal performance** on Raspberry Pi 4/5, use these parameters: | Parameter | Value | Description | |-----------|-------|-------------| | `num_ctx` | `512` - `1024` | Context length (512 for faster response, 1024 for longer conversations) | | `num_thread` | `4` | Utilize all 4 cores on Raspberry Pi 4 | | `num_batch` | `32` | Optimized for throughput on Pi | | `temperature` | `0.7` - `0.8` | Balanced creativity vs consistency | | `top_p` | `0.9` | Nucleus sampling for diverse responses | | `repeat_penalty` | `1.05` | Reduces repetitive outputs | **For voice assistants** or **real-time chat**, reduce `num_ctx` to `512` for faster responses. --- ## ๐Ÿ’ฌ Try These Prompts Test the bilingual capabilities with these examples: ### ๐Ÿ‡ฎ๐Ÿ‡น Italian ```bash ollama run antonio-gemma3-smart-q4 "Spiegami la differenza tra sensore IR e ultrasuoni in due frasi." ``` ```bash ollama run antonio-gemma3-smart-q4 "Come posso controllare un LED con GPIO su Raspberry Pi?" ``` ### ๐Ÿ‡ฌ๐Ÿ‡ง English ```bash ollama run antonio-gemma3-smart-q4 "Outline a 5-step plan to control a servo with GPIO on Raspberry Pi." ``` ```bash ollama run antonio-gemma3-smart-q4 "What are the best uses for a Raspberry Pi in home automation?" ``` ### ๐ŸŒ Code-switching ```bash ollama run antonio-gemma3-smart-q4 "Explain in English how to install Ollama, poi spiega in italiano come testare il modello." ``` --- ## ๐Ÿ“ฆ Files Included | File | SHA256 Checksum | Size | Description | |------|----------------|------|-------------| | `gemma3-1b-q4_0.gguf` | `d1d037446a2836db7666aa6ced3ce460b0f7f2ba61c816494a098bb816f2ad55` | 687 MB | Q4_0 quantization (recommended) | | `gemma3-1b-q4_k_m.gguf` | `c02d2e6f68fd34e9e66dff6a31d3f95fccb6db51f2be0b51f26136a85f7ec1f0` | 769 MB | Q4_K_M quantization (better quality) | --- ## ๐Ÿš€ Use Cases - **Privacy-focused personal assistant** โ€” All data stays on your device - **Offline home automation** โ€” Control IoT devices without cloud dependencies - **Educational projects** โ€” Learn AI/ML without expensive hardware - **Voice assistants** โ€” Fast enough for real-time speech interaction (3.67 t/s) - **Embedded systems** โ€” Industrial applications requiring offline inference - **Bilingual chatbots** โ€” Italian/English customer support, offline documentation --- ## ๐Ÿ”– License This model is a **derivative work** of [Google's Gemma 3 1B](https://huggingface.co/google/gemma-3-1b-it). **License**: Gemma License Please review and comply with the [Gemma License Terms](https://ai.google.dev/gemma/terms) before using this model. **Quantization, optimization, and bilingual configuration** by Antonio (chill123). For licensing questions regarding the base model, refer to Google's official Gemma documentation. --- ## ๐Ÿ”— Links - **GitHub Repository**: [antconsales/gemma3-smart-q4](https://github.com/antconsales/gemma3-smart-q4) โ€” Code, demos, benchmark scripts - **Original Model**: [Google Gemma 3 1B IT](https://huggingface.co/google/gemma-3-1b-it) - **Ollama Library**: Coming soon (pending submission) --- ## ๐Ÿ› ๏ธ Technical Details **Base Model**: Google Gemma 3 1B (instruction-tuned) **Quantization**: Q4_0 and Q4_K_M (llama.cpp) **Context Length**: 1024 tokens (configurable) **Vocabulary Size**: 262,144 tokens **Architecture**: Gemma3ForCausalLM **Supported Platforms**: Raspberry Pi 4/5, Mac M1/M2, Linux ARM64 --- ## ๐Ÿ“ Version History ### v0.1.0 (2025-10-21) - Initial release - Two quantizations: Q4_0 (687 MB) and Q4_K_M (769 MB) - Bilingual IT/EN support with automatic language detection - Optimized for Raspberry Pi 4 (3.56-3.67 tokens/s) - Tested on Raspberry Pi OS (Debian Bookworm) with Ollama --- **Built with โค๏ธ by Antonio ๐Ÿ‡ฎ๐Ÿ‡น** *Empowering privacy and edge computing, one model at a time.*