🧪 Gemma 4B — Distilled from Gemma 2

This repository contains a 4B-parameter distilled Gemma model, trained using knowledge distillation from a larger Gemma-2-9B-Instruct teacher.
The objective was to create a smaller, faster model that preserves strong instruction-following behavior while being practical for edge deployment and low-latency inference.

🔥 Overview

Teacher Model: google/gemma-2-9b-it
Student Model: vishalkhot/gemma-4b-distilled
Goal: Achieve near-teacher quality in a ~4B parameter footprint
Method: Logits-based distillation with temperature scaling, plus supervised tuning on curated instruction data

This model is ideal for:

Chat-based assistants
Reasoning over short/medium contexts
On-device inference (4B fits easily on a single modern GPU)
Serverless or low-cost API deployments

🧬 Distillation Process

Distillation was performed over a mixed instruction dataset, combining reasoning, multi-turn dialogue, tool-usage instructions, and diverse language tasks.
Training used a combination of:

🔹 KL divergence on teacher/student logits
🔹 Cross-entropy loss on reference outputs
🔹 Temperature scaling (T = 2–4)

Training Command Example

python distill.py \
  --teacher-model google/gemma-3-12b-it \
  --student-model google/gemma-3-4b-it\
  --train-file data/train.jsonl \
  --val-file data/val.jsonl \
  --output-dir gemma_4b_distilled \
  --batch-size 8 \
  --gradient-accumulation-steps 2 \
  --num-epochs 1 \
  --learning-rate 1e-5 \
  --warmup-steps 300 \
  --max-length 2048 \
  --temperature 2.5 \
  --alpha 0.5 \
  --save-steps 1000 \
  --eval-steps 200 \
  --logging-steps 10 \
  --bf16

Downloads last month: 21

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for vishalkhot/gemma-4b-distilled

Base model

google/gemma-3-12b-pt

Finetuned

google/gemma-3-12b-it

Finetuned

(139)

this model

vishalkhot
/

gemma-4b-distilled

🧪 Gemma 4B — Distilled from Gemma 2

🔥 Overview

🧬 Distillation Process

Training Command Example

Model tree for vishalkhot/gemma-4b-distilled

Dataset used to train vishalkhot/gemma-4b-distilled

Evaluation results