--- language: en tags: - text-generation - gemma - tinystories license: apache-2.0 datasets: - roneneldan/TinyStories --- # Gemma-3 270M Fine-tuned on TinyStories This is a custom implementation of Gemma-3 270M parameter model fine-tuned on the TinyStories dataset. ## Model Details - **Architecture**: Custom Gemma-3 with sliding window attention - **Parameters**: ~270M - **Training Dataset**: TinyStories - **Context Length**: 32,768 tokens - **Sliding Window**: 512 tokens ## Usage ```python # Note: This model requires the custom Gemma3Model class from the training notebook # You'll need to copy the model definition to use this model ``` ## Training Details - Trained for 150,000 steps - Final training loss: ~2.55 - Final validation loss: ~2.56