| language: en | |
| tags: | |
| - text-generation | |
| - gemma | |
| - tinystories | |
| license: apache-2.0 | |
| datasets: | |
| - roneneldan/TinyStories | |
| # Gemma-3 270M Fine-tuned on TinyStories | |
| This is a custom implementation of Gemma-3 270M parameter model fine-tuned on the TinyStories dataset. | |
| ## Model Details | |
| - **Architecture**: Custom Gemma-3 with sliding window attention | |
| - **Parameters**: ~270M | |
| - **Training Dataset**: TinyStories | |
| - **Context Length**: 32,768 tokens | |
| - **Sliding Window**: 512 tokens | |
| ## Usage | |
| ```python | |
| # Note: This model requires the custom Gemma3Model class from the training notebook | |
| # You'll need to copy the model definition to use this model | |
| ``` | |
| ## Training Details | |
| - Trained for 150,000 steps | |
| - Final training loss: ~2.55 | |
| - Final validation loss: ~2.56 | |