---
language: en
tags:
- text-generation
- gemma
- tinystories
license: apache-2.0
datasets:
- roneneldan/TinyStories
---

# Gemma-3 270M Fine-tuned on TinyStories

This is a custom implementation of Gemma-3 270M parameter model fine-tuned on the TinyStories dataset.

## Model Details
- **Architecture**: Custom Gemma-3 with sliding window attention
- **Parameters**: ~270M
- **Training Dataset**: TinyStories
- **Context Length**: 32,768 tokens
- **Sliding Window**: 512 tokens

## Usage

```python
# Note: This model requires the custom Gemma3Model class from the training notebook
# You'll need to copy the model definition to use this model
```

## Training Details
- Trained for 150,000 steps
- Final training loss: ~2.55
- Final validation loss: ~2.56