Upload custom Gemma-3 270M fine-tuned on TinyStories

44b75a6 verified 2 months ago

759 Bytes

	---
	language: en
	tags:
	- text-generation
	- gemma
	- tinystories
	license: apache-2.0
	datasets:
	- roneneldan/TinyStories
	---

	# Gemma-3 270M Fine-tuned on TinyStories

	This is a custom implementation of Gemma-3 270M parameter model fine-tuned on the TinyStories dataset.

	## Model Details
	- Architecture: Custom Gemma-3 with sliding window attention
	- Parameters: ~270M
	- Training Dataset: TinyStories
	- Context Length: 32,768 tokens
	- Sliding Window: 512 tokens

	## Usage

	```python
	# Note: This model requires the custom Gemma3Model class from the training notebook
	# You'll need to copy the model definition to use this model
	```

	## Training Details
	- Trained for 150,000 steps
	- Final training loss: ~2.55
	- Final validation loss: ~2.56