---
language:
- en
- it
license: mit
tags:
- text-generation
- educational
- transformer
- pytorch
- safetensors
pipeline_tag: text-generation
datasets:
- roneneldan/TinyStories
---
# MiniTransformer v3

A small educational transformer model trained from scratch for text generation tasks.

## Model Description

MiniTransformer is a compact transformer architecture designed for educational purposes and experimentation. The model is trained on question-answer pairs with various system prompts to demonstrate fundamental transformer capabilities.

**This is an educational model** - it's designed to help understand transformer architectures and training processes, not for production use.

## Architecture

- **Parameters:** 43.9M
- **Architecture:** Decoder-only transformer
- **Embedding Dimension:** 512
- **Attention Heads:** 4
- **Layers:** 4
- **Context Length:** 128 tokens
- **Vocabulary:** BERT tokenizer (30,522 tokens)

## Training Details

### Training Data

- Generic question-answer pairs with diverse system prompts
- Trained using sliding window approach with stride of 32
- Train/test split: 90/10

### Training Procedure

- **Optimizer:** AdamW (fused, learning rate: 3e-4)
- **Batch Size:** 128
- **Epochs:** 50
- **Mixed Precision:** FP16 (AMP enabled)
- **Hardware:** NVIDIA A10 GPU
- **Final Train Loss:** 0.0024

### Framework

- PyTorch 2.0+ with `torch.compile()` optimization
- Transformers library tokenizer

## Usage
```python
import torch
from transformers import AutoTokenizer

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

# Load model (you'll need to download the checkpoint)
# model = MiniTransformer(...)
# model.load_state_dict(torch.load("checkpoint.pt"))

# Generate text
input_text = "Your prompt here"
input_ids = tokenizer.encode(input_text, return_tensors="pt")

# Generation code here