--- language: - en - it license: mit tags: - text-generation - educational - transformer - pytorch - safetensors pipeline_tag: text-generation datasets: - roneneldan/TinyStories --- # MiniTransformer v3 A small educational transformer model trained from scratch for text generation tasks. ## Model Description MiniTransformer is a compact transformer architecture designed for educational purposes and experimentation. The model is trained on question-answer pairs with various system prompts to demonstrate fundamental transformer capabilities. **This is an educational model** - it's designed to help understand transformer architectures and training processes, not for production use. ## Architecture - **Parameters:** 43.9M - **Architecture:** Decoder-only transformer - **Embedding Dimension:** 512 - **Attention Heads:** 4 - **Layers:** 4 - **Context Length:** 128 tokens - **Vocabulary:** BERT tokenizer (30,522 tokens) ## Training Details ### Training Data - Generic question-answer pairs with diverse system prompts - Trained using sliding window approach with stride of 32 - Train/test split: 90/10 ### Training Procedure - **Optimizer:** AdamW (fused, learning rate: 3e-4) - **Batch Size:** 128 - **Epochs:** 50 - **Mixed Precision:** FP16 (AMP enabled) - **Hardware:** NVIDIA A10 GPU - **Final Train Loss:** 0.0024 ### Framework - PyTorch 2.0+ with `torch.compile()` optimization - Transformers library tokenizer ## Usage ```python import torch from transformers import AutoTokenizer # Load tokenizer tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") # Load model (you'll need to download the checkpoint) # model = MiniTransformer(...) # model.load_state_dict(torch.load("checkpoint.pt")) # Generate text input_text = "Your prompt here" input_ids = tokenizer.encode(input_text, return_tensors="pt") # Generation code here