MiniTransformer v3

A small educational transformer model trained from scratch for text generation tasks.

Model Description

MiniTransformer is a compact transformer architecture designed for educational purposes and experimentation. The model is trained on question-answer pairs with various system prompts to demonstrate fundamental transformer capabilities.

This is an educational model - it's designed to help understand transformer architectures and training processes, not for production use.

Architecture

Parameters: 43.9M
Architecture: Decoder-only transformer
Embedding Dimension: 512
Attention Heads: 4
Layers: 4
Context Length: 128 tokens
Vocabulary: BERT tokenizer (30,522 tokens)

Training Details

Training Data

Generic question-answer pairs with diverse system prompts
Trained using sliding window approach with stride of 32
Train/test split: 90/10

Training Procedure

Optimizer: AdamW (fused, learning rate: 3e-4)
Batch Size: 128
Epochs: 50
Mixed Precision: FP16 (AMP enabled)
Hardware: NVIDIA A10 GPU
Final Train Loss: 0.0024

Framework

PyTorch 2.0+ with torch.compile() optimization
Transformers library tokenizer

Usage

import torch
from transformers import AutoTokenizer

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

# Load model (you'll need to download the checkpoint)
# model = MiniTransformer(...)
# model.load_state_dict(torch.load("checkpoint.pt"))

# Generate text
input_text = "Your prompt here"
input_ids = tokenizer.encode(input_text, return_tensors="pt")

# Generation code here

Downloads last month: -; Downloads are not tracked for this model. How to track

pierjoe
/

MiniTransformer