MiniTransformer v3

A small educational transformer model trained from scratch for text generation tasks.

Model Description

MiniTransformer is a compact transformer architecture designed for educational purposes and experimentation. The model is trained on question-answer pairs with various system prompts to demonstrate fundamental transformer capabilities.

This is an educational model - it's designed to help understand transformer architectures and training processes, not for production use.

Architecture

  • Parameters: 43.9M
  • Architecture: Decoder-only transformer
  • Embedding Dimension: 512
  • Attention Heads: 4
  • Layers: 4
  • Context Length: 128 tokens
  • Vocabulary: BERT tokenizer (30,522 tokens)

Training Details

Training Data

  • Generic question-answer pairs with diverse system prompts
  • Trained using sliding window approach with stride of 32
  • Train/test split: 90/10

Training Procedure

  • Optimizer: AdamW (fused, learning rate: 3e-4)
  • Batch Size: 128
  • Epochs: 50
  • Mixed Precision: FP16 (AMP enabled)
  • Hardware: NVIDIA A10 GPU
  • Final Train Loss: 0.0024

Framework

  • PyTorch 2.0+ with torch.compile() optimization
  • Transformers library tokenizer

Usage

import torch
from transformers import AutoTokenizer

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

# Load model (you'll need to download the checkpoint)
# model = MiniTransformer(...)
# model.load_state_dict(torch.load("checkpoint.pt"))

# Generate text
input_text = "Your prompt here"
input_ids = tokenizer.encode(input_text, return_tensors="pt")

# Generation code here
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train pierjoe/MiniTransformer