A newer version of this model is available: phmd/TinyStories-SRL-5M

TinyStories Word-Level LSTM (ONNX)

A compact 10.9 MB word-level LSTM language model trained on the TinyStories dataset.
Generates short, coherent children's stories with minimal compute.

Vocab size: 5,004 (word-level, includes <PAD>, <UNK>, <SOS>, <EOS>)
Architecture: 2-layer LSTM, 256 hidden units, 128-dim embeddings
Max sequence length: 50 tokens
Format: ONNX (compatible with ONNX Runtime on CPU/GPU)
Model size: ~11 MB

Trained on 500k TinyStories samples in under 5 minutes on a T4 GPU (x2).

Usage

1. Install dependencies

pip install onnxruntime numpy

2. Download files from this repo

3. Run inference

import numpy as np
import onnxruntime as ort

# --- Load vocabulary ---
with open("vocab.txt", "r") as f:
    vocab = [line.strip() for line in f]
word2idx = {word: idx for idx, word in enumerate(vocab)}
idx2word = {idx: word for word, idx in word2idx.items()}

# Special tokens
SOS_IDX = word2idx["<SOS>"]
EOS_IDX = word2idx["<EOS>"]
PAD_IDX = word2idx["<PAD>"]
UNK_IDX = word2idx["<UNK>"]

# --- Tokenizer (simple word-level) ---
def tokenize(text):
    import re
    # Lowercase and split punctuation
    text = re.sub(r'([.,!?])', r' \1 ', text.lower())
    return text.split()

# --- Load ONNX model ---
ort_session = ort.InferenceSession("tinystories_lstm.onnx")

# --- Text generation function ---
def generate_text(prompt, max_new_tokens=30, temperature=0.8):
    # Tokenize prompt
    tokens = tokenize(prompt)
    input_ids = [SOS_IDX] + [
        word2idx.get(t, UNK_IDX) for t in tokens
    ]
    
    # Pad to length 1 (we'll generate autoregressively)
    current_seq = input_ids.copy()
    
    for _ in range(max_new_tokens):
        # Pad current sequence to length 50 (model expects fixed length)
        padded = current_seq + [PAD_IDX] * (50 - len(current_seq))
        if len(padded) > 50:
            padded = padded[-50:]  # Truncate if too long
        
        input_tensor = np.array([padded], dtype=np.int64)
        
        # Run model
        outputs = ort_session.run(None, {"input": input_tensor})
        logits = outputs[0]  # Shape: (1, 50, 5004)
        
        # Get logits for last non-pad token
        last_pos = min(len(current_seq) - 1, 49)
        next_token_logits = logits[0, last_pos, :] / temperature
        
        # Softmax + sampling
        probs = np.exp(next_token_logits - np.max(next_token_logits))
        probs = probs / np.sum(probs)
        next_token = np.random.choice(len(probs), p=probs)
        
        if next_token == EOS_IDX:
            break
        current_seq.append(next_token)
    
    # Decode
    words = [idx2word[idx] for idx in current_seq[1:] if idx != PAD_IDX]
    return " ".join(words).replace(" .", ".").replace(" ,", ",")

# --- Example usage ---
if __name__ == "__main__":
    prompt = "once upon a time"
    story = generate_text(prompt, max_new_tokens=40, temperature=0.7)
    print(f"Prompt: {prompt}")
    print(f"Story:  {story}")

Example Output

Prompt: once upon a time
Story:  once upon a time there was a little girl named lily. she loved to play in the garden. one day she found a magic flower that could talk!

Training Details

Dataset: roneneldan/TinyStories (500k training samples)
Optimizer: Adam (lr=0.002)
Batch size: 128
Epochs: 2
Hardware: NVIDIA T4 (x2)
Training time: ~5 minutes

phmd
/

TinyStories-LSTM-5.5M