A newer version of this model is available:
phmd/TinyStories-SRL-5M
TinyStories Word-Level LSTM (ONNX)
A compact 10.9 MB word-level LSTM language model trained on the TinyStories dataset.
Generates short, coherent children's stories with minimal compute.
- Vocab size: 5,004 (word-level, includes
<PAD>,<UNK>,<SOS>,<EOS>) - Architecture: 2-layer LSTM, 256 hidden units, 128-dim embeddings
- Max sequence length: 50 tokens
- Format: ONNX (compatible with ONNX Runtime on CPU/GPU)
- Model size: ~11 MB
Trained on 500k TinyStories samples in under 5 minutes on a T4 GPU (x2).
Usage
1. Install dependencies
pip install onnxruntime numpy
2. Download files from this repo
3. Run inference
import numpy as np
import onnxruntime as ort
# --- Load vocabulary ---
with open("vocab.txt", "r") as f:
vocab = [line.strip() for line in f]
word2idx = {word: idx for idx, word in enumerate(vocab)}
idx2word = {idx: word for word, idx in word2idx.items()}
# Special tokens
SOS_IDX = word2idx["<SOS>"]
EOS_IDX = word2idx["<EOS>"]
PAD_IDX = word2idx["<PAD>"]
UNK_IDX = word2idx["<UNK>"]
# --- Tokenizer (simple word-level) ---
def tokenize(text):
import re
# Lowercase and split punctuation
text = re.sub(r'([.,!?])', r' \1 ', text.lower())
return text.split()
# --- Load ONNX model ---
ort_session = ort.InferenceSession("tinystories_lstm.onnx")
# --- Text generation function ---
def generate_text(prompt, max_new_tokens=30, temperature=0.8):
# Tokenize prompt
tokens = tokenize(prompt)
input_ids = [SOS_IDX] + [
word2idx.get(t, UNK_IDX) for t in tokens
]
# Pad to length 1 (we'll generate autoregressively)
current_seq = input_ids.copy()
for _ in range(max_new_tokens):
# Pad current sequence to length 50 (model expects fixed length)
padded = current_seq + [PAD_IDX] * (50 - len(current_seq))
if len(padded) > 50:
padded = padded[-50:] # Truncate if too long
input_tensor = np.array([padded], dtype=np.int64)
# Run model
outputs = ort_session.run(None, {"input": input_tensor})
logits = outputs[0] # Shape: (1, 50, 5004)
# Get logits for last non-pad token
last_pos = min(len(current_seq) - 1, 49)
next_token_logits = logits[0, last_pos, :] / temperature
# Softmax + sampling
probs = np.exp(next_token_logits - np.max(next_token_logits))
probs = probs / np.sum(probs)
next_token = np.random.choice(len(probs), p=probs)
if next_token == EOS_IDX:
break
current_seq.append(next_token)
# Decode
words = [idx2word[idx] for idx in current_seq[1:] if idx != PAD_IDX]
return " ".join(words).replace(" .", ".").replace(" ,", ",")
# --- Example usage ---
if __name__ == "__main__":
prompt = "once upon a time"
story = generate_text(prompt, max_new_tokens=40, temperature=0.7)
print(f"Prompt: {prompt}")
print(f"Story: {story}")
Example Output
Prompt: once upon a time
Story: once upon a time there was a little girl named lily. she loved to play in the garden. one day she found a magic flower that could talk!
Training Details
- Dataset:
roneneldan/TinyStories(500k training samples) - Optimizer: Adam (lr=0.002)
- Batch size: 128
- Epochs: 2
- Hardware: NVIDIA T4 (x2)
- Training time: ~5 minutes
Limitations
- Word-level modeling → cannot handle out-of-vocabulary words well
- Fixed context window (50 tokens)
- No beam search (uses basic sampling)
- Downloads last month
- 16