olmostories-8m
TinyStories trained on OLMo 2 architecture.
Took around 2.5 hours on an A100 (80GB).
config = Olmo2Config(
vocab_size=5000,
hidden_size=288,
intermediate_size=720,
num_hidden_layers=6,
num_attention_heads=6,
num_key_value_heads=6,
max_position_embeddings=256,
initializer_range=0.02,
attention_dropout=0.1,
)
- Downloads last month
- -