🧠 LLaMA-50M Turkish news

Model Summary

Property Value
Architecture LLaMA (decoder-only transformer)
Parameters ~50M
Vocab size 32,768
Embedding dim 256
Hidden dim 2048
Layers 20
Attention heads 128
KV groups 64
Context length 256
Tokenizer turkish_tokenizer (alibayram)
Dataset habanoz/news-tr-1.8M
Tokens seen 372,679,971
Epochs 2
Batch size 64
Language Turkish 🇹🇷

Model Description

llama_50m_tr_tokenizer_news is a 50 million parameter Turkish language model trained from scratch on the habanoz/news-tr-1.8M dataset.
It was developed as a lightweight experimental model to explore Turkish-specific tokenization and morphology-aware pretraining using the custom turkish_tokenizer.

The model follows the LLaMA-style causal transformer architecture and was trained with a context length of 256 tokens over ~372M tokens in total.


Training Environment

Hardware: NVIDIA A100 GPU ~20 hours

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train AhmetSemih/llama_50m_pretrained_news_tr_tokenizer