YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
GPT-2 Medium Somali β Text Generation Only
Short description: A Somali text generation model based on GPTβ2 Medium (β345M parameters). Optimized for general Somali generation: headlines, newsβstyle sentences, short stories, and assistantβlike completions. This repository is intended only for generation use cases.
Suggested model ID:
FatihJimale/gpt2-medium-somali
π Key facts
- Architecture: GPTβ2 Medium (12 layers Γ 1024 hidden size Γ 16 heads, ~345M params)
- Objective: causal language modeling (nextβtoken prediction)
- Context length: 1024 tokens (set to your actual value if different)
- Tokenizer: GPTβ2 BPE (fast)
- Framework: π€ Transformers
- Precision: FP16/BF16 compatible at inference
β Intended use
- Somali text generation (stories, headlines, newsβstyle sentences, prompts)
- Assistantβstyle completions in Somali
β οΈ Limitations
- May generate inaccurate, offensive, or biased content.
- Not suitable for factual QA without verification.
- Avoid safetyβcritical usage.
π Quick start (Transformers)
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "FatihJimale/gpt2-medium-somali"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
prompt = "qarax xoogan ayaa ka dhacay magaalada"
inputs = tok(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=80,
do_sample=True,
temperature=0.9,
top_p=0.92,
repetition_penalty=1.08,
)
print(tok.decode(outputs[0], skip_special_tokens=True))
Inference tips
- If repetition appears, increase
repetition_penaltyto 1.1β1.2 or lowertemperature(0.7β0.9). - For more focused generations, reduce
max_new_tokensand settop_paround 0.9. - Deterministic output:
do_sample=False, tunetop_k=None,temperature=1.0.
π§ Model details
- Training steps: β14,850 (completed at ~epoch 2.00)
- Epochs: 2
- Effective batch size: 64
- Learning rate & schedule: final logged LR β 8.998e-10
- Optimizer: AdamW (Ξ²1=0.9, Ξ²2=0.999)
- Weight decay: 0.01
- Mixed precision: bf16
- Hardware: AWS
ml.g5.24xlargeβ 4Γ NVIDIA A10 (24β―GB each), 96 vCPU, 384β―GiB RAM; data-parallel across 4 GPUs - Context length: 1024 tokens
- Tokenizer: GPTβ2 BPE (fast) (no custom Somali tokenizer in this version)
- Train date: 2025β09β25
- Runtime: evaluation runtime β 1652.22 s (~27.5 min); overall training wallβclock β 1.337 days (β 32 h 05 m 17 s)
Note: Dataset specifics and cleaning steps are intentionally not disclosed here, per the author's request. This card focuses on model size, parameters, and usage.
π Evaluation (please populate)
- Train loss (last logged): 1.8449 @ step 14850 (~epoch 2.00)
- Eval/validation loss: 1.78604
- Perplexity (valid/test): 5.9658 (final recorded value @ 2025β09β25 09:06:42)
- Eval runtime: 1652.22 s, 72.272 samples/s, 9.035 steps/s
- Human eval notes: TBD (fluency, coherence)
- Train loss: 1.8449
- Eval/validation loss: 1.78604
- Perplexity (valid/test): 5.9658
- Human eval notes: TBD (fluency, coherence)
π Repo layout
config.json
pytorch_model.bin (or model.safetensors)
merges.txt
vocab.json
tokenizer.json
tokenizer_config.json
special_tokens_map.json (if any)
README.md (this file)
Tusaale (Isticmaal Somali)
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "FatihJimale/gpt2-medium-somali"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
prompt = "qarax xoogan ayaa ka dhacay magaalada"
inputs = tok(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=80, do_sample=True, temperature=0.9, top_p=0.92, repetition_penalty=1.08)
print(tok.decode(outputs[0], skip_special_tokens=True))
π£ Citation
@software{gpt2_medium_somali_2025,
title = {GPT-2 Medium Somali},
author = {Mohamed Abdirizak Ahmed},
year = {2025},
url = {https://huggingface.co/FatihJimale/gpt2-medium-somali}
}
π Safety
This model can produce hallucinations and harmful content. Use with content filters and human review. Do not use for medical, legal, or financial advice.
- Downloads last month
- 367
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support