YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

GPT-2 Medium Somali β€” Text Generation Only

Short description: A Somali text generation model based on GPT‑2 Medium (β‰ˆ345M parameters). Optimized for general Somali generation: headlines, news‑style sentences, short stories, and assistant‑like completions. This repository is intended only for generation use cases.

Suggested model ID: FatihJimale/gpt2-medium-somali

πŸ”‘ Key facts

  • Architecture: GPT‑2 Medium (12 layers Γ— 1024 hidden size Γ— 16 heads, ~345M params)
  • Objective: causal language modeling (next‑token prediction)
  • Context length: 1024 tokens (set to your actual value if different)
  • Tokenizer: GPT‑2 BPE (fast)
  • Framework: πŸ€— Transformers
  • Precision: FP16/BF16 compatible at inference

βœ… Intended use

  • Somali text generation (stories, headlines, news‑style sentences, prompts)
  • Assistant‑style completions in Somali

⚠️ Limitations

  • May generate inaccurate, offensive, or biased content.
  • Not suitable for factual QA without verification.
  • Avoid safety‑critical usage.

πŸš€ Quick start (Transformers)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "FatihJimale/gpt2-medium-somali"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")

prompt = "qarax xoogan ayaa ka dhacay magaalada"
inputs = tok(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=80,
    do_sample=True,
    temperature=0.9,
    top_p=0.92,
    repetition_penalty=1.08,
)
print(tok.decode(outputs[0], skip_special_tokens=True))

Inference tips

  • If repetition appears, increase repetition_penalty to 1.1–1.2 or lower temperature (0.7–0.9).
  • For more focused generations, reduce max_new_tokens and set top_p around 0.9.
  • Deterministic output: do_sample=False, tune top_k=None, temperature=1.0.

πŸ”§ Model details

  • Training steps: β‰ˆ14,850 (completed at ~epoch 2.00)
  • Epochs: 2
  • Effective batch size: 64
  • Learning rate & schedule: final logged LR β‰ˆ 8.998e-10
  • Optimizer: AdamW (Ξ²1=0.9, Ξ²2=0.999)
  • Weight decay: 0.01
  • Mixed precision: bf16
  • Hardware: AWS ml.g5.24xlarge β€” 4Γ— NVIDIA A10 (24β€―GB each), 96 vCPU, 384β€―GiB RAM; data-parallel across 4 GPUs
  • Context length: 1024 tokens
  • Tokenizer: GPT‑2 BPE (fast) (no custom Somali tokenizer in this version)
  • Train date: 2025‑09‑25
  • Runtime: evaluation runtime β‰ˆ 1652.22 s (~27.5 min); overall training wall‑clock β‰ˆ 1.337 days (β‰ˆ 32 h 05 m 17 s)

Note: Dataset specifics and cleaning steps are intentionally not disclosed here, per the author's request. This card focuses on model size, parameters, and usage.

πŸ“Š Evaluation (please populate)

  • Train loss (last logged): 1.8449 @ step 14850 (~epoch 2.00)
  • Eval/validation loss: 1.78604
  • Perplexity (valid/test): 5.9658 (final recorded value @ 2025‑09‑25 09:06:42)
  • Eval runtime: 1652.22 s, 72.272 samples/s, 9.035 steps/s
  • Human eval notes: TBD (fluency, coherence)
  • Train loss: 1.8449
  • Eval/validation loss: 1.78604
  • Perplexity (valid/test): 5.9658
  • Human eval notes: TBD (fluency, coherence)

πŸ“ Repo layout

config.json
pytorch_model.bin  (or model.safetensors)
merges.txt
vocab.json
tokenizer.json
tokenizer_config.json
special_tokens_map.json (if any)
README.md (this file)

Tusaale (Isticmaal Somali)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "FatihJimale/gpt2-medium-somali"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")

prompt = "qarax xoogan ayaa ka dhacay magaalada"
inputs = tok(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=80, do_sample=True, temperature=0.9, top_p=0.92, repetition_penalty=1.08)
print(tok.decode(outputs[0], skip_special_tokens=True))

πŸ“£ Citation

@software{gpt2_medium_somali_2025,
  title        = {GPT-2 Medium Somali},
  author       = {Mohamed Abdirizak Ahmed},
  year         = {2025},
  url          = {https://huggingface.co/FatihJimale/gpt2-medium-somali}
}

πŸ” Safety

This model can produce hallucinations and harmful content. Use with content filters and human review. Do not use for medical, legal, or financial advice.

Downloads last month
367
Safetensors
Model size
0.4B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for FatihJimale/gpt2-medium-somali

Quantizations
1 model