nurcunal/BEDAI-2.4B
Fine-tuned Turkish instruct model (law domain) based on nurcunal/BEDAI-2B, merged QLoRA adapters.
Usage (Transformers)
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
m = "nurcunal/BEDAI-2.4B"
tok = AutoTokenizer.from_pretrained(m, use_fast=True, trust_remote_code=True)
mdl = AutoModelForCausalLM.from_pretrained(m, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True)
if tok.pad_token_id is None and tok.eos_token_id is not None:
tok.pad_token_id = tok.eos_token_id
p = "<s>[SİSTEM]: Türk hukuku hakkında kısa ve net yanıt ver.\n[KULLANICI]: İdari yargıda yürütmenin durdurulması nedir?\n[ASİSTAN]:"
x = tok(p, return_tensors="pt").to(mdl.device)
y = mdl.generate(**x, max_new_tokens=200, temperature=0.7, top_p=0.9)
print(tok.decode(y[0], skip_special_tokens=True))
model-index:
- name: BEDAI-2.4B
results:
task: type: multiple-choice name: Exams (TR) dataset: name: exams_tr type: exams_tr args: {split: validation} metrics:
- name: accuracy_norm type: accuracy value: 32.31
task: type: question-answering-extractive name: TQuAD (TR) dataset: name: tquad type: tquad args: {split: validation} metrics:
- name: f1 type: f1 value: 23.5035
task: type: question-answering-extractive name: XQuAD (TR) dataset: name: xquad_tr type: xquad_tr args: {split: validation} metrics:
- name: f1 type: f1 value: 16.4439
task: type: text-classification name: Turkish PLU (overall) dataset: name: turkish_plu type: turkish_plu args: {split: test} metrics:
- name: accuracy_norm type: accuracy value: 51.26
Evaluation (CETVEL – Turkish subsets)
BEDAI-2B: MCQA 25.70, QA 17.97, TC 51.58
BEDAI-2.4B (this run, full): MCQA 32.31, QA 19.97 (mean of TQuAD/XQuAD-TR F1), TC 51.26
| Model | MCQA | QA | TC |
|---|---|---|---|
| BEDAI-2B | 25.70 | 17.97 | 51.58 |
| BEDAI-2.4B (this work) | 32.31 | 19.97 | 51.26 |
Setup: lm-evaluation-harness (CETVEL tasks), H100 80GB, bf16, SDPA attention, batch size 128, full dataset (no --limit).
| Model | MCQA | QA | TC |
|---|---|---|---|
| CohereLabs__aya-expanse-32b | 52.47 | 20.48 | 50.67 |
| CohereLabs__aya-expanse-8b | 44.09 | 0.19 | 50.03 |
| google__gemma-2-9b-it | 48.20 | 4.46 | 45.38 |
| google__gemma-3-12b-it | 52.66 | 10.26 | 54.38 |
| google__gemma-3-27b-it | 55.40 | 10.56 | 53.65 |
| google__gemma-3-4b-it | 42.33 | 8.22 | 46.15 |
| Kumru-2B (full) | 19.59 | 10.00 | 31.62 |
| Llama-3.1-8B-Instruct | 45.77 | 38.99 | 46.51 |
| Llama-3.3-70B-Instruct | 60.70 | 23.97 | 63.73 |
| meta-llama__Llama-3.2-11B-Vision-Instruct | 45.66 | 4.37 | 47.88 |
| meta-llama__Llama-3.2-3B-Instruct | 37.00 | 7.52 | 39.00 |
| Qwen__Qwen2-72B-Instruct | 61.27 | 0.83 | 60.47 |
| Qwen__Qwen2-7B-Instruct | 49.66 | 1.53 | 52.52 |
| Trendyol__Llama-3-Trendyol-LLM-8b-chat-v2.0 | 53.28 | 0.17 | 54.06 |
| Trendyol__Trendyol-LLM-7B-chat-v4.1.0 | 54.94 | 0.34 | 52.12 |
| ytu-ce-cosmos__Turkish-Gemma-9b-v0.1 | 51.85 | 11.11 | 46.97 |
| ytu-ce-cosmos__turkish-gpt2-large-750m-instruct-v0.1 | 35.20 | 0.28 | 52.77 |
Notes
• QA = mean F1 over TQuAD (TR) and XQuAD (TR) for this run.
- Downloads last month
- 2