---
license: mit
datasets:
- newmindai/RAGTruth-TR
language:
- tr
- en
metrics:
- precision
- recall
- f1
- roc_auc
base_model:
- newmindai/TurkEmbed4STS
pipeline_tag: token-classification
---

# TurkEmbed4STS-HallucinationDetection

## Model Description

**TurkEmbed4STS-HallucinationDetection** is a Turkish hallucination detection model based on the GTE-multilingual architecture, optimized for semantic textual similarity and adapted for hallucination detection. This model is part of the Turk-LettuceDetect suite, specifically designed for detecting hallucinations in Turkish Retrieval-Augmented Generation (RAG) applications.

## Model Details

- **Model Type:** Token-level binary classifier for hallucination detection
- **Base Architecture:** GTE-multilingual-base (TurkEmbed4STS)
- **Language:** Turkish (tr)
- **Training Dataset:** Machine-translated RAGTruth dataset (17,790 training instances)
- **Context Length:** Up to 8,192 tokens
- **Model Size:** ~305M parameters

## Intended Use

### Primary Use Cases
- Hallucination detection in Turkish RAG systems
- Token-level classification of supported vs. hallucinated content
- Stable performance across diverse Turkish text generation tasks
- Applications requiring consistent precision-recall balance

### Supported Tasks
- Question Answering (QA) hallucination detection
- Data-to-text generation verification  
- Text summarization fact-checking

## Performance

### Overall Performance (F1-Score)
- **Whole Dataset:** 0.7666
- **Question Answering:** 0.7420
- **Data-to-text Generation:** 0.7797
- **Summarization:** 0.6123

### Key Strengths
- Most consistent performance across all task types
- Stable behavior avoiding extreme precision-recall imbalances
- Good semantic understanding from Turkish fine-tuning

## Training Details

### Training Data
- **Dataset:** Machine-translated RAGTruth benchmark
- **Size:** 17,790 training instances, 2,700 test instances
- **Tasks:** Question answering (MS MARCO), data-to-text (Yelp), summarization (CNN/Daily Mail)
- **Translation Model:** Google Gemma-3-27b-it

### Training Configuration
- **Epochs:** 6
- **Learning Rate:** 1e-5
- **Batch Size:** 4
- **Hardware:** NVIDIA A100 40GB GPU
- **Training Time:** ~2 hours
- **Optimization:** Cross-entropy loss with token masking

### Pre-training Background
- Built on GTE-multilingual-base architecture
- Fine-tuned for NLI and STS tasks
- Optimized for Turkish language understanding
- Fine-tuned specifically for hallucination detection

## Technical Specifications

### Architecture Features
- **Base Model:** GTE-multilingual encoder
- **Specialization:** Turkish semantic textual similarity
- **Maximum Sequence Length:** 8,192 tokens
- **Classification Head:** Binary token-level classifier
- **Embedding Dimension:** Based on GTE-multilingual architecture

### Input Format
```
Input: [CONTEXT] [QUESTION] [GENERATED_ANSWER]
Output: Token-level binary labels (0=supported, 1=hallucinated)
```

## Limitations and Biases

### Known Limitations
- Lower performance on summarization tasks compared to structured tasks
- Performance dependent on translation quality of training data
- Smaller model size may limit complex reasoning capabilities
- Optimized for Turkish but built on multilingual foundation

### Potential Biases
- Translation artifacts from machine-translated training data
- Bias toward semantic similarity patterns from STS pre-training
- May favor shorter, more structured text over longer abstracts

## Usage

### Installation
```bash
pip install lettucedetect
```

### Basic Usage
```python
from lettucedetect.models.inference import HallucinationDetector

# Initialize the Turkish-specific hallucination detector
detector = HallucinationDetector(
    method="transformer", 
    model_path="newmindai/TurkEmbed4STS-HD"
)

# Turkish context, question, and answer
context = "İstanbul Türkiye'nin en büyük şehridir. Şehir 15 milyonluk nüfusla Avrupa'nın en kalabalık şehridir."
question = "İstanbul'un nüfusu nedir? İstanbul Avrupa'nın en kalabalık şehri midir?"
answer = "İstanbul'un nüfusu yaklaşık 16 milyondur ve Avrupa'nın en kalabalık şehridir."

# Get span-level predictions (start/end indices, confidence scores)
predictions = detector.predict(
    context=context, 
    question=question, 
    answer=answer, 
    output_format="spans"
)

print("Tespit Edilen Hallusinasyonlar:", predictions)
# Örnek çıktı: 
# [{'start': 34, 'end': 57, 'confidence': 0.92, 'text': 'yaklaşık 16 milyondur'}]
```


## Evaluation

### Benchmark Results
Evaluated on machine-translated Turkish RAGTruth test set, showing the most consistent behavior across all three task types with stable precision-recall balance.

**Example-level Results**

<img 
  src="https://cdn-uploads.huggingface.co/production/uploads/683d4880e639f8d647355997/RejTWu3JNjH8t0teV1Txf.png" 
  width="1000" 
  style="object-fit: contain; margin: auto; display: block;"
/>
**Token-level Results**

<img 
  src="https://cdn-uploads.huggingface.co/production/uploads/683d4880e639f8d647355997/ECyrfN5Jv8fZSM0svxLXq.png" 
  width="500" 
  style="object-fit: contain; margin: auto; display: block;"
/>

### Comparative Analysis
- Most stable performance across diverse tasks
- Consistent precision-recall balance (unlike models with extreme values)
- Suitable for applications prioritizing reliability over peak performance

## Citation

```bibtex
@inproceedings{turklettucedetect2025,
  title={Turk-LettuceDetect: A Hallucination Detection Models for Turkish RAG Applications},
  author={NewMind AI Team},
  booktitle={9th International Artificial Intelligence and Data Processing Symposium (IDAP'25)},
  year={2025},
  address={Malatya, Turkey}
}
```

## Related Work

This model builds upon the TurkEmbed4STS model:
```bibtex
@article{turkembed4sts,
  title={TurkEmbed4Retrieval: Turkish Embedding Model for Retrieval Task},
  author={Ezerceli, Ö. and Gümüşçekicci, G. and Erkoç, T. and Özenc, B.},
  journal={preprint},
  year={2024}
}
```

## Original LettuceDetect Framework

This model extends the LettuceDetect methodology:
```bibtex
@misc{Kovacs:2025,
      title={LettuceDetect: A Hallucination Detection Framework for RAG Applications}, 
      author={Ádám Kovács and Gábor Recski},
      year={2025},
      eprint={2502.17125},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.17125}, 
}
```


## License

This model is released under an open-source license to support research and development in Turkish NLP applications.

## Contact

For questions about this model or other Turkish hallucination detection models, please refer to the original paper or contact the authors.

---