newmindai
/

TurkEmbed4STS-HD

+---
+license: mit
+datasets:
+- newmindai/RAGTruth-TR
+language:
+- tr
+- en
+metrics:
+- precision
+- recall
+- f1
+- roc_auc
+base_model:
+- newmindai/TurkEmbed4STS
+pipeline_tag: token-classification
+---
+# TurkEmbed4STS-HallucinationDetection
+## Model Description
+**TurkEmbed4STS-HallucinationDetection** is a Turkish hallucination detection model based on the GTE-multilingual architecture, optimized for semantic textual similarity and adapted for hallucination detection. This model is part of the Turk-LettuceDetect suite, specifically designed for detecting hallucinations in Turkish Retrieval-Augmented Generation (RAG) applications.
+## Model Details
+- **Model Type:** Token-level binary classifier for hallucination detection
+- **Base Architecture:** GTE-multilingual-base (TurkEmbed4STS)
+- **Language:** Turkish (tr)
+- **Training Dataset:** Machine-translated RAGTruth dataset (17,790 training instances)
+- **Context Length:** Up to 8,192 tokens
+- **Model Size:** ~135M parameters
+## Intended Use
+### Primary Use Cases
+- Hallucination detection in Turkish RAG systems
+- Token-level classification of supported vs. hallucinated content
+- Stable performance across diverse Turkish text generation tasks
+- Applications requiring consistent precision-recall balance
+### Supported Tasks
+- Question Answering (QA) hallucination detection
+- Data-to-text generation verification
+- Text summarization fact-checking
+## Performance
+### Overall Performance (F1-Score)
+- **Whole Dataset:** 0.7666
+- **Question Answering:** 0.7420
+- **Data-to-text Generation:** 0.7797
+- **Summarization:** 0.6123
+### Key Strengths
+- Most consistent performance across all task types
+- Stable behavior avoiding extreme precision-recall imbalances
+- Good semantic understanding from Turkish fine-tuning
+## Training Details
+### Training Data
+- **Dataset:** Machine-translated RAGTruth benchmark
+- **Size:** 17,790 training instances, 2,700 test instances
+- **Tasks:** Question answering (MS MARCO), data-to-text (Yelp), summarization (CNN/Daily Mail)
+- **Translation Model:** Google Gemma-3-27b-it
+### Training Configuration
+- **Epochs:** 6
+- **Learning Rate:** 1e-5
+- **Batch Size:** 4
+- **Hardware:** NVIDIA A100 40GB GPU
+- **Training Time:** ~2 hours
+- **Optimization:** Cross-entropy loss with token masking
+### Pre-training Background
+- Built on GTE-multilingual-base architecture
+- Fine-tuned for NLI and STS tasks
+- Optimized for Turkish language understanding
+- Fine-tuned specifically for hallucination detection
+## Technical Specifications
+### Architecture Features
+- **Base Model:** GTE-multilingual encoder
+- **Specialization:** Turkish semantic textual similarity
+- **Maximum Sequence Length:** 8,192 tokens
+- **Classification Head:** Binary token-level classifier
+- **Embedding Dimension:** Based on GTE-multilingual architecture
+### Input Format
+```
+Input: [CONTEXT] [QUESTION] [GENERATED_ANSWER]
+Output: Token-level binary labels (0=supported, 1=hallucinated)
+```
+## Limitations and Biases
+### Known Limitations
+- Lower performance on summarization tasks compared to structured tasks
+- Performance dependent on translation quality of training data
+- Smaller model size may limit complex reasoning capabilities
+- Optimized for Turkish but built on multilingual foundation
+### Potential Biases
+- Translation artifacts from machine-translated training data
+- Bias toward semantic similarity patterns from STS pre-training
+- May favor shorter, more structured text over longer abstracts
+## Usage
+### Installation
+```bash
+pip install lettucedetect
+```
+### Basic Usage
+```python
+from lettucedetect.models.inference import HallucinationDetector
+# Initialize the Turkish-specific hallucination detector
+detector = HallucinationDetector(
+    method="transformer",
+    model_path="newmindai/TurkEmbed4STS-HD"
+)
+# Turkish context, question, and answer
+context = "İstanbul Türkiye'nin en büyük şehridir. Şehir 15 milyonluk nüfusla Avrupa'nın en kalabalık şehridir."
+question = "İstanbul'un nüfusu nedir? İstanbul Avrupa'nın en kalabalık şehri midir?"
+answer = "İstanbul'un nüfusu yaklaşık 16 milyondur ve Avrupa'nın en kalabalık şehridir."
+# Get span-level predictions (start/end indices, confidence scores)
+predictions = detector.predict(
+    context=context,
+    question=question,
+    answer=answer,
+    output_format="spans"
+)
+print("Tespit Edilen Hallusinasyonlar:", predictions)
+# Örnek çıktı:
+# [{'start': 34, 'end': 57, 'confidence': 0.92, 'text': 'yaklaşık 16 milyondur'}]
+```
+## Evaluation
+### Benchmark Results
+Evaluated on machine-translated Turkish RAGTruth test set, showing the most consistent behavior across all three task types with stable precision-recall balance.
+**Example-level Results**
+<img
+  src="https://cdn-uploads.huggingface.co/production/uploads/683d4880e639f8d647355997/RejTWu3JNjH8t0teV1Txf.png"
+  width="1000"
+  style="object-fit: contain; margin: auto; display: block;"
+/>
+**Token-level Results**
+<img
+  src="https://cdn-uploads.huggingface.co/production/uploads/683d4880e639f8d647355997/ECyrfN5Jv8fZSM0svxLXq.png"
+  width="500"
+  style="object-fit: contain; margin: auto; display: block;"
+/>
+### Comparative Analysis
+- Most stable performance across diverse tasks
+- Consistent precision-recall balance (unlike models with extreme values)
+- Suitable for applications prioritizing reliability over peak performance
+## Citation
+```bibtex
+@inproceedings{turklettucedetect2025,
+  title={Turk-LettuceDetect: A Hallucination Detection Models for Turkish RAG Applications},
+  author={Authors Hidden for Review},
+  booktitle={9th International Artificial Intelligence and Data Processing Symposium (IDAP'25)},
+  year={2025},
+  address={Malatya, Turkey}
+}
+```
+## Related Work
+This model builds upon the TurkEmbed4STS model:
+```bibtex
+@article{turkembed4sts,
+  title={TurkEmbed4Retrieval: Turkish Embedding Model for Retrieval Task},
+  author={Ezerceli, Ö. and Gümüşçekicci, G. and Erkoç, T. and Özenc, B.},
+  journal={preprint},
+  year={2024}
+}
+```
+```bibtex
+@misc{Kovacs:2025,
+      title={LettuceDetect: A Hallucination Detection Framework for RAG Applications},
+      author={Ádám Kovács and Gábor Recski},
+      year={2025},
+      eprint={2502.17125},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2502.17125},
+}
+```
+## License
+This model is released under an open-source license to support research and development in Turkish NLP applications.
+## Contact
+For questions about this model or other Turkish hallucination detection models, please refer to the original paper or contact the authors.
+---
+**Note:** This model is optimized for stability and consistency across different Turkish RAG tasks, making it ideal for production environments where reliable performance is more important than peak metrics.