--- license: mit datasets: - newmindai/RAGTruth-TR language: - tr - en metrics: - precision - recall - f1 - roc_auc base_model: - newmindai/TurkEmbed4STS pipeline_tag: token-classification --- # TurkEmbed4STS-HallucinationDetection ## Model Description **TurkEmbed4STS-HallucinationDetection** is a Turkish hallucination detection model based on the GTE-multilingual architecture, optimized for semantic textual similarity and adapted for hallucination detection. This model is part of the Turk-LettuceDetect suite, specifically designed for detecting hallucinations in Turkish Retrieval-Augmented Generation (RAG) applications. ## Model Details - **Model Type:** Token-level binary classifier for hallucination detection - **Base Architecture:** GTE-multilingual-base (TurkEmbed4STS) - **Language:** Turkish (tr) - **Training Dataset:** Machine-translated RAGTruth dataset (17,790 training instances) - **Context Length:** Up to 8,192 tokens - **Model Size:** ~305M parameters ## Intended Use ### Primary Use Cases - Hallucination detection in Turkish RAG systems - Token-level classification of supported vs. hallucinated content - Stable performance across diverse Turkish text generation tasks - Applications requiring consistent precision-recall balance ### Supported Tasks - Question Answering (QA) hallucination detection - Data-to-text generation verification - Text summarization fact-checking ## Performance ### Overall Performance (F1-Score) - **Whole Dataset:** 0.7666 - **Question Answering:** 0.7420 - **Data-to-text Generation:** 0.7797 - **Summarization:** 0.6123 ### Key Strengths - Most consistent performance across all task types - Stable behavior avoiding extreme precision-recall imbalances - Good semantic understanding from Turkish fine-tuning ## Training Details ### Training Data - **Dataset:** Machine-translated RAGTruth benchmark - **Size:** 17,790 training instances, 2,700 test instances - **Tasks:** Question answering (MS MARCO), data-to-text (Yelp), summarization (CNN/Daily Mail) - **Translation Model:** Google Gemma-3-27b-it ### Training Configuration - **Epochs:** 6 - **Learning Rate:** 1e-5 - **Batch Size:** 4 - **Hardware:** NVIDIA A100 40GB GPU - **Training Time:** ~2 hours - **Optimization:** Cross-entropy loss with token masking ### Pre-training Background - Built on GTE-multilingual-base architecture - Fine-tuned for NLI and STS tasks - Optimized for Turkish language understanding - Fine-tuned specifically for hallucination detection ## Technical Specifications ### Architecture Features - **Base Model:** GTE-multilingual encoder - **Specialization:** Turkish semantic textual similarity - **Maximum Sequence Length:** 8,192 tokens - **Classification Head:** Binary token-level classifier - **Embedding Dimension:** Based on GTE-multilingual architecture ### Input Format ``` Input: [CONTEXT] [QUESTION] [GENERATED_ANSWER] Output: Token-level binary labels (0=supported, 1=hallucinated) ``` ## Limitations and Biases ### Known Limitations - Lower performance on summarization tasks compared to structured tasks - Performance dependent on translation quality of training data - Smaller model size may limit complex reasoning capabilities - Optimized for Turkish but built on multilingual foundation ### Potential Biases - Translation artifacts from machine-translated training data - Bias toward semantic similarity patterns from STS pre-training - May favor shorter, more structured text over longer abstracts ## Usage ### Installation ```bash pip install lettucedetect ``` ### Basic Usage ```python from lettucedetect.models.inference import HallucinationDetector # Initialize the Turkish-specific hallucination detector detector = HallucinationDetector( method="transformer", model_path="newmindai/TurkEmbed4STS-HD" ) # Turkish context, question, and answer context = "İstanbul Türkiye'nin en büyük şehridir. Şehir 15 milyonluk nüfusla Avrupa'nın en kalabalık şehridir." question = "İstanbul'un nüfusu nedir? İstanbul Avrupa'nın en kalabalık şehri midir?" answer = "İstanbul'un nüfusu yaklaşık 16 milyondur ve Avrupa'nın en kalabalık şehridir." # Get span-level predictions (start/end indices, confidence scores) predictions = detector.predict( context=context, question=question, answer=answer, output_format="spans" ) print("Tespit Edilen Hallusinasyonlar:", predictions) # Örnek çıktı: # [{'start': 34, 'end': 57, 'confidence': 0.92, 'text': 'yaklaşık 16 milyondur'}] ``` ## Evaluation ### Benchmark Results Evaluated on machine-translated Turkish RAGTruth test set, showing the most consistent behavior across all three task types with stable precision-recall balance. **Example-level Results** **Token-level Results** ### Comparative Analysis - Most stable performance across diverse tasks - Consistent precision-recall balance (unlike models with extreme values) - Suitable for applications prioritizing reliability over peak performance ## Citation ```bibtex @inproceedings{turklettucedetect2025, title={Turk-LettuceDetect: A Hallucination Detection Models for Turkish RAG Applications}, author={NewMind AI Team}, booktitle={9th International Artificial Intelligence and Data Processing Symposium (IDAP'25)}, year={2025}, address={Malatya, Turkey} } ``` ## Related Work This model builds upon the TurkEmbed4STS model: ```bibtex @article{turkembed4sts, title={TurkEmbed4Retrieval: Turkish Embedding Model for Retrieval Task}, author={Ezerceli, Ö. and Gümüşçekicci, G. and Erkoç, T. and Özenc, B.}, journal={preprint}, year={2024} } ``` ## Original LettuceDetect Framework This model extends the LettuceDetect methodology: ```bibtex @misc{Kovacs:2025, title={LettuceDetect: A Hallucination Detection Framework for RAG Applications}, author={Ádám Kovács and Gábor Recski}, year={2025}, eprint={2502.17125}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2502.17125}, } ``` ## License This model is released under an open-source license to support research and development in Turkish NLP applications. ## Contact For questions about this model or other Turkish hallucination detection models, please refer to the original paper or contact the authors. ---