nmmursit commited on
Commit
ad437cc
·
verified ·
1 Parent(s): 920a27e

Added Model Card

Browse files
Files changed (1) hide show
  1. README.md +217 -3
README.md CHANGED
@@ -1,3 +1,217 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - newmindai/RAGTruth-TR
5
+ language:
6
+ - tr
7
+ - en
8
+ metrics:
9
+ - precision
10
+ - recall
11
+ - f1
12
+ - roc_auc
13
+ base_model:
14
+ - newmindai/TurkEmbed4STS
15
+ pipeline_tag: token-classification
16
+ ---
17
+
18
+ # TurkEmbed4STS-HallucinationDetection
19
+
20
+ ## Model Description
21
+
22
+ **TurkEmbed4STS-HallucinationDetection** is a Turkish hallucination detection model based on the GTE-multilingual architecture, optimized for semantic textual similarity and adapted for hallucination detection. This model is part of the Turk-LettuceDetect suite, specifically designed for detecting hallucinations in Turkish Retrieval-Augmented Generation (RAG) applications.
23
+
24
+ ## Model Details
25
+
26
+ - **Model Type:** Token-level binary classifier for hallucination detection
27
+ - **Base Architecture:** GTE-multilingual-base (TurkEmbed4STS)
28
+ - **Language:** Turkish (tr)
29
+ - **Training Dataset:** Machine-translated RAGTruth dataset (17,790 training instances)
30
+ - **Context Length:** Up to 8,192 tokens
31
+ - **Model Size:** ~135M parameters
32
+
33
+ ## Intended Use
34
+
35
+ ### Primary Use Cases
36
+ - Hallucination detection in Turkish RAG systems
37
+ - Token-level classification of supported vs. hallucinated content
38
+ - Stable performance across diverse Turkish text generation tasks
39
+ - Applications requiring consistent precision-recall balance
40
+
41
+ ### Supported Tasks
42
+ - Question Answering (QA) hallucination detection
43
+ - Data-to-text generation verification
44
+ - Text summarization fact-checking
45
+
46
+ ## Performance
47
+
48
+ ### Overall Performance (F1-Score)
49
+ - **Whole Dataset:** 0.7666
50
+ - **Question Answering:** 0.7420
51
+ - **Data-to-text Generation:** 0.7797
52
+ - **Summarization:** 0.6123
53
+
54
+ ### Key Strengths
55
+ - Most consistent performance across all task types
56
+ - Stable behavior avoiding extreme precision-recall imbalances
57
+ - Good semantic understanding from Turkish fine-tuning
58
+
59
+ ## Training Details
60
+
61
+ ### Training Data
62
+ - **Dataset:** Machine-translated RAGTruth benchmark
63
+ - **Size:** 17,790 training instances, 2,700 test instances
64
+ - **Tasks:** Question answering (MS MARCO), data-to-text (Yelp), summarization (CNN/Daily Mail)
65
+ - **Translation Model:** Google Gemma-3-27b-it
66
+
67
+ ### Training Configuration
68
+ - **Epochs:** 6
69
+ - **Learning Rate:** 1e-5
70
+ - **Batch Size:** 4
71
+ - **Hardware:** NVIDIA A100 40GB GPU
72
+ - **Training Time:** ~2 hours
73
+ - **Optimization:** Cross-entropy loss with token masking
74
+
75
+ ### Pre-training Background
76
+ - Built on GTE-multilingual-base architecture
77
+ - Fine-tuned for NLI and STS tasks
78
+ - Optimized for Turkish language understanding
79
+ - Fine-tuned specifically for hallucination detection
80
+
81
+ ## Technical Specifications
82
+
83
+ ### Architecture Features
84
+ - **Base Model:** GTE-multilingual encoder
85
+ - **Specialization:** Turkish semantic textual similarity
86
+ - **Maximum Sequence Length:** 8,192 tokens
87
+ - **Classification Head:** Binary token-level classifier
88
+ - **Embedding Dimension:** Based on GTE-multilingual architecture
89
+
90
+ ### Input Format
91
+ ```
92
+ Input: [CONTEXT] [QUESTION] [GENERATED_ANSWER]
93
+ Output: Token-level binary labels (0=supported, 1=hallucinated)
94
+ ```
95
+
96
+ ## Limitations and Biases
97
+
98
+ ### Known Limitations
99
+ - Lower performance on summarization tasks compared to structured tasks
100
+ - Performance dependent on translation quality of training data
101
+ - Smaller model size may limit complex reasoning capabilities
102
+ - Optimized for Turkish but built on multilingual foundation
103
+
104
+ ### Potential Biases
105
+ - Translation artifacts from machine-translated training data
106
+ - Bias toward semantic similarity patterns from STS pre-training
107
+ - May favor shorter, more structured text over longer abstracts
108
+
109
+ ## Usage
110
+
111
+ ### Installation
112
+ ```bash
113
+ pip install lettucedetect
114
+ ```
115
+
116
+ ### Basic Usage
117
+ ```python
118
+ from lettucedetect.models.inference import HallucinationDetector
119
+
120
+ # Initialize the Turkish-specific hallucination detector
121
+ detector = HallucinationDetector(
122
+ method="transformer",
123
+ model_path="newmindai/TurkEmbed4STS-HD"
124
+ )
125
+
126
+ # Turkish context, question, and answer
127
+ context = "İstanbul Türkiye'nin en büyük şehridir. Şehir 15 milyonluk nüfusla Avrupa'nın en kalabalık şehridir."
128
+ question = "İstanbul'un nüfusu nedir? İstanbul Avrupa'nın en kalabalık şehri midir?"
129
+ answer = "İstanbul'un nüfusu yaklaşık 16 milyondur ve Avrupa'nın en kalabalık şehridir."
130
+
131
+ # Get span-level predictions (start/end indices, confidence scores)
132
+ predictions = detector.predict(
133
+ context=context,
134
+ question=question,
135
+ answer=answer,
136
+ output_format="spans"
137
+ )
138
+
139
+ print("Tespit Edilen Hallusinasyonlar:", predictions)
140
+ # Örnek çıktı:
141
+ # [{'start': 34, 'end': 57, 'confidence': 0.92, 'text': 'yaklaşık 16 milyondur'}]
142
+ ```
143
+
144
+
145
+ ## Evaluation
146
+
147
+ ### Benchmark Results
148
+ Evaluated on machine-translated Turkish RAGTruth test set, showing the most consistent behavior across all three task types with stable precision-recall balance.
149
+
150
+ **Example-level Results**
151
+
152
+ <img
153
+ src="https://cdn-uploads.huggingface.co/production/uploads/683d4880e639f8d647355997/RejTWu3JNjH8t0teV1Txf.png"
154
+ width="1000"
155
+ style="object-fit: contain; margin: auto; display: block;"
156
+ />
157
+ **Token-level Results**
158
+
159
+ <img
160
+ src="https://cdn-uploads.huggingface.co/production/uploads/683d4880e639f8d647355997/ECyrfN5Jv8fZSM0svxLXq.png"
161
+ width="500"
162
+ style="object-fit: contain; margin: auto; display: block;"
163
+ />
164
+
165
+ ### Comparative Analysis
166
+ - Most stable performance across diverse tasks
167
+ - Consistent precision-recall balance (unlike models with extreme values)
168
+ - Suitable for applications prioritizing reliability over peak performance
169
+
170
+ ## Citation
171
+
172
+ ```bibtex
173
+ @inproceedings{turklettucedetect2025,
174
+ title={Turk-LettuceDetect: A Hallucination Detection Models for Turkish RAG Applications},
175
+ author={Authors Hidden for Review},
176
+ booktitle={9th International Artificial Intelligence and Data Processing Symposium (IDAP'25)},
177
+ year={2025},
178
+ address={Malatya, Turkey}
179
+ }
180
+ ```
181
+
182
+ ## Related Work
183
+
184
+ This model builds upon the TurkEmbed4STS model:
185
+ ```bibtex
186
+ @article{turkembed4sts,
187
+ title={TurkEmbed4Retrieval: Turkish Embedding Model for Retrieval Task},
188
+ author={Ezerceli, Ö. and Gümüşçekicci, G. and Erkoç, T. and Özenc, B.},
189
+ journal={preprint},
190
+ year={2024}
191
+ }
192
+ ```
193
+
194
+ ```bibtex
195
+ @misc{Kovacs:2025,
196
+ title={LettuceDetect: A Hallucination Detection Framework for RAG Applications},
197
+ author={Ádám Kovács and Gábor Recski},
198
+ year={2025},
199
+ eprint={2502.17125},
200
+ archivePrefix={arXiv},
201
+ primaryClass={cs.CL},
202
+ url={https://arxiv.org/abs/2502.17125},
203
+ }
204
+ ```
205
+
206
+
207
+ ## License
208
+
209
+ This model is released under an open-source license to support research and development in Turkish NLP applications.
210
+
211
+ ## Contact
212
+
213
+ For questions about this model or other Turkish hallucination detection models, please refer to the original paper or contact the authors.
214
+
215
+ ---
216
+
217
+ **Note:** This model is optimized for stability and consistency across different Turkish RAG tasks, making it ideal for production environments where reliable performance is more important than peak metrics.