PleIAs
/

Pleias-RAG-350M

@@ -64,14 +64,25 @@ Pleias-RAG-350m is able to read and write in the main European languages: French
 To date, it is the only small language model with negligible loss of performance in leading European languages for RAG-related tasks. On a translated set of HotPotQA we observed a significant drop of performance in most SLMs from 10\% to 30-35\% for sub-1B models.
 We do expect the results of any standard English evaluation on Pleias RAG models should be largely transferable to the main European languages limiting the costs of evaluation and deployment in multilingual settings.
 ## Training
 Pleias-RAG-350m is trained on large synthetic dataset emulating retrieval of wide variety of multilingual open sources from Common Corpus. They provide native support for citation and grounding with literal quotes. Following on the latest trends of agentification, the models reintegrate multiple features associated with RAG workflows such as query routing, query reformulation, source reranking.
 ## Evaluation
-Pleias-RAG-350m has been evaluated on three standard RAG benchmarks, 2wiki, HotpotQA and MuSique. All the benchmarks only assess the "trivial" mode on questions requiring some form of multi-hop reasoning over sources (answer disseminated into different sources) as well as discrimination of distractor sources.
 <p align="center">
   <img width="80%" src="figures/benchmark.png">
 </p>

 To date, it is the only small language model with negligible loss of performance in leading European languages for RAG-related tasks. On a translated set of HotPotQA we observed a significant drop of performance in most SLMs from 10\% to 30-35\% for sub-1B models.
+<p align="center">
+  <img width="80%" src="figures/language_benchmark.png">
+</p>
 We do expect the results of any standard English evaluation on Pleias RAG models should be largely transferable to the main European languages limiting the costs of evaluation and deployment in multilingual settings.
 ## Training
 Pleias-RAG-350m is trained on large synthetic dataset emulating retrieval of wide variety of multilingual open sources from Common Corpus. They provide native support for citation and grounding with literal quotes. Following on the latest trends of agentification, the models reintegrate multiple features associated with RAG workflows such as query routing, query reformulation, source reranking.
 ## Evaluation
+Pleias-RAG-350m has been evaluated on three standard RAG benchmarks, 2wiki, HotpotQA and MuSique.
 <p align="center">
   <img width="80%" src="figures/benchmark.png">
 </p>
+All the benchmarks only assess the "trivial" mode on questions requiring some form of multi-hop reasoning over sources (answer disseminated into different sources) as well as discrimination of distractor sources.
+Pleias-RAG-350m is not simply a cost-effective version of larger models. We found it has been able to answer correctly to several hundred questions from HotPotQA that neither Llama-3-8b nor Qwen-2.5-7b could solve. Consequently we encourage its use as part of multi-model RAG systems.
+## Deployment
+With only 350 million parameters, Pleias-RAG-350m is classified among the *phone-sized SLM*, a niche with very little alternatives (Smollm, Qwen-0.5) and none that currently works well for retrieval-augmented generation.