--- tags: - sentence-transformers - embeddings - multilingual - NLP - Indic-languages - semantic-search - similarity library_name: sentence-transformers license: other license_name: krutrim-community-license-agreement-version-1.0 license_link: LICENSE.md --- # Vyakyarth: A Multilingual Sentence Embedding Model for Indic Languages [![Static Badge](https://img.shields.io/badge/Huggingface-Vyakyarth-yellow?logo=huggingface)](https://huggingface.co/krutrim-ai-labs/vyakyarth) [![Static Badge](https://img.shields.io/badge/Github-Vyakyarth-green?logo=github)](https://github.com/ola-krutrim/Vyakyarth) [![Static Badge](https://img.shields.io/badge/Krutrim_Cloud-Vyakyarth-orange?logo=)](https://cloud.olakrutrim.com/console/inference-service?section=models&modelName=Krutrim&artifactName=Vyakyarth&artifactType=model) [![Static Badge](https://img.shields.io/badge/Krutrim_AI_Labs-Vyakyarth-blue?logo=)](https://ai-labs.olakrutrim.com/models/Vyakyarth-1-Indic-Embedding) This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/stsb-xlm-r-multilingual](https://huggingface.co/sentence-transformers/stsb-xlm-r-multilingual). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. [![Vyakyarth](https://img.youtube.com/vi/N1f8IlZCUi4/0.jpg)](https://www.youtube.com/watch?v=N1f8IlZCUi4) ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer from sklearn.metrics.pairwise import cosine_similarity import numpy as np # Download from the ЁЯдЧ Hub model = SentenceTransformer("krutrim-ai-labs/vyakyarth") # Run inference sentences = ["рдореИрдВ рдЕрдкрдиреЗ рджреЛрд╕реНрдд рд╕реЗ рдорд┐рд▓рд╛", "I met my friend", "I love you" ] embeddings = np.array(model.encode(sentences)) print(cosine_similarity([embeddings[0]], [embeddings[1]])[0][0]) # Score : 0.9861017 print(cosine_similarity([embeddings[0]], [embeddings[2]])[0][0]) # Score 0.26329127 ``` ### Evaluation/Benchmarking ### Dataset Name : Flores Cross Lingual Sentence Retrieval Task of IndicXtreme Benchmark | **Language** | **MuRIL** | **IndicBERT** | **Vyakyarth** | **jina-embeddings-v3** | |--------------|------|----------|----------|----------------| | **Bengali** | 77.0 | 91.0 | **98.7** | 97.4 | | **Gujarati** | 67.0 | 92.4 | **98.7** | 97.3 | | **Hindi** | 84.2 | 90.5 | **99.9** | 98.8 | | **Kannada** | 88.4 | 89.1 | **99.2** | 96.8 | | **Malayalam**| 82.2 | 89.2 | **98.7** | 96.3 | | **Marathi** | 83.9 | 92.5 | **98.8** | 97.1 | | **Sanskrit** | 36.4 | 30.4 | **90.1** | 84.1 | | **Tamil** | 79.4 | 90.0 | **97.9** | 95.8 | | **Telugu** | 43.5 | 88.6 | **97.5** | 97.3 | ```json { "scale": 20.0, "similarity_fct": "cos_sim" } ``` ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: XLMRobertaModel (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) ) ``` ## License This code repository and the model weights are licensed under the [Krutrim Community License.](LICENSE.md) ## 7. Citation ``` @inproceedings{ author={Pushkar Singh, Sandeep Kumar Pandey, Rajkiran Panuganti}, title={Vyakyarth: A Multilingual Sentence Embedding Model for Indic Languages}, year = {2024}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/ola-krutrim/Vyakyarth}} } ``` ## Contact Contributions are welcome! If you have any improvements or suggestions, feel free to submit a pull request on GitHub.