monsoon-nlp
/

bangla-electra

Model card Files Files and versions

monsoon-nlp commited on Oct 8, 2022

Commit

750d816

·

1 Parent(s): 6b1473a

relevant model links

Files changed (1) hide show

README.md +4 -0

README.md CHANGED Viewed

@@ -7,6 +7,10 @@ language: bn
 This is a second attempt at a Bangla/Bengali language model trained with
 Google Research's [ELECTRA](https://github.com/google-research/electra).
 Tokenization and pre-training CoLab: https://colab.research.google.com/drive/1gpwHvXAnNQaqcu-YNx1kafEVxz07g2jL
 V1 - 120,000 steps; V2 - 190,000 steps

 This is a second attempt at a Bangla/Bengali language model trained with
 Google Research's [ELECTRA](https://github.com/google-research/electra).
+**As of 2022 I recommend Google's MuRIL model trained on English, Bangla, and other major Indian languages, both in their script and latinized script**: https://huggingface.co/google/muril-base-cased and https://huggingface.co/google/muril-large-cased
+**For causal language models, I would suggest https://huggingface.co/sberbank-ai/mGPT, though this is a large model**
 Tokenization and pre-training CoLab: https://colab.research.google.com/drive/1gpwHvXAnNQaqcu-YNx1kafEVxz07g2jL
 V1 - 120,000 steps; V2 - 190,000 steps