jhu-clsp/mmBERT-base
Fill-Mask
•
Updated
•
31.8k
•
•
154
mmBERT is trained on 3T tokens from over 1800 languages, showing SoTA scores on benchmarks and exceptional low-resource performance
Note Intermediate checkpoints for continued pre-training (MosiacML Composer format)
Note Pre-training Data
Note Randomized data for training (not recommended unless you are using the same data mix)