 
				jhu-clsp/mmBERT-base
			Fill-Mask
			• 
		
	
				Updated
					
				
				• 
					
					35.4k
				• 
			
	
				• 
					
					154
				
mmBERT is trained on 3T tokens from over 1800 languages, showing SoTA scores on benchmarks and exceptional low-resource performance
 
				 
				Note Intermediate checkpoints for continued pre-training (MosiacML Composer format)
 
				Note Pre-training Data
Note Randomized data for training (not recommended unless you are using the same data mix)