Efficient Process Reward Model Training via Active Learning.
			
	
	Sea AI Lab
company
						
	Verified
						
						
						AI & ML interests
None defined yet.
Recent Activity
	View all activity
	
				Papers
		Defeating the Training-Inference Mismatch via FP16
Imperceptible Jailbreaking against Large Language Models
Sailing in South-East Asia with Inclusive Multilingual LLMs
			
	
	- 
	
	
	27
Sailor2 20B Chat
π±Chat with Sailor2 for detailed answers in multiple languages
 - 
	
	
	
Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs
Paper β’ 2502.12982 β’ Published β’ 19 - 
	
	
	
				sail/Sailor2-8B-Chat
Text Generation β’ 9B β’ Updated β’ 436 β’ 19 - 
	
	
	
				sail/Sailor2-1B-Chat
Text Generation β’ 1.0B β’ Updated β’ 38 β’ 16 
Increase your vocabulary size when you scale up your language model
			
	
	- 
	
	
	
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies
Paper β’ 2407.13623 β’ Published β’ 56 - 
	
	
	11
Scaling With Vocab Demo
πPredict optimal vocabulary size for models
 - 
	
	
	
				sail/scaling-vocab-3b-43k-overtrain
Text Generation β’ 3B β’ Updated - 
	
	
	
				sail/scaling-vocab-3b-32k-overtrain
Text Generation β’ 3B β’ Updated β’ 1 
Sailor: Open Language Models tailored for South-East Asia (SEA) released by Sea AI Lab.
			
	
	- 
	
	
	
Understanding R1-Zero-Like Training: A Critical Perspective
Paper β’ 2503.20783 β’ Published β’ 56 - 
	
	
	
				sail/Qwen2.5-Math-7B-Oat-Zero
Text Generation β’ 8B β’ Updated β’ 374 β’ β’ 6 - 
	
	
	
				sail/Qwen2.5-Math-1.5B-Oat-Zero
Text Generation β’ 2B β’ Updated β’ 216 β’ β’ 4 - 
	
	
	
				sail/Llama-3.2-3B-Oat-Zero
Text Generation β’ 3B β’ Updated β’ 6 β’ 1 
Automatic data mixture method for large language model pre-training
			
	
	- 
	
	
	6
RegMix
πGenerate predictions and visualize regression results from CSV data
 - 
	
	
	
RegMix: Data Mixture as Regression for Language Model Pre-training
Paper β’ 2407.01492 β’ Published β’ 40 - 
	
	
	
				sail/data-mixture-human-1b
Text Generation β’ Updated β’ 1 β’ 3 - 
	
	
	
				sail/data-mixture-pile-cc-1b
Text Generation β’ Updated β’ 3 β’ 3 
Self-alignment with DPO Implicit Rewards
			
	
	- 
	
	
	
Bootstrapping Language Models with DPO Implicit Rewards
Paper β’ 2406.09760 β’ Published β’ 40 - 
	
	
	
				sail/Llama-3-Base-8B-DICE-Iter1
Text Generation β’ 8B β’ Updated β’ 8 β’ 2 - 
	
	
	
				sail/Llama-3-Base-8B-DICE-Iter2
Text Generation β’ 8B β’ Updated β’ 4 β’ 3 - 
	
	
	
				sail/Zephyr-7B-DICE-Iter1
Text Generation β’ 7B β’ Updated 
Efficient Process Reward Model Training via Active Learning.
			
	
	- 
	
	
	
Understanding R1-Zero-Like Training: A Critical Perspective
Paper β’ 2503.20783 β’ Published β’ 56 - 
	
	
	
				sail/Qwen2.5-Math-7B-Oat-Zero
Text Generation β’ 8B β’ Updated β’ 374 β’ β’ 6 - 
	
	
	
				sail/Qwen2.5-Math-1.5B-Oat-Zero
Text Generation β’ 2B β’ Updated β’ 216 β’ β’ 4 - 
	
	
	
				sail/Llama-3.2-3B-Oat-Zero
Text Generation β’ 3B β’ Updated β’ 6 β’ 1 
Sailing in South-East Asia with Inclusive Multilingual LLMs
			
	
	- 
	
	
	27
Sailor2 20B Chat
π±Chat with Sailor2 for detailed answers in multiple languages
 - 
	
	
	
Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs
Paper β’ 2502.12982 β’ Published β’ 19 - 
	
	
	
				sail/Sailor2-8B-Chat
Text Generation β’ 9B β’ Updated β’ 436 β’ 19 - 
	
	
	
				sail/Sailor2-1B-Chat
Text Generation β’ 1.0B β’ Updated β’ 38 β’ 16 
Automatic data mixture method for large language model pre-training
			
	
	- 
	
	
	6
RegMix
πGenerate predictions and visualize regression results from CSV data
 - 
	
	
	
RegMix: Data Mixture as Regression for Language Model Pre-training
Paper β’ 2407.01492 β’ Published β’ 40 - 
	
	
	
				sail/data-mixture-human-1b
Text Generation β’ Updated β’ 1 β’ 3 - 
	
	
	
				sail/data-mixture-pile-cc-1b
Text Generation β’ Updated β’ 3 β’ 3 
Increase your vocabulary size when you scale up your language model
			
	
	- 
	
	
	
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies
Paper β’ 2407.13623 β’ Published β’ 56 - 
	
	
	11
Scaling With Vocab Demo
πPredict optimal vocabulary size for models
 - 
	
	
	
				sail/scaling-vocab-3b-43k-overtrain
Text Generation β’ 3B β’ Updated - 
	
	
	
				sail/scaling-vocab-3b-32k-overtrain
Text Generation β’ 3B β’ Updated β’ 1 
Self-alignment with DPO Implicit Rewards
			
	
	- 
	
	
	
Bootstrapping Language Models with DPO Implicit Rewards
Paper β’ 2406.09760 β’ Published β’ 40 - 
	
	
	
				sail/Llama-3-Base-8B-DICE-Iter1
Text Generation β’ 8B β’ Updated β’ 8 β’ 2 - 
	
	
	
				sail/Llama-3-Base-8B-DICE-Iter2
Text Generation β’ 8B β’ Updated β’ 4 β’ 3 - 
	
	
	
				sail/Zephyr-7B-DICE-Iter1
Text Generation β’ 7B β’ Updated 
Sailor: Open Language Models tailored for South-East Asia (SEA) released by Sea AI Lab.