sabersalehk
/

Llama3-SimPO

Model card Files Files and versions

Llama3-SimPO / README.md

sabersalehk's picture

Create README.md

022f6b4 verified about 1 year ago

|

history blame contribute delete

371 Bytes

	---
	license: mit
	datasets:
	- HuggingFaceH4/ultrafeedback_binarized
	base_model:
	- princeton-nlp/Llama-3-Base-8B-SFT
	---

	This is an aligned model based on princeton-nlp/Llama-3-Base-8B-SFT. This model is aligned using the Ultrafeedback dataset, fine-tuned through the Simple Preference Optimization (SimPO) loss. The optimization process was conducted with a single epoch.