LFM2 2.6B comparIA-votes DPO
This repository contains a fine-tuned version of the LiquidAI/LFM2-2.6B model, which has been aligned with French user preferences (available in the comparia-votes dataset) using Direct Preference Optimization (DPO).
Training Details
The model was a first experiment and so was fine-tuned on a single epoch. Key training metrics are as follows:
- Training Time: 2 hours, 31 minutes, 5 seconds
- Training Loss: 0.6179
- Validation Loss: 0.6095
- Rewards/Accuracies: 0.6660
How to Use
You can load and use this model with the Hugging Face transformers library or vllm, for more info follow the vanilla model running guide.
- Downloads last month
- 9
Model tree for monsimas/LFM2-2.6B-French-ComparIA-Votes-DPO
Base model
LiquidAI/LFM2-2.6B