metadata
library_name: transformers
tags:
- trl
- dpo
datasets:
- ministere-culture/comparia-votes
language:
- fr
base_model:
- LiquidAI/LFM2-2.6B
LFM2 2.6B comparIA-votes DPO
This repository contains a fine-tuned version of the LiquidAI/LFM2-2.6B model, which has been aligned with French user preferences (available in the comparia-votes dataset) using Direct Preference Optimization (DPO).
Training Details
The model was a first experiment and so was fine-tuned on a single epoch. Key training metrics are as follows:
- Training Time: 2 hours, 31 minutes, 5 seconds
- Training Loss: 0.6179
- Validation Loss: 0.6095
- Rewards/Accuracies: 0.6660
How to Use
You can load and use this model with the Hugging Face transformers library or vllm, for more info follow the vanilla model running guide.