File size: 2,836 Bytes
d0620d4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
---
{
"language": ["en"],
"license": "apache-2.0",
"tags": [
"text-generation",
"causal-lm",
"instruction-tuning",
"supervised-fine-tuning",
"synthetic-qa",
"lora",
"axolotl",
"deepspeed",
"transformers",
"mistral",
"nemo",
"eu-hpc"
],
"datasets": ["axolotl_deduplicated_synthetic_qa"],
"metrics": ["loss"],
"library_name": "transformers",
"framework": "pytorch",
"base_model": "mistralai/Mistral-Nemo-Instruct-2407",
"model_name": "mistral-12b-sft",
"pipeline_tag": "text-generation",
"task_categories": ["text-generation", "instruction-following"],
"model_type": "AutoModelForCausalLM",
"inference": {
"parameters": {
"max_new_tokens": 512,
"temperature": 0.7,
"top_p": 0.9
}
},
"trained_on": ["Leonardo EuroHPC"],
"description": "Supervised fine-tuning (SFT) of Mistral 12B Nemo Instruct on synthetic QA data using LoRA with Axolotl and DeepSpeed. Improves conversational reasoning and factual accuracy."
}
---
# Mistral 12B — SFT (Supervised Fine-Tuning on Synthetic QA)
**Model type:** Causal Language Model
**Base model:** [mistralai/Mistral-Nemo-Instruct-2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407)
**License:** Apache 2.0
**Framework:** [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl)
---
## Overview
`mistral-12b-sft` is a **supervised fine-tuned** variant of Mistral-12B trained on high-quality synthetic QA data.
This SFT phase enhances instruction following, factual reasoning, and conversational ability while maintaining model efficiency via 8-bit LoRA adapters.
Training was conducted on **Leonardo EuroHPC**.
---
## Training Setup
**Objective:** Supervised fine-tuning (instruction-following QA)
**Adapter:** LoRA + 8-bit base
**Precision:** bfloat16
**Hardware:** 8 × 2 × A100 64 GB
**Framework:** Axolotl + DeepSpeed + PyTorch 2.5.1 + CUDA 12.1
**Runtime:** ~6 h
**Validation:** 30 %
---
## Dataset
| Dataset | Type | Description |
|----------|------|-------------|
| `axolotl_deduplicated_synthetic_qa.jsonl` | `alpaca_chat.load_qa` | Synthetic instruction–response pairs for QA and chat fine-tuning |
---
## Hyperparameters
| Parameter | Value |
|------------|-------|
| Sequence length | 2048 |
| Micro batch size | 2 |
| Gradient accumulation | 2 |
| Epochs | 1 |
| Learning rate | 0.0002 |
| LR scheduler | cosine |
| Optimizer | AdamW (8-bit) |
| Warmup steps | 10 |
| Weight decay | 0.0 |
| LoRA rank (r) | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0.05 |
| LoRA targets | q_proj, k_proj, v_proj, o_proj |
| Gradient checkpointing | ✅ |
| Flash attention | ✅ |
| Auto-resume | ✅ |
| Loss watchdog | threshold 5.0, patience 3 |
---
## Tokenizer
**Tokenizer type:** `AutoTokenizer`
**Pad token:** `<|end_of_text|>` |