|
|
--- |
|
|
library_name: transformers |
|
|
license: apache-2.0 |
|
|
base_model: |
|
|
- nbeerbower/Qwen3-8B-abliterated-TIES |
|
|
datasets: |
|
|
- nbeerbower/GreatFirewall-DPO |
|
|
- nbeerbower/Schule-DPO |
|
|
- nbeerbower/Purpura-DPO |
|
|
- nbeerbower/Arkhaios-DPO |
|
|
- jondurbin/truthy-dpo-v0.1 |
|
|
- antiven0m/physical-reasoning-dpo |
|
|
- flammenai/Date-DPO-NoAsterisks |
|
|
- flammenai/Prude-Phi3-DPO |
|
|
- Atsunori/HelpSteer2-DPO |
|
|
- jondurbin/gutenberg-dpo-v0.1 |
|
|
- nbeerbower/gutenberg2-dpo |
|
|
- nbeerbower/gutenberg-moderne-dpo |
|
|
- GeneralReasoning/GeneralThought-430K |
|
|
- nvidia/OpenMathReasoning |
|
|
- nvidia/OpenCodeReasoning |
|
|
tags: |
|
|
- orpo |
|
|
- uncensored |
|
|
- reasoning |
|
|
- cot |
|
|
--- |
|
|
|
|
|
 |
|
|
|
|
|
# Xiaolong-Qwen3-8B |
|
|
|
|
|
**Xiaolong** is a small, uncensored, reasoning-focused model finetuned using [ORPO and QLoRA](https://huggingface.co/blog/mlabonne/orpo-llama-3) on top of [Qwen3-8B-abliterated-TIES](https://huggingface.co/nbeerbower/Qwen3-8B-abliterated-TIES). |
|
|
|
|
|
## Finetuning Details |
|
|
|
|
|
- **Method:** ORPO |
|
|
- **Epochs:** 2 |
|
|
- **Learning Rate:** 5e-6, cosine decay w/ 5% warmup |
|
|
- **Batch Size:** 1 x 32 (32 effective) |
|
|
- **Max Grad Norm:** 0.3 |
|
|
- **LoRA Rank:** 64 |
|
|
- **Hardware:** 1x NVIDIA RTX A6000 |
|
|
|
|
|
## Dataset Composition |
|
|
|
|
|
~9,100 samples. 3,000 used Chain of Thought reasoning. |
|
|
|
|
|
* [nbeerbower/GreatFirewall-DPO](https://huggingface.co/datasets/nbeerbower/GreatFirewall-DPO) |
|
|
* [nbeerbower/Schule-DPO](https://huggingface.co/datasets/nbeerbower/Schule-DPO) |
|
|
* [nbeerbower/Purpura-DPO](https://huggingface.co/datasets/nbeerbower/Purpura-DPO) |
|
|
* [nbeerbower/Arkhaios-DPO](https://huggingface.co/datasets/nbeerbower/Arkhaios-DPO) |
|
|
* [jondurbin/truthy-dpo-v0.1](https://huggingface.co/datasets/jondurbin/truthy-dpo-v0.1) |
|
|
* [antiven0m/physical-reasoning-dpo](https://huggingface.co/datasets/antiven0m/physical-reasoning-dpo) |
|
|
* [flammenai/Date-DPO-NoAsterisks](https://huggingface.co/datasets/flammenai/Date-DPO-NoAsterisks) |
|
|
* [flammenai/Prude-Phi3-DPO](https://huggingface.co/datasets/flammenai/Prude-Phi3-DPO) |
|
|
* [Atsunori/HelpSteer2-DPO](https://huggingface.co/datasets/Atsunori/HelpSteer2-DPO) (1000 samples) |
|
|
* [jondurbin/gutenberg-dpo-v0.1](https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1) |
|
|
* [nbeerbower/gutenberg2-dpo](https://huggingface.co/datasets/nbeerbower/gutenberg2-dpo) |
|
|
* [nbeerbower/gutenberg-moderne-dpo](https://huggingface.co/datasets/nbeerbower/gutenberg-moderne-dpo) |
|
|
|
|
|
### Chain of Thought |
|
|
|
|
|
* [GeneralReasoning/GeneralThought-430K](https://huggingface.co/datasets/GeneralReasoning/GeneralThought-430K) (1000 samples) |
|
|
* [nvidia/OpenMathReasoning](https://huggingface.co/datasets/nvidia/OpenMathReasoning) (1000 samples) |
|
|
* [nvidia/OpenCodeReasoning](https://huggingface.co/datasets/nvidia/OpenCodeReasoning) (1000 samples) |