--- license: apache-2.0 datasets: - N8Programs/CreativeGPT base_model: - Qwen/Qwen3-14B --- # VellumMini-0.1-Qwen3-14B Just a sneak peek of what I'm cooking in a little project called Vellum. This model was made to evaluate the quality of the CreativeGPT dataset, and how well Qwen3 trains on it. This is just one of many datasets that the final model will be trained on (which will also be using a different base model). This got pretty good results compared to the regular instruct in my testing so thought I would share. I trained for 3 epochs, but both checkpoints at 2 epoch and 3 epoch were too overbaked. This checkpoint, at 1 epoch performed best. I'm pretty surprised how decent this came out since Qwen models aren't that great at writing, especially at this size. ### Usage Use with thinking/chain-of-thought disabled. Use ChatML prompt format. Qwen suggested sampler settings are recommended. Temperature: 0.7 Top_P: 0.8 Top_K: 20 Min_P: 0 ## Quants ### GGUFs #### iMatrix These are reccommended. - bartowski - https://huggingface.co/bartowski/lemon07r_VellumMini-0.1-Qwen3-14B-GGUF - mradermacher - https://huggingface.co/mradermacher/VellumMini-0.1-Qwen3-14B-i1-GGUF #### Static - mradermacher - https://huggingface.co/mradermacher/VellumMini-0.1-Qwen3-14B-GGUF - Q4_K_M Only - https://huggingface.co/lemon07r/VellumMini-0.1-Qwen3-14B-Q4_K_M-GGUF ## Special Thanks Big thanks to everyone over at the KoboldAI discord. The members there have helped me a ton with various things over the long while I've been there. ## Training Details ### Parent Model https://huggingface.co/Qwen/Qwen3-14B ### Training Method Full fine-tune - SFT ### Dataset(s) https://huggingface.co/datasets/N8Programs/CreativeGPT ### Training Hyperparameters ``` Batch size 4 Learning rate 0.00001 Number of epochs 3 Warmup ratio 0.05 Weight decay 0.02 Max gradient norm 1 Packing No ``` ### Training Results ![Screenshot_20251005_020153](https://cdn-uploads.huggingface.co/production/uploads/65751ccd1488186315b841e6/TBtH-6CD7gnbZQVdlfRpW.webp)