Nanuq-R1 14B
 
      Model Information
Nanuq-R1 14B
A sequel! The new Nanuq series is meant to be as a testing grounds for my GRPO experiments, This model is a full post-train heal of Snwy's Frankenmerge between Q3 235B and Q3 8B.
Pretrained for 2 epochs on 1B tokens of Creative Writing data, Then SFT with alot of my own and Pocketdoc's Instruct dataset, and then GRPO'd with the Claude-2.7K dataset in an attempt to align it to be more like Claude with POLARS and Verifiers
There's alot of things i could do different, As the reward almost falls flat as soon as you get out of warm-up but this model was pretty decent so decided to release it(Esp considering it's starting place), Hope people enjoy it!
Quantized Versions
Available Downloads
- GGUF FormatFor use with LLama.cpp & Forks(Coming Soon!)
- EXL2 FormatFor use with TabbyAPI (Coming soon!)
Prompting
Model has been tuned with the ChatML formatting. A typical input would look like this:
"""<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>
<|im_start|>user
Can I ask a question?<|im_end|>
<|im_start|>assistant
"""Training
The training was done for 2 epochs of Pretraining and 2 epochs of SFT and finally 500 steps of GRPO using Verifiers with 8 x H200s GPUs for the fine-tuning of the model.
Credits
Thank you to Intervitens, Cgato, Kubernetes Bad, Cgato, Snwy, Auri, Will Brown and most of all: Kalomaze
- Downloads last month
- 3
Model tree for Delta-Vector/Nanuq-R1-14B
Base model
Qwen/Qwen3-235B-A22B-Thinking-2507
 
  