Model Information

Nanuq-R1 14B

GRPO Experiment Q3-235B-8B Merge/heal Competant assistant with decent writing!

A sequel! The new Nanuq series is meant to be as a testing grounds for my GRPO experiments, This model is a full post-train heal of Snwy's Frankenmerge between Q3 235B and Q3 8B.

Pretrained for 2 epochs on 1B tokens of Creative Writing data, Then SFT with alot of my own and Pocketdoc's Instruct dataset, and then GRPO'd with the Claude-2.7K dataset in an attempt to align it to be more like Claude with POLARS and Verifiers

There's alot of things i could do different, As the reward almost falls flat as soon as you get out of warm-up but this model was pretty decent so decided to release it(Esp considering it's starting place), Hope people enjoy it!

Quantized Versions

Available Downloads

GGUF FormatFor use with LLama.cpp & Forks(Coming Soon!)
EXL2 FormatFor use with TabbyAPI (Coming soon!)

Prompting

Model has been tuned with the ChatML formatting. A typical input would look like this:

"""<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>
<|im_start|>user
Can I ask a question?<|im_end|>
<|im_start|>assistant
"""

Training

The training was done for 2 epochs of Pretraining and 2 epochs of SFT and finally 500 steps of GRPO using Verifiers with 8 x H200s GPUs for the fine-tuning of the model.