Good work.
I want to signify my appreciation for your work and exploration into non-uniform quantization.
I can't provide informative feedback. I'm not qualified to evaluate the claims. But they sure sound sensible to me!
<3
Do mxp4 ggufs make any sense (vs Q4_*) on hardware without native FP4 support?
Alright running the 580MB version in a live chat with several people and it's very fun. Terse. 29t/s generation on an old ryzen2 laptop. Vega8 Vulkan or CPU speed almost identical.
thanks! amazing!
Sorry for the late response, the holidays had me busy!
As for your questions on MXFP4 GGUFs vs Q4. So, native support for things like FP4 is obviously fantastic, but most of us don't have that support since true native FP4 support is really only native to high end Nvidia GPU's I believe right now. Basically, if your GPU doesn’t natively support something like FP4, it’s still handled correctly via dequantization and kernels, just not through true native FP4 hardware instructions. This has a small overhead cost, but I'd pretty confidently guess that 99% of the time, it never really matters for the majority of people. Plus, I'm subscribed to the philosophy of not just posting a model, but the benchmarks because I want to trust the data, not the vibes.
I'll hopefully have updates for this model potentially in the near future as well. This 350M model is really easy for me to work with when testing. I've got a whole new architecture I'm brewing for MagicQuant. Now I'm unsure if the new architecture will find anything better for this model specifically, but the new MagicQuant code hopefully will make even more fun mixes.
But I'm glad enjoy the project and this model! I was hoping someone would enjoy the near lossless 350M model because it was a hard quant mix to find! These tiny models cannot take a 1% to 5% PPL delta % loss like larger models can, it's way more sensitive.
Thanks again, I'm glad you enjoy it!
i really enjoyed testing out your quants - when are new releases coming? i keep checking every few days but you've not posted in 3 weeks ;(
I can only do this on the side, so I don't have a hard deadline yet. But I'm hoping mid to late January and hopefully at most sometime in February.