I would be interested in some more NVF4P Quants

#2
by DigitalSpellcaster - opened

Thanks for the Quant, I am interested in exploring more.

Are there any specific models you're interested in that don't currently have NVFP4 quants?

First off—thank you for even entertaining this request, you're an absolute legend. I appreciate you helping me unlock the power of this Blackwell silicon. (5060Ti 16GB)

To avoid quality loss, I’d like these to be mastered using the source weights. Here are the repos:

❖ 21B MoE: DavidAU/L3.2-8X4B-MOE-V2-Dark-Champion-Inst-21B-uncen-ablit

❖ 16.5B Doom: DavidAU/L3-DARKEST-PLANET-16.5B

❖ 16B Heretic: DavidAU/L3-Darkest-Planet-16B-HERETIC-Uncensored-Abliterated

For the technical side, I’m hoping for:
❖ Format: E2M1 (Standard NVFP4).
❖ Block Size: 16 (for that Blackwell sweet spot).
❖ Scaling: If you can do FP8 (E4M3) micro-scaling with an FP32 tensor scale, that would be the dream.

Thanks in advance for anything you can do, and just for reading my request.

I ran the two that I could.

https://huggingface.co/Firworks/L3-Darkest-Planet-16B-HERETIC-Uncensored-Abliterated-nvfp4
https://huggingface.co/Firworks/L3-DARKEST-PLANET-16.5B-nvfp4

The model card made them sound pretty specialized and strange. I wasn't sure exactly the right way to run them but I did test both to verify they respond with coherent text in VLLM. I was getting ~20tok/s on the DGX Spark. You should see faster speeds than that on the 5060 Ti.

I couldn't run DavidAU/L3.2-8X4B-MOE-V2-Dark-Champion-Inst-21B-uncen-ablit as I don't see it in non GGUF format ?

Thank you very much! having such high fidelity models in that specific format is going to sing on my Blackwell CPU. I really appreciate the work you've done. I understand if you couldn't find the final one, I'll do some digging.

Sign up or log in to comment