Model Card for ealexeev/TheDrummer-Valkyrie-49B-v2.1-NVFP4

This is an NVFP4 quantization of TheDrummer/Valkyrie-49B-v2.1.

Quantization Details

Used https://github.com/ealexeev/llm-quantization script.

Calibration dataset size: 1024 Calibration data:

HuggingFaceH4/ultrachat_200k
allenai/c4_en
mrcedric98/fiction_books_v8

These were shuffled and mixed at a ratio of 3:2:3

Procedure

python ./quantize_nvfp4.py --model TheDrummer/Valkyrie-49B-v2.1 --output ./TheDrummer/Valkyrie-49B-v2.1 --size 1024 --seed 42 --ultra_chat 3 --c4_en 2 --fiction_v8 3

I had read in VLLM docs that NVFP4 quantization needs very few samples. I ran multiple quants of 128, 256, 512, and 1024 samples. This 1024 version hit the sweet spot in these particular evals.

Quantization Evals

Metric	Base Model (BF16)	NVFP4 (Quantized)	Delta
ARC Challenge (Logic/Reasoning)	0.596	0.582	-2.300%
IFEval (Strict Instruction Following)	0.724	0.717	-1.000%
HellaSwag (Flow/Common Sense)	0.633	0.645	+1.900%
Lambada (Perplexity)	2.986	2.955	-1.000%
WikiText (Perplexity)	8.704	8.2185	-5.600%

Bias, Risks, and Limitations

This is already a creative fine-tune. It was quantized with that usecase in mind. Probably not gonna pass any leet-coder challenges with this one.

How To Use

bash
vllm serve ealexeev/TheDrummer-Valkyrie-49B-v2.1-NVFP4 \
    --tensor-parallel-size 1 \      # 1 GPU
    --gpu-memory-utilization 0.8 \  # Else it will take it all for KV

Downloads last month: 85

Safetensors

Model size

29B params

Tensor type

BF16

F32

F8_E4M3

Model tree for ealexeev/TheDrummer-Valkyrie-49B-v2.1-NVFP4

Base model

nvidia/Llama-3_3-Nemotron-Super-49B-v1_5

Finetuned

TheDrummer/Valkyrie-49B-v2.1

Quantized

(7)

this model