Model Card for ealexeev/TheDrummer-Skyfall-31B-v4.1-NVFP4

This is an NVFP4 quantization of TheDrummer/Skyfall-31B-v4.1.

Quantization Details

Used https://github.com/ealexeev/llm-quantization script.

Calibration dataset size: 1024 Calibration data:

HuggingFaceH4/ultrachat_200k
allenai/c4_en
mrcedric98/fiction_books_v8

These were shuffled and mixed at a ratio of 3:2:3

Procedure

python ./quantize_nvfp4.py --model TheDrummer/Skyfall-31B-v4.1 --output ./TheDrummer/Skyfall-31B-v4.1 --size 1024 --seed 42 --ultra_chat 3 --c4_en 2 --fiction_v8 3

I had read in VLLM docs that NVFP4 quantization needs very few samples. I ran multiple quants of 128, 256, and 512 samples. This 1024 version hit the sweet spot in these particular evals.

Quantization Evals

Metric	Base Model (BF16)	NVFP4 (Quantized)	Delta
ARC Challenge (Logic/Reasoning)	0.6766	0.6561	-3.03%
IFEval (Strict Instruction Following)	0.478	0.4732	-1%
HellaSwag (Flow/Common Sense)	0.8396	0.8298	-1.17%
Winogrande (Ambiguity Resolution)	2.759	2.881	+4.42%
Lambada (Perplexity)	7.553	8.365	+10.75%

Bias, Risks, and Limitations

This is already a creative fine-tune. It was quantized with that usecase in mind. Probably not gonna pass any leet-coder challenges with this one.

How To Use

bash
vllm serve ealexeev/TheDrummer-Skyfall-31B-v4.1-NVFP4 \
    --tensor-parallel-size 1 \      # 1 GPU
    --gpu-memory-utilization 0.8 \  # Else it will take it all for KV