Model Card for ealexeev/TheDrummer-Skyfall-31B-v4.1-NVFP4
This is an NVFP4 quantization of TheDrummer/Skyfall-31B-v4.1.
Quantization Details
Used https://github.com/ealexeev/llm-quantization script.
Calibration dataset size: 1024 Calibration data:
- HuggingFaceH4/ultrachat_200k
- allenai/c4_en
- mrcedric98/fiction_books_v8
These were shuffled and mixed at a ratio of 3:2:3
Procedure
python ./quantize_nvfp4.py --model TheDrummer/Skyfall-31B-v4.1 --output ./TheDrummer/Skyfall-31B-v4.1 --size 1024 --seed 42 --ultra_chat 3 --c4_en 2 --fiction_v8 3
I had read in VLLM docs that NVFP4 quantization needs very few samples. I ran multiple quants of 128, 256, and 512 samples. This 1024 version hit the sweet spot in these particular evals.
Quantization Evals
| Metric | Base Model (BF16) | NVFP4 (Quantized) | Delta |
|---|---|---|---|
| ARC Challenge (Logic/Reasoning) | 0.6766 | 0.6561 | -3.03% |
| IFEval (Strict Instruction Following) | 0.478 | 0.4732 | -1% |
| HellaSwag (Flow/Common Sense) | 0.8396 | 0.8298 | -1.17% |
| Winogrande (Ambiguity Resolution) | 2.759 | 2.881 | +4.42% |
| Lambada (Perplexity) | 7.553 | 8.365 | +10.75% |
Bias, Risks, and Limitations
This is already a creative fine-tune. It was quantized with that usecase in mind. Probably not gonna pass any leet-coder challenges with this one.
How To Use
bash
vllm serve ealexeev/TheDrummer-Skyfall-31B-v4.1-NVFP4 \
--tensor-parallel-size 1 \ # 1 GPU
--gpu-memory-utilization 0.8 \ # Else it will take it all for KV
- Downloads last month
- 91
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support
Model tree for ealexeev/TheDrummer-Skyfall-31B-v4.1-NVFP4
Base model
mistralai/Mistral-Small-3.1-24B-Base-2503 Finetuned
mistralai/Magistral-Small-2509 Finetuned
TheDrummer/Skyfall-31B-v4.1