Model Card for ealexeev/TheDrummer-Cydonia-24B-v4.3-NVFP4
This is an NVFP4 quantization of TheDrummer/Cydonia-24B-v4.3.
Quantization Details
Used https://github.com/ealexeev/llm-quantization script.
Calibration dataset size: 512 Calibration data:
- HuggingFaceH4/ultrachat_200k
- allenai/c4_en
- mrcedric98/fiction_books_v8
These were shuffled and mixed at a ratio of 3:2:3
Procedure
python ./quantize_nvfp4.py --model TheDrummer/Cydonia-24B-v4.3 --output ./TheDrummer/Cydonia-24B-v4.3-NVFP4 --size 512 --seed 42 --ultra_chat 3 --c4_en 2 --fiction_v8 3
I ran a grid search of calibration samples (32, 64, 128, 256, 512, 1024, 4096). While lower sample counts (128/256) improved instruction following, they significantly degraded the model's ability to handle nuance (Winogrande). This 512-sample version was the only one that fully recovered the ambiguity resolution capabilities of the base model, making it the best choice for this creative/roleplay architecture.
Quantization Evals
| Metric | Base Model (BF16) | NVFP4 (Quantized) | Delta |
|---|---|---|---|
| Winogrande (Ambiguity Resolution) | 77.19% | 77.27% | +0.08% |
| HellaSwag (Flow/Common Sense) | 84.19% | 83.16% | -1.03% |
| Lambada (Perplexity) | 2.78 | 2.88 | +0.10 |
| IFEval (Strict Instruction Following) | 53.97% | 53.23% | -0.74% |
| ARC Challenge (Logic/Reasoning) | 68.43% | 65.78% | -2.65% |
Note: It almost matches the base model in Nuance and Perplexity (the "vibes" metrics), trading off a small amount of raw logic.
Bias, Risks, and Limitations
This is a quantization of a creative/roleplay fine-tune. It is optimized for fiction, dialogue, etc. It is not going to do exactly what it is told or win your coding competition for you.
How To Use
vllm serve ealexeev/TheDrummer-Cydonia-24B-v4.3-NVFP4 \
--tensor-parallel-size 1 \ # Fits on 1x GPU (24GB+)
--gpu-memory-utilization 0.8 \ # Leave room for KV Cache
- Downloads last month
- 9
Model tree for ealexeev/TheDrummer-Cydonia-24B-v4.3-NVFP4
Base model
mistralai/Mistral-Small-3.1-24B-Base-2503