MN-Violet-Lotus-12B-AWQ
AWQ 4-bit quantized version of FallenMerick/MN-Violet-Lotus-12B — a Mistral-based 12B finetune for creative writing and emotionally rich roleplay.
| Spec | Value |
|---|---|
| Architecture | Mistral 12B |
| Parameters | 12.2B (not 3B — HF misreads AWQ compressed size) |
| Quantization | AWQ W4A16 (4-bit weights, 16-bit activations) |
| Format | compressed-tensors / safetensors |
| Base model | FallenMerick/MN-Violet-Lotus-12B |
| VRAM required | ~8GB (fits on 1× RTX 3080) |
Serving with vLLM
pip install vllm compressed-tensors
python3 -m vllm.entrypoints.openai.api_server \
--model meganovaai/MN-Violet-Lotus-12B-AWQ \
--served-model-name MN-Violet-Lotus-12B \
--gpu-memory-utilization 0.8 \
--dtype half \
--quantization compressed-tensors \
--port 8000
Quantization
Applied via llmcompressor with AWQ W4A16 scheme, calibrated on 64 samples from ultrachat_200k. lm_head excluded from quantization.
Credits
- Original model: FallenMerick/MN-Violet-Lotus-12B
- Quantized by: Meganova AI
- Downloads last month
- 17
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for meganovaai/MN-Violet-Lotus-12B-AWQ
Base model
FallenMerick/MN-Violet-Lotus-12B