MN-Violet-Lotus-12B-AWQ

AWQ 4-bit quantized version of FallenMerick/MN-Violet-Lotus-12B — a Mistral-based 12B finetune for creative writing and emotionally rich roleplay.

Spec Value
Architecture Mistral 12B
Parameters 12.2B (not 3B — HF misreads AWQ compressed size)
Quantization AWQ W4A16 (4-bit weights, 16-bit activations)
Format compressed-tensors / safetensors
Base model FallenMerick/MN-Violet-Lotus-12B
VRAM required ~8GB (fits on 1× RTX 3080)

Serving with vLLM

pip install vllm compressed-tensors

python3 -m vllm.entrypoints.openai.api_server \
    --model meganovaai/MN-Violet-Lotus-12B-AWQ \
    --served-model-name MN-Violet-Lotus-12B \
    --gpu-memory-utilization 0.8 \
    --dtype half \
    --quantization compressed-tensors \
    --port 8000

Quantization

Applied via llmcompressor with AWQ W4A16 scheme, calibrated on 64 samples from ultrachat_200k. lm_head excluded from quantization.

Credits

Downloads last month
17
Safetensors
Model size
3B params
Tensor type
I64
·
I32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for meganovaai/MN-Violet-Lotus-12B-AWQ

Quantized
(8)
this model