MN-Violet-Lotus-12B-AWQ

AWQ 4-bit quantized version of FallenMerick/MN-Violet-Lotus-12B — a Mistral-based 12B finetune for creative writing and emotionally rich roleplay.

Spec	Value
Architecture	Mistral 12B
Parameters	12.2B (not 3B — HF misreads AWQ compressed size)
Quantization	AWQ W4A16 (4-bit weights, 16-bit activations)
Format	compressed-tensors / safetensors
Base model	FallenMerick/MN-Violet-Lotus-12B
VRAM required	~8GB (fits on 1× RTX 3080)

Serving with vLLM

pip install vllm compressed-tensors

python3 -m vllm.entrypoints.openai.api_server \
    --model meganovaai/MN-Violet-Lotus-12B-AWQ \
    --served-model-name MN-Violet-Lotus-12B \
    --gpu-memory-utilization 0.8 \
    --dtype half \
    --quantization compressed-tensors \
    --port 8000

Quantization

Applied via llmcompressor with AWQ W4A16 scheme, calibrated on 64 samples from ultrachat_200k. lm_head excluded from quantization.

Credits

Original model: FallenMerick/MN-Violet-Lotus-12B
Quantized by: Meganova AI

Downloads last month: 17

Safetensors

Model size

3B params

Tensor type

I64

I32

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for meganovaai/MN-Violet-Lotus-12B-AWQ

Base model

FallenMerick/MN-Violet-Lotus-12B

Quantized

(8)

this model