L3.3-70B-Euryale-v2.3-AWQ

AWQ 4-bit quantized version of Sao10K/L3.3-70B-Euryale-v2.3 — a Llama 3.3 70B finetune for structured, immersive long-form roleplay.

Spec Value
Architecture Llama 3.3 70B
Parameters 70.6B (not 11B — HF misreads AWQ compressed size)
Quantization AWQ W4A16 (4-bit weights, 16-bit activations)
Format compressed-tensors / safetensors
Base model meta-llama/Llama-3.3-70B-Instruct → Sao10K/L3.3-70B-Euryale-v2.3
VRAM required ~38GB (fits on 4× RTX 3080 or 2× RTX 3090)

Serving with vLLM

pip install vllm compressed-tensors

python3 -m vllm.entrypoints.openai.api_server \
    --model meganovaai/L3.3-70B-Euryale-v2.3-AWQ \
    --served-model-name L3.3-70B-Euryale-v2.3 \
    --gpu-memory-utilization 0.9 \
    --dtype half \
    --quantization compressed-tensors \
    --tensor-parallel-size 4 \
    --port 8000

Quantization

Applied via llmcompressor with AWQ W4A16 scheme, calibrated on 64 samples from ultrachat_200k. lm_head excluded from quantization.

Credits

Downloads last month
132
Safetensors
Model size
11B params
Tensor type
I64
·
I32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for meganovaai/L3.3-70B-Euryale-v2.3-AWQ

Quantized
(27)
this model