L3.3-70B-Euryale-v2.3-AWQ
AWQ 4-bit quantized version of Sao10K/L3.3-70B-Euryale-v2.3 — a Llama 3.3 70B finetune for structured, immersive long-form roleplay.
| Spec | Value |
|---|---|
| Architecture | Llama 3.3 70B |
| Parameters | 70.6B (not 11B — HF misreads AWQ compressed size) |
| Quantization | AWQ W4A16 (4-bit weights, 16-bit activations) |
| Format | compressed-tensors / safetensors |
| Base model | meta-llama/Llama-3.3-70B-Instruct → Sao10K/L3.3-70B-Euryale-v2.3 |
| VRAM required | ~38GB (fits on 4× RTX 3080 or 2× RTX 3090) |
Serving with vLLM
pip install vllm compressed-tensors
python3 -m vllm.entrypoints.openai.api_server \
--model meganovaai/L3.3-70B-Euryale-v2.3-AWQ \
--served-model-name L3.3-70B-Euryale-v2.3 \
--gpu-memory-utilization 0.9 \
--dtype half \
--quantization compressed-tensors \
--tensor-parallel-size 4 \
--port 8000
Quantization
Applied via llmcompressor with AWQ W4A16 scheme, calibrated on 64 samples from ultrachat_200k. lm_head excluded from quantization.
Credits
- Original model: Sao10K/L3.3-70B-Euryale-v2.3
- Quantized by: Meganova AI
- Downloads last month
- 132
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for meganovaai/L3.3-70B-Euryale-v2.3-AWQ
Base model
meta-llama/Llama-3.1-70B Finetuned
meta-llama/Llama-3.3-70B-Instruct Finetuned
Sao10K/L3.3-70B-Euryale-v2.3