Quantized EXAONE4
Collection
Quantized checkpoints for EXAONE4 series • 5 items • Updated
2-bit (EXPERIMENTAL): mse=1.5 + group_size=32 + SmoothMSE(96,0.65) Skip embed_tokens + lm_head (34% preserved in FP16)
Expected: 84-90% quality × 4.0-6.0x speed ⚠️ MUST benchmark before using!
from gptqmodel import GPTQModel
model = GPTQModel.from_quantized("namgyu-youn/EXAONE-4.0-1.2B-GPTQ-W2A16", device="cuda:0")
from vllm import LLM
llm = LLM(model="namgyu-youn/EXAONE-4.0-1.2B-GPTQ-W2A16", dtype="float16")
Base model
LGAI-EXAONE/EXAONE-4.0-1.2B