EXAONE-4.0-1.2B GPTQ W2

2-bit (EXPERIMENTAL): mse=1.5 + group_size=32 + SmoothMSE(96,0.65) Skip embed_tokens + lm_head (34% preserved in FP16)

Expected: 84-90% quality × 4.0-6.0x speed ⚠️ MUST benchmark before using!

Usage

GPTQModel

from gptqmodel import GPTQModel
model = GPTQModel.from_quantized("namgyu-youn/EXAONE-4.0-1.2B-GPTQ-W2A16", device="cuda:0")

vLLM

from vllm import LLM
llm = LLM(model="namgyu-youn/EXAONE-4.0-1.2B-GPTQ-W2A16", dtype="float16")
Downloads last month
3
Safetensors
Model size
1B params
Tensor type
F16
·
I32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for namgyu-youn/EXAONE-4.0-1.2B-GPTQ-W2A16

Quantized
(33)
this model

Collection including namgyu-youn/EXAONE-4.0-1.2B-GPTQ-W2A16