EXAONE-4.0-1.2B QQQ W4

Config: QQQ | 4-bit | group=128 | QQQ

W4A8: 4-bit weights, 8-bit activations - Marlin-based kernel

Usage

from gptqmodel import GPTQModel
model = GPTQModel.from_quantized("YOUR_REPO", device="cuda:0")

Safetensors

Model size

0.4B params

Tensor type

BF16

I32

F16

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Quantized

(33)

this model