EXAONE-4.0-1.2B AWQ (MLP) + GPTQ (Attention) W4A16
01. Quick Start
from vllm import LLM
llm = LLM(model="namgyu-youn/EXAONE-4.0-1.2B-LLMC-AWQ-W4", dtype="bfloat16", kv_cache_dtype="fp8")
02. Benchmark
Perplexity:
|Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.0098|± |0.0044|
| | |strict-match | 5|exact_match|↑ |0.0000|± |0.0000|
Throughput: 5.01 requests/s, 5774.56 total tokens/s, 641.62 output tokens/s
repro:
lm_eval --model vllm \
--model_args pretrained=namgyu-youn/EXAONE-4.0-1.2B-LLMC-AWQ-W4,dtype=float16,gpu_memory_utilization=0.85,enable_thinking=False,max_gen_toks=2048,max_model_len=8192,enforce_eager=True \
--tasks gsm8k \
--limit 512 \
--output_path results \
--apply_chat_template \
--batch_size auto
# For gptqmodel checkpoints (native GPTQ format), omit --quantization
# (vLLM auto-selects gptq_marlin)
vllm bench throughput \
--input-len 256 \
--output-len 256 \
--model namgyu-youn/EXAONE-4.0-1.2B-LLMC-AWQ-W4 \
--num-prompts 100 \
--max-model-len 4096 \
--enforce-eager
- Downloads last month
- 6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for namgyu-youn/EXAONE-4.0-1.2B-LLMC-AWQ-W4
Base model
LGAI-EXAONE/EXAONE-4.0-1.2B