GovOn-EXAONE-AWQ-v2
Introduction
GovOn-EXAONE-AWQ-v2 is an optimized 4-bit quantized version of GovOn-EXAONE-Merged-v2, designed for On-Device and low-latency deployment in civil service environments.
By applying AWQ (Activation-aware Weight Quantization) (W4A16g128), we reduced the model size by 66.1% (from 14.56GB to 4.94GB) while preserving domain-specific performance. This enables high-quality Korean civil complaint assistance on consumer-grade GPUs with as little as 8GB of VRAM.
Quickstart
We recommend using vLLM or AutoAWQ for optimized inference.
Using AutoAWQ
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
model_id = "umyunsang/GovOn-EXAONE-AWQ-v2"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoAWQForCausalLM.from_quantized(model_id, fuse_layers=True, trust_remote_code=True)
# (Inference code same as Merged-v2)
Specifications
Model Details
- Source Model: umyunsang/GovOn-EXAONE-Merged-v2
- Quantization Method: AWQ (Weight-only 4-bit)
- Config: W4A16, Group Size 128, Zero Point True
- Model Size: 4.94 GB (BF16 Original: 14.56 GB)
- VRAM Required: ~6.5 GB (Inference)
Efficiency
- Compression Ratio: 2.95x
- Size Reduction: 66.1%
- Calibration: 512 domain-specific civil complaint samples
Limitation and Usage
- Quantization Loss: While AWQ minimizes performance drops, slight deviations in CoT (
<thought>) or nuanced reasoning might occur compared to the BF16 version. - Infrastructure: Optimized for NVIDIA GPUs (Ampere architecture or newer recommended).
License
This model is licensed under the Apache License 2.0. However, users must also comply with the EXAONE AI Model License Agreement of the base model.
- Downloads last month
- 81
Model tree for umyunsang/GovOn-EXAONE-AWQ-v2
Base model
LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct Finetuned
LGAI-EXAONE/EXAONE-Deep-7.8B Finetuned
umyunsang/GovOn-EXAONE-Merged-v2