civil-complaint-exaone-awq

umyunsang/civil-complaint-exaone-merged의 AWQ W4A16g128 4-bit μ–‘μžν™” λ²„μ „μž…λ‹ˆλ‹€. μ˜¨λ””λ°”μ΄μŠ€ AI 배포λ₯Ό μœ„ν•΄ μ΅œμ ν™”λ˜μ—ˆμŠ΅λ‹ˆλ‹€.

Model Tree

LGAI-EXAONE/EXAONE-Deep-7.8B  (기반 λͺ¨λΈ)
        |
        |  + umyunsang/civil-complaint-exaone-lora  (QLoRA μ–΄λŒ‘ν„°, rank=16)
        v
umyunsang/civil-complaint-exaone-merged  (BF16, 14.56 GB)
        |
        |  AWQ W4A16g128 μ–‘μžν™” (AutoAWQ)
        v
umyunsang/civil-complaint-exaone-awq  (4-bit, 4.94 GB)  <- 이 λͺ¨λΈ

Model Description

ν•­λͺ© λ‚΄μš©
원본 λͺ¨λΈ umyunsang/civil-complaint-exaone-merged
μ–‘μžν™” 방식 AWQ (Activation-aware Weight Quantization)
μ–‘μžν™” μ„€μ • W4A16g128 (4-bit κ°€μ€‘μΉ˜, 16-bit ν™œμ„±ν™”, group_size=128)
λͺ¨λΈ 크기 4.94 GB (safetensors)
μ••μΆ•λ₯  2.95x (BF16 14.56 GB β†’ 4-bit 4.94 GB)
크기 κ°μ†Œ 66.1%
GPU VRAM ~5-7 GB (μΆ”λ‘  μ‹œ)

Intended Use

ν•œκ΅­ 민원 처리 업무 지원 (μ˜¨λ””λ°”μ΄μŠ€/μ—£μ§€ AI 배포용):

  • 민원 λΆ„λ₯˜: ν™˜κ²½, ꡐ톡, μ‹œμ„€, λ―Όμ›μ„œλΉ„μŠ€, 볡지, λ¬Έν™”, 경제, ꡐ윑, μ•ˆμ „, 기타
  • 민원 λ‹΅λ³€ 생성: ν‘œμ€€ μ„œμ‹μ— 맞좘 κ³΅μ†ν•˜κ³  λͺ…ν™•ν•œ λ‹΅λ³€ μž‘μ„±
  • κ²½λŸ‰ 배포: μ†ŒλΉ„μžκΈ‰ GPU (8GB VRAM)μ—μ„œλ„ μ‹€ν–‰ κ°€λŠ₯

Usage

AutoAWQ μ‚¬μš©

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
import torch

model_id = "umyunsang/civil-complaint-exaone-awq"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoAWQForCausalLM.from_quantized(
    model_id,
    fuse_layers=True,
    trust_remote_code=True,
    safetensors=True,
)

instruction = "λ‹€μŒ 민원에 λŒ€ν•΄ λ‹¨κ³„μ μœΌλ‘œ λΆ„μ„ν•˜κ³ , ν‘œμ€€ μ„œμ‹μ— 맞좰 κ³΅μ†ν•˜κ³  λͺ…ν™•ν•œ 닡변을 μž‘μ„±ν•˜μ„Έμš”."
complaint = "[Category: traffic]\nComplaint Content: 우리 동넀 λ„λ‘œμ— ν¬νŠΈν™€μ΄ μƒκ²¨μ„œ μ°¨λŸ‰ 톡행에 μœ„ν—˜ν•©λ‹ˆλ‹€. λΉ λ₯Έ 쑰치 λΆ€νƒλ“œλ¦½λ‹ˆλ‹€."

messages = [{"role": "user", "content": f"{instruction}\n\n{complaint}"}]

input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
).to(model.device)

with torch.no_grad():
    output = model.generate(
        input_ids,
        max_new_tokens=512,
        do_sample=True,
        temperature=0.6,
        top_p=0.95,
        eos_token_id=tokenizer.eos_token_id,
    )

response = tokenizer.decode(output[0][input_ids.shape[1]:], skip_special_tokens=True)
print(response)

vLLM μ‚¬μš© (ꢌμž₯)

from vllm import LLM, SamplingParams

llm = LLM(
    model="umyunsang/civil-complaint-exaone-awq",
    quantization="awq",
    trust_remote_code=True,
)

sampling_params = SamplingParams(temperature=0.6, top_p=0.95, max_tokens=512)

instruction = "λ‹€μŒ 민원에 λŒ€ν•΄ λ‹¨κ³„μ μœΌλ‘œ λΆ„μ„ν•˜κ³ , ν‘œμ€€ μ„œμ‹μ— 맞좰 κ³΅μ†ν•˜κ³  λͺ…ν™•ν•œ 닡변을 μž‘μ„±ν•˜μ„Έμš”."
complaint = "[Category: traffic]\nComplaint Content: 우리 동넀 λ„λ‘œμ— ν¬νŠΈν™€μ΄ μƒκ²¨μ„œ μ°¨λŸ‰ 톡행에 μœ„ν—˜ν•©λ‹ˆλ‹€."

prompt = f"[|system|]You are a helpful assistant.[|endofturn|]\n[|user|]{instruction}\n\n{complaint}[|endofturn|]\n[|assistant|]<thought>\n"

outputs = llm.generate([prompt], sampling_params)
print(outputs[0].outputs[0].text)

Quantization Details

μ„€μ • κ°’
μ•Œκ³ λ¦¬μ¦˜ AWQ (Activation-aware Weight Quantization)
κ°€μ€‘μΉ˜ λΉ„νŠΈ 4-bit
ν™œμ„±ν™” λΉ„νŠΈ 16-bit (FP16)
Group size 128
Zero-point True
버전 GEMM
μΊ˜λ¦¬λΈŒλ ˆμ΄μ…˜ 데이터 512 μƒ˜ν”Œ (민원 도메인 ν•™μŠ΅ 데이터)

Evaluation Results

μ§€ν‘œ BF16 (merged) AWQ 4-bit λ³€ν™”
Perplexity - 3.20 -
BLEU Score - 17.32 -
ROUGE-L Score - 18.28 -
평균 μΆ”λ‘  μ§€μ—° - 9.29s -
μ²˜λ¦¬λŸ‰ - 13.8 tok/s -
GPU VRAM ~30 GB ~5-7 GB -76%
λͺ¨λΈ 크기 14.56 GB 4.94 GB -66%

Hardware Requirements

ν•­λͺ© μ΅œμ†Œ 사양 ꢌμž₯ 사양
GPU VRAM 6 GB 8 GB
RAM 8 GB 16 GB
GPU RTX 3060 이상 RTX 3080 이상

Limitations

  • EXAONE-Deep-7.8B의 μΆ”λ‘  νƒœκ·Έ(<thought>...</thought>)λ₯Ό μ‚¬μš©ν•˜λŠ” Reasoning λͺ¨λΈμž…λ‹ˆλ‹€
  • 4-bit μ–‘μžν™”λ‘œ 인해 BF16 원본 λŒ€λΉ„ μ•½κ°„μ˜ μ„±λŠ₯ μ €ν•˜κ°€ μžˆμ„ 수 μžˆμŠ΅λ‹ˆλ‹€
  • ν•œκ΅­μ–΄ 민원 처리 도메인에 νŠΉν™”λ˜μ–΄ 있으며, λ‹€λ₯Έ λ„λ©”μΈμ—μ„œμ˜ μ„±λŠ₯은 보μž₯λ˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€
  • μƒμ„±λœ 닡변은 λ°˜λ“œμ‹œ λ‹΄λ‹Ήμžκ°€ κ²€ν†  ν›„ μ‚¬μš©ν•΄μ•Ό ν•©λ‹ˆλ‹€

License

EXAONE λͺ¨λΈμ€ EXAONE AI Model License Agreement 1.1을 λ”°λ¦…λ‹ˆλ‹€.

Downloads last month
17
Safetensors
Model size
8B params
Tensor type
I32
Β·
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for umyunsang/civil-complaint-exaone-awq

Quantized
(1)
this model