Tri-21B AWQ (SRT NLU)

trillionlabs/Tri-21B 을 AWQ W4A16 로 양자화한 모델. 한국어 SRT 기차 예매 NLU 태스크에서 calibration 수행.

양자화 설정

스킴: AWQ W4A16 (4-bit weight, 16-bit activation)
Calibration: SRT 발화 512개 (sttTranscription + label, chat template)
Recipe: duo_scaling, n_grid=20, lm_head ignore
라이브러리: llmcompressor 0.10, vLLM 호환

사용법

from vllm import LLM, SamplingParams

llm = LLM(model="saeha/Tri-21B-AWQ-SRT", quantization="awq", dtype="bfloat16")
outputs = llm.generate(["안녕"], SamplingParams(max_tokens=64))

원본 모델

Base: trillionlabs/Tri-21B
License: apache-2.0 (원본 따름)

Downloads last month: 18

Safetensors

Model size

4B params

Tensor type

I64

I32

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for saeha/Tri-21B-AWQ-SRT

Base model

trillionlabs/Tri-21B

Quantized

(2)

this model