Tri-21B AWQ (SRT NLU)
trillionlabs/Tri-21B ์ AWQ W4A16 ๋ก ์์ํํ ๋ชจ๋ธ.
ํ๊ตญ์ด SRT ๊ธฐ์ฐจ ์๋งค NLU ํ์คํฌ์์ calibration ์ํ.
์์ํ ์ค์
- ์คํด: AWQ W4A16 (4-bit weight, 16-bit activation)
- Calibration: SRT ๋ฐํ 512๊ฐ (
sttTranscription+label, chat template) - Recipe: duo_scaling, n_grid=20, lm_head ignore
- ๋ผ์ด๋ธ๋ฌ๋ฆฌ: llmcompressor 0.10, vLLM ํธํ
์ฌ์ฉ๋ฒ
from vllm import LLM, SamplingParams
llm = LLM(model="saeha/Tri-21B-AWQ-SRT", quantization="awq", dtype="bfloat16")
outputs = llm.generate(["์๋
"], SamplingParams(max_tokens=64))
์๋ณธ ๋ชจ๋ธ
- Base: trillionlabs/Tri-21B
- License: apache-2.0 (์๋ณธ ๋ฐ๋ฆ)
- Downloads last month
- 18
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for saeha/Tri-21B-AWQ-SRT
Base model
trillionlabs/Tri-21B