Tri-21B AWQ (SRT NLU)

trillionlabs/Tri-21B ์„ AWQ W4A16 ๋กœ ์–‘์žํ™”ํ•œ ๋ชจ๋ธ. ํ•œ๊ตญ์–ด SRT ๊ธฐ์ฐจ ์˜ˆ๋งค NLU ํƒœ์Šคํฌ์—์„œ calibration ์ˆ˜ํ–‰.

์–‘์žํ™” ์„ค์ •

  • ์Šคํ‚ด: AWQ W4A16 (4-bit weight, 16-bit activation)
  • Calibration: SRT ๋ฐœํ™” 512๊ฐœ (sttTranscription + label, chat template)
  • Recipe: duo_scaling, n_grid=20, lm_head ignore
  • ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ: llmcompressor 0.10, vLLM ํ˜ธํ™˜

์‚ฌ์šฉ๋ฒ•

from vllm import LLM, SamplingParams

llm = LLM(model="saeha/Tri-21B-AWQ-SRT", quantization="awq", dtype="bfloat16")
outputs = llm.generate(["์•ˆ๋…•"], SamplingParams(max_tokens=64))

์›๋ณธ ๋ชจ๋ธ

Downloads last month
18
Safetensors
Model size
4B params
Tensor type
I64
ยท
I32
ยท
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for saeha/Tri-21B-AWQ-SRT

Quantized
(2)
this model