Yi-1.5-34B-Chat-HQQ-INT4

INT4 weight-only quantization of 01-ai/Yi-1.5-34B-Chat.

Yi 1.5 34B Chat in HQQ INT4 (alternative to AWQ). Calibration-free quantization. Apache 2.0.

Property Value
Base model 01-ai/Yi-1.5-34B-Chat
Quantization INT4 weight-only
Approx. on-disk size ~20.7 GB
License Apache License, Version 2.0
Languages English

Load (vLLM)

vllm serve drawais/Yi-1.5-34B-Chat-HQQ-INT4 \
  --max-model-len 32768 \
  --gpu-memory-utilization 0.94
from vllm import LLM, SamplingParams
llm = LLM(model="drawais/Yi-1.5-34B-Chat-HQQ-INT4", max_model_len=32768)
print(llm.generate(["Hello!"], SamplingParams(max_tokens=128))[0].outputs[0].text)

Footprint

~20.7 GB on disk. Recommended VRAM: enough headroom for KV cache.

License & attribution

This artifact is a derivative work of 01-ai/Yi-1.5-34B-Chat, released by its original authors under the Apache License, Version 2.0.

This artifact is distributed under the same license. The full license text is included in LICENSE, and required attribution is in NOTICE.

License text: https://www.apache.org/licenses/LICENSE-2.0 Source model: https://huggingface.co/01-ai/Yi-1.5-34B-Chat

Downloads last month
13
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for drawais/Yi-1.5-34B-Chat-HQQ-INT4

Finetuned
(6)
this model