π§ TinyLlama 1.1B Chat β GPTQ Quantized (4bit)
Repo: kivoai/tinyllama-1.1b-chat-gptq
Base Model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
Quantization: GPTQ 4-bit (128g)
Tokenizer: Same as base model (BPE)
π Purpose
This model is a 4-bit GPTQ quantized version of the TinyLlama-1.1B-Chat-v1.0 model, optimized for lightweight inference and deployment in decentralized GPU mining environments.
It is currently being used as part of the Neural Subnet protocol for text-generation mining.
π§° Technical Details
- Quantized With:
AutoGPTQ - Quantization Type: 4-bit (group size: 128)
- Precision: int4
- Safetensors Format: β
- Chat Template: Included (
chat_template.jinja) - Max Sequence Length: 2048
πͺ Example Usage
from auto_gptq import AutoGPTQForCausalLM
from transformers import AutoTokenizer
model = AutoGPTQForCausalLM.from_quantized(
"kivoai/tinyllama-1.1b-chat-gptq",
use_safetensors=True
)
tokenizer = AutoTokenizer.from_pretrained("kivoai/tinyllama-1.1b-chat-gptq")
prompt = "What is the meaning of intelligence?"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
- Downloads last month
- 3
Model tree for kivoai/tinyllama-1.1b-chat-gptq
Base model
TinyLlama/TinyLlama-1.1B-Chat-v1.0