Llama-3.1-Nemotron-Nano-4B-v1.1 - GPTQ 4-bit Quantized
This is a 4-bit GPTQ quantized version of nvidia/Llama-3.1-Nemotron-Nano-4B-v1.1 using auto-gptq.
How to use
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "avinashhm/Llama-3.1-Nemotron-Nano-4B-v1.1-GPTQ"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
- Downloads last month
- 6