QLoRA: Efficient Finetuning of Quantized LLMs
Paper โข 2305.14314 โข Published โข 61
This repo contains a low-rank adapter for Falcon-7b fit on the Stanford Alpaca dataset Arabic version Yasbok/Alpaca_arabic_instruct.
The model was fine-tuned in 8-bit precision using ๐ค peft adapters, transformers, and bitsandbytes. Training relied on a method called QLoRA introduced in this paper. The run took approximately 3 hours and was executed on a workstation with a single A100-SXM NVIDIA GPU with 37 GB of available memory.
June 10, 2023
We recommend users of this model to develop guardrails and to take appropriate precautions for any production use.
# Install packages
!pip install -q -U bitsandbytes loralib einops
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git
This requires a GPU with at least 12 GB of memory.
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
# load the model
peft_model_id = "Ali-C137/falcon-7b-chat-alpaca-arabic"
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(
config.base_model_name_or_path,
return_dict=True,
device_map={"":0},
trust_remote_code=True,
load_in_8bit=True,
)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
tokenizer.pad_token = tokenizer.eos_token
model = PeftModel.from_pretrained(model, peft_model_id)
torch: 2.0.1+cu118transformers: 4.30.0.dev0peft: 0.4.0.dev0accelerate: 0.19.0bitsandbytes: 0.39.0einops: 0.6.1