QEFT: Quantization for Efficient Fine-Tuning of LLMs
Paper • 2410.08661 • Published
This model has sensitivity-based channel sorting applied. No quantization has been applied.
| Parameter | Value |
|---|---|
| Base model | meta-llama/Llama-2-13b-hf |
| Method | Sensitivity-based full channel sorting |
| Sorting order | Ascending (low sensitivity → high) |
| Quantization | None (sorting only) |
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("jsyeom/sensitivity_sorted_models/llama-2-13b-hf-sensitivity-sorted")
tokenizer = AutoTokenizer.from_pretrained("jsyeom/sensitivity_sorted_models/llama-2-13b-hf-sensitivity-sorted")
dst_ids.pt contains the full permutation indices for reference.sort_config.json contains metadata about the sorting configuration.Base model
meta-llama/Llama-2-13b-hf