llama-2-13b-hf-sensitivity-sorted

This model has sensitivity-based channel sorting applied. No quantization has been applied.

Configuration

Parameter	Value
Base model	`meta-llama/Llama-2-13b-hf`
Method	Sensitivity-based full channel sorting
Sorting order	Ascending (low sensitivity → high)
Quantization	None (sorting only)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("jsyeom/sensitivity_sorted_models/llama-2-13b-hf-sensitivity-sorted")
tokenizer = AutoTokenizer.from_pretrained("jsyeom/sensitivity_sorted_models/llama-2-13b-hf-sensitivity-sorted")

Important Notes

dst_ids.pt contains the full permutation indices for reference.
sort_config.json contains metadata about the sorting configuration.

Reference

QEFT: Quantization for Efficient Fine-Tuning of LLMs

Downloads last month: 45

Safetensors

Model size

13B params

Tensor type

F16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jsyeom/llama-2-13b-hf-sensitivity-sorted

Base model

meta-llama/Llama-2-13b-hf

Finetuned

(60)

this model

Paper for jsyeom/llama-2-13b-hf-sensitivity-sorted

QEFT: Quantization for Efficient Fine-Tuning of LLMs

Paper • 2410.08661 • Published Oct 11, 2024