EdgeRazor Logo

EdgeRazor for Lightweight LLMs

GitHub EdgeRazor

Qwen3-1.7B-EdgeRazor-2.79bit

Contents

Model Overview

Model Bit-Widths

Mixed-Precision Recipe Bit-Width This Repo
100% 4-bit + 0% 1.58-bit 4
50% 4-bit + 50% 1.58-bit 2.79 ✔️
12.5% 4-bit + 87.5% 1.58-bit 1.88
0% 4-bit + 100% 1.58-bit 1.58

Model Performance

Models W-A-KV ARC-e ARC-c HellaS. BoolQ PIQA WinoG. SIQA OBQA Tr.QA2 Ethics MMLU IFEval GSM8K HumanE. Average (↑)
Qwen3-1.7B 16-16-16 69.87 42.83 60.40 77.77 72.58 60.85 45.19 37.40 45.97 49.63 55.49 67.10 68.76 67.07 58.64
EdgeRazor 4-16-16 70.66 44.80 57.51 80.09 72.31 60.14 44.06 38.40 48.41 64.02 54.70 58.96 68.39 57.32 58.56
EdgeRazor 2.79-16-16 63.47 38.57 49.48 78.78 68.23 55.64 43.91 33.40 45.42 60.81 46.25 54.71 54.28 53.66 53.33
EdgeRazor 1.88-16-16 59.60 34.04 40.94 72.11 65.23 54.38 41.76 29.80 46.09 57.30 38.93 43.81 36.39 39.63 47.14
EdgeRazor 1.58-16-16 55.60 31.06 39.53 70.95 63.60 53.28 41.97 31.60 40.16 55.89 35.00 32.72 29.49 33.54 43.89
EdgeRazor 4-8-8 70.16 44.45 57.52 79.82 72.58 59.67 43.45 38.20 48.37 63.56 54.29 60.26 68.54 59.15 58.57
EdgeRazor 2.79-8-8 62.79 38.31 49.53 78.38 68.72 56.04 43.65 33.40 45.57 60.72 46.27 54.34 53.68 50.61 53.00
EdgeRazor 1.88-8-8 59.09 33.53 40.85 72.14 65.18 53.99 41.76 29.00 46.18 57.33 39.03 41.96 37.53 40.85 47.03
EdgeRazor 1.58-8-8 55.64 31.48 39.68 70.70 64.25 53.91 41.76 31.60 40.15 56.26 35.07 32.35 28.96 32.93 43.91

Quickstart

It is recommended to ensure that EdgeRazor is installed in advance for weight-activation quantization. The provided weights are already quantized (quantized_weights*scaling_bf16); to enable activation and KV cache quantization, set trust_remote_code=True in the model configuration.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "zhangsq-nju/Qwen3-1.7B-EdgeRazor-2.79bit"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False # For EdgeRazor-nbit, we only train the instruct mode.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)
print("content:", content)

Citation

If you find our project useful in your research, please consider kindly citing our papers ✏️:

@article{zhangsh-edgerazor,
  title={{EdgeRazor}: A Lightweight Framework for Large Language Models via Mixed-Precision Quantization-Aware Distillation},
  author={Shu-Hao Zhang and Le-Tong Huang and Xiang-Sheng Deng and Xin-Yi Zou and Chen Wu and Nan Li and Shao-Qun Zhang},
  year={2026},
}
Downloads last month
352
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zhangsq-nju/Qwen3-1.7B-EdgeRazor-2.79bit

Finetuned
Qwen/Qwen3-1.7B
Finetuned
(621)
this model

Collection including zhangsq-nju/Qwen3-1.7B-EdgeRazor-2.79bit