EdgeRazor Logo

EdgeRazor for Lightweight LLMs

GitHub EdgeRazor

MobileLLM-350M-EdgeRazor-4bit

Contents

Model Overview

Model Bit-Widths

Mixed-Precision Recipe Bit-Width This Repo
100% 4-bit + 0% 1.58-bit 4 ✔️
50% 4-bit + 50% 1.58-bit 2.79
12.5% 4-bit + 87.5% 1.58-bit 1.88
0% 4-bit + 100% 1.58-bit 1.58

Model Performance

Models W-A-KV ARC-e ARC-c HellaS. BoolQ PIQA WinoG. SIQA OBQA Tr.QA2 Ethics MMLU GSM8K HumanE. Average (↑)
MobileLLM-350M 16-16-16 64.94 35.49 52.87 58.96 70.84 56.35 40.79 40.20 37.44 53.98 23.52 0.00 0.00 41.18
EdgeRazor 4-16-16 69.19 36.26 51.91 62.26 70.40 56.20 40.74 37.40 37.96 57.41 25.00 0.53 0.00 41.94
EdgeRazor 2.79-16-16 65.87 32.68 45.98 61.71 68.82 56.27 40.02 35.00 38.97 56.53 24.27 0.76 0.00 40.53
EdgeRazor 1.88-16-16 61.20 28.75 40.76 58.23 66.59 55.01 39.51 33.00 40.98 56.22 25.03 0.53 0.00 38.91
EdgeRazor 1.58-16-16 58.63 26.19 38.95 58.07 65.29 53.04 39.30 32.20 41.97 56.26 24.12 0.53 0.00 38.04
EdgeRazor 4-8-8 69.11 35.84 51.82 62.60 70.35 56.20 40.58 37.40 37.90 57.21 24.66 0.45 0.00 41.86
EdgeRazor 2.79-8-8 65.99 32.68 45.99 62.11 68.55 56.51 40.07 35.20 39.05 56.51 24.41 0.99 0.00 40.62
EdgeRazor 1.88-8-8 61.36 29.18 40.86 58.23 66.92 55.49 39.56 33.20 40.95 56.13 24.97 0.38 0.00 39.02
EdgeRazor 1.58-8-8 58.67 26.19 38.92 58.04 65.23 53.83 39.25 32.00 42.03 56.33 24.19 0.83 0.00 38.12

Quickstart

It is recommended to ensure that EdgeRazor is installed in advance for weight-activation quantization. The provided weights are already quantized (quantized_weights*scaling_bf16); to enable activation and KV cache quantization, set trust_remote_code=True in the model configuration.

from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(
    "zhangsq-nju/MobileLLM-ParetoQ-350M-BF16-EdgeRazor-4bit",
    use_fast=False
)
model = AutoModelForCausalLM.from_pretrained(
    "zhangsq-nju/MobileLLM-ParetoQ-350M-BF16-EdgeRazor-4bit", 
    trust_remote_code=True
)

Note that the default tokenizer does not contain special tokens. For example you can use:

tokenizer.add_special_tokens(
    {
        "eos_token": "</s>",
        "bos_token": "<s>",
        "unk_token": "<unk>",
    }
)

Citation

If you find our project useful in your research, please consider kindly citing our papers ✏️:

@article{zhangsh-edgerazor,
  title={{EdgeRazor}: A Lightweight Framework for Large Language Models via Mixed-Precision Quantization-Aware Distillation},
  author={Shu-Hao Zhang and Le-Tong Huang and Xiang-Sheng Deng and Xin-Yi Zou and Chen Wu and Nan Li and Shao-Qun Zhang},
  year={2026},
}
Downloads last month
35
Safetensors
Model size
0.4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zhangsq-nju/MobileLLM-350M-EdgeRazor-4bit

Finetuned
(4)
this model

Collection including zhangsq-nju/MobileLLM-350M-EdgeRazor-4bit