EdgeRazor for Lightweight LLMs

MobileLLM-350M-EdgeRazor-4bit

Contents
Model Overview
Model Bit-Widths
Model Performance
Quickstart
Citation

Model Overview

Base Model: facebook/MobileLLM-ParetoQ-350M-BF16
Training: zhangsq-nju/EdgeRazor
Quantization: 4-bit for all embedding, decoder, and lm_head layers

Model Bit-Widths

Mixed-Precision Recipe	Bit-Width	This Repo
100% 4-bit + 0% 1.58-bit	4	✔️
50% 4-bit + 50% 1.58-bit	2.79
12.5% 4-bit + 87.5% 1.58-bit	1.88
0% 4-bit + 100% 1.58-bit	1.58

Model Performance

Models	W-A-KV	ARC-e	ARC-c	HellaS.	BoolQ	PIQA	WinoG.	SIQA	OBQA	Tr.QA2	Ethics	MMLU	GSM8K	Average (↑)
MobileLLM-350M	16-16-16	64.94	35.49	52.87	58.96	70.84	56.35	40.79	40.20	37.44	53.98	23.52	0.00	41.18
EdgeRazor	4-16-16	69.19	36.26	51.91	62.26	70.40	56.20	40.74	37.40	37.96	57.41	25.00	0.53	41.94
EdgeRazor	2.79-16-16	65.87	32.68	45.98	61.71	68.82	56.27	40.02	35.00	38.97	56.53	24.27	0.76	40.53
EdgeRazor	1.88-16-16	61.20	28.75	40.76	58.23	66.59	55.01	39.51	33.00	40.98	56.22	25.03	0.53	38.91
EdgeRazor	1.58-16-16	58.63	26.19	38.95	58.07	65.29	53.04	39.30	32.20	41.97	56.26	24.12	0.53	38.04
EdgeRazor	4-8-8	69.11	35.84	51.82	62.60	70.35	56.20	40.58	37.40	37.90	57.21	24.66	0.45	41.86
EdgeRazor	2.79-8-8	65.99	32.68	45.99	62.11	68.55	56.51	40.07	35.20	39.05	56.51	24.41	0.99	40.62
EdgeRazor	1.88-8-8	61.36	29.18	40.86	58.23	66.92	55.49	39.56	33.20	40.95	56.13	24.97	0.38	39.02
EdgeRazor	1.58-8-8	58.67	26.19	38.92	58.04	65.23	53.83	39.25	32.00	42.03	56.33	24.19	0.83	38.12

Quickstart

It is recommended to ensure that EdgeRazor is installed in advance for weight-activation quantization. The provided weights are already quantized (quantized_weights*scaling_bf16); to enable activation and KV cache quantization, set trust_remote_code=True in the model configuration.

from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(
    "zhangsq-nju/MobileLLM-ParetoQ-350M-BF16-EdgeRazor-4bit",
    use_fast=False
)
model = AutoModelForCausalLM.from_pretrained(
    "zhangsq-nju/MobileLLM-ParetoQ-350M-BF16-EdgeRazor-4bit", 
    trust_remote_code=True
)

Note that the default tokenizer does not contain special tokens. For example you can use:

tokenizer.add_special_tokens(
    {
        "eos_token": "</s>",
        "bos_token": "<s>",
        "unk_token": "<unk>",
    }
)

Citation

If you find our project useful in your research, please consider kindly citing our papers ✏️:

@article{zhangsh-edgerazor,
  title={{EdgeRazor}: A Lightweight Framework for Large Language Models via Mixed-Precision Quantization-Aware Distillation},
  author={Shu-Hao Zhang and Le-Tong Huang and Xiang-Sheng Deng and Xin-Yi Zou and Chen Wu and Nan Li and Shao-Qun Zhang},
  year={2026},
}

Downloads last month: 35

Safetensors

Model size

0.4B params

Tensor type

BF16

Model tree for zhangsq-nju/MobileLLM-350M-EdgeRazor-4bit

Base model

facebook/MobileLLM-ParetoQ-350M-BF16

Finetuned

(4)

this model

Collection including zhangsq-nju/MobileLLM-350M-EdgeRazor-4bit

EdgeRazor-Nbit

Collection

15 items • Updated 7 days ago