Gemma 3 1B Instruct for RK3588
This version of Gemma 3 has been converted to run on the RK3588 NPU using mixed w8a8 and w8a8_g128 quantisation and rkllm-toolkit v1.2.1.
Compatible with RKLLM runtime version: 1.2.x
Useful links:
Pretty much anything by these folks: marty1885 and happyme531
Conversion Python script
Based on instructions from airockchip/rknn-llm #240
gemma-3-conversion.py
from rkllm.api import RKLLM
from transformers import Gemma3ForCausalLM, AutoTokenizer
import safetensors
import torch
# Unsloth version
modelpath = 'unsloth/gemma-3-1b-it'
model = Gemma3ForCausalLM.from_pretrained(modelpath, device_map='cpu', torch_dtype=torch.bfloat16).eval()
tokenizer = AutoTokenizer.from_pretrained(modelpath)
model.save_pretrained('llm')
tokenizer.save_pretrained('llm')
del model
model = None
del tokenizer
tokenizer = None
modelpath = 'llm'
savepath = 'llm/gemma-3-1b-it-w8a8.rkllm'
llm = RKLLM()
ret = llm.load_huggingface(model=modelpath, device='cpu')
if ret != 0:
print('Load model failed!')
exit(ret)
ret = llm.build(
do_quantization=True,
optimization_level=0,
quantized_dtype='w8a8',
# hybrid ratio of 25% gives a good balance
hybrid_rate=0.25,
max_context=4096 * 4,
quantized_algorithm='normal',
target_platform='rk3588',
num_npu_core=3,
extra_qparams=None,
dataset=None
)
if ret != 0:
print('Build model failed!')
exit(ret)
ret = llm.export_rkllm(savepath)
if ret != 0:
print('Export model failed!')
exit(ret)
- Downloads last month
- 1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support