gemma-3-4b-novision-RK3588-1.2.1

This version of gemma-3-4b-novision has been converted to run on the RK3588 NPU using w8a8 quantization.

This model has been optimized with the following LoRA: NA

This model supports a max context length of 16384.

Compatible with RKLLM version: 1.2.1

Recommended `rkllm` parameters

This model runs well in limited testing with the following rkllm library paremeters:

n_keep = -1
top_k = 64
top_p = 0.95
temperature = 0.7
repeat_penalty = 1.0
frequency_penalty = 1.0
presence_penalty = 0.0
mirostat = 0
mirostat_tau = 5.0
mirostat_eta = 0.1

Useful links:

Official RKLLM GitHub

RockhipNPU Reddit

EZRKNN-LLM

Pretty much anything by these folks: marty1885 and happyme531

Converted using https://github.com/c0zaut/ez-er-rkllm-toolkit

Original Model Card for base model, gemma-3-4b-novision, below:

Gemma-3-4b Text-Only

This model is a text-only version of google/gemma-3-4b-it, converted from the multimodal Gemma3ForConditionalGeneration architecture to the text-only Gemma3ForCausalLM architecture.

Model Description

Original Model: The original Gemma-3-4b-it is a multimodal model released by Google that can process both text and images
This Version: This version has been modified to use the same architecture as the text-only 1b model, with the vision components removed
Parameters: 4 billion parameters
Conversion Process: Vision-related components were stripped while maintaining the text generation capabilities

Usage

You can load and use this model the same way you would use the text-only google/gemma-3-1b-it version:

from transformers import AutoTokenizer, BitsAndBytesConfig, Gemma3ForCausalLM
import torch

model_id = "gghfez/gemma-3-4b-novision"

quantization_config = BitsAndBytesConfig(load_in_8bit=True)

model = Gemma3ForCausalLM.from_pretrained(
    model_id, quantization_config=quantization_config
).eval()

tokenizer = AutoTokenizer.from_pretrained(model_id)

messages = [
    [
        {
            "role": "system",
            "content": [{"type": "text", "text": "You are a helpful assistant."},]
        },
        {
            "role": "user",
            "content": [{"type": "text", "text": "Write a poem on Hugging Face, the company"},]
        },
    ],
]
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device).to(torch.bfloat16)


with torch.inference_mode():
    outputs = model.generate(**inputs, max_new_tokens=64)

outputs = tokenizer.batch_decode(outputs)