---
base_model: unsloth/gemma-3-4b-it-unsloth-bnb-4bit
tags:
- text-generation-inference
- transformers
- unsloth
- gemma3
- trl
license: apache-2.0
language:
- en
---
# Uploaded Model
## Model Details
- **Developer:** zoeeyys
- **License:** Apache-2.0
- **Finetuned from:** unsloth/gemma-3-4b-it-unsloth-bnb-4bit
This Gemma 3 model was fine-tuned **2x faster** using [Unsloth](https://github.com/unslothai/unsloth).
---
## Installation
```bash
pip install unsloth transformers==4.56.2 trl==0.22.2
If running on Colab, use:
import os, re, torch
v = re.match(r"[\d]+\.[\d]+", str(torch.__version__)).group(0)
xformers = "xformers==" + {
"2.10": "0.0.34",
"2.9": "0.0.33.post1",
"2.8": "0.0.32.post2",
}.get(v, "0.0.34")
!pip install sentencepiece protobuf datasets==4.3.0 huggingface_hub>=0.34.0 hf_transfer
!pip install --no-deps unsloth_zoo bitsandbytes accelerate {xformers} peft triton unsloth
!pip install transformers==4.56.2 trl==0.22.2
Usage
Load Model
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
model_name="zoeeyys/gemma3-4b-it-unsloth-fulldata-qk",
max_seq_length=2048,
)
model.eval()
Streaming Inference Example
from transformers import TextStreamer
prompt = "I want to switch careers into data science. Where should I start?"
inputs = tokenizer(
prompt,
return_tensors="pt"
)
streamer = TextStreamer(
tokenizer,
skip_prompt=True,
skip_special_tokens=True
)
model.generate(
**inputs.to("cuda"),
max_new_tokens=2048,
temperature=1.0, # Recommended by Unsloth
top_p=0.95,
top_k=64,
streamer=streamer,
)
Notes
- Optimized using Unsloth
- Supports token streaming
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for zoeeyys/gemma3-4b-it-unsloth-fulldata-qk
Base model
google/gemma-3-4b-pt Finetuned
google/gemma-3-4b-it Quantized
unsloth/gemma-3-4b-it-bnb-4bit