Gemma 3 Phishing Classifier (LoRA Adapter)
This repository contains a LoRA adapter fine-tuned for binary email phishing classification (phishing vs safe) on top of google/gemma-3-4b-it.
Model Type
- Type: PEFT LoRA adapter (not full model weights)
- Base model required:
google/gemma-3-4b-it - Output labels:
phishingorsafe(single-word target)
What This Model Is For
This model is intended to help triage suspicious email-like text in security workflows. It is designed as an assistive classifier, not a standalone security control.
Quickstart (Transformers + PEFT)
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
base_model = "google/gemma-3-4b-it"
adapter_repo = "briankkogi/gemma3-phishing-main-v1"
tok = AutoTokenizer.from_pretrained(base_model)
if tok.pad_token is None:
tok.pad_token = tok.eos_token
base = AutoModelForCausalLM.from_pretrained(
base_model,
torch_dtype=torch.bfloat16,
device_map="auto",
)
model = PeftModel.from_pretrained(base, adapter_repo).eval()
prompt = 'Email body: """Your account will be suspended unless you verify now."""\n\nTask: Is this phishing or safe? Reply with only one word: phishing or safe.'
inputs = tok.apply_chat_template(
[{"role": "user", "content": prompt}],
tokenize=True,
add_generation_prompt=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
with torch.no_grad():
out = model.generate(**inputs, max_new_tokens=2, do_sample=False)
txt = tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True).strip().lower()
pred = "phishing" if "phishing" in txt else "safe"
print(pred, "| raw:", txt)
Training Summary
- Base model:
google/gemma-3-4b-it - Run name:
main-v1 - Hardware: B200 GPU
- Training approach: LoRA fine-tuning
- Core config:
- epochs: 2
- max_seq_len: 2048
- learning_rate: 1e-4
- batch_size: 2
- grad_accum: 8
- LoRA: r=16, alpha=32, dropout=0.05
- target modules:
q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj
Evaluation Results
Full validation/test evaluation from training run
- Validation:
- accuracy:
0.993718 - phishing precision:
0.996904 - phishing recall:
0.986217 - phishing F1:
0.991532 - confusion: TP=644, TN=1096, FP=2, FN=9
- accuracy:
- Test:
- accuracy:
0.991443 - phishing precision:
0.989297 - phishing recall:
0.987786 - phishing F1:
0.988541 - confusion: TP=647, TN=1091, FP=7, FN=8
- accuracy:
Base vs fine-tuned comparison (200-example slice)
- Base Gemma-3-4b-it:
- accuracy:
0.7350 - phishing F1:
0.7104 - FP:
52
- accuracy:
- This adapter:
- accuracy:
0.9950 - phishing F1:
0.9924 - FP:
0
- accuracy:
Limitations and Risks
- This is not guaranteed to catch all phishing attempts.
- False negatives are possible, especially with novel/obfuscated content.
- Performance can shift with distribution changes (language/domain/time).
- Should be used with additional controls (URL reputation, auth checks, sandboxing, analyst review).
Intended Use
- Email triage, SOC copilots, risk scoring pipelines, analyst assist tools.
Out of Scope / Not Recommended
- Fully autonomous blocking without guardrails.
- Legal/compliance-only decisioning without human oversight.
- Use on data domains very different from training data without re-validation.
License and Usage Terms
This adapter is derived from google/gemma-3-4b-it and inherits applicable upstream usage restrictions/terms.
Please review and comply with Gemma and any dataset-specific license/policy requirements before production use.
Version
v1(main-v1)