Gemma 3 Phishing Classifier (LoRA Adapter)

This repository contains a LoRA adapter fine-tuned for binary email phishing classification (phishing vs safe) on top of google/gemma-3-4b-it.

Model Type

Type: PEFT LoRA adapter (not full model weights)
Base model required: google/gemma-3-4b-it
Output labels: phishing or safe (single-word target)

What This Model Is For

This model is intended to help triage suspicious email-like text in security workflows. It is designed as an assistive classifier, not a standalone security control.

Quickstart (Transformers + PEFT)

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

base_model = "google/gemma-3-4b-it"
adapter_repo = "briankkogi/gemma3-phishing-main-v1"

tok = AutoTokenizer.from_pretrained(base_model)
if tok.pad_token is None:
    tok.pad_token = tok.eos_token

base = AutoModelForCausalLM.from_pretrained(
    base_model,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model = PeftModel.from_pretrained(base, adapter_repo).eval()

prompt = 'Email body: """Your account will be suspended unless you verify now."""\n\nTask: Is this phishing or safe? Reply with only one word: phishing or safe.'
inputs = tok.apply_chat_template(
    [{"role": "user", "content": prompt}],
    tokenize=True,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device)

with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=2, do_sample=False)

txt = tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True).strip().lower()
pred = "phishing" if "phishing" in txt else "safe"
print(pred, "| raw:", txt)

Training Summary

Base model: google/gemma-3-4b-it
Run name: main-v1
Hardware: B200 GPU
Training approach: LoRA fine-tuning
Core config:
- epochs: 2
- max_seq_len: 2048
- learning_rate: 1e-4
- batch_size: 2
- grad_accum: 8
- LoRA: r=16, alpha=32, dropout=0.05
- target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Evaluation Results

Full validation/test evaluation from training run

Validation:
- accuracy: 0.993718
- phishing precision: 0.996904
- phishing recall: 0.986217
- phishing F1: 0.991532
- confusion: TP=644, TN=1096, FP=2, FN=9
Test:
- accuracy: 0.991443
- phishing precision: 0.989297
- phishing recall: 0.987786
- phishing F1: 0.988541
- confusion: TP=647, TN=1091, FP=7, FN=8

Base vs fine-tuned comparison (200-example slice)

Base Gemma-3-4b-it:
- accuracy: 0.7350
- phishing F1: 0.7104
- FP: 52
This adapter:
- accuracy: 0.9950
- phishing F1: 0.9924
- FP: 0

Limitations and Risks

This is not guaranteed to catch all phishing attempts.
False negatives are possible, especially with novel/obfuscated content.
Performance can shift with distribution changes (language/domain/time).
Should be used with additional controls (URL reputation, auth checks, sandboxing, analyst review).

Intended Use

Email triage, SOC copilots, risk scoring pipelines, analyst assist tools.

Out of Scope / Not Recommended

Fully autonomous blocking without guardrails.
Legal/compliance-only decisioning without human oversight.
Use on data domains very different from training data without re-validation.

License and Usage Terms

This adapter is derived from google/gemma-3-4b-it and inherits applicable upstream usage restrictions/terms. Please review and comply with Gemma and any dataset-specific license/policy requirements before production use.

Version

v1 (main-v1)

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for briankkogi/gemma3-phishing-main-v1

Base model

google/gemma-3-4b-pt

Finetuned

google/gemma-3-4b-it

Adapter

(315)

this model