Gemma 3 Phishing Classifier (LoRA Adapter)

This repository contains a LoRA adapter fine-tuned for binary email phishing classification (phishing vs safe) on top of google/gemma-3-4b-it.

Model Type

  • Type: PEFT LoRA adapter (not full model weights)
  • Base model required: google/gemma-3-4b-it
  • Output labels: phishing or safe (single-word target)

What This Model Is For

This model is intended to help triage suspicious email-like text in security workflows. It is designed as an assistive classifier, not a standalone security control.

Quickstart (Transformers + PEFT)

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

base_model = "google/gemma-3-4b-it"
adapter_repo = "briankkogi/gemma3-phishing-main-v1"

tok = AutoTokenizer.from_pretrained(base_model)
if tok.pad_token is None:
    tok.pad_token = tok.eos_token

base = AutoModelForCausalLM.from_pretrained(
    base_model,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model = PeftModel.from_pretrained(base, adapter_repo).eval()

prompt = 'Email body: """Your account will be suspended unless you verify now."""\n\nTask: Is this phishing or safe? Reply with only one word: phishing or safe.'
inputs = tok.apply_chat_template(
    [{"role": "user", "content": prompt}],
    tokenize=True,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device)

with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=2, do_sample=False)

txt = tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True).strip().lower()
pred = "phishing" if "phishing" in txt else "safe"
print(pred, "| raw:", txt)

Training Summary

  • Base model: google/gemma-3-4b-it
  • Run name: main-v1
  • Hardware: B200 GPU
  • Training approach: LoRA fine-tuning
  • Core config:
    • epochs: 2
    • max_seq_len: 2048
    • learning_rate: 1e-4
    • batch_size: 2
    • grad_accum: 8
    • LoRA: r=16, alpha=32, dropout=0.05
    • target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Evaluation Results

Full validation/test evaluation from training run

  • Validation:
    • accuracy: 0.993718
    • phishing precision: 0.996904
    • phishing recall: 0.986217
    • phishing F1: 0.991532
    • confusion: TP=644, TN=1096, FP=2, FN=9
  • Test:
    • accuracy: 0.991443
    • phishing precision: 0.989297
    • phishing recall: 0.987786
    • phishing F1: 0.988541
    • confusion: TP=647, TN=1091, FP=7, FN=8

Base vs fine-tuned comparison (200-example slice)

  • Base Gemma-3-4b-it:
    • accuracy: 0.7350
    • phishing F1: 0.7104
    • FP: 52
  • This adapter:
    • accuracy: 0.9950
    • phishing F1: 0.9924
    • FP: 0

Limitations and Risks

  • This is not guaranteed to catch all phishing attempts.
  • False negatives are possible, especially with novel/obfuscated content.
  • Performance can shift with distribution changes (language/domain/time).
  • Should be used with additional controls (URL reputation, auth checks, sandboxing, analyst review).

Intended Use

  • Email triage, SOC copilots, risk scoring pipelines, analyst assist tools.

Out of Scope / Not Recommended

  • Fully autonomous blocking without guardrails.
  • Legal/compliance-only decisioning without human oversight.
  • Use on data domains very different from training data without re-validation.

License and Usage Terms

This adapter is derived from google/gemma-3-4b-it and inherits applicable upstream usage restrictions/terms. Please review and comply with Gemma and any dataset-specific license/policy requirements before production use.

Version

  • v1 (main-v1)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for briankkogi/gemma3-phishing-main-v1

Adapter
(315)
this model