Phishing Email Detector (DeBERTa-v3-large)

フィッシングメール検出のためにファインチューニングされたDeBERTa-v3-largeモデル

Model Description

このモデルはmicrosoft/deberta-v3-largeをベースに、フィッシングメールと安全なメールを分類するためにファインチューニングされています。

🔒 100% Recall達成

閾値を0.0007に設定することで、フィッシングメールを100%検出できます。

Performance

デフォルト設定(閾値0.5)

Metric Value
Accuracy 97.59%
F1-score 96.99%
Precision 95.01%
Recall 99.04%

最大セキュリティ設定(閾値0.0007)- Recall 100%

Metric Value
Accuracy 95.23%
F1-score 94.26%
Precision 89.15%
Recall 100.00%

Usage

Basic Usage (Default Threshold)

from transformers import pipeline

classifier = pipeline("text-classification", model="takumi123xxx/phishing-email-detector-deberta-v3")
result = classifier("Your email text here")
print(result)

Maximum Security (100% Recall)

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("takumi123xxx/phishing-email-detector-deberta-v3")
tokenizer = AutoTokenizer.from_pretrained("takumi123xxx/phishing-email-detector-deberta-v3")

THRESHOLD = 0.0007  # For 100% Recall

def detect_phishing(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
    with torch.no_grad():
        outputs = model(**inputs)
        probs = torch.softmax(outputs.logits, dim=-1)
        phishing_prob = probs[0][1].item()
    
    return {
        "is_phishing": phishing_prob >= THRESHOLD,
        "phishing_probability": phishing_prob,
        "label": "Phishing Email" if phishing_prob >= THRESHOLD else "Safe Email"
    }

# Example
result = detect_phishing("Congratulations! You've won $1,000,000. Click here to claim your prize!")
print(result)

Training Details

  • Base Model: microsoft/deberta-v3-large
  • Dataset: zefang-liu/phishing-email-dataset
  • Training Samples: 14,904
  • Validation Samples: 1,863
  • Test Samples: 1,864
  • Epochs: 2.15 (Early Stopping)
  • Batch Size: 16
  • Learning Rate: 2e-5
  • Max Length: 512

Labels

  • 0: Safe Email
  • 1: Phishing Email

Threshold Recommendation

Use Case Threshold Recall False Positives
Balanced 0.5 99.04% 38
High Security 0.0007 100.00% 89

Limitations

  • Trained on English emails only
  • May not detect novel phishing techniques not present in training data
  • False positives increase when using lower thresholds

License

MIT License

Downloads last month
17
Safetensors
Model size
0.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for takumi123xxx/phishing-email-detector-deberta-v3

Finetuned
(258)
this model

Dataset used to train takumi123xxx/phishing-email-detector-deberta-v3