Phishing Email Detector (DeBERTa-v3-large)

フィッシングメール検出のためにファインチューニングされたDeBERTa-v3-largeモデル

Model Description

このモデルはmicrosoft/deberta-v3-largeをベースに、フィッシングメールと安全なメールを分類するためにファインチューニングされています。

🔒 100% Recall達成

閾値を0.0007に設定することで、フィッシングメールを100%検出できます。

Performance

デフォルト設定（閾値0.5）

Metric	Value
Accuracy	97.59%
F1-score	96.99%
Precision	95.01%
Recall	99.04%

最大セキュリティ設定（閾値0.0007）- Recall 100%

Metric	Value
Accuracy	95.23%
F1-score	94.26%
Precision	89.15%
Recall	100.00%

Usage

Basic Usage (Default Threshold)

from transformers import pipeline

classifier = pipeline("text-classification", model="takumi123xxx/phishing-email-detector-deberta-v3")
result = classifier("Your email text here")
print(result)

Maximum Security (100% Recall)

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("takumi123xxx/phishing-email-detector-deberta-v3")
tokenizer = AutoTokenizer.from_pretrained("takumi123xxx/phishing-email-detector-deberta-v3")

THRESHOLD = 0.0007  # For 100% Recall

def detect_phishing(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
    with torch.no_grad():
        outputs = model(**inputs)
        probs = torch.softmax(outputs.logits, dim=-1)
        phishing_prob = probs[0][1].item()
    
    return {
        "is_phishing": phishing_prob >= THRESHOLD,
        "phishing_probability": phishing_prob,
        "label": "Phishing Email" if phishing_prob >= THRESHOLD else "Safe Email"
    }

# Example
result = detect_phishing("Congratulations! You've won $1,000,000. Click here to claim your prize!")
print(result)

Training Details

Base Model: microsoft/deberta-v3-large
Dataset: zefang-liu/phishing-email-dataset
Training Samples: 14,904
Validation Samples: 1,863
Test Samples: 1,864
Epochs: 2.15 (Early Stopping)
Batch Size: 16
Learning Rate: 2e-5
Max Length: 512

Labels

0: Safe Email
1: Phishing Email

Threshold Recommendation

Use Case	Threshold	Recall	False Positives
Balanced	0.5	99.04%	38
High Security	0.0007	100.00%	89

Limitations

Trained on English emails only
May not detect novel phishing techniques not present in training data
False positives increase when using lower thresholds

License

MIT License

Downloads last month: 17

Safetensors

Model size

0.4B params

Tensor type

F32

Model tree for takumi123xxx/phishing-email-detector-deberta-v3

Base model

microsoft/deberta-v3-large

Finetuned

(258)

this model

takumi123xxx
/

phishing-email-detector-deberta-v3