Phishing Email Detector (DeBERTa-v3-large)
フィッシングメール検出のためにファインチューニングされたDeBERTa-v3-largeモデル
Model Description
このモデルはmicrosoft/deberta-v3-largeをベースに、フィッシングメールと安全なメールを分類するためにファインチューニングされています。
🔒 100% Recall達成
閾値を0.0007に設定することで、フィッシングメールを100%検出できます。
Performance
デフォルト設定(閾値0.5)
| Metric | Value |
|---|---|
| Accuracy | 97.59% |
| F1-score | 96.99% |
| Precision | 95.01% |
| Recall | 99.04% |
最大セキュリティ設定(閾値0.0007)- Recall 100%
| Metric | Value |
|---|---|
| Accuracy | 95.23% |
| F1-score | 94.26% |
| Precision | 89.15% |
| Recall | 100.00% |
Usage
Basic Usage (Default Threshold)
from transformers import pipeline
classifier = pipeline("text-classification", model="takumi123xxx/phishing-email-detector-deberta-v3")
result = classifier("Your email text here")
print(result)
Maximum Security (100% Recall)
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("takumi123xxx/phishing-email-detector-deberta-v3")
tokenizer = AutoTokenizer.from_pretrained("takumi123xxx/phishing-email-detector-deberta-v3")
THRESHOLD = 0.0007 # For 100% Recall
def detect_phishing(text):
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
phishing_prob = probs[0][1].item()
return {
"is_phishing": phishing_prob >= THRESHOLD,
"phishing_probability": phishing_prob,
"label": "Phishing Email" if phishing_prob >= THRESHOLD else "Safe Email"
}
# Example
result = detect_phishing("Congratulations! You've won $1,000,000. Click here to claim your prize!")
print(result)
Training Details
- Base Model: microsoft/deberta-v3-large
- Dataset: zefang-liu/phishing-email-dataset
- Training Samples: 14,904
- Validation Samples: 1,863
- Test Samples: 1,864
- Epochs: 2.15 (Early Stopping)
- Batch Size: 16
- Learning Rate: 2e-5
- Max Length: 512
Labels
0: Safe Email1: Phishing Email
Threshold Recommendation
| Use Case | Threshold | Recall | False Positives |
|---|---|---|---|
| Balanced | 0.5 | 99.04% | 38 |
| High Security | 0.0007 | 100.00% | 89 |
Limitations
- Trained on English emails only
- May not detect novel phishing techniques not present in training data
- False positives increase when using lower thresholds
License
MIT License
- Downloads last month
- 17
Model tree for takumi123xxx/phishing-email-detector-deberta-v3
Base model
microsoft/deberta-v3-large