Llama Email Fraud Detector (AWQ 4-bit)
AWQ 4-bit quantized version of cunxin/llama-email-fraud-detector. Same fine-tuned email fraud detection capability at 1/3 the size with ~1-1.5% accuracy loss.
This model is optimized for low VRAM GPUs (6-8 GB) such as RTX 3050, RTX 3060 laptop, and similar consumer GPUs.
cunxin/llama-email-fraud-detector 的 AWQ 4-bit 量化版本。同样的邮件欺诈检测能力,体积仅为原来的 1/3,精度损失约 1-1.5%。
本模型为低显存 GPU(6-8 GB)优化,如 RTX 3050、RTX 3060 笔记本版等消费级显卡。
Model Details / 模型详情
| Architecture | LlamaForCausalLM (Decoder-only Transformer) |
| Base Model | meta-llama/Llama-3.2-3B-Instruct |
| Fine-Tuning | LoRA (r=16, alpha=32) merged, then AWQ quantized |
| Quantization | AWQ 4-bit, group_size=128, GEMM kernel, zero_point=True |
| Parameters | 3.2B (quantized) |
| Precision | 4-bit weights (int4) |
| Model Size | 2.2 GB (vs 6.4 GB for bf16) |
| Compression Ratio | ~2.9x |
| Accuracy Loss vs bf16 | ~1-1.5% |
Lineage / 模型血统
meta-llama/Llama-3.2-3B-Instruct
│
├── LoRA fine-tuning (email fraud detection)
│
├── Merge LoRA into base weights (6.4 GB, bf16)
│ └─► cunxin/llama-email-fraud-detector
│
└── AWQ 4-bit quantization (2.2 GB)
└─► cunxin/llama-email-fraud-detector-awq (this model)
Output Format / 输出格式
Same structured JSON output as the bf16 version:
与 bf16 版本相同的结构化 JSON 输出:
{
"is_fraud": true,
"risk_score": 95,
"confidence_level": 0.97,
"detected_threats": ["DOMAIN_MISMATCH", "CREDENTIAL_REQUEST", "URGENCY_FEAR"],
"reason": "The sender domain 'amaz0n-verify.com' typosquats amazon.com...",
"suggestion": "Do not click any links. Report this email as phishing."
}
11 threat types with point-based scoring: CREDENTIAL_REQUEST (35), DOMAIN_MISMATCH (30), URL_DISCREPANCY (30), TOO_GOOD_TO_BE_TRUE (30), PROMPT_INJECTION (30), URGENCY_FEAR (15), REPLY_TO_MISMATCH (15), GENERIC_SALUTATION (8), ANOMALOUS_TIMING (8), MISSING_SIGNATURE (8), GRAMMAR_ANOMALY (5).
See cunxin/llama-email-fraud-detector for full documentation of threat types, dual-model pipeline, and training details.
详细的威胁类型、双模型流水线和训练细节请参阅 cunxin/llama-email-fraud-detector。
Usage / 使用方法
With vLLM (Recommended) / 使用 vLLM(推荐)
# Set in .env
MODEL_PATH=cunxin/llama-email-fraud-detector-awq
QUANTIZATION=awq
# Start service
docker compose --profile gpu up -d
With Transformers / 使用 Transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_name = "cunxin/llama-email-fraud-detector-awq"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
VRAM Requirements / 显存要求
| Configuration / 配置 | VRAM / 显存 | Example GPU / 示例 GPU |
|---|---|---|
| AWQ model only | ~2.5 GB | Any 4GB+ GPU |
| AWQ + RoBERTa | ~3.0 GB | RTX 3050 (8GB) |
| AWQ + RoBERTa + KV Cache (4096 ctx) | ~3.5 GB | RTX 3060 Laptop (6GB) |
| AWQ + RoBERTa + KV Cache + TurboQuant | ~2.7 GB | RTX 3050 (4GB laptop) |
Comparison with bf16 / 与 bf16 对比
| bf16 | AWQ 4-bit (this) | |
|---|---|---|
| Model size | 6.4 GB | 2.2 GB |
| Min VRAM | 12 GB | 6 GB |
| Max concurrent requests (24GB GPU) | ~21 | ~50+ |
| Accuracy loss | baseline | ~1-1.5% |
| Inference speed | baseline | comparable |
Quantization Details / 量化细节
AWQ (Activation-Aware Weight Quantization) identifies the most important 1% of weights by observing activation magnitudes, then scales those channels up before uniform 4-bit quantization. This preserves model quality far better than naive quantization.
AWQ(激活感知权重量化)通过观察激活值大小识别最重要的 1% 权重,在统一 4-bit 量化前对这些通道进行缩放。这比朴素量化更好地保留了模型质量。
| Method | AWQ (Activation-Aware Weight Quantization) |
| Bit Width | 4-bit (int4) |
| Group Size | 128 |
| Kernel | GEMM |
| Zero Point | Yes |
| Calibration Data | Pile validation set (214K samples) |
| Quantization Time | ~4 minutes on RTX 4090 |
Related Models / 相关模型
| Model / 模型 | Type / 类型 | Size / 大小 | Speed / 速度 | Use Case / 用途 |
|---|---|---|---|---|
| cunxin/roberta-email-fraud-detector | Discriminative | 475 MB | <50ms | Fast binary pre-screen / 快速二元预筛 |
| cunxin/llama-email-fraud-detector | Generative (bf16) | 6.4 GB | ~1-3s | Detailed threat analysis / 详细威胁分析 |
| cunxin/llama-email-fraud-detector-awq (this) | Generative (4-bit) | 2.2 GB | ~1-3s | Same as above, for low VRAM / 同上,低显存版 |
Citation / 引用
@misc{cunxin2025llama-email-fraud-awq,
title={Llama Email Fraud Detector (AWQ 4-bit)},
author={cunxin},
year={2025},
url={https://huggingface.co/cunxin/llama-email-fraud-detector-awq}
}
- Downloads last month
- 109
Model tree for cunxin/llama-email-fraud-detector-awq
Base model
meta-llama/Llama-3.2-3B-InstructDataset used to train cunxin/llama-email-fraud-detector-awq
Evaluation results
- Threat Types on Enron Email + Synthetic (held-out test set)self-reported11.000