Llama Email Fraud Detector (AWQ 4-bit)

AWQ 4-bit quantized version of cunxin/llama-email-fraud-detector. Same fine-tuned email fraud detection capability at 1/3 the size with ~1-1.5% accuracy loss.

This model is optimized for low VRAM GPUs (6-8 GB) such as RTX 3050, RTX 3060 laptop, and similar consumer GPUs.

cunxin/llama-email-fraud-detector 的 AWQ 4-bit 量化版本。同样的邮件欺诈检测能力，体积仅为原来的 1/3，精度损失约 1-1.5%。

本模型为低显存 GPU（6-8 GB）优化，如 RTX 3050、RTX 3060 笔记本版等消费级显卡。

Model Details / 模型详情


Architecture	`LlamaForCausalLM` (Decoder-only Transformer)
Base Model	`meta-llama/Llama-3.2-3B-Instruct`
Fine-Tuning	LoRA (r=16, alpha=32) merged, then AWQ quantized
Quantization	AWQ 4-bit, group_size=128, GEMM kernel, zero_point=True
Parameters	3.2B (quantized)
Precision	4-bit weights (int4)
Model Size	2.2 GB (vs 6.4 GB for bf16)
Compression Ratio	~2.9x
Accuracy Loss vs bf16	~1-1.5%

Lineage / 模型血统

meta-llama/Llama-3.2-3B-Instruct
    │
    ├── LoRA fine-tuning (email fraud detection)
    │
    ├── Merge LoRA into base weights (6.4 GB, bf16)
    │     └─► cunxin/llama-email-fraud-detector
    │
    └── AWQ 4-bit quantization (2.2 GB)
          └─► cunxin/llama-email-fraud-detector-awq (this model)

Output Format / 输出格式

Same structured JSON output as the bf16 version:

与 bf16 版本相同的结构化 JSON 输出：

{
  "is_fraud": true,
  "risk_score": 95,
  "confidence_level": 0.97,
  "detected_threats": ["DOMAIN_MISMATCH", "CREDENTIAL_REQUEST", "URGENCY_FEAR"],
  "reason": "The sender domain 'amaz0n-verify.com' typosquats amazon.com...",
  "suggestion": "Do not click any links. Report this email as phishing."
}

11 threat types with point-based scoring: CREDENTIAL_REQUEST (35), DOMAIN_MISMATCH (30), URL_DISCREPANCY (30), TOO_GOOD_TO_BE_TRUE (30), PROMPT_INJECTION (30), URGENCY_FEAR (15), REPLY_TO_MISMATCH (15), GENERIC_SALUTATION (8), ANOMALOUS_TIMING (8), MISSING_SIGNATURE (8), GRAMMAR_ANOMALY (5).

See cunxin/llama-email-fraud-detector for full documentation of threat types, dual-model pipeline, and training details.

详细的威胁类型、双模型流水线和训练细节请参阅 cunxin/llama-email-fraud-detector。

Usage / 使用方法

With vLLM (Recommended) / 使用 vLLM（推荐）

# Set in .env
MODEL_PATH=cunxin/llama-email-fraud-detector-awq
QUANTIZATION=awq

# Start service
docker compose --profile gpu up -d

With Transformers / 使用 Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "cunxin/llama-email-fraud-detector-awq"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

VRAM Requirements / 显存要求

Configuration / 配置	VRAM / 显存	Example GPU / 示例 GPU
AWQ model only	~2.5 GB	Any 4GB+ GPU
AWQ + RoBERTa	~3.0 GB	RTX 3050 (8GB)
AWQ + RoBERTa + KV Cache (4096 ctx)	~3.5 GB	RTX 3060 Laptop (6GB)
AWQ + RoBERTa + KV Cache + TurboQuant	~2.7 GB	RTX 3050 (4GB laptop)

Comparison with bf16 / 与 bf16 对比

	bf16	AWQ 4-bit (this)
Model size	6.4 GB	2.2 GB
Min VRAM	12 GB	6 GB
Max concurrent requests (24GB GPU)	~21	~50+
Accuracy loss	baseline	~1-1.5%
Inference speed	baseline	comparable

Quantization Details / 量化细节

AWQ (Activation-Aware Weight Quantization) identifies the most important 1% of weights by observing activation magnitudes, then scales those channels up before uniform 4-bit quantization. This preserves model quality far better than naive quantization.

AWQ（激活感知权重量化）通过观察激活值大小识别最重要的 1% 权重，在统一 4-bit 量化前对这些通道进行缩放。这比朴素量化更好地保留了模型质量。


Method	AWQ (Activation-Aware Weight Quantization)
Bit Width	4-bit (int4)
Group Size	128
Kernel	GEMM
Zero Point	Yes
Calibration Data	Pile validation set (214K samples)
Quantization Time	~4 minutes on RTX 4090

Related Models / 相关模型

Model / 模型	Type / 类型	Size / 大小	Speed / 速度	Use Case / 用途
cunxin/roberta-email-fraud-detector	Discriminative	475 MB	<50ms	Fast binary pre-screen / 快速二元预筛
cunxin/llama-email-fraud-detector	Generative (bf16)	6.4 GB	~1-3s	Detailed threat analysis / 详细威胁分析
cunxin/llama-email-fraud-detector-awq (this)	Generative (4-bit)	2.2 GB	~1-3s	Same as above, for low VRAM / 同上，低显存版

Citation / 引用

@misc{cunxin2025llama-email-fraud-awq,
  title={Llama Email Fraud Detector (AWQ 4-bit)},
  author={cunxin},
  year={2025},
  url={https://huggingface.co/cunxin/llama-email-fraud-detector-awq}
}

Downloads last month: 109

Safetensors

Model size

3B params

Tensor type

I32

BF16

Model tree for cunxin/llama-email-fraud-detector-awq

Base model

meta-llama/Llama-3.2-3B-Instruct

Adapter

cunxin/llama-email-fraud-detector

Quantized

(1)

this model

Dataset used to train cunxin/llama-email-fraud-detector-awq

Evaluation results

Threat Types on Enron Email + Synthetic (held-out test set)
self-reported

11.000