You need to agree to share your contact information to access this model
This repository is publicly accessible, but you have to accept the conditions to access its files and content.
This model is released under CC BY-NC-SA 4.0 and is for non-commercial research and educational use only. Commercial use requires a separately- trained variant on Apache-2.0-only data — contact the maintainer.
Log in or Sign Up to review the conditions and access this model content.
CyberPuppy v2.2 — Chinese Cyberbullying Detection (LoRA Adapter + Multi-task Heads)
中文網路霸凌偵測 · Qwen3-8B + LoRA + 4 個分類頭 + 對抗訓練
Status: production-ready research artefact. License: CC BY-NC-SA 4.0. Maintainer: thc1006 · hctsai1006@cs.nctu.edu.tw
What this is
PEFT LoRA adapter for Qwen/Qwen3-8B-Base (Apache 2.0) plus 4 custom
classification heads (heads.pt) for simultaneous prediction of:
| Task | Classes |
|---|---|
toxicity |
none / toxic / severe |
bullying |
none / harassment / threat |
role |
none / perpetrator / victim / bystander |
emotion |
pos / neu / neg |
Designed for Traditional and Simplified Chinese moderation use-cases (LINE bots, school safety modules, social-platform pre-moderation).
Performance (2026-04-16, RTX 5090 bf16)
| Eval set | Toxicity F1_w | Bullying F1_w | Notes |
|---|---|---|---|
| COLD test (5,320) | 0.8378 | 0.8365 | Apples-to-apples vs MacBERT baseline 0.8247 |
| Multisource test (10,382) | 0.8085 | 0.8431 | COLD+SCCD+STATE-ToxiCN merged |
| 6 Traditional Chinese threats | 6/6 | — | Includes "我打死你", "希望你去死" |
Adversarial robustness (ToxiCloakCN held-out, 906 pairs):
| Attack | F1_w drop |
|---|---|
| Emoji substitution | −0.37% ✅ |
| Homophone substitution | −8.51% ⚠️ (next version aims for ≤ −5%) |
Latency (bf16 batch=1, RTX 5090): p95 17 ms (short) / 22 ms (medium) / 34 ms (long).
How to load
import torch
from peft import PeftModel
from transformers import AutoModel, AutoTokenizer
import torch.nn as nn
base = "Qwen/Qwen3-8B-Base"
adapter = "thc1006/cyberpuppy-v2.2-adapter"
tokenizer = AutoTokenizer.from_pretrained(base)
tokenizer.padding_side = "left"
backbone = AutoModel.from_pretrained(base, torch_dtype=torch.bfloat16,
attn_implementation="sdpa")
backbone = PeftModel.from_pretrained(backbone, adapter)
# Multi-task heads (saved separately as heads.pt)
HEAD_DIMS = {"toxicity": 3, "bullying": 3, "role": 4, "emotion": 3}
heads = nn.ModuleDict({
name: nn.Linear(backbone.config.hidden_size, dim)
for name, dim in HEAD_DIMS.items()
})
state = torch.load(
f"{adapter}/heads.pt", # download via hf_hub_download in real code
map_location="cuda", weights_only=False,
)
heads.load_state_dict(state["heads"])
# Inference: backbone hidden_states[:, -1] -> heads -> argmax
For full inference and serving code see the source repo
(api/v2_2_app.py, src/cyberpuppy/models/qwen3_multihead.py).
Training data
Heterogeneous multi-source (60K + 11K adversarial = 70.8K samples). All upstream datasets are research artefacts; this model card propagates the most-restrictive upstream license (NonCommercial) by way of CC BY-NC-SA.
| Dataset | Upstream license | Used for |
|---|---|---|
| COLD (Deng et al. EMNLP 2022) | Apache 2.0 | Training |
| SCCD (Yang et al. COLING 2025) | "academic research only" | Training |
| STATE-ToxiCN (Bai et al. ACL Findings 2025) | CC BY-NC 4.0 | Training |
| ToxiCloakCN (Xiao et al. EMNLP 2024) | derivative of ToxiCN (CC BY-NC-ND 4.0) | Training (adversarial) + held-out eval |
| CHNCI (Zhu et al. arXiv 2505.20654) | not declared | (eval only, planned for v2.3) |
Method (high level)
- Backbone:
Qwen/Qwen3-8B-Base, frozen - Adapter: LoRA r=32, α=64, targets
q_proj k_proj v_proj o_proj gate_proj up_proj down_proj - Heads: 4 ×
nn.Linear(4096, K)on last-token pooled hidden state - Loss: uncertainty-weighted multi-task (Kendall et al. 2018) + focal γ=2.5 + adversarial consistency λ=0.1
- Adversarial trick: same training batch holds all 3 variants of each ToxiCloakCN pair (base / homophone / emoji); a
consistency_lossforces their toxicity logits to agree (HuggingFace-styleLengthGroupedSamplerextended intoCloakAwareBatchSampler) - Optimizer: AdamW 8-bit (bitsandbytes), cosine LR peak 3e-5, warmup 0.1
- Hardware: RTX 5090 32 GB, bf16, gradient checkpointing on, 75 min for 3 epochs
Full ADR with rationale, alternatives considered, and DoD: docs/adr/0001-cyberpuppy-2026-upgrade.md.
Intended use
- ✅ Academic research on Chinese hate-speech / cyberbullying
- ✅ Educational demos
- ✅ Pre-screen module in school-safety dashboards (with human-in-the-loop)
- ✅ Non-commercial harm-reduction tools
Out-of-scope / NOT intended
- ❌ Standalone automatic disciplinary or legal action (always require human review)
- ❌ Commercial / for-profit deployment without a relicensed commercial variant
- ❌ Any setting where false positives could harm users (e.g. unsupervised account suspension)
Limitations
- Homophone-attack robustness still 8.51% drop — for adversarial production use, pair with the upstream Qwen3Guard sentinel (see ADR §3 dual-layer architecture)
- role / emotion labels are silver for the current multisource training (perpetrator-only, neg-only) — derived from heuristics, not gold-annotated
- English-only inputs are out of distribution; this is a Chinese model
- Threat class is very rare (~0.2% in training) — high-confidence "threat" predictions should be hand-reviewed
License
This adapter and its multi-task heads are licensed under CC BY-NC-SA 4.0:
The non-commercial restriction is inherited in good faith from the most restrictive upstream training datasets. For commercial use, contact the maintainer for a separately-trained variant on Apache-2.0 sources.
Citation
@misc{cyberpuppy_v2_2_2026,
author = {Tsai, H.-C.},
title = {CyberPuppy v2.2: Chinese Cyberbullying Detection with Multi-task LoRA on Qwen3-8B},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/thc1006/cyberpuppy-v2.2-adapter}
}
Please also cite the upstream datasets: COLD (Deng et al. 2022), ToxiCN (Lu et al. 2023), STATE-ToxiCN (Bai et al. 2025), ToxiCloakCN (Xiao et al. 2024), SCCD (Yang et al. 2025), and Qwen3 (Yang et al. 2025).
Takedown / dispute
If you are an author of any upstream dataset and object to this release, open a discussion on this repo or email hctsai1006@cs.nctu.edu.tw — the artefact will be removed within 7 days pending resolution.
- Downloads last month
- -
Model tree for thc1006/cyberpuppy-v2.2-adapter
Base model
Qwen/Qwen3-8B-Base