You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

This model is released under CC BY-NC-SA 4.0 and is for non-commercial research and educational use only. Commercial use requires a separately- trained variant on Apache-2.0-only data — contact the maintainer.

CyberPuppy v2.2 — Chinese Cyberbullying Detection (LoRA Adapter + Multi-task Heads)

中文網路霸凌偵測 · Qwen3-8B + LoRA + 4 個分類頭 + 對抗訓練

Status: production-ready research artefact. License: CC BY-NC-SA 4.0. Maintainer: thc1006 · hctsai1006@cs.nctu.edu.tw

What this is

PEFT LoRA adapter for Qwen/Qwen3-8B-Base (Apache 2.0) plus 4 custom classification heads (heads.pt) for simultaneous prediction of:

Task	Classes
`toxicity`	none / toxic / severe
`bullying`	none / harassment / threat
`role`	none / perpetrator / victim / bystander
`emotion`	pos / neu / neg

Designed for Traditional and Simplified Chinese moderation use-cases (LINE bots, school safety modules, social-platform pre-moderation).

Performance (2026-04-16, RTX 5090 bf16)

Eval set	Toxicity F1_w	Bullying F1_w	Notes
COLD test (5,320)	0.8378	0.8365	Apples-to-apples vs MacBERT baseline 0.8247
Multisource test (10,382)	0.8085	0.8431	COLD+SCCD+STATE-ToxiCN merged
6 Traditional Chinese threats	6/6	—	Includes "我打死你", "希望你去死"

Adversarial robustness (ToxiCloakCN held-out, 906 pairs):

Attack	F1_w drop
Emoji substitution	−0.37% ✅
Homophone substitution	−8.51% ⚠️ (next version aims for ≤ −5%)

Latency (bf16 batch=1, RTX 5090): p95 17 ms (short) / 22 ms (medium) / 34 ms (long).

How to load

import torch
from peft import PeftModel
from transformers import AutoModel, AutoTokenizer
import torch.nn as nn

base = "Qwen/Qwen3-8B-Base"
adapter = "thc1006/cyberpuppy-v2.2-adapter"

tokenizer = AutoTokenizer.from_pretrained(base)
tokenizer.padding_side = "left"

backbone = AutoModel.from_pretrained(base, torch_dtype=torch.bfloat16,
                                       attn_implementation="sdpa")
backbone = PeftModel.from_pretrained(backbone, adapter)

# Multi-task heads (saved separately as heads.pt)
HEAD_DIMS = {"toxicity": 3, "bullying": 3, "role": 4, "emotion": 3}
heads = nn.ModuleDict({
    name: nn.Linear(backbone.config.hidden_size, dim)
    for name, dim in HEAD_DIMS.items()
})
state = torch.load(
    f"{adapter}/heads.pt",  # download via hf_hub_download in real code
    map_location="cuda", weights_only=False,
)
heads.load_state_dict(state["heads"])

# Inference: backbone hidden_states[:, -1] -> heads -> argmax

For full inference and serving code see the source repo (api/v2_2_app.py, src/cyberpuppy/models/qwen3_multihead.py).

Training data

Heterogeneous multi-source (60K + 11K adversarial = 70.8K samples). All upstream datasets are research artefacts; this model card propagates the most-restrictive upstream license (NonCommercial) by way of CC BY-NC-SA.

Dataset	Upstream license	Used for
COLD (Deng et al. EMNLP 2022)	Apache 2.0	Training
SCCD (Yang et al. COLING 2025)	"academic research only"	Training
STATE-ToxiCN (Bai et al. ACL Findings 2025)	CC BY-NC 4.0	Training
ToxiCloakCN (Xiao et al. EMNLP 2024)	derivative of ToxiCN (CC BY-NC-ND 4.0)	Training (adversarial) + held-out eval
CHNCI (Zhu et al. arXiv 2505.20654)	not declared	(eval only, planned for v2.3)

Method (high level)

Backbone: Qwen/Qwen3-8B-Base, frozen
Adapter: LoRA r=32, α=64, targets q_proj k_proj v_proj o_proj gate_proj up_proj down_proj
Heads: 4 × nn.Linear(4096, K) on last-token pooled hidden state
Loss: uncertainty-weighted multi-task (Kendall et al. 2018) + focal γ=2.5 + adversarial consistency λ=0.1
Adversarial trick: same training batch holds all 3 variants of each ToxiCloakCN pair (base / homophone / emoji); a consistency_loss forces their toxicity logits to agree (HuggingFace-style LengthGroupedSampler extended into CloakAwareBatchSampler)
Optimizer: AdamW 8-bit (bitsandbytes), cosine LR peak 3e-5, warmup 0.1
Hardware: RTX 5090 32 GB, bf16, gradient checkpointing on, 75 min for 3 epochs

Full ADR with rationale, alternatives considered, and DoD: docs/adr/0001-cyberpuppy-2026-upgrade.md.

Intended use

✅ Academic research on Chinese hate-speech / cyberbullying
✅ Educational demos
✅ Pre-screen module in school-safety dashboards (with human-in-the-loop)
✅ Non-commercial harm-reduction tools

Out-of-scope / NOT intended

❌ Standalone automatic disciplinary or legal action (always require human review)
❌ Commercial / for-profit deployment without a relicensed commercial variant
❌ Any setting where false positives could harm users (e.g. unsupervised account suspension)

Limitations

Homophone-attack robustness still 8.51% drop — for adversarial production use, pair with the upstream Qwen3Guard sentinel (see ADR §3 dual-layer architecture)
role / emotion labels are silver for the current multisource training (perpetrator-only, neg-only) — derived from heuristics, not gold-annotated
English-only inputs are out of distribution; this is a Chinese model
Threat class is very rare (~0.2% in training) — high-confidence "threat" predictions should be hand-reviewed

License

This adapter and its multi-task heads are licensed under CC BY-NC-SA 4.0:

https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode

The non-commercial restriction is inherited in good faith from the most restrictive upstream training datasets. For commercial use, contact the maintainer for a separately-trained variant on Apache-2.0 sources.

Citation

@misc{cyberpuppy_v2_2_2026,
  author       = {Tsai, H.-C.},
  title        = {CyberPuppy v2.2: Chinese Cyberbullying Detection with Multi-task LoRA on Qwen3-8B},
  year         = {2026},
  publisher    = {Hugging Face},
  url          = {https://huggingface.co/thc1006/cyberpuppy-v2.2-adapter}
}

Please also cite the upstream datasets: COLD (Deng et al. 2022), ToxiCN (Lu et al. 2023), STATE-ToxiCN (Bai et al. 2025), ToxiCloakCN (Xiao et al. 2024), SCCD (Yang et al. 2025), and Qwen3 (Yang et al. 2025).

Takedown / dispute

If you are an author of any upstream dataset and object to this release, open a discussion on this repo or email hctsai1006@cs.nctu.edu.tw — the artefact will be removed within 7 days pending resolution.

Downloads last month: -

Model tree for thc1006/cyberpuppy-v2.2-adapter

Base model

Qwen/Qwen3-8B-Base

Adapter

(63)

this model