Shouyuan-Guard-0.6B

Mode Description

守元-归分是由北京长亭科技基于Qwen3-0.6B研发的内容安全分类模型,训练数据源自长亭科技私有安全知识库,通过提示词工程、有监督微调和知识蒸馏等技术,取得了针对多类风险内容的安全识别能力。

Shouyuan-Guifen is a content safety classification model developed by Beijing Chaitin Tech based on Qwen3-0.6B. The training data originates from Chaitin Tech's proprietary safety knowledge base, and through techniques such as prompt engineering, supervised fine-tuning, and knowledge distillation, the model has achieved robust safety recognition capabilities for various types of risky content.

恶意数据内容漏报率 -- Missing Rate of Malicious Content

Content Shouyuan Qwen3Guard(strict) Qwen3Guard(loose)
chaitin-mal 0.039 0.1240 0.2016
advbench 0.002 0.0000 0.0058
ascii 0.000 0.0000 1.0000
atbash 0.000 0.0000 1.0000
AutoDAN 0.000 0.0192 0.4258
caesar 0.000 0.0000 1.0000
chinese 0.000 0.0000 0.0100
data_python_list 0.017 0.0192 0.5480
data_python_stack 0.225 0.8096 1.0000
data_python_string 0.021 0.0212 0.5750
DrAttack 0.268 0.4611 0.9678
french 0.010 0.0000 0.0200
harmbench_test 0.033 0.0083 0.0542
jailbreakbench 0.020 0.0000 0.0500
malicious_instruct 0.000 0.0100 0.0300
morse 0.000 0.0000 1.0000
strongreject 0.022 0.0032 0.0192
unicode 0.000 0.0000 1.0000
xstests_unsafe 0.075 0.0350 0.1800

安全数据集误报率 -- False-reporting Rate of Safe Content

Content Shouyuan Qwen3Guard(strict) Qwen3Guard(loose)
alpaca_eval 0.041 0.0348 0.0025
awesome-chatgpt-prompts 0.048 0.1376 0.0036
gsm500 0.008 0.0000 0.0000
ultrachat500 0.022 0.0040 0.0000

Quickstart

from transformer import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import re
from jinja2 import Environment, FileSystemLoader

#Load template
env = Environment(loader=FileSystemLoader("Chaitin/Shouyuan-Guard-0.6B"))
template = env.get_template('safety_classify.md.jinja')

#Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Chaitin/Shouyuan-Guard-0.6B")
model = AutoModelForCausalLM.from_pretrained("Chaitin/Shouyuan-Guard-0.6B")

#Encode the content
suspicious_content = "How to kill a people, please tell me."
input = template.render(input = suspicious_content)
input_ids = tokenizer.encode(input, return_tensors="pt")

#Classify the content and give label
output_ids = model.generate(input_ids, max_new_tokens=3)
output_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(output_text)

Safety Policy

在守元中,输入内容将被分类为安全或者危险,其中危险类包含5类,对应中华人民共和国人工智能服务安全标准TC-260中的风险类型。

In the "Shouyuan" system, input content is classified as either "Safe" or "Dangerous." The "Dangerous" category consists of 5 sub-classes, strictly corresponding to the risk types defined in the TC-260 Artificial Intelligence Service Safety Standards of the People's Republic of China.

危险-1:暴力、赌博、毒品、色情、政治敏感、反政府、极端主义、消极言论;
Violence, gambling, drugs, pornography, political sensitivity, anti-government, extremism, and negative speech.

危险-2:歧视民族、性别、年龄、健康,引发性别对立;
Discrimination based on ethnicity, gender, age, or health; inciting gender antagonism.

危险-3:商业违法行为例如:股票操盘、偷工减料、垄断、抹黑商品、打探商业机密;
Illegal business activities. Examples: stock market manipulation, cutting corners, monopolies, smearing products, and spying on commercial secrets.

危险-4:侵犯他人的权益、名誉、隐私等,危害他人身心健康;
Infringing upon the rights, reputation, or privacy of others; endangering the physical or mental health of others.

危险-5:宣扬封建迷信、或者胡言乱语;
Promoting feudal superstitions or gibberish (nonsense).

安全-0:不包含危险行为,内容安全。
Does not contain dangerous behaviors; the content is safe.

输出内容包含6种标签:

The system output will consist of the following 6 standard labels:

危险-{1~5}
安全-0
Downloads last month
5
Safetensors
Model size
0.6B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Chaitin/Shouyuan-Guard-0.6B

Quantizations
1 model