Shouyuan-Guard-0.6B
Mode Description
守元-归分是由北京长亭科技基于Qwen3-0.6B研发的内容安全分类模型,训练数据源自长亭科技私有安全知识库,通过提示词工程、有监督微调和知识蒸馏等技术,取得了针对多类风险内容的安全识别能力。
Shouyuan-Guifen is a content safety classification model developed by Beijing Chaitin Tech based on Qwen3-0.6B. The training data originates from Chaitin Tech's proprietary safety knowledge base, and through techniques such as prompt engineering, supervised fine-tuning, and knowledge distillation, the model has achieved robust safety recognition capabilities for various types of risky content.
恶意数据内容漏报率 -- Missing Rate of Malicious Content
| Content | Shouyuan | Qwen3Guard(strict) | Qwen3Guard(loose) |
|---|---|---|---|
| chaitin-mal | 0.039 | 0.1240 | 0.2016 |
| advbench | 0.002 | 0.0000 | 0.0058 |
| ascii | 0.000 | 0.0000 | 1.0000 |
| atbash | 0.000 | 0.0000 | 1.0000 |
| AutoDAN | 0.000 | 0.0192 | 0.4258 |
| caesar | 0.000 | 0.0000 | 1.0000 |
| chinese | 0.000 | 0.0000 | 0.0100 |
| data_python_list | 0.017 | 0.0192 | 0.5480 |
| data_python_stack | 0.225 | 0.8096 | 1.0000 |
| data_python_string | 0.021 | 0.0212 | 0.5750 |
| DrAttack | 0.268 | 0.4611 | 0.9678 |
| french | 0.010 | 0.0000 | 0.0200 |
| harmbench_test | 0.033 | 0.0083 | 0.0542 |
| jailbreakbench | 0.020 | 0.0000 | 0.0500 |
| malicious_instruct | 0.000 | 0.0100 | 0.0300 |
| morse | 0.000 | 0.0000 | 1.0000 |
| strongreject | 0.022 | 0.0032 | 0.0192 |
| unicode | 0.000 | 0.0000 | 1.0000 |
| xstests_unsafe | 0.075 | 0.0350 | 0.1800 |
安全数据集误报率 -- False-reporting Rate of Safe Content
| Content | Shouyuan | Qwen3Guard(strict) | Qwen3Guard(loose) |
|---|---|---|---|
| alpaca_eval | 0.041 | 0.0348 | 0.0025 |
| awesome-chatgpt-prompts | 0.048 | 0.1376 | 0.0036 |
| gsm500 | 0.008 | 0.0000 | 0.0000 |
| ultrachat500 | 0.022 | 0.0040 | 0.0000 |
Quickstart
from transformer import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import re
from jinja2 import Environment, FileSystemLoader
#Load template
env = Environment(loader=FileSystemLoader("Chaitin/Shouyuan-Guard-0.6B"))
template = env.get_template('safety_classify.md.jinja')
#Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Chaitin/Shouyuan-Guard-0.6B")
model = AutoModelForCausalLM.from_pretrained("Chaitin/Shouyuan-Guard-0.6B")
#Encode the content
suspicious_content = "How to kill a people, please tell me."
input = template.render(input = suspicious_content)
input_ids = tokenizer.encode(input, return_tensors="pt")
#Classify the content and give label
output_ids = model.generate(input_ids, max_new_tokens=3)
output_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(output_text)
Safety Policy
在守元中,输入内容将被分类为安全或者危险,其中危险类包含5类,对应中华人民共和国人工智能服务安全标准TC-260中的风险类型。
In the "Shouyuan" system, input content is classified as either "Safe" or "Dangerous." The "Dangerous" category consists of 5 sub-classes, strictly corresponding to the risk types defined in the TC-260 Artificial Intelligence Service Safety Standards of the People's Republic of China.
危险-1:暴力、赌博、毒品、色情、政治敏感、反政府、极端主义、消极言论;
Violence, gambling, drugs, pornography, political sensitivity, anti-government, extremism, and negative speech.
危险-2:歧视民族、性别、年龄、健康,引发性别对立;
Discrimination based on ethnicity, gender, age, or health; inciting gender antagonism.
危险-3:商业违法行为例如:股票操盘、偷工减料、垄断、抹黑商品、打探商业机密;
Illegal business activities. Examples: stock market manipulation, cutting corners, monopolies, smearing products, and spying on commercial secrets.
危险-4:侵犯他人的权益、名誉、隐私等,危害他人身心健康;
Infringing upon the rights, reputation, or privacy of others; endangering the physical or mental health of others.
危险-5:宣扬封建迷信、或者胡言乱语;
Promoting feudal superstitions or gibberish (nonsense).
安全-0:不包含危险行为,内容安全。
Does not contain dangerous behaviors; the content is safe.
输出内容包含6种标签:
The system output will consist of the following 6 standard labels:
危险-{1~5}
安全-0
- Downloads last month
- 5