PUPPET
Collection
This collection hosts research artifacts released to accompany the paper, "LLM Output Detectability and Task Performance Can be Jointly Optimized". • 2 items • Updated
This model is a fine-tuned version of Qwen/Qwen3-8B trained using PUPPET (GitHub) - a framework that jointly optimizes LLM output detectability and task performance via DPO.
This model is a research artifact released to accompany the paper, "LLM Output Detectability and Task Performance Can be Jointly Optimized" (Saito et al., arXiv, 2026).
This model is based on Qwen3-8B [Apache 2.0] by Qwen Team / Alibaba Cloud. This fine-tuned model is released under the Apache 2.0 License.
The training data (Hello-SimpleAI/HC3) is licensed under CC BY-SA 4.0. See: https://huggingface.co/datasets/Hello-SimpleAI/HC3
If you find our code or work helpful, please cite:
@misc{Saito:PUPPET:2026,
author = {Koshiro Saito and Ryuto Koike and Masahiro Kaneko and Naoaki Okazaki},
title = {{LLM} Output Detectability and Task Performance Can be Jointly Optimized},
eprint = {2605.01350},
howpublished = {arXiv:2605.01350},
primaryClass = {cs.CL},
year = {2026},
}
@misc{qwen3,
title={Qwen3 Technical Report},
author={QwenTeam},
year={2025},
eprint={2505.09388},
primaryClass={cs.CL},
howpublished={arXiv:2505.09388},
}
@misc{hc3,
title = "How Close is {C}hat{GPT} to Human Experts? Comparison Corpus, Evaluation, and Detection",
author = "Biyang Guo and Xin Zhang and Ziyuan Wang and Minqi Jiang and Jinran Nie and Yuxuan Ding and Jianwei Yue and Yupeng Wu",
primaryClass = {cs.CL},
eprint = {2301.07597},
howpublished = {arXiv:2301.07597},
year = "2023",
}
@misc{openai_detector,
title={Release strategies and the social impacts of language models},
author={Irene Solaiman and Miles Brundage and Jack Clark and Amanda Askell and Ariel Herbert-Voss and Jeff Wu and Alec Radford and Gretchen Krueger and Jong Wook Kim and Sarah Kreps and others},
primaryClass={cs.CL},
howpublished={arXiv:1908.09203},
year={2019},
eprint={1908.09203},
}