PUPPET
Collection
This collection hosts research artifacts released to accompany the paper, "LLM Output Detectability and Task Performance Can be Jointly Optimized". • 2 items • Updated
This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct trained using PUPPET (GitHub) — a framework that jointly optimizes LLM output detectability and task performance via DPO.
This model is a research artifact released to accompany the paper, "LLM Output Detectability and Task Performance Can be Jointly Optimized" (Saito et al., arXiv, 2026).
This model is based on Meta Llama 3 and is distributed under the Llama 3 Community License.
The training data (Hello-SimpleAI/HC3) is licensed under CC BY-SA 4.0. See: https://huggingface.co/datasets/Hello-SimpleAI/HC3
If you find our code or work helpful, please cite:
@misc{Saito:PUPPET:2026,
author = {Koshiro Saito and Ryuto Koike and Masahiro Kaneko and Naoaki Okazaki},
title = {{LLM} Output Detectability and Task Performance Can be Jointly Optimized},
eprint = {2605.01350},
howpublished = {arXiv:2605.01350},
primaryClass = {cs.CL},
year = {2026},
}
@misc{llama3,
title = {The {L}lama 3 Herd of Models},
author = {Aaron Grattafiori and Abhimanyu Dubey and Abhinav Jauhri and Abhinav Pandey and Abhishek Kadian and Ahmad Al-Dahle and Aiesha Letman and Akhil Mathur and Alan Schelten and Alex Vaughan and others},
year = {2024},
eprint = {2407.21783},
primaryClass={cs.CL},
howpublished={arXiv:2407.21783},
}
@misc{hc3,
title = "How Close is {C}hat{GPT} to Human Experts? Comparison Corpus, Evaluation, and Detection",
author = "Biyang Guo and Xin Zhang and Ziyuan Wang and Minqi Jiang and Jinran Nie and Yuxuan Ding and Jianwei Yue and Yupeng Wu",
primaryClass = {cs.CL},
eprint = {2301.07597},
howpublished = {arXiv:2301.07597},
year = "2023",
}
@misc{openai_detector,
title={Release strategies and the social impacts of language models},
author={Irene Solaiman and Miles Brundage and Jack Clark and Amanda Askell and Ariel Herbert-Voss and Jeff Wu and Alec Radford and Gretchen Krueger and Jong Wook Kim and Sarah Kreps and others},
primaryClass={cs.CL},
howpublished={arXiv:1908.09203},
year={2019},
eprint={1908.09203},
}
Base model
meta-llama/Meta-Llama-3-8B-Instruct