PUPPET

Qwen3-PUPPET-8B

Overview

This model is a fine-tuned version of Qwen/Qwen3-8B trained using PUPPET (GitHub) - a framework that jointly optimizes LLM output detectability and task performance via DPO.

This model is a research artifact released to accompany the paper, "LLM Output Detectability and Task Performance Can be Jointly Optimized" (Saito et al., arXiv, 2026).

Training Details

Base Model: Qwen/Qwen3-8B [Apache 2.0]
Training Data: Hello-SimpleAI/HC3 / reddit_eli5 split [CC BY-SA 4.0]
Training Method: DPO (Direct Preference Optimization)
Preference Labeling: Assign the best sample "chosen" and the worst sample "rejected" based on the sum of these scores following -
- Task Performance - ROUGE-L
- Detectability - openai-community/roberta-large-openai-detector [MIT]
Training Environment: 2× NVIDIA RTX A6000 (48 GB VRAM each), TRL 0.24.0, Transformers 4.57.1

Warnings & Disclaimer

This model is released for research purposes only.
No guarantee is made regarding the accuracy or appropriateness of model outputs.
Users are responsible for ensuring appropriate use in compliance with applicable laws.

License

This model is based on Qwen3-8B [Apache 2.0] by Qwen Team / Alibaba Cloud. This fine-tuned model is released under the Apache 2.0 License.

The training data (Hello-SimpleAI/HC3) is licensed under CC BY-SA 4.0. See: https://huggingface.co/datasets/Hello-SimpleAI/HC3

How to cite

If you find our code or work helpful, please cite:

@misc{Saito:PUPPET:2026,
  author        = {Koshiro Saito and Ryuto Koike and Masahiro Kaneko and Naoaki Okazaki},
  title         = {{LLM} Output Detectability and Task Performance Can be Jointly Optimized},
  eprint        = {2605.01350},
  howpublished  = {arXiv:2605.01350},
  primaryClass  = {cs.CL},
  year          = {2026},
}

Citation

@misc{qwen3,
      title={Qwen3 Technical Report}, 
      author={QwenTeam},
      year={2025},
      eprint={2505.09388},
      primaryClass={cs.CL},
      howpublished={arXiv:2505.09388}, 
}

@misc{hc3,
    title = "How Close is {C}hat{GPT} to Human Experts? Comparison Corpus, Evaluation, and Detection",
    author = "Biyang Guo and Xin Zhang and Ziyuan Wang and Minqi Jiang and Jinran Nie and Yuxuan Ding and Jianwei Yue and Yupeng Wu",
    primaryClass = {cs.CL},
    eprint = {2301.07597},
    howpublished = {arXiv:2301.07597},
    year = "2023",
}

@misc{openai_detector,
    title={Release strategies and the social impacts of language models},
    author={Irene Solaiman and Miles Brundage and Jack Clark and Amanda Askell and Ariel Herbert-Voss and Jeff Wu and Alec Radford and Gretchen Krueger and Jong Wook Kim and Sarah Kreps and others},
    primaryClass={cs.CL},
    howpublished={arXiv:1908.09203},
    year={2019},
    eprint={1908.09203},
}