PUPPET

Llama-3-PUPPET-8B-Instruct

Overview

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct trained using PUPPET (GitHub) — a framework that jointly optimizes LLM output detectability and task performance via DPO.

This model is a research artifact released to accompany the paper, "LLM Output Detectability and Task Performance Can be Jointly Optimized" (Saito et al., arXiv, 2026).

Training Details

Base Model: meta-llama/Meta-Llama-3-8B-Instruct [Llama 3 Community License]
Training Data: Hello-SimpleAI/HC3 / reddit_eli5 split [CC BY-SA 4.0]
Training Method: DPO (Direct Preference Optimization)
Preference Labeling: Assign the best sample "chosen" and the worst sample "rejected" based on the sum of these scores following -
- Task Performance - ROUGE-L
- Detectability - openai-community/roberta-large-openai-detector [MIT]
Training Environment: 2× NVIDIA RTX A6000 (48 GB VRAM each), TRL 0.24.0, Transformers 4.57.1

Warnings & Disclaimer

This model is released for research purposes only.
No guarantee is made regarding the accuracy or appropriateness of model outputs.
Usage must comply with the Llama 3 Community License and applicable laws.

License

This model is based on Meta Llama 3 and is distributed under the Llama 3 Community License.

The training data (Hello-SimpleAI/HC3) is licensed under CC BY-SA 4.0. See: https://huggingface.co/datasets/Hello-SimpleAI/HC3

How to Cite

If you find our code or work helpful, please cite:

@misc{Saito:PUPPET:2026,
  author        = {Koshiro Saito and Ryuto Koike and Masahiro Kaneko and Naoaki Okazaki},
  title         = {{LLM} Output Detectability and Task Performance Can be Jointly Optimized},
  eprint        = {2605.01350},
  howpublished  = {arXiv:2605.01350},
  primaryClass  = {cs.CL},
  year          = {2026},
}

Citation

@misc{llama3,
    title   = {The {L}lama 3 Herd of Models},
    author  = {Aaron Grattafiori and Abhimanyu Dubey and Abhinav Jauhri and Abhinav Pandey and Abhishek Kadian and Ahmad Al-Dahle and Aiesha Letman and Akhil Mathur and Alan Schelten and Alex Vaughan and others},
    year    = {2024},
    eprint = {2407.21783},
    primaryClass={cs.CL},
    howpublished={arXiv:2407.21783}, 
}

@misc{hc3,
    title = "How Close is {C}hat{GPT} to Human Experts? Comparison Corpus, Evaluation, and Detection",
    author = "Biyang Guo and Xin Zhang and Ziyuan Wang and Minqi Jiang and Jinran Nie and Yuxuan Ding and Jianwei Yue and Yupeng Wu",
    primaryClass = {cs.CL},
    eprint = {2301.07597},
    howpublished = {arXiv:2301.07597},
    year = "2023",
}

@misc{openai_detector,
    title={Release strategies and the social impacts of language models},
    author={Irene Solaiman and Miles Brundage and Jack Clark and Amanda Askell and Ariel Herbert-Voss and Jeff Wu and Alec Radford and Gretchen Krueger and Jong Wook Kim and Sarah Kreps and others},
    primaryClass={cs.CL},
    howpublished={arXiv:1908.09203},
    year={2019},
    eprint={1908.09203},
}