PUPPET


Llama-3-PUPPET-8B-Instruct

Overview

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct trained using PUPPET (GitHub) — a framework that jointly optimizes LLM output detectability and task performance via DPO.

This model is a research artifact released to accompany the paper, "LLM Output Detectability and Task Performance Can be Jointly Optimized" (Saito et al., arXiv, 2026).

Training Details

  • Base Model: meta-llama/Meta-Llama-3-8B-Instruct [Llama 3 Community License]
  • Training Data: Hello-SimpleAI/HC3 / reddit_eli5 split [CC BY-SA 4.0]
  • Training Method: DPO (Direct Preference Optimization)
  • Preference Labeling: Assign the best sample "chosen" and the worst sample "rejected" based on the sum of these scores following -
  • Training Environment: 2× NVIDIA RTX A6000 (48 GB VRAM each), TRL 0.24.0, Transformers 4.57.1

Warnings & Disclaimer

  • This model is released for research purposes only.
  • No guarantee is made regarding the accuracy or appropriateness of model outputs.
  • Usage must comply with the Llama 3 Community License and applicable laws.

License

This model is based on Meta Llama 3 and is distributed under the Llama 3 Community License.

The training data (Hello-SimpleAI/HC3) is licensed under CC BY-SA 4.0. See: https://huggingface.co/datasets/Hello-SimpleAI/HC3

How to Cite

If you find our code or work helpful, please cite:

@misc{Saito:PUPPET:2026,
  author        = {Koshiro Saito and Ryuto Koike and Masahiro Kaneko and Naoaki Okazaki},
  title         = {{LLM} Output Detectability and Task Performance Can be Jointly Optimized},
  eprint        = {2605.01350},
  howpublished  = {arXiv:2605.01350},
  primaryClass  = {cs.CL},
  year          = {2026},
}

Citation

@misc{llama3,
    title   = {The {L}lama 3 Herd of Models},
    author  = {Aaron Grattafiori and Abhimanyu Dubey and Abhinav Jauhri and Abhinav Pandey and Abhishek Kadian and Ahmad Al-Dahle and Aiesha Letman and Akhil Mathur and Alan Schelten and Alex Vaughan and others},
    year    = {2024},
    eprint = {2407.21783},
    primaryClass={cs.CL},
    howpublished={arXiv:2407.21783}, 
}

@misc{hc3,
    title = "How Close is {C}hat{GPT} to Human Experts? Comparison Corpus, Evaluation, and Detection",
    author = "Biyang Guo and Xin Zhang and Ziyuan Wang and Minqi Jiang and Jinran Nie and Yuxuan Ding and Jianwei Yue and Yupeng Wu",
    primaryClass = {cs.CL},
    eprint = {2301.07597},
    howpublished = {arXiv:2301.07597},
    year = "2023",
}

@misc{openai_detector,
    title={Release strategies and the social impacts of language models},
    author={Irene Solaiman and Miles Brundage and Jack Clark and Amanda Askell and Ariel Herbert-Voss and Jeff Wu and Alec Radford and Gretchen Krueger and Jong Wook Kim and Sarah Kreps and others},
    primaryClass={cs.CL},
    howpublished={arXiv:1908.09203},
    year={2019},
    eprint={1908.09203},
}
Downloads last month
13
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aru-pakapaka/Llama-3-PUPPET-8B-Instruct

Finetuned
(1076)
this model

Dataset used to train aru-pakapaka/Llama-3-PUPPET-8B-Instruct

Collection including aru-pakapaka/Llama-3-PUPPET-8B-Instruct

Papers for aru-pakapaka/Llama-3-PUPPET-8B-Instruct