Ditto-8B / README.md
sunweiwei's picture
Add model card README
cdc13de verified
metadata
license: apache-2.0
base_model:
  - Qwen/Qwen3-VL-8B-Instruct
pipeline_tag: text-generation
library_name: transformers
tags:
  - human-simulation
  - role-play
  - social-intelligence

Ditto-8B

Ditto-8B is an 8B open-weight model for human behavior simulation, covering theory of mind, character role-play, social skills, learner simulation, user simulation, and persona simulation.

Method

Ditto-8B is trained with DITTO, a reinforcement learning method that uses verbal feedback as the learning signal. After each output, the model receives descriptive feedback and produces an improved version; both are jointly optimized with GRPO. This distills the verbal guidance into the policy, so no feedback is needed at inference time.

Results

Primary metric for each benchmark (higher is better).

Dim Benchmark GPT 5.5 Gemini 3.1 Pro Claude Opus 4.7 Qwen 3.6 Plus Others* Qwen3 8B Inst Ditto-8B
CONV UserLLM 65.3 67.7 57.6 72.1 44.6 46.0 91.5
CONV MirrorBench 56.7 48.3 63.7 48.0 45.4 54.0 73.4
CONV Humanual-Chat 28.2 21.0 22.6 22.2 25.8 24.7 21.0
CONV SimArena-Doc 83.4 83.0 83.5 82.4 83.5 83.6 84.4
SS Sotopia-Hard 31.9 27.8 32.4 28.3 31.7 27.7 45.8
COG Fantom 93.0 93.0 80.0 89.0 70.0 23.0 92.0
COG Hitom 82.0 86.0 93.0 73.0 56.0 62.0 79.0
COG Paratomi 99.0 97.0 90.0 94.0 75.0 67.0 95.0
COG Social-R1 69.0 79.0 67.0 67.0 47.0 54.0 50.0
ROLE Coser 66.2 62.1 66.5 55.9 30.3 43.5 64.4
ROLE Lifechoices 91.0 84.0 92.0 79.0 67.0 70.0 70.0
ROLE Twinvoice 74.0 86.0 83.0 71.0 40.0 42.0 71.0
ROLE BehaviorChain 95.0 92.0 96.0 85.0 36.0 41.0 44.0
ROLE SimArena-Math 68.5 71.5 68.7 70.9 70.5 68.9 69.6
ROLE Mistakes 72.0 73.0 74.0 67.0 56.0 27.0 36.0
ROLE Humanual-Email 50.1 46.9 50.4 47.9 42.8 43.7 40.8
ROLE Humanual-News 40.2 42.3 41.3 41.8 33.1 32.5 27.5
ROLE Humanual-Politics 42.0 32.5 43.5 31.6 34.2 33.2 29.7
EVAL AlignX 71.2 73.4 71.6 69.8 66.8 68.6 67.4
EVAL Humanllm 45.7 46.9 44.2 42.7 35.2 34.1 33.1
EVAL Socsci210 77.2 78.0 77.2 74.5 75.2 73.6 72.5
EVAL Humanual-Book 57.6 62.4 61.4 58.4 50.2 53.6 53.4
EVAL Humanual-Opinion 39.8 36.0 46.2 34.2 37.4 37.2 30.3

* Others: best result among other specialized human-simulation models (HumanLM-8B, Sotopia-RL-7B, UserLM-8B, Coser-8B).

Note. The released Ditto-8B is a single generalist distilled from a set of task-specific DITTO experts via rejection sampling on the training set.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "sunweiwei/Ditto-8B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")

messages = [{"role": "user", "content": "Hello!"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

Citation

@article{sun2026ditto,
  title         = {Reinforcing Human Behavior Simulation via Verbal Feedback},
  author        = {Sun, Weiwei and Zhou, Xuhui and Liu, Jiarui and Du, Weihua and Sun, Haojia and Xie, Yiqing and Ma, Qianou and Chen, Sihao and Wan, Mengting and Yang, Longqi and Zhou, Pei and Wu, Sherry and Welleck, Sean and Neubig, Graham and Yang, Yiming and Sap, Maarten},
  year          = {2026},
  eprint        = {2605.20506},
  archivePrefix = {arXiv},
  url           = {http://arxiv.org/abs/2605.20506}
}