How to use from
SGLang
Install from pip and serve model
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "sunweiwei/Ditto-8B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sunweiwei/Ditto-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'
Use Docker images
docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "sunweiwei/Ditto-8B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sunweiwei/Ditto-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'
Quick Links

Ditto-8B

Ditto-8B is an 8B open-weight model for human behavior simulation, covering theory of mind, character role-play, social skills, learner simulation, user simulation, and persona simulation.

Method

Ditto-8B is trained with DITTO, a reinforcement learning method that uses verbal feedback as the learning signal. After each output, the model receives descriptive feedback and produces an improved version; both are jointly optimized with GRPO. This distills the verbal guidance into the policy, so no feedback is needed at inference time.

Results

Primary metric for each benchmark (higher is better).

Dim Benchmark GPT 5.5 Gemini 3.1 Pro Claude Opus 4.7 Qwen 3.6 Plus Others* Qwen3 8B Inst Ditto-8B
CONV UserLLM 65.3 67.7 57.6 72.1 44.6 46.0 91.5
CONV MirrorBench 56.7 48.3 63.7 48.0 45.4 54.0 73.4
CONV Humanual-Chat 28.2 21.0 22.6 22.2 25.8 24.7 21.0
CONV SimArena-Doc 83.4 83.0 83.5 82.4 83.5 83.6 84.4
SS Sotopia-Hard 31.9 27.8 32.4 28.3 31.7 27.7 45.8
COG Fantom 93.0 93.0 80.0 89.0 70.0 23.0 92.0
COG Hitom 82.0 86.0 93.0 73.0 56.0 62.0 79.0
COG Paratomi 99.0 97.0 90.0 94.0 75.0 67.0 95.0
COG Social-R1 69.0 79.0 67.0 67.0 47.0 54.0 50.0
ROLE Coser 66.2 62.1 66.5 55.9 30.3 43.5 64.4
ROLE Lifechoices 91.0 84.0 92.0 79.0 67.0 70.0 70.0
ROLE Twinvoice 74.0 86.0 83.0 71.0 40.0 42.0 71.0
ROLE BehaviorChain 95.0 92.0 96.0 85.0 36.0 41.0 44.0
ROLE SimArena-Math 68.5 71.5 68.7 70.9 70.5 68.9 69.6
ROLE Mistakes 72.0 73.0 74.0 67.0 56.0 27.0 36.0
ROLE Humanual-Email 50.1 46.9 50.4 47.9 42.8 43.7 40.8
ROLE Humanual-News 40.2 42.3 41.3 41.8 33.1 32.5 27.5
ROLE Humanual-Politics 42.0 32.5 43.5 31.6 34.2 33.2 29.7
EVAL AlignX 71.2 73.4 71.6 69.8 66.8 68.6 67.4
EVAL Humanllm 45.7 46.9 44.2 42.7 35.2 34.1 33.1
EVAL Socsci210 77.2 78.0 77.2 74.5 75.2 73.6 72.5
EVAL Humanual-Book 57.6 62.4 61.4 58.4 50.2 53.6 53.4
EVAL Humanual-Opinion 39.8 36.0 46.2 34.2 37.4 37.2 30.3

* Others: best result among other specialized human-simulation models (HumanLM-8B, Sotopia-RL-7B, UserLM-8B, Coser-8B).

Note. The released Ditto-8B is a single generalist distilled from a set of task-specific DITTO experts via rejection sampling on the training set.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "sunweiwei/Ditto-8B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")

messages = [{"role": "user", "content": "Hello!"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

Citation

@article{sun2026ditto,
  title         = {Reinforcing Human Behavior Simulation via Verbal Feedback},
  author        = {Sun, Weiwei and Zhou, Xuhui and Liu, Jiarui and Du, Weihua and Sun, Haojia and Xie, Yiqing and Ma, Qianou and Chen, Sihao and Wan, Mengting and Yang, Longqi and Zhou, Pei and Wu, Sherry and Welleck, Sean and Neubig, Graham and Yang, Yiming and Sap, Maarten},
  year          = {2026},
  eprint        = {2605.20506},
  archivePrefix = {arXiv},
  url           = {http://arxiv.org/abs/2605.20506}
}
Downloads last month
18
Safetensors
Model size
9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sunweiwei/Ditto-8B

Finetuned
(283)
this model
Quantizations
1 model

Paper for sunweiwei/Ditto-8B