PsychAgent-Qwen3-32B

PsychAgent-Qwen3-32B is a psychological counseling model built on top of Qwen/Qwen3-32B. It is the 32B instantiation of PsychAgent, an experience-driven lifelong learning framework for AI psychological counseling.

This checkpoint is trained from Qwen/Qwen3-32B on two training variants:

rft_explicit_skill_0218: the system prompt retains the skill candidate pool.
rft_implicit_skill_0218: the system prompt removes the skill candidate pool to encourage stronger skill internalization.

Model description

PsychAgent is designed for multi-session psychological counseling. Unlike static SFT-only counseling models, it improves longitudinal consistency and counseling quality through a closed-loop framework with three components:

Memory-Augmented Planning Engine (MAPE): maintains an evolving client profile and session summaries, then performs session-level planning for longitudinal continuity.
Skill Evolution Engine (SEE): extracts and organizes practice-grounded therapeutic skills into a hierarchical skill tree.
Reinforced Internalization Engine (RIE): internalizes successful counseling trajectories via rejection fine-tuning so that useful strategies become more endogenous to the model.

According to the paper, the 32B model is trained with a maximum context length of 32,768 tokens, using DeepSpeed ZeRO-3, bf16 precision, history masking, and a rollout number of 8.

Key features

Longitudinal multi-session counseling with memory-augmented planning.
Experience-driven skill evolution from historical counseling trajectories.
Reinforced internalization of high-quality trajectories through rejection fine-tuning.
Strong benchmark performance on PsychEval across both counselor-side and client-side dimensions.

Intended uses

This model is intended for:

research on AI psychological counseling and longitudinal dialogue agents;
experiments on memory, planning, skill evolution, and lifelong learning for counseling agents;
benchmarking on multi-session counseling settings similar to PsychEval.

Out-of-scope use and limitations

This model is not a licensed mental health professional and should not be used as a substitute for clinical care.

It should not be relied on in emergencies, crisis intervention, suicide risk handling, or any high-stakes clinical scenario requiring qualified professionals.

The paper evaluates the model on benchmarked multi-session counseling tasks rather than real-world clinical deployment. The authors also note that future work is needed for more realistic counseling settings and for stronger safety and privacy protections. Some observed improvements should therefore be interpreted as benchmark trends rather than direct clinical evidence.

Training and evaluation data

The experiments are built on PsychEval. Following the benchmark protocol, the authors first perform supervised fine-tuning on the released multi-session counseling corpus, and then reuse a pool of 2,000+ client profiles for rollout training and evaluation.

For each therapeutic school, 140 client profiles are sampled, with 120 used for training and 20 for evaluation.

Reported metrics include shared counseling metrics such as:

PANAS
RRO
SRS
CUSTOM_DIM
HTAIS
WAI
DIALOGUE_PLANNING

The benchmark also includes school-specific metrics such as:

Behavioral Therapy: MITI, STAI
Cognitive Behavioral Therapy: CTRS, BDI_II
Postmodernist Therapy: EFT_TFS, SFBT
Humanistic-Existential Therapy: TES, CCT
Psychodynamic Therapy: PSC, IPO

Performance

In the paper, PsychAgent outperforms the compared general-purpose and psychology-specific baselines on all four aggregated PsychEval dimensions.

Model	Counselor Shared	Counselor Specific	Client Shared	Client Specific
GPT-5.4	5.54	7.41	5.07	7.72
Gemini-3	5.34	7.04	4.97	7.52
Qwen3-Max	5.88	7.74	5.41	7.81
DeepSeek-V3.2	5.54	7.12	5.06	7.70
PsyLLM	5.30	4.67	5.63	7.93
PsyDTLLM	6.10	5.43	5.27	7.42
CPsyCounX	4.21	2.51	4.73	7.27
TheraMind	6.25	6.94	5.48	7.83
PsychAgent† (8B)	7.35	7.78	5.94	8.19
PsychAgent (32B)	7.32	7.91	5.92	8.24

The paper also reports human evaluation on 522 matched multi-session dialogues rated by two human annotators and one LLM rater (Gemini-3) across four dimensions:

Ethics
Interaction
Intervention
Perception

PsychAgent ranks first in all three rater columns, ahead of Qwen3-Max and TheraMind. The paper further reports moderate-to-strong inter-rater agreement, with:

human-human QWK = 0.675
LLM-human QWK = 0.770 / 0.877

Ablation summary

The paper shows that removing any of the three main modules—MAPE, SEE, or RIE—degrades performance.

Among them, removing SEE causes the largest drop in the reported ablation, suggesting that skill evolution is especially important under the reported setting.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 1
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 16
gradient_accumulation_steps: 8
total_train_batch_size: 128
total_eval_batch_size: 128
optimizer: adamw_torch_fused
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3.0

The paper additionally states that training used:

bf16 precision
DeepSpeed ZeRO-3
maximum context length 32,768
rollout number N = 8
10% warmup
two servers with 8 NVIDIA H200 GPUs each

Framework versions

Transformers 4.55.0
Pytorch 2.9.0+cu128
Datasets 3.6.0
Tokenizers 0.21.1

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "ecnu-icalk/PsychAgent-Qwen3-32B"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)

messages = [
    {
        "role": "system",
        "content": "You are a supportive and cautious psychological counseling assistant. Do not claim to be a licensed clinician.",
    },
    {
        "role": "user",
        "content": "I've been feeling increasingly anxious about job hunting and sleeping poorly for two nights. Can we talk through it step by step?",
    },
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Citation

If you use this model, please cite the PsychAgent paper.

@article{yang2026psychagent,
  title={PsychAgent: An Experience-Driven Lifelong Learning Agent for Self-Evolving Psychological Counselor},
  author={Yang, Yutao and Li, Junsong and Pan, Qianjun and Zhou, Jie and Chen, Kai and Chen, Qin and Zhao, Jingyuan and Zhou, Ningning and Li, Xin and He, Liang},
  journal={arXiv preprint arXiv},
  year={2026}
}

Downloads last month: 640

Safetensors

Model size

677k params

Tensor type

BF16

Model tree for ecnu-icalk/PsychAgent-Qwen3-32B

Base model

Qwen/Qwen3-32B

Finetuned

(470)

this model

Quantizations

2 models