PsychAgent-Qwen3-32B

PsychAgent-Qwen3-32B is a psychological counseling model built on top of Qwen/Qwen3-32B. It is the 32B instantiation of PsychAgent, an experience-driven lifelong learning framework for AI psychological counseling.

This checkpoint is trained from Qwen/Qwen3-32B on two training variants:

  • rft_explicit_skill_0218: the system prompt retains the skill candidate pool.
  • rft_implicit_skill_0218: the system prompt removes the skill candidate pool to encourage stronger skill internalization.

Model description

PsychAgent is designed for multi-session psychological counseling. Unlike static SFT-only counseling models, it improves longitudinal consistency and counseling quality through a closed-loop framework with three components:

  • Memory-Augmented Planning Engine (MAPE): maintains an evolving client profile and session summaries, then performs session-level planning for longitudinal continuity.
  • Skill Evolution Engine (SEE): extracts and organizes practice-grounded therapeutic skills into a hierarchical skill tree.
  • Reinforced Internalization Engine (RIE): internalizes successful counseling trajectories via rejection fine-tuning so that useful strategies become more endogenous to the model.

According to the paper, the 32B model is trained with a maximum context length of 32,768 tokens, using DeepSpeed ZeRO-3, bf16 precision, history masking, and a rollout number of 8.

Key features

  • Longitudinal multi-session counseling with memory-augmented planning.
  • Experience-driven skill evolution from historical counseling trajectories.
  • Reinforced internalization of high-quality trajectories through rejection fine-tuning.
  • Strong benchmark performance on PsychEval across both counselor-side and client-side dimensions.

Intended uses

This model is intended for:

  • research on AI psychological counseling and longitudinal dialogue agents;
  • experiments on memory, planning, skill evolution, and lifelong learning for counseling agents;
  • benchmarking on multi-session counseling settings similar to PsychEval.

Out-of-scope use and limitations

This model is not a licensed mental health professional and should not be used as a substitute for clinical care.

It should not be relied on in emergencies, crisis intervention, suicide risk handling, or any high-stakes clinical scenario requiring qualified professionals.

The paper evaluates the model on benchmarked multi-session counseling tasks rather than real-world clinical deployment. The authors also note that future work is needed for more realistic counseling settings and for stronger safety and privacy protections. Some observed improvements should therefore be interpreted as benchmark trends rather than direct clinical evidence.

Training and evaluation data

The experiments are built on PsychEval. Following the benchmark protocol, the authors first perform supervised fine-tuning on the released multi-session counseling corpus, and then reuse a pool of 2,000+ client profiles for rollout training and evaluation.

For each therapeutic school, 140 client profiles are sampled, with 120 used for training and 20 for evaluation.

Reported metrics include shared counseling metrics such as:

  • PANAS
  • RRO
  • SRS
  • CUSTOM_DIM
  • HTAIS
  • WAI
  • DIALOGUE_PLANNING

The benchmark also includes school-specific metrics such as:

  • Behavioral Therapy: MITI, STAI
  • Cognitive Behavioral Therapy: CTRS, BDI_II
  • Postmodernist Therapy: EFT_TFS, SFBT
  • Humanistic-Existential Therapy: TES, CCT
  • Psychodynamic Therapy: PSC, IPO

Performance

In the paper, PsychAgent outperforms the compared general-purpose and psychology-specific baselines on all four aggregated PsychEval dimensions.

Model Counselor Shared Counselor Specific Client Shared Client Specific
GPT-5.4 5.54 7.41 5.07 7.72
Gemini-3 5.34 7.04 4.97 7.52
Qwen3-Max 5.88 7.74 5.41 7.81
DeepSeek-V3.2 5.54 7.12 5.06 7.70
PsyLLM 5.30 4.67 5.63 7.93
PsyDTLLM 6.10 5.43 5.27 7.42
CPsyCounX 4.21 2.51 4.73 7.27
TheraMind 6.25 6.94 5.48 7.83
PsychAgent† (8B) 7.35 7.78 5.94 8.19
PsychAgent (32B) 7.32 7.91 5.92 8.24

The paper also reports human evaluation on 522 matched multi-session dialogues rated by two human annotators and one LLM rater (Gemini-3) across four dimensions:

  • Ethics
  • Interaction
  • Intervention
  • Perception

PsychAgent ranks first in all three rater columns, ahead of Qwen3-Max and TheraMind. The paper further reports moderate-to-strong inter-rater agreement, with:

  • human-human QWK = 0.675
  • LLM-human QWK = 0.770 / 0.877

Ablation summary

The paper shows that removing any of the three main modules—MAPE, SEE, or RIE—degrades performance.

Among them, removing SEE causes the largest drop in the reported ablation, suggesting that skill evolution is especially important under the reported setting.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 16
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 128
  • total_eval_batch_size: 128
  • optimizer: adamw_torch_fused
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3.0

The paper additionally states that training used:

  • bf16 precision
  • DeepSpeed ZeRO-3
  • maximum context length 32,768
  • rollout number N = 8
  • 10% warmup
  • two servers with 8 NVIDIA H200 GPUs each

Framework versions

  • Transformers 4.55.0
  • Pytorch 2.9.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "ecnu-icalk/PsychAgent-Qwen3-32B"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)

messages = [
    {
        "role": "system",
        "content": "You are a supportive and cautious psychological counseling assistant. Do not claim to be a licensed clinician.",
    },
    {
        "role": "user",
        "content": "I've been feeling increasingly anxious about job hunting and sleeping poorly for two nights. Can we talk through it step by step?",
    },
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Citation

If you use this model, please cite the PsychAgent paper.

@article{yang2026psychagent,
  title={PsychAgent: An Experience-Driven Lifelong Learning Agent for Self-Evolving Psychological Counselor},
  author={Yang, Yutao and Li, Junsong and Pan, Qianjun and Zhou, Jie and Chen, Kai and Chen, Qin and Zhao, Jingyuan and Zhou, Ningning and Li, Xin and He, Liang},
  journal={arXiv preprint arXiv},
  year={2026}
}
Downloads last month
640
Safetensors
Model size
677k params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ecnu-icalk/PsychAgent-Qwen3-32B

Base model

Qwen/Qwen3-32B
Finetuned
(470)
this model
Quantizations
2 models