Safetensors
qwen3

Perovskite-RL

Perovskite-RL is a domain-adapted large language model for perovskite solar-cell additive engineering. It is trained to reason about additive molecules, defect passivation, crystallization modulation, interfacial protection, ion migration, electronic effects, and stability-related mechanisms.

Perovskite-RL is one component of a closed-loop discovery workflow for perovskite precursor additive discovery. The workflow connects literature-derived mechanism reasoning, additive candidate generation, descriptor extraction, feedback evaluation, and iterative refinement.

The workflow is available at: https://github.com/WD928/LEAP

Model Details

  • Base model: Qwen3-32B
  • Training pipeline: supervised fine-tuning followed by GRPO reinforcement learning
  • Training framework: ms-swift / Transformers / PEFT
  • Primary domain: perovskite photovoltaics and molecular additive design

Training Data

Perovskite-RL was trained using curated perovskite-additive reasoning data.

  • SFT training set: 90,749 examples
  • SFT validation set: 1,000 examples
  • GRPO dataset: 5,800 examples

The data include literature-derived mechanism reasoning, molecular-property reasoning, and additive-selection tasks.

Training Procedure

SFT

The base model was first fine-tuned with LoRA on instruction-response examples for perovskite additive reasoning.

Key settings:

  • LoRA fine-tuning
  • Learning rate: 3e-5
  • Epochs: 2
  • Batch size per device: 1
  • Gradient accumulation: 16
  • Scheduler: cosine
  • Seed: 42

GRPO

The SFT model was further optimized with GRPO using reward signals designed for mechanism-aware additive selection.

Key settings:

  • GRPO
  • LoRA rank: 16
  • LoRA alpha: 32
  • LoRA dropout: 0.05
  • Learning rate: 2e-5
  • Epochs: 1
  • Number of generations: 8
  • Max length: 8192
  • Reward focus: answer correctness, format compliance, content recall, and reasoning quality

Evaluation

On the mechanism-consistency benchmark:

Model Accuracy
Perovskite-RL 25 / 32, 78.1%

The benchmark tests whether a model can identify paper-specific mechanistic explanations rather than relying only on generic materials-science priors.

Intended Use

Perovskite-RL is intended for research use in:

  • perovskite additive mechanism analysis
  • molecular additive hypothesis generation
  • mechanistic descriptor generation
  • literature-based reasoning for perovskite photovoltaics
  • assisting computational screening workflows

Limitations

  • The model is not a substitute for experimental validation.
  • Generated additive suggestions may be chemically invalid, commercially unavailable, or experimentally unsuitable.
  • The model may overstate mechanistic confidence when evidence is incomplete.
  • Use outputs as hypotheses, not final scientific conclusions.

Citation

Please cite the associated arXiv preprint if you use this model:

https://arxiv.org/abs/2605.20242

Downloads last month
41
Safetensors
Model size
33B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train JH976/Perovskite-RL

Paper for JH976/Perovskite-RL