Perovskite-RL

Perovskite-RL is a domain-adapted large language model for perovskite solar-cell additive engineering. It is trained to reason about additive molecules, defect passivation, crystallization modulation, interfacial protection, ion migration, electronic effects, and stability-related mechanisms.

Perovskite-RL is one component of a closed-loop discovery workflow for perovskite precursor additive discovery. The workflow connects literature-derived mechanism reasoning, additive candidate generation, descriptor extraction, feedback evaluation, and iterative refinement.

The workflow is available at: https://github.com/WD928/LEAP

Model Details

Base model: Qwen3-32B
Training pipeline: supervised fine-tuning followed by GRPO reinforcement learning
Training framework: ms-swift / Transformers / PEFT
Primary domain: perovskite photovoltaics and molecular additive design

Training Data

Perovskite-RL was trained using curated perovskite-additive reasoning data.

SFT training set: 90,749 examples
SFT validation set: 1,000 examples
GRPO dataset: 5,800 examples

The data include literature-derived mechanism reasoning, molecular-property reasoning, and additive-selection tasks.

Training Procedure

SFT

The base model was first fine-tuned with LoRA on instruction-response examples for perovskite additive reasoning.

Key settings:

LoRA fine-tuning
Learning rate: 3e-5
Epochs: 2
Batch size per device: 1
Gradient accumulation: 16
Scheduler: cosine
Seed: 42

GRPO

The SFT model was further optimized with GRPO using reward signals designed for mechanism-aware additive selection.

Key settings:

GRPO
LoRA rank: 16
LoRA alpha: 32
LoRA dropout: 0.05
Learning rate: 2e-5
Epochs: 1
Number of generations: 8
Max length: 8192
Reward focus: answer correctness, format compliance, content recall, and reasoning quality

Evaluation

On the mechanism-consistency benchmark:

Model	Accuracy
Perovskite-RL	25 / 32, 78.1%

The benchmark tests whether a model can identify paper-specific mechanistic explanations rather than relying only on generic materials-science priors.

Intended Use

Perovskite-RL is intended for research use in:

perovskite additive mechanism analysis
molecular additive hypothesis generation
mechanistic descriptor generation
literature-based reasoning for perovskite photovoltaics
assisting computational screening workflows

Limitations

The model is not a substitute for experimental validation.
Generated additive suggestions may be chemically invalid, commercially unavailable, or experimentally unsuitable.
The model may overstate mechanistic confidence when evidence is incomplete.
Use outputs as hypotheses, not final scientific conclusions.

Citation

Please cite the associated arXiv preprint if you use this model:

https://arxiv.org/abs/2605.20242

Downloads last month: 41

Safetensors

Model size

33B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train JH976/Perovskite-RL

Paper for JH976/Perovskite-RL

LEAP: A closed-loop framework for perovskite precursor additive discovery

Paper • 2605.20242 • Published 4 days ago