Perovskite-RL
Perovskite-RL is a domain-adapted large language model for perovskite solar-cell additive engineering. It is trained to reason about additive molecules, defect passivation, crystallization modulation, interfacial protection, ion migration, electronic effects, and stability-related mechanisms.
Perovskite-RL is one component of a closed-loop discovery workflow for perovskite precursor additive discovery. The workflow connects literature-derived mechanism reasoning, additive candidate generation, descriptor extraction, feedback evaluation, and iterative refinement.
The workflow is available at: https://github.com/WD928/LEAP
Model Details
- Base model: Qwen3-32B
- Training pipeline: supervised fine-tuning followed by GRPO reinforcement learning
- Training framework: ms-swift / Transformers / PEFT
- Primary domain: perovskite photovoltaics and molecular additive design
Training Data
Perovskite-RL was trained using curated perovskite-additive reasoning data.
- SFT training set: 90,749 examples
- SFT validation set: 1,000 examples
- GRPO dataset: 5,800 examples
The data include literature-derived mechanism reasoning, molecular-property reasoning, and additive-selection tasks.
Training Procedure
SFT
The base model was first fine-tuned with LoRA on instruction-response examples for perovskite additive reasoning.
Key settings:
- LoRA fine-tuning
- Learning rate:
3e-5 - Epochs:
2 - Batch size per device:
1 - Gradient accumulation:
16 - Scheduler: cosine
- Seed:
42
GRPO
The SFT model was further optimized with GRPO using reward signals designed for mechanism-aware additive selection.
Key settings:
- GRPO
- LoRA rank:
16 - LoRA alpha:
32 - LoRA dropout:
0.05 - Learning rate:
2e-5 - Epochs:
1 - Number of generations:
8 - Max length:
8192 - Reward focus: answer correctness, format compliance, content recall, and reasoning quality
Evaluation
On the mechanism-consistency benchmark:
| Model | Accuracy |
|---|---|
| Perovskite-RL | 25 / 32, 78.1% |
The benchmark tests whether a model can identify paper-specific mechanistic explanations rather than relying only on generic materials-science priors.
Intended Use
Perovskite-RL is intended for research use in:
- perovskite additive mechanism analysis
- molecular additive hypothesis generation
- mechanistic descriptor generation
- literature-based reasoning for perovskite photovoltaics
- assisting computational screening workflows
Limitations
- The model is not a substitute for experimental validation.
- Generated additive suggestions may be chemically invalid, commercially unavailable, or experimentally unsuitable.
- The model may overstate mechanistic confidence when evidence is incomplete.
- Use outputs as hypotheses, not final scientific conclusions.
Citation
Please cite the associated arXiv preprint if you use this model:
- Downloads last month
- 41