| --- |
| license: apache-2.0 |
| datasets: |
| - JH976/Perovskite-RL |
| --- |
| |
|
|
| # Perovskite-RL |
|
|
| Perovskite-RL is a domain-adapted large language model for perovskite solar-cell additive engineering. It is trained to reason about additive molecules, defect passivation, crystallization modulation, interfacial protection, ion migration, electronic effects, and stability-related mechanisms. |
|
|
| Perovskite-RL is one component of a closed-loop discovery workflow for perovskite precursor additive discovery. The workflow connects literature-derived mechanism reasoning, additive candidate generation, descriptor extraction, feedback evaluation, and iterative refinement. |
|
|
| The workflow is available at: https://github.com/WD928/LEAP |
|
|
| ## Model Details |
|
|
| - **Base model:** Qwen3-32B |
| - **Training pipeline:** supervised fine-tuning followed by GRPO reinforcement learning |
| - **Training framework:** ms-swift / Transformers / PEFT |
| - **Primary domain:** perovskite photovoltaics and molecular additive design |
|
|
| ## Training Data |
|
|
| Perovskite-RL was trained using curated perovskite-additive reasoning data. |
|
|
| - **SFT training set:** 90,749 examples |
| - **SFT validation set:** 1,000 examples |
| - **GRPO dataset:** 5,800 examples |
|
|
| The data include literature-derived mechanism reasoning, molecular-property reasoning, and additive-selection tasks. |
|
|
| ## Training Procedure |
|
|
| ### SFT |
|
|
| The base model was first fine-tuned with LoRA on instruction-response examples for perovskite additive reasoning. |
|
|
| Key settings: |
|
|
| - LoRA fine-tuning |
| - Learning rate: `3e-5` |
| - Epochs: `2` |
| - Batch size per device: `1` |
| - Gradient accumulation: `16` |
| - Scheduler: cosine |
| - Seed: `42` |
|
|
| ### GRPO |
|
|
| The SFT model was further optimized with GRPO using reward signals designed for mechanism-aware additive selection. |
|
|
| Key settings: |
|
|
| - GRPO |
| - LoRA rank: `16` |
| - LoRA alpha: `32` |
| - LoRA dropout: `0.05` |
| - Learning rate: `2e-5` |
| - Epochs: `1` |
| - Number of generations: `8` |
| - Max length: `8192` |
| - Reward focus: answer correctness, format compliance, content recall, and reasoning quality |
|
|
| ## Evaluation |
|
|
| On the mechanism-consistency benchmark: |
|
|
| | Model | Accuracy | |
| |---|---:| |
| | Perovskite-RL | 25 / 32, 78.1% | |
|
|
| The benchmark tests whether a model can identify paper-specific mechanistic explanations rather than relying only on generic materials-science priors. |
|
|
| ## Intended Use |
|
|
| Perovskite-RL is intended for research use in: |
|
|
| - perovskite additive mechanism analysis |
| - molecular additive hypothesis generation |
| - mechanistic descriptor generation |
| - literature-based reasoning for perovskite photovoltaics |
| - assisting computational screening workflows |
|
|
| ## Limitations |
|
|
| - The model is not a substitute for experimental validation. |
| - Generated additive suggestions may be chemically invalid, commercially unavailable, or experimentally unsuitable. |
| - The model may overstate mechanistic confidence when evidence is incomplete. |
| - Use outputs as hypotheses, not final scientific conclusions. |
|
|
|
|
|
|
| ## Citation |
|
|
| Please cite the associated arXiv preprint if you use this model: |
|
|
| https://arxiv.org/abs/2605.20242 |
|
|