--- license: apache-2.0 datasets: - JH976/Perovskite-RL --- # Perovskite-RL Perovskite-RL is a domain-adapted large language model for perovskite solar-cell additive engineering. It is trained to reason about additive molecules, defect passivation, crystallization modulation, interfacial protection, ion migration, electronic effects, and stability-related mechanisms. Perovskite-RL is one component of a closed-loop discovery workflow for perovskite precursor additive discovery. The workflow connects literature-derived mechanism reasoning, additive candidate generation, descriptor extraction, feedback evaluation, and iterative refinement. The workflow is available at: https://github.com/WD928/LEAP ## Model Details - **Base model:** Qwen3-32B - **Training pipeline:** supervised fine-tuning followed by GRPO reinforcement learning - **Training framework:** ms-swift / Transformers / PEFT - **Primary domain:** perovskite photovoltaics and molecular additive design ## Training Data Perovskite-RL was trained using curated perovskite-additive reasoning data. - **SFT training set:** 90,749 examples - **SFT validation set:** 1,000 examples - **GRPO dataset:** 5,800 examples The data include literature-derived mechanism reasoning, molecular-property reasoning, and additive-selection tasks. ## Training Procedure ### SFT The base model was first fine-tuned with LoRA on instruction-response examples for perovskite additive reasoning. Key settings: - LoRA fine-tuning - Learning rate: `3e-5` - Epochs: `2` - Batch size per device: `1` - Gradient accumulation: `16` - Scheduler: cosine - Seed: `42` ### GRPO The SFT model was further optimized with GRPO using reward signals designed for mechanism-aware additive selection. Key settings: - GRPO - LoRA rank: `16` - LoRA alpha: `32` - LoRA dropout: `0.05` - Learning rate: `2e-5` - Epochs: `1` - Number of generations: `8` - Max length: `8192` - Reward focus: answer correctness, format compliance, content recall, and reasoning quality ## Evaluation On the mechanism-consistency benchmark: | Model | Accuracy | |---|---:| | Perovskite-RL | 25 / 32, 78.1% | The benchmark tests whether a model can identify paper-specific mechanistic explanations rather than relying only on generic materials-science priors. ## Intended Use Perovskite-RL is intended for research use in: - perovskite additive mechanism analysis - molecular additive hypothesis generation - mechanistic descriptor generation - literature-based reasoning for perovskite photovoltaics - assisting computational screening workflows ## Limitations - The model is not a substitute for experimental validation. - Generated additive suggestions may be chemically invalid, commercially unavailable, or experimentally unsuitable. - The model may overstate mechanistic confidence when evidence is incomplete. - Use outputs as hypotheses, not final scientific conclusions. ## Citation Please cite the associated arXiv preprint if you use this model: https://arxiv.org/abs/2605.20242