Update README.md
Browse files---
license: other
language:
- en
library_name: transformers
pipeline_tag: text-generation
base_model: Qwen/Qwen3-32B
tags:
- perovskite
- materials-science
- solar-cells
- additive-engineering
- sft
- grpo
- qwen3
---
# Perovskite-RL
Perovskite-RL is a domain-adapted large language model for perovskite solar-cell additive engineering. It is trained to reason about additive molecules, defect passivation, crystallization modulation, interfacial protection, ion migration, electronic effects, and stability-related mechanisms.
## Model Details
- **Base model:** Qwen3-32B
- **Training pipeline:** supervised fine-tuning followed by GRPO reinforcement learning
- **Training framework:** ms-swift / Transformers / PEFT
- **Primary domain:** perovskite photovoltaics and molecular additive design
## Training Data
Perovskite-RL was trained using curated perovskite-additive reasoning data.
- **SFT training set:** 90,749 examples
- **SFT validation set:** 1,000 examples
- **GRPO dataset:** 5,800 examples
The data include literature-derived mechanism reasoning, molecular-property reasoning, and additive-selection tasks.
## Training Procedure
### SFT
The base model was first fine-tuned with LoRA on instruction-response examples for perovskite additive reasoning.
Key settings:
- LoRA fine-tuning
- Learning rate: `3e-5`
- Epochs: `2`
- Batch size per device: `1`
- Gradient accumulation: `16`
- Scheduler: cosine
- Seed: `42`
### GRPO
The SFT model was further optimized with GRPO using reward signals designed for mechanism-aware additive selection.
Key settings:
- GRPO
- LoRA rank: `16`
- LoRA alpha: `32`
- LoRA dropout: `0.05`
- Learning rate: `2e-5`
- Epochs: `1`
- Number of generations: `8`
- Max length: `8192`
- Reward focus: answer correctness, format compliance, content recall, and reasoning quality
## Evaluation
On the mechanism-consistency benchmark:
| Model | Accuracy |
|---|---:|
| Perovskite-RL | 25 / 32, 78.1% |
The benchmark tests whether a model can identify paper-specific mechanistic explanations rather than relying only on generic materials-science priors.
## Intended Use
Perovskite-RL is intended for research use in:
- perovskite additive mechanism analysis
- molecular additive hypothesis generation
- mechanistic descriptor generation
- literature-based reasoning for perovskite photovoltaics
- assisting computational screening workflows
## Limitations
- The model is not a substitute for experimental validation.
- Generated additive suggestions may be chemically invalid, commercially unavailable, or experimentally unsuitable.
- The model may overstate mechanistic confidence when evidence is incomplete.
- Use outputs as hypotheses, not final scientific conclusions.
## Citation
Citation information will be added after the manuscript is publicly available.