Safetensors
qwen3
File size: 3,028 Bytes
824e5e9
 
 
 
033742d
 
 
 
 
 
 
548d826
 
d617cef
548d826
033742d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
548d826
 
033742d
 
3e55cfa
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
---
license: apache-2.0
datasets:
- JH976/Perovskite-RL
---


# Perovskite-RL

Perovskite-RL is a domain-adapted large language model for perovskite solar-cell additive engineering. It is trained to reason about additive molecules, defect passivation, crystallization modulation, interfacial protection, ion migration, electronic effects, and stability-related mechanisms.

Perovskite-RL is one component of a closed-loop discovery workflow for perovskite precursor additive discovery. The workflow connects literature-derived mechanism reasoning, additive candidate generation, descriptor extraction, feedback evaluation, and iterative refinement.

The workflow is available at: https://github.com/WD928/LEAP

## Model Details

- **Base model:** Qwen3-32B
- **Training pipeline:** supervised fine-tuning followed by GRPO reinforcement learning
- **Training framework:** ms-swift / Transformers / PEFT
- **Primary domain:** perovskite photovoltaics and molecular additive design

## Training Data

Perovskite-RL was trained using curated perovskite-additive reasoning data.

- **SFT training set:** 90,749 examples
- **SFT validation set:** 1,000 examples
- **GRPO dataset:** 5,800 examples

The data include literature-derived mechanism reasoning, molecular-property reasoning, and additive-selection tasks.

## Training Procedure

### SFT

The base model was first fine-tuned with LoRA on instruction-response examples for perovskite additive reasoning.

Key settings:

- LoRA fine-tuning
- Learning rate: `3e-5`
- Epochs: `2`
- Batch size per device: `1`
- Gradient accumulation: `16`
- Scheduler: cosine
- Seed: `42`

### GRPO

The SFT model was further optimized with GRPO using reward signals designed for mechanism-aware additive selection.

Key settings:

- GRPO
- LoRA rank: `16`
- LoRA alpha: `32`
- LoRA dropout: `0.05`
- Learning rate: `2e-5`
- Epochs: `1`
- Number of generations: `8`
- Max length: `8192`
- Reward focus: answer correctness, format compliance, content recall, and reasoning quality

## Evaluation

On the mechanism-consistency benchmark:

| Model | Accuracy |
|---|---:|
| Perovskite-RL | 25 / 32, 78.1% |

The benchmark tests whether a model can identify paper-specific mechanistic explanations rather than relying only on generic materials-science priors.

## Intended Use

Perovskite-RL is intended for research use in:

- perovskite additive mechanism analysis
- molecular additive hypothesis generation
- mechanistic descriptor generation
- literature-based reasoning for perovskite photovoltaics
- assisting computational screening workflows

## Limitations

- The model is not a substitute for experimental validation.
- Generated additive suggestions may be chemically invalid, commercially unavailable, or experimentally unsuitable.
- The model may overstate mechanistic confidence when evidence is incomplete.
- Use outputs as hypotheses, not final scientific conclusions.



## Citation

Please cite the associated arXiv preprint if you use this model:

https://arxiv.org/abs/2605.20242