Safetensors
qwen3
JH976 commited on
Commit
033742d
·
verified ·
1 Parent(s): 824e5e9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +102 -1
README.md CHANGED
@@ -2,4 +2,105 @@
2
  license: apache-2.0
3
  datasets:
4
  - JH976/Perovskite-RL
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: apache-2.0
3
  datasets:
4
  - JH976/Perovskite-RL
5
+ ---
6
+
7
+ ---
8
+ license: other
9
+ language:
10
+ - en
11
+ library_name: transformers
12
+ pipeline_tag: text-generation
13
+ base_model: Qwen/Qwen3-32B
14
+ tags:
15
+ - perovskite
16
+ - materials-science
17
+ - solar-cells
18
+ - additive-engineering
19
+ - sft
20
+ - grpo
21
+ - qwen3
22
+ ---
23
+
24
+ # Perovskite-RL
25
+
26
+ Perovskite-RL is a domain-adapted large language model for perovskite solar-cell additive engineering. It is trained to reason about additive molecules, defect passivation, crystallization modulation, interfacial protection, ion migration, electronic effects, and stability-related mechanisms.
27
+
28
+ ## Model Details
29
+
30
+ - **Base model:** Qwen3-32B
31
+ - **Training pipeline:** supervised fine-tuning followed by GRPO reinforcement learning
32
+ - **Training framework:** ms-swift / Transformers / PEFT
33
+ - **Primary domain:** perovskite photovoltaics and molecular additive design
34
+
35
+ ## Training Data
36
+
37
+ Perovskite-RL was trained using curated perovskite-additive reasoning data.
38
+
39
+ - **SFT training set:** 90,749 examples
40
+ - **SFT validation set:** 1,000 examples
41
+ - **GRPO dataset:** 5,800 examples
42
+
43
+ The data include literature-derived mechanism reasoning, molecular-property reasoning, and additive-selection tasks.
44
+
45
+ ## Training Procedure
46
+
47
+ ### SFT
48
+
49
+ The base model was first fine-tuned with LoRA on instruction-response examples for perovskite additive reasoning.
50
+
51
+ Key settings:
52
+
53
+ - LoRA fine-tuning
54
+ - Learning rate: `3e-5`
55
+ - Epochs: `2`
56
+ - Batch size per device: `1`
57
+ - Gradient accumulation: `16`
58
+ - Scheduler: cosine
59
+ - Seed: `42`
60
+
61
+ ### GRPO
62
+
63
+ The SFT model was further optimized with GRPO using reward signals designed for mechanism-aware additive selection.
64
+
65
+ Key settings:
66
+
67
+ - GRPO
68
+ - LoRA rank: `16`
69
+ - LoRA alpha: `32`
70
+ - LoRA dropout: `0.05`
71
+ - Learning rate: `2e-5`
72
+ - Epochs: `1`
73
+ - Number of generations: `8`
74
+ - Max length: `8192`
75
+ - Reward focus: answer correctness, format compliance, content recall, and reasoning quality
76
+
77
+ ## Evaluation
78
+
79
+ On the mechanism-consistency benchmark:
80
+
81
+ | Model | Accuracy |
82
+ |---|---:|
83
+ | Perovskite-RL | 25 / 32, 78.1% |
84
+
85
+ The benchmark tests whether a model can identify paper-specific mechanistic explanations rather than relying only on generic materials-science priors.
86
+
87
+ ## Intended Use
88
+
89
+ Perovskite-RL is intended for research use in:
90
+
91
+ - perovskite additive mechanism analysis
92
+ - molecular additive hypothesis generation
93
+ - mechanistic descriptor generation
94
+ - literature-based reasoning for perovskite photovoltaics
95
+ - assisting computational screening workflows
96
+
97
+ ## Limitations
98
+
99
+ - The model is not a substitute for experimental validation.
100
+ - Generated additive suggestions may be chemically invalid, commercially unavailable, or experimentally unsuitable.
101
+ - The model may overstate mechanistic confidence when evidence is incomplete.
102
+ - Use outputs as hypotheses, not final scientific conclusions.
103
+
104
+ ## Citation
105
+
106
+ Citation information will be added after the manuscript is publicly available.