JH976
/

Perovskite-RL

Model card Files Files and versions

JH976 commited on 14 days ago

Commit

033742d

·

verified ·

1 Parent(s): 824e5e9

Update README.md

Files changed (1) hide show

README.md +102 -1

README.md CHANGED Viewed

@@ -2,4 +2,105 @@
 license: apache-2.0
 datasets:
 - JH976/Perovskite-RL
----

 license: apache-2.0
 datasets:
 - JH976/Perovskite-RL
+---
+---
+license: other
+language:
+- en
+library_name: transformers
+pipeline_tag: text-generation
+base_model: Qwen/Qwen3-32B
+tags:
+- perovskite
+- materials-science
+- solar-cells
+- additive-engineering
+- sft
+- grpo
+- qwen3
+---
+# Perovskite-RL
+Perovskite-RL is a domain-adapted large language model for perovskite solar-cell additive engineering. It is trained to reason about additive molecules, defect passivation, crystallization modulation, interfacial protection, ion migration, electronic effects, and stability-related mechanisms.
+## Model Details
+- **Base model:** Qwen3-32B
+- **Training pipeline:** supervised fine-tuning followed by GRPO reinforcement learning
+- **Training framework:** ms-swift / Transformers / PEFT
+- **Primary domain:** perovskite photovoltaics and molecular additive design
+## Training Data
+Perovskite-RL was trained using curated perovskite-additive reasoning data.
+- **SFT training set:** 90,749 examples
+- **SFT validation set:** 1,000 examples
+- **GRPO dataset:** 5,800 examples
+The data include literature-derived mechanism reasoning, molecular-property reasoning, and additive-selection tasks.
+## Training Procedure
+### SFT
+The base model was first fine-tuned with LoRA on instruction-response examples for perovskite additive reasoning.
+Key settings:
+- LoRA fine-tuning
+- Learning rate: `3e-5`
+- Epochs: `2`
+- Batch size per device: `1`
+- Gradient accumulation: `16`
+- Scheduler: cosine
+- Seed: `42`
+### GRPO
+The SFT model was further optimized with GRPO using reward signals designed for mechanism-aware additive selection.
+Key settings:
+- GRPO
+- LoRA rank: `16`
+- LoRA alpha: `32`
+- LoRA dropout: `0.05`
+- Learning rate: `2e-5`
+- Epochs: `1`
+- Number of generations: `8`
+- Max length: `8192`
+- Reward focus: answer correctness, format compliance, content recall, and reasoning quality
+## Evaluation
+On the mechanism-consistency benchmark:
+| Model | Accuracy |
+|---|---:|
+| Perovskite-RL | 25 / 32, 78.1% |
+The benchmark tests whether a model can identify paper-specific mechanistic explanations rather than relying only on generic materials-science priors.
+## Intended Use
+Perovskite-RL is intended for research use in:
+- perovskite additive mechanism analysis
+- molecular additive hypothesis generation
+- mechanistic descriptor generation
+- literature-based reasoning for perovskite photovoltaics
+- assisting computational screening workflows
+## Limitations
+- The model is not a substitute for experimental validation.
+- Generated additive suggestions may be chemically invalid, commercially unavailable, or experimentally unsuitable.
+- The model may overstate mechanistic confidence when evidence is incomplete.
+- Use outputs as hypotheses, not final scientific conclusions.
+## Citation
+Citation information will be added after the manuscript is publicly available.