JH976
/

Perovskite-RL

Model card Files Files and versions

Perovskite-RL / README.md

JH976's picture

Update README.md

3e55cfa verified 1 day ago

|

history blame contribute delete

3.03 kB

	---
	license: apache-2.0
	datasets:
	- JH976/Perovskite-RL
	---


	# Perovskite-RL

	Perovskite-RL is a domain-adapted large language model for perovskite solar-cell additive engineering. It is trained to reason about additive molecules, defect passivation, crystallization modulation, interfacial protection, ion migration, electronic effects, and stability-related mechanisms.

	Perovskite-RL is one component of a closed-loop discovery workflow for perovskite precursor additive discovery. The workflow connects literature-derived mechanism reasoning, additive candidate generation, descriptor extraction, feedback evaluation, and iterative refinement.

	The workflow is available at: https://github.com/WD928/LEAP

	## Model Details

	- Base model: Qwen3-32B
	- Training pipeline: supervised fine-tuning followed by GRPO reinforcement learning
	- Training framework: ms-swift / Transformers / PEFT
	- Primary domain: perovskite photovoltaics and molecular additive design

	## Training Data

	Perovskite-RL was trained using curated perovskite-additive reasoning data.

	- SFT training set: 90,749 examples
	- SFT validation set: 1,000 examples
	- GRPO dataset: 5,800 examples

	The data include literature-derived mechanism reasoning, molecular-property reasoning, and additive-selection tasks.

	## Training Procedure

	### SFT

	The base model was first fine-tuned with LoRA on instruction-response examples for perovskite additive reasoning.

	Key settings:

	- LoRA fine-tuning
	- Learning rate: `3e-5`
	- Epochs: `2`
	- Batch size per device: `1`
	- Gradient accumulation: `16`
	- Scheduler: cosine
	- Seed: `42`

	### GRPO

	The SFT model was further optimized with GRPO using reward signals designed for mechanism-aware additive selection.

	Key settings:

	- GRPO
	- LoRA rank: `16`
	- LoRA alpha: `32`
	- LoRA dropout: `0.05`
	- Learning rate: `2e-5`
	- Epochs: `1`
	- Number of generations: `8`
	- Max length: `8192`
	- Reward focus: answer correctness, format compliance, content recall, and reasoning quality

	## Evaluation

	On the mechanism-consistency benchmark:

	\| Model \| Accuracy \|
	\|---\|---:\|
	\| Perovskite-RL \| 25 / 32, 78.1% \|

	The benchmark tests whether a model can identify paper-specific mechanistic explanations rather than relying only on generic materials-science priors.

	## Intended Use

	Perovskite-RL is intended for research use in:

	- perovskite additive mechanism analysis
	- molecular additive hypothesis generation
	- mechanistic descriptor generation
	- literature-based reasoning for perovskite photovoltaics
	- assisting computational screening workflows

	## Limitations

	- The model is not a substitute for experimental validation.
	- Generated additive suggestions may be chemically invalid, commercially unavailable, or experimentally unsuitable.
	- The model may overstate mechanistic confidence when evidence is incomplete.
	- Use outputs as hypotheses, not final scientific conclusions.



	## Citation

	Please cite the associated arXiv preprint if you use this model:

	https://arxiv.org/abs/2605.20242