Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,41 @@
|
|
| 1 |
-
---
|
| 2 |
-
|
| 3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
{}
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
## References to read
|
| 6 |
+
|
| 7 |
+
[1]: Prein, T., Pan, E., Doerr, T., Olivetti, E., & Rupp, J. L. M. 2024. **MTEncoder: A Transformer-Based Framework for Materials Representation Learning.** *Materials Today*. Available at chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://openreview.net/pdf?id=wug7i3O7y1
|
| 8 |
+
<br>
|
| 9 |
+
[2] Schmidt, Jonathan, Hai-Chen Wang, Tiago F.T. Cerqueira, Silvana Botti, Aldo H. Romero, and Miguel A.L. Marques. 2024. "Improving Machine-Learning Models in Materials Science through Large Datasets." Journal Name (To be determined). https://www.sciencedirect.com/science/article/pii/S2542529324002360.
|
| 10 |
+
|
| 11 |
+
# MTEncoder (SyntMTE)
|
| 12 |
+
|
| 13 |
+
## Overview
|
| 14 |
+
|
| 15 |
+
MTEncoder is a transformer-based model for encoding materials’ elemental compositions into dense vector representations. Each material is tokenized into:
|
| 16 |
+
|
| 17 |
+
- Individual element tokens (e.g., Na, Fe, O)
|
| 18 |
+
- A special `Compound` token (`[CPD]`) that aggregates elemental information
|
| 19 |
+
|
| 20 |
+
These tokens are fed into a transformer encoder, which produces context-rich embeddings. The embedding of the `[CPD]` token serves as the learned representation of the material and is passed through an MLP head to predict various properties[1].
|
| 21 |
+
|
| 22 |
+
## Pretraining Tasks
|
| 23 |
+
|
| 24 |
+
MTEncoder is pretrained on the Alexandria dataset [2] across 12 tasks:
|
| 25 |
+
|
| 26 |
+
| Pretraining Objective |
|
| 27 |
+
|----------------------------------------------|
|
| 28 |
+
| Stress |
|
| 29 |
+
| Band Gap (Direct) |
|
| 30 |
+
| Band Gap (Indirect) |
|
| 31 |
+
| Density of States at Fermi Level |
|
| 32 |
+
| Energy Above Hull |
|
| 33 |
+
| Formation Energy |
|
| 34 |
+
| Corrected Total Energy |
|
| 35 |
+
| Phase Separation Energy |
|
| 36 |
+
| Number of Atomic Sites |
|
| 37 |
+
| Total Magnetic Moment |
|
| 38 |
+
| Crystal Space Group |
|
| 39 |
+
| Masked Element Reconstruction (Self-Supervised) |
|
| 40 |
+
|
| 41 |
+
*Table: Pretraining objectives for MTEncoder (drawn from the Alexandria materials dataset).*
|