thor1 commited on
Commit
350b5c6
·
verified ·
1 Parent(s): afe1414

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -3
README.md CHANGED
@@ -1,3 +1,41 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ {}
3
+ ---
4
+
5
+ ## References to read
6
+
7
+ [1]: Prein, T., Pan, E., Doerr, T., Olivetti, E., & Rupp, J. L. M. 2024. **MTEncoder: A Transformer-Based Framework for Materials Representation Learning.** *Materials Today*. Available at chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://openreview.net/pdf?id=wug7i3O7y1
8
+ <br>
9
+ [2] Schmidt, Jonathan, Hai-Chen Wang, Tiago F.T. Cerqueira, Silvana Botti, Aldo H. Romero, and Miguel A.L. Marques. 2024. "Improving Machine-Learning Models in Materials Science through Large Datasets." Journal Name (To be determined). https://www.sciencedirect.com/science/article/pii/S2542529324002360.
10
+
11
+ # MTEncoder (SyntMTE)
12
+
13
+ ## Overview
14
+
15
+ MTEncoder is a transformer-based model for encoding materials’ elemental compositions into dense vector representations. Each material is tokenized into:
16
+
17
+ - Individual element tokens (e.g., Na, Fe, O)
18
+ - A special `Compound` token (`[CPD]`) that aggregates elemental information
19
+
20
+ These tokens are fed into a transformer encoder, which produces context-rich embeddings. The embedding of the `[CPD]` token serves as the learned representation of the material and is passed through an MLP head to predict various properties[1].
21
+
22
+ ## Pretraining Tasks
23
+
24
+ MTEncoder is pretrained on the Alexandria dataset [2] across 12 tasks:
25
+
26
+ | Pretraining Objective |
27
+ |----------------------------------------------|
28
+ | Stress |
29
+ | Band Gap (Direct) |
30
+ | Band Gap (Indirect) |
31
+ | Density of States at Fermi Level |
32
+ | Energy Above Hull |
33
+ | Formation Energy |
34
+ | Corrected Total Energy |
35
+ | Phase Separation Energy |
36
+ | Number of Atomic Sites |
37
+ | Total Magnetic Moment |
38
+ | Crystal Space Group |
39
+ | Masked Element Reconstruction (Self-Supervised) |
40
+
41
+ *Table: Pretraining objectives for MTEncoder (drawn from the Alexandria materials dataset).*