File size: 5,164 Bytes

---
license: bsd-3-clause
language:
- en
tags:
- pytorch
- materials-science
- crystallography
- x-ray-diffraction
- pxrd
- convnext
- arxiv:2603.23367
datasets:
- materials-project
metrics:
- accuracy
- mae
pipeline_tag: other
---

# AlphaDiffract — Open Weights

[arXiv](https://arxiv.org/abs/2603.23367) | [GitHub](https://github.com/AdvancedPhotonSource/AlphaDiffract)

**Automated crystallographic analysis of powder X-ray diffraction data.**

AlphaDiffract is a multi-task 1D ConvNeXt model that takes a powder X-ray diffraction (PXRD) pattern and simultaneously predicts:

| Output | Description |
|---|---|
| **Crystal system** | 7-class classification (Triclinic → Cubic) |
| **Space group** | 230-class classification |
| **Lattice parameters** | 6 values: a, b, c (Å), α, β, γ (°) |

This release contains a **single model** trained exclusively on
[Materials Project](https://next-gen.materialsproject.org/) structures
(publicly available data). It is *not* the 10-model ensemble reported in
the paper — see [Performance](#performance) for details.

## Quick Start

```bash
pip install torch safetensors numpy huggingface-hub
```

```python
import sys
import torch
import numpy as np
from huggingface_hub import snapshot_download

# Download model files
model_dir = snapshot_download("linked-liszt/OpenAlphaDiffract")

# Load model
sys.path.insert(0, model_dir)
from model import AlphaDiffract
model = AlphaDiffract.from_pretrained(model_dir, device="cpu")

# 8192-point intensity pattern, normalized to [0, 100]
pattern = np.load("my_pattern.npy").astype(np.float32)
pattern = (pattern - pattern.min()) / (pattern.max() - pattern.min() + 1e-10) * 100.0
x = torch.from_numpy(pattern).unsqueeze(0)

with torch.no_grad():
    out = model(x)

cs_probs = torch.softmax(out["cs_logits"], dim=-1)
sg_probs = torch.softmax(out["sg_logits"], dim=-1)
lp = out["lp"]  # [a, b, c, alpha, beta, gamma]

print("Crystal system:", AlphaDiffract.CRYSTAL_SYSTEMS[cs_probs.argmax().item()])
print("Space group:   #", sg_probs.argmax().item() + 1)
print("Lattice params:", lp[0].tolist())
```

See `example_inference.py` for a complete runnable example.

## Files

| File | Description |
|---|---|
| `model.safetensors` | Model weights (safetensors format, ~35 MB) |
| `model.py` | Standalone model definition (pure PyTorch, no Lightning) |
| `config.json` | Architecture and training hyperparameters |
| `maxsub.json` | Space-group subgroup graph (230×230, used as a registered buffer) |
| `example_inference.py` | End-to-end inference example |
| `LICENSE` | BSD 3-Clause |


## Input Format

- **Length:** 8192 equally-spaced intensity values
- **2θ range:** 5–20° (monochromatic, 20 keV)
- **Preprocessing:** floor negatives at zero, then rescale to [0, 100]
- **Shape:** `(batch, 8192)` or `(batch, 1, 8192)`

## Architecture

1D ConvNeXt backbone adapted from [Liu et al. (2022)](https://arxiv.org/abs/2201.03545):

```
Input (8192) → [ConvNeXt Block × 3 with AvgPool] → Flatten (560-d)
  ├─ CS head:  MLP 560→2300→1150→7    (crystal system)
  ├─ SG head:  MLP 560→2300→1150→230  (space group)
  └─ LP head:  MLP 560→512→256→6      (lattice parameters, sigmoid-bounded)
```

- **Parameters:** 8,734,989
- **Activation:** GELU
- **Stochastic depth:** 0.3
- **Head dropout:** 0.5

## Performance

This is a **single model** trained on Materials Project data only (no ICSD).
Metrics on the best validation checkpoint (epoch 11):

| Metric | Simulated Val | RRUFF (experimental) |
|---|---|---|
| Crystal system accuracy | 74.88% | 60.35% |
| Space group accuracy | 57.31% | 38.28% |
| Lattice parameter MAE | 2.71 | — |

The paper reports higher numbers from a 10-model ensemble trained on
Materials Project + ICSD combined data. This open-weights release covers
only publicly available training data.

## Training Details

| | |
|---|---|
| **Data** | ~146k Materials Project structures, 100 GSAS-II simulations each |
| **Augmentation** | Poisson + Gaussian noise, rescaled to [0, 100] |
| **Optimizer** | AdamW (lr=2e-4, weight_decay=0.01) |
| **Scheduler** | CyclicLR (triangular2, 6-epoch half-cycles) |
| **Loss** | CE (crystal system) + CE + GEMD (space group) + MSE (lattice params) |
| **Hardware** | 1× NVIDIA H100, float32 |
| **Batch size** | 64 |

## Citation

```bibtex
@article{andrejevic2026alphadiffract,
  title   = {AlphaDiffract: Automated Crystallographic Analysis of Powder X-ray Diffraction Data},
  author  = {Andrejevic, Nina and Du, Ming and Sharma, Hemant and Horwath, James P. and Luo, Aileen and Yin, Xiangyu and Prince, Michael and Toby, Brian H. and Cherukara, Mathew J.},
  year    = {2026},
  eprint  = {2603.23367},
  archivePrefix = {arXiv},
  primaryClass  = {cond-mat.mtrl-sci},
  doi     = {10.48550/arXiv.2603.23367},
  url     = {https://arxiv.org/abs/2603.23367}
}
```

## License

BSD 3-Clause — Copyright 2026 UChicago Argonne, LLC.

## Links

- [arXiv: 2603.23367](https://arxiv.org/abs/2603.23367)
- [GitHub: OpenAlphaDiffract](https://github.com/AdvancedPhotonSource/OpenAlphaDiffract)
- [GitHub: AlphaDiffract](https://github.com/AdvancedPhotonSource/AlphaDiffract)