new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

May 13

MatterSim: A Deep Learning Atomistic Model Across Elements, Temperatures and Pressures

Accurate and fast prediction of materials properties is central to the digital transformation of materials design. However, the vast design space and diverse operating conditions pose significant challenges for accurately modeling arbitrary material candidates and forecasting their properties. We present MatterSim, a deep learning model actively learned from large-scale first-principles computations, for efficient atomistic simulations at first-principles level and accurate prediction of broad material properties across the periodic table, spanning temperatures from 0 to 5000 K and pressures up to 1000 GPa. Out-of-the-box, the model serves as a machine learning force field, and shows remarkable capabilities not only in predicting ground-state material structures and energetics, but also in simulating their behavior under realistic temperatures and pressures, signifying an up to ten-fold enhancement in precision compared to the prior best-in-class. This enables MatterSim to compute materials' lattice dynamics, mechanical and thermodynamic properties, and beyond, to an accuracy comparable with first-principles methods. Specifically, MatterSim predicts Gibbs free energies for a wide range of inorganic solids with near-first-principles accuracy and achieves a 15 meV/atom resolution for temperatures up to 1000K compared with experiments. This opens an opportunity to predict experimental phase diagrams of materials at minimal computational cost. Moreover, MatterSim also serves as a platform for continuous learning and customization by integrating domain-specific data. The model can be fine-tuned for atomistic simulations at a desired level of theory or for direct structure-to-property predictions, achieving high data efficiency with a reduction in data requirements by up to 97%.

  • 22 authors
·
May 9, 2024

Crystal Diffusion Variational Autoencoder for Periodic Material Generation

Generating the periodic structure of stable materials is a long-standing challenge for the material design community. This task is difficult because stable materials only exist in a low-dimensional subspace of all possible periodic arrangements of atoms: 1) the coordinates must lie in the local energy minimum defined by quantum mechanics, and 2) global stability also requires the structure to follow the complex, yet specific bonding preferences between different atom types. Existing methods fail to incorporate these factors and often lack proper invariances. We propose a Crystal Diffusion Variational Autoencoder (CDVAE) that captures the physical inductive bias of material stability. By learning from the data distribution of stable materials, the decoder generates materials in a diffusion process that moves atomic coordinates towards a lower energy state and updates atom types to satisfy bonding preferences between neighbors. Our model also explicitly encodes interactions across periodic boundaries and respects permutation, translation, rotation, and periodic invariances. We significantly outperform past methods in three tasks: 1) reconstructing the input structure, 2) generating valid, diverse, and realistic materials, and 3) generating materials that optimize a specific property. We also provide several standard datasets and evaluation metrics for the broader machine learning community.

  • 5 authors
·
Oct 12, 2021

Accelerating the Search for Superconductors Using Machine Learning

Prediction of critical temperature (T_c) of a superconductor remains a significant challenge in condensed matter physics. While the BCS theory explains superconductivity in conventional superconductors, there is no framework to predict T_c of unconventional, higher T_{c} superconductors. Quantum Structure Diagrams (QSD) were successful in establishing structure-property relationship for superconductors, quasicrystals, and ferroelectric materials starting from chemical composition. Building on the QSD ideas, we demonstrate that the principal component analysis of superconductivity data uncovers the clustering of various classes of superconductors. We use machine learning analysis and cleaned databases of superconductors to develop predictive models of T_c of a superconductor using its chemical composition. Earlier studies relied on datasets with inconsistencies, leading to suboptimal predictions. To address this, we introduce a data-cleaning workflow to enhance the statistical quality of superconducting databases by eliminating redundancies and resolving inconsistencies. With this improvised database, we apply a supervised machine learning framework and develop a Random Forest model to predict superconductivity and T_c as a function of descriptors motivated from Quantum Structure Diagrams. We demonstrate that this model generalizes effectively in reasonably accurate prediction of T_{c} of compounds outside the database. We further employ our model to systematically screen materials across materials databases as well as various chemically plausible combinations of elements and predict Tl_{5}Ba_{6}Ca_{6}Cu_{9}O_{29} to exhibit superconductivity with a T_{c} sim 105 K. Being based on the descriptors used in QSD's, our model bypasses structural information and predicts T_{c} merely from the chemical composition.

  • 2 authors
·
May 17, 2025

MatterGen: a generative model for inorganic materials design

The design of functional materials with desired properties is essential in driving technological advances in areas like energy storage, catalysis, and carbon capture. Generative models provide a new paradigm for materials design by directly generating entirely novel materials given desired property constraints. Despite recent progress, current generative models have low success rate in proposing stable crystals, or can only satisfy a very limited set of property constraints. Here, we present MatterGen, a model that generates stable, diverse inorganic materials across the periodic table and can further be fine-tuned to steer the generation towards a broad range of property constraints. To enable this, we introduce a new diffusion-based generative process that produces crystalline structures by gradually refining atom types, coordinates, and the periodic lattice. We further introduce adapter modules to enable fine-tuning towards any given property constraints with a labeled dataset. Compared to prior generative models, structures produced by MatterGen are more than twice as likely to be novel and stable, and more than 15 times closer to the local energy minimum. After fine-tuning, MatterGen successfully generates stable, novel materials with desired chemistry, symmetry, as well as mechanical, electronic and magnetic properties. Finally, we demonstrate multi-property materials design capabilities by proposing structures that have both high magnetic density and a chemical composition with low supply-chain risk. We believe that the quality of generated materials and the breadth of MatterGen's capabilities represent a major advancement towards creating a universal generative model for materials design.

  • 21 authors
·
Dec 6, 2023

CLOUD: A Scalable and Physics-Informed Foundation Model for Crystal Representation Learning

The prediction of crystal properties is essential for understanding structure-property relationships and accelerating the discovery of functional materials. However, conventional approaches relying on experimental measurements or density functional theory (DFT) calculations are often resource-intensive, limiting their scalability. Machine learning (ML) models offer a promising alternative by learning complex structure-property relationships from data, enabling faster predictions. Yet, existing ML models often rely on labeled data, adopt representations that poorly capture essential structural characteristics, and lack integration with physical principles--factors that limit their generalizability and interpretability. Here, we introduce CLOUD (Crystal Language mOdel for Unified and Differentiable materials modeling), a transformer-based framework trained on a novel Symmetry-Consistent Ordered Parameter Encoding (SCOPE) that encodes crystal symmetry, Wyckoff positions, and composition in a compact, coordinate-free string representation. Pre-trained on over six million crystal structures, CLOUD is fine-tuned on multiple downstream tasks and achieves competitive performance in predicting a wide range of material properties, demonstrating strong scaling performance. Furthermore, as proof of concept of differentiable materials modeling, CLOUD is applied to predict the phonon internal energy and heat capacity, which integrates the Debye model to preserve thermodynamic consistency. The CLOUD-DEBYE framework enforces thermodynamic consistency and enables temperature-dependent property prediction without requiring additional data. These results demonstrate the potential of CLOUD as a scalable and physics-informed foundation model for crystalline materials, unifying symmetry-consistent representations with physically grounded learning for property prediction and materials discovery.

  • 3 authors
·
Jun 18, 2025

Machine learning for materials discovery: two-dimensional topological insulators

One of the main goals and challenges of materials discovery is to find the best candidates for each interest property or application. Machine learning rises in this context to efficiently optimize this search, exploring the immense materials space, consisting of simultaneously the atomic, compositional, and structural spaces. Topological insulators, presenting symmetry-protected metallic edge states, are a promising class of materials for different applications. However, further, development is limited by the scarcity of viable candidates. Here we present and discuss machine learning-accelerated strategies for searching the materials space for two-dimensional topological materials. We show the importance of detailed investigations of each machine learning component, leading to different results. Using recently created databases containing thousands of ab initio calculations of 2D materials, we train machine learning models capable of determining the electronic topology of materials, with an accuracy of over 90%. We can then generate and screen thousands of novel materials, efficiently predicting their topological character without the need for a priori structural knowledge. We discover 56 non-trivial materials, of which 17 novel insulating candidates for further investigation, for which we corroborate their topological properties with density functional theory calculations. This strategy is 10times more efficient than the trial-and-error approach while few orders of magnitude faster and is a proof of concept for guiding improved materials discovery search strategies.

  • 3 authors
·
Jul 14, 2021

2D Theoretically Twistable Material Database

The study of twisted two-dimensional (2D) materials, where twisting layers create moiré superlattices, has opened new opportunities for investigating topological phases and strongly correlated physics. While systems such as twisted bilayer graphene (TBG) and twisted transition metal dichalcogenides (TMDs) have been extensively studied, the broader potential of a seemingly infinite set of other twistable 2D materials remains largely unexplored. In this paper, we define "theoretically twistable materials" as single- or multi-layer structures that allow for the construction of simple continuum models of their moiré structures. This excludes, for example, materials with a "spaghetti" of bands or those with numerous crossing points at the Fermi level, for which theoretical moiré modeling is unfeasible. We present a high-throughput algorithm that systematically searches for theoretically twistable semimetals and insulators based on the Topological 2D Materials Database. By analyzing key electronic properties, we identify thousands of new candidate materials that could host rich topological and strongly correlated phenomena when twisted. We propose representative twistable materials for realizing different types of moiré systems, including materials with different Bravais lattices, valleys, and strength of spin-orbital coupling. We provide examples of crystal growth for several of these materials and showcase twisted bilayer band structures along with simplified twisted continuum models. Our results significantly broaden the scope of moiré heterostructures and provide a valuable resource for future experimental and theoretical studies on novel moiré systems.

  • 25 authors
·
Nov 14, 2024

MatterGPT: A Generative Transformer for Multi-Property Inverse Design of Solid-State Materials

Inverse design of solid-state materials with desired properties represents a formidable challenge in materials science. Although recent generative models have demonstrated potential, their adoption has been hindered by limitations such as inefficiency, architectural constraints and restricted open-source availability. The representation of crystal structures using the SLICES (Simplified Line-Input Crystal-Encoding System) notation as a string of characters enables the use of state-of-the-art natural language processing models, such as Transformers, for crystal design. Drawing inspiration from the success of GPT models in generating coherent text, we trained a generative Transformer on the next-token prediction task to generate solid-state materials with targeted properties. We demonstrate MatterGPT's capability to generate de novo crystal structures with targeted single properties, including both lattice-insensitive (formation energy) and lattice-sensitive (band gap) properties. Furthermore, we extend MatterGPT to simultaneously target multiple properties, addressing the complex challenge of multi-objective inverse design of crystals. Our approach showcases high validity, uniqueness, and novelty in generated structures, as well as the ability to generate materials with properties beyond the training data distribution. This work represents a significant step forward in computational materials discovery, offering a powerful and open tool for designing materials with tailored properties for various applications in energy, electronics, and beyond.

  • 8 authors
·
Aug 14, 2024

Ground State Preparation via Dynamical Cooling

Quantum algorithms for probing ground-state properties of quantum systems require good initial states. Projection-based methods such as eigenvalue filtering rely on inputs that have a significant overlap with the low-energy subspace, which can be challenging for large, strongly-correlated systems. This issue has motivated the study of physically-inspired dynamical approaches such as thermodynamic cooling. In this work, we introduce a ground-state preparation algorithm based on the simulation of quantum dynamics. Our main insight is to transform the Hamiltonian by a shifted sign function via quantum signal processing, effectively mapping eigenvalues into positive and negative subspaces separated by a large gap. This automatically ensures that all states within each subspace conserve energy with respect to the transformed Hamiltonian. Subsequent time-evolution with a perturbed Hamiltonian induces transitions to lower-energy states while preventing unwanted jumps to higher energy states. The approach does not rely on a priori knowledge of energy gaps and requires no additional qubits to model a bath. Furthermore, it makes mathcal{O}(d^{,3/2}/epsilon) queries to the time-evolution operator of the system and mathcal{O}(d^{,3/2}) queries to a block-encoding of the perturbation, for d cooling steps and an epsilon-accurate energy resolution. Our results provide a framework for combining quantum signal processing and Hamiltonian simulation to design heuristic quantum algorithms for ground-state preparation.

  • 4 authors
·
Apr 8, 2024

A Graph Neural Network for the Era of Large Atomistic Models

Foundation models, or large atomistic models (LAMs), aim to universally represent the ground-state potential energy surface (PES) of atomistic systems as defined by density functional theory (DFT). The scaling law is pivotal in the development of large models, suggesting that their generalizability in downstream tasks consistently improves with increased model size, expanded training datasets, and larger computational budgets. In this study, we present DPA3, a multi-layer graph neural network founded on line graph series (LiGS), designed explicitly for the era of LAMs. We demonstrate that the generalization error of the DPA3 model adheres to the scaling law. The scalability in the number of model parameters is attained by stacking additional layers within DPA3. Additionally, the model employs a dataset encoding mechanism that decouples the scaling of training data size from the model size within its multi-task training framework. When trained as problem-oriented potential energy models, the DPA3 model exhibits superior accuracy in the majority of benchmark cases, encompassing systems with diverse features, including molecules, bulk materials, surface and cluster catalysts, two-dimensional materials, and battery materials. When trained as a LAM on the OpenLAM-v1 dataset, the DPA-3.1-3M model exhibits state-of-the-art performance in the LAMBench benchmark suite for LAMs, demonstrating lowest overall zero-shot generalization error across 17 downstream tasks from a broad spectrum of research domains. This performance suggests superior accuracy as an out-of-the-box potential model, requiring minimal fine-tuning data for downstream scientific applications.

  • 14 authors
·
Jun 2, 2025

L^2M^3OF: A Large Language Multimodal Model for Metal-Organic Frameworks

Large language models have demonstrated remarkable reasoning capabilities across diverse natural language tasks. However, comparable breakthroughs in scientific discovery are more limited, because understanding complex physical phenomena demands multifaceted representations far beyond language alone. A compelling example is the design of functional materials such as MOFs-critical for a range of impactful applications like carbon capture and hydrogen storage. Navigating their vast and intricate design space in language-based representations interpretable by LLMs is challenging due to the numerous possible three-dimensional atomic arrangements and strict reticular rules of coordination geometry and topology. Despite promising early results in LLM-assisted discovery for simpler materials systems, MOF design remains heavily reliant on tacit human expertise rarely codified in textual information alone. To overcome this barrier, we introduce L2M3OF, the first multimodal LLM for MOFs. L2M3OF integrates crystal representation learning with language understanding to process structural, textual, and knowledge modalities jointly. L2M3OF employs a pre-trained crystal encoder with a lightweight projection layer to compress structural information into a token space, enabling efficient alignment with language instructions. To facilitate training and evaluation, we curate a structure-property-knowledge database of crystalline materials and benchmark L2M3OF against state-of-the-art closed-source LLMs such as GPT-5, Gemini-2.5-Pro and DeepSeek-R1. Experiments show that L2M3OF outperforms leading text-based closed-source LLMs in property prediction and knowledge generation tasks, despite using far fewer parameters. These results highlight the importance of multimodal approaches for porous material understanding and establish L2M3OF as a foundation for next-generation AI systems in materials discovery.

  • 7 authors
·
Oct 23, 2025 2

Generative AI for Discovering Porous Oxide Materials for Next-Generation Energy Storage

The key challenge in advancing multivalent-ion batteries lies in finding suitable intercalation hosts. Open-tunnel oxides, featuring one-dimensional channels or nanopores, show promise for enabling effective ion transport. However, the vast range of compositional possibilities renders traditional experimental and quantum-based methods impractical for large-scale studies. This work presents a generative AI framework that uses the Crystal Diffusion Variational Autoencoder (CDVAE) and a fine-tuned Large Language Model (LLM) to expedite the discovery of stable open-tunneled oxide materials for multivalent-ion batteries. By combining machine learning with data mining techniques, five promising transition metal oxide (TMO) structures are generated. These structures, known for forming open-tunnel oxide frameworks, are structurally validated through Density Functional Theory (DFT). The results show that the generated structures have lower formation energies compared to similar compositions in the Materials Project (MP) database, indicating improved thermodynamic stability. Additionally, the graph-based M3GNet model is employed to relax further generated structures, providing a more computationally efficient alternative to DFT. Machine learning-based predictions of formation energy, band gap, and energy above the hull refine the selection process, leading to the identification of materials with significant potential for real-world battery applications. This research demonstrates the power of generative AI in rapidly exploring the vast chemical space of TMOs, offering a new approach to discovering stable open-tunnel oxides for multivalent-ion batteries. The results highlight the potential of this approach to contribute to more sustainable energy storage technologies, addressing the growing concerns surrounding the scarcity of lithium.

  • 4 authors
·
Oct 8, 2024

Scalable Diffusion for Materials Generation

Generative models trained on internet-scale data are capable of generating novel and realistic texts, images, and videos. A natural next question is whether these models can advance science, for example by generating novel stable materials. Traditionally, models with explicit structures (e.g., graphs) have been used in modeling structural relationships in scientific data (e.g., atoms and bonds in crystals), but generating structures can be difficult to scale to large and complex systems. Another challenge in generating materials is the mismatch between standard generative modeling metrics and downstream applications. For instance, common metrics such as the reconstruction error do not correlate well with the downstream goal of discovering stable materials. In this work, we tackle the scalability challenge by developing a unified crystal representation that can represent any crystal structure (UniMat), followed by training a diffusion probabilistic model on these UniMat representations. Our empirical results suggest that despite the lack of explicit structure modeling, UniMat can generate high fidelity crystal structures from larger and more complex chemical systems, outperforming previous graph-based approaches under various generative modeling metrics. To better connect the generation quality of materials to downstream applications, such as discovering novel stable materials, we propose additional metrics for evaluating generative models of materials, including per-composition formation energy and stability with respect to convex hulls through decomposition energy from Density Function Theory (DFT). Lastly, we show that conditional generation with UniMat can scale to previously established crystal datasets with up to millions of crystals structures, outperforming random structure search (the current leading method for structure discovery) in discovering new stable materials.

  • 7 authors
·
Oct 18, 2023

Generative Hierarchical Materials Search

Generative models trained at scale can now produce text, video, and more recently, scientific data such as crystal structures. In applications of generative approaches to materials science, and in particular to crystal structures, the guidance from the domain expert in the form of high-level instructions can be essential for an automated system to output candidate crystals that are viable for downstream research. In this work, we formulate end-to-end language-to-structure generation as a multi-objective optimization problem, and propose Generative Hierarchical Materials Search (GenMS) for controllable generation of crystal structures. GenMS consists of (1) a language model that takes high-level natural language as input and generates intermediate textual information about a crystal (e.g., chemical formulae), and (2) a diffusion model that takes intermediate information as input and generates low-level continuous value crystal structures. GenMS additionally uses a graph neural network to predict properties (e.g., formation energy) from the generated crystal structures. During inference, GenMS leverages all three components to conduct a forward tree search over the space of possible structures. Experiments show that GenMS outperforms other alternatives of directly using language models to generate structures both in satisfying user request and in generating low-energy structures. We confirm that GenMS is able to generate common crystal structures such as double perovskites, or spinels, solely from natural language input, and hence can form the foundation for more complex structure generation in near future.

  • 10 authors
·
Sep 10, 2024 4

First Order Quantum Phase Transition in the Hybrid Metal-Mott Insulator Transition Metal Dichalcogenide 4Hb-TaS2

Coupling together distinct correlated and topologically non-trivial electronic phases of matter can potentially induce novel electronic orders and phase transitions among them. Transition metal dichalcogenide compounds serve as a bedrock for exploration of such hybrid systems. They host a variety of exotic electronic phases and their Van der Waals nature enables to admix them, either by exfoliation and stacking or by stoichiometric growth, and thereby induce novel correlated complexes. Here we investigate the compound 4Hb-TaS_2 that interleaves the Mott-insulating state of 1T-TaS_2 and the putative spin liquid it hosts together with the metallic state of 2H-TaS_2 and the low temperature superconducting phase it harbors. We reveal a thermodynamic phase diagram that hosts a first order quantum phase transition between a correlated Kondo cluster state and a flat band state in which the Kondo cluster becomes depleted. We demonstrate that this intrinsic transition can be induced by an electric field and temperature as well as by manipulation of the interlayer coupling with the probe tip, hence allowing to reversibly toggle between the Kondo cluster and the flat band states. The phase transition is manifested by a discontinuous change of the complete electronic spectrum accompanied by hysteresis and low frequency noise. We find that the shape of the transition line in the phase diagram is determined by the local compressibility and the entropy of the two electronic states. Our findings set such heterogeneous structures as an exciting platform for systematic investigation and manipulation of Mott-metal transitions and strongly correlated phases and quantum phase transitions therein.

  • 11 authors
·
Mar 2, 2023

Unified Micromechanics Theory of Composites

We consider the matrix composite materials (CM) of either random (statistically homogeneous or inhomogeneous), periodic, or deterministic (neither random nor periodic) structures. CMs exhibit linear or nonlinear behavior, coupled or uncoupled multi-physical phenomena, locally elastic, weakly nonlocal (strain gradient and stress gradient), or strongly nonlocal (strain-type and displacement-type, peridynamics) phase properties. A modified Computational Analytical Micromechanics (CAM) approach introduces an exact Additive General Integral Equation (AGIE) for CMs of any structure and phase properties mentioned above. The unified iteration solution of static AGIEs is adapted to the body force with compact support serving as a fundamentally new universal training parameter. The approach also establishes a critical threshold for filtering out unsuitable sub-datasets of effective parameters through a novel Representative Volume Element (RVE) concept, which extends Hill's classical framework. This RVE concept eliminates sample size, boundary layer, and edge effects, making it applicable to CMs of any structure and phase properties, regardless of local or nonlocal, linear or nonlinear. Incorporating this new RVE concept into machine learning and neural network techniques enables the construction of any unpredefined surrogate nonlocal operators. The methodology is structured as a modular, block-based framework, allowing independent development and refinement of software components. This flexible, robust AGIE-CAM framework integrates data-driven, multi-scale, and multi-physics modeling, accelerating research in CM of any microtopology and phase properties considered. The AGIE-CAM framework represents a groundbreaking paradigm shift in the micromechanics of composites, redefining the very philosophy that underpins our understanding of their behavior at the microscopic level.

  • 1 authors
·
Mar 15, 2025

Automated Extraction of Material Properties using LLM-based AI Agents

The rapid discovery of materials is constrained by the lack of large, machine-readable datasets that couple performance metrics with structural context. Existing databases are either small, manually curated, or biased toward first principles results, leaving experimental literature underexploited. We present an agentic, large language model (LLM)-driven workflow that autonomously extracts thermoelectric and structural-properties from about 10,000 full-text scientific articles. The pipeline integrates dynamic token allocation, zeroshot multi-agent extraction, and conditional table parsing to balance accuracy against computational cost. Benchmarking on 50 curated papers shows that GPT-4.1 achieves the highest accuracy (F1 = 0.91 for thermoelectric properties and 0.82 for structural fields), while GPT-4.1 Mini delivers nearly comparable performance (F1 = 0.89 and 0.81) at a fraction of the cost, enabling practical large scale deployment. Applying this workflow, we curated 27,822 temperature resolved property records with normalized units, spanning figure of merit (ZT), Seebeck coefficient, conductivity, resistivity, power factor, and thermal conductivity, together with structural attributes such as crystal class, space group, and doping strategy. Dataset analysis reproduces known thermoelectric trends, such as the superior performance of alloys over oxides and the advantage of p-type doping, while also surfacing broader structure-property correlations. To facilitate community access, we release an interactive web explorer with semantic filters, numeric queries, and CSV export. This study delivers the largest LLM-curated thermoelectric dataset to date, provides a reproducible and cost-profiled extraction pipeline, and establishes a foundation for scalable, data-driven materials discovery beyond thermoelectrics.

  • 2 authors
·
Sep 23, 2025

Strong correlation behavior and Strong coupling superconductivity in (Ti1/4Hf1/4Nb1/4Ta1/4)1-xNix with the rich magnetic element Ni

Searching for new superconductors, especially unconventional superconductors, has been studied extensively for decades but remains one of the major outstanding challenges in condensed matter physics. Medium/high-entropy alloys (MEAs-HEAs) are new fertile soils of unconventional superconductors and generate widespread interest and questions on the existence of superconductivity in highly disordered materials. Here, we report on the effect of Ni-doped on the crystal structure and superconductivity properties of strongly coupled TiHfNbTa MEA. XRD results indicate that the maximum solid solution of (Ti1/4Hf1/4Nb1/4Ta1/4)1-xNix is about 7.7%. Resistivity, magnetic susceptibility, and specific heat measurements demonstrated that (Ti1/4Hf1/4Nb1/4Ta1/4)1-xNix HEAs are all bulk type-II superconductors and follow the trend of the increase of Tc with the increase of Ni-doped contents. The specific heat jump of all (Ti1/4Hf1/4Nb1/4Ta1/4)1-xNix are much larger than the BCS value of 1.43, suggesting all these HEAs are strongly coupled superconductors. Additionally, large Kadawaki-Woods ratio values suggest that there is a strong electron correlation effect in this system. The (Ti1/4Hf1/4Nb1/4Ta1/4)1-xNix HEA system is a new ideal material platform for the study of strong correlation behavior and strongly coupled superconductivity, which provides an insight into the physics of high-temperature superconductors or other unconventional superconductors.

  • 11 authors
·
Jul 29, 2025

LLaMP: Large Language Model Made Powerful for High-fidelity Materials Knowledge Retrieval and Distillation

Reducing hallucination of Large Language Models (LLMs) is imperative for use in the sciences where reproducibility is crucial. However, LLMs inherently lack long-term memory, making it a nontrivial, ad hoc, and inevitably biased task to fine-tune them on domain-specific literature and data. Here we introduce LLaMP, a multimodal retrieval-augmented generation (RAG) framework of multiple data-aware reasoning-and-acting (ReAct) agents that dynamically interact with computational and experimental data on Materials Project (MP). Without fine-tuning, LLaMP demonstrates an ability to comprehend and integrate various modalities of materials science concepts, fetch relevant data stores on the fly, process higher-order data (such as crystal structures and elastic tensors), and summarize multi-step procedures for solid-state synthesis. We show that LLaMP effectively corrects errors in GPT-3.5's intrinsic knowledge, reducing a 5.21% MAPE on frequently-documented bandgaps and a significant 1103.54% MAPE on formation energies -- errors that GPT-3.5 seems to derive from mixed data sources. Additionally, LLaMP substantially reduces the hallucinated volumetric strain in a diamond cubic silicon structure from 66.3% to 0. The proposed framework offers an intuitive and nearly hallucination-free approach to exploring materials informatics and establishes a pathway for knowledge distillation and fine-tuning other language models. We envision the framework as a valuable component for scientific hypotheses and a foundation for future autonomous laboratories where multiple LLM agents communicate and cooperate with robotics to drive material synthesis and chemical reactions without hard-coded human logic and intervention.

  • 3 authors
·
Jan 30, 2024

Crystal Transformer: Self-learning neural language model for Generative and Tinkering Design of Materials

Self-supervised neural language models have recently achieved unprecedented success, from natural language processing to learning the languages of biological sequences and organic molecules. These models have demonstrated superior performance in the generation, structure classification, and functional predictions for proteins and molecules with learned representations. However, most of the masking-based pre-trained language models are not designed for generative design, and their black-box nature makes it difficult to interpret their design logic. Here we propose BLMM Crystal Transformer, a neural network based probabilistic generative model for generative and tinkering design of inorganic materials. Our model is built on the blank filling language model for text generation and has demonstrated unique advantages in learning the "materials grammars" together with high-quality generation, interpretability, and data efficiency. It can generate chemically valid materials compositions with as high as 89.7\% charge neutrality and 84.8\% balanced electronegativity, which are more than 4 and 8 times higher compared to a pseudo random sampling baseline. The probabilistic generation process of BLMM allows it to recommend tinkering operations based on learned materials chemistry and makes it useful for materials doping. Combined with the TCSP crysal structure prediction algorithm, We have applied our model to discover a set of new materials as validated using DFT calculations. Our work thus brings the unsupervised transformer language models based generative artificial intelligence to inorganic materials. A user-friendly web app has been developed for computational materials doping and can be accessed freely at www.materialsatlas.org/blmtinker.

  • 7 authors
·
Apr 25, 2022

Crystal Structure Generation with Autoregressive Large Language Modeling

The generation of plausible crystal structures is often the first step in predicting the structure and properties of a material from its chemical composition. Quickly generating and predicting inorganic crystal structures is important for the discovery of new materials, which can target applications such as energy or electronic devices. However, most current methods for crystal structure prediction are computationally expensive, slowing the pace of innovation. Seeding structure prediction algorithms with quality generated candidates can overcome a major bottleneck. Here, we introduce CrystaLLM, a methodology for the versatile generation of crystal structures, based on the autoregressive large language modeling (LLM) of the Crystallographic Information File (CIF) format. Trained on millions of CIF files, CrystaLLM focuses on modeling crystal structures through text. CrystaLLM can produce plausible crystal structures for a wide range of inorganic compounds unseen in training, as demonstrated by ab initio simulations. The integration with predictors of formation energy permits the use of a Monte Carlo Tree Search algorithm to improve the generation of meaningful structures. Our approach challenges conventional representations of crystals, and demonstrates the potential of LLMs for learning effective 'world models' of crystal chemistry, which will lead to accelerated discovery and innovation in materials science.

  • 3 authors
·
Jul 10, 2023

All that structure matches does not glitter

Generative models for materials, especially inorganic crystals, hold potential to transform the theoretical prediction of novel compounds and structures. Advancement in this field depends critically on robust benchmarks and minimal, information-rich datasets that enable meaningful model evaluation. This paper critically examines common datasets and reported metrics for a crystal structure prediction taskx2014generating the most likely structures given the chemical composition of a material. We focus on three key issues: First, materials datasets should contain unique crystal structures; for example, we show that the widely-utilized carbon-24 dataset only contains approx40% unique structures. Second, materials datasets should not be split randomly if polymorphs of many different compositions are numerous, which we find to be the case for the perov-5 dataset. Third, benchmarks can mislead if used uncritically, e.g., reporting a match rate metric without considering the structural variety exhibited by identical building blocks. To address these oft-overlooked issues, we introduce several fixes. We provide revised versions of the carbon-24 dataset: one with duplicates removed, one deduplicated and split by number of atoms N, and two containing only identical structures but with different unit cells. We also propose a new split for the perov-5 dataset which ensures polymorphs are grouped within each split subset, setting a more sensible standard for benchmarking model performance. Finally, we present METRe and cRMSE, new model evaluation metrics that can correct existing issues with the match rate metric.

  • 10 authors
·
Sep 15, 2025

Φeat: Physically-Grounded Feature Representation

Foundation models have emerged as effective backbones for many vision tasks. However, current self-supervised features entangle high-level semantics with low-level physical factors, such as geometry and illumination, hindering their use in tasks requiring explicit physical reasoning. In this paper, we introduce Φeat, a novel physically-grounded visual backbone that encourages a representation sensitive to material identity, including reflectance cues and geometric mesostructure. Our key idea is to employ a pretraining strategy that contrasts spatial crops and physical augmentations of the same material under varying shapes and lighting conditions. While similar data have been used in high-end supervised tasks such as intrinsic decomposition or material estimation, we demonstrate that a pure self-supervised training strategy, without explicit labels, already provides a strong prior for tasks requiring robust features invariant to external physical factors. We evaluate the learned representations through feature similarity analysis and material selection, showing that Φeat captures physically-grounded structure beyond semantic grouping. These findings highlight the promise of unsupervised physical feature learning as a foundation for physics-aware perception in vision and graphics. These findings highlight the promise of unsupervised physical feature learning as a foundation for physics-aware perception in vision and graphics.

adobe Adobe
·
Nov 14, 2025 2

Roadmap: 2D Materials for Quantum Technologies

Two-dimensional (2D) materials have emerged as a versatile and powerful platform for quantum technologies, offering atomic-scale control, strong quantum confinement, and seamless integration into heterogeneous device architectures. Their reduced dimensionality enables unique quantum phenomena, including optically addressable spin defects, tunable single-photon emitters, low-dimensional magnetism, gate-controlled superconductivity, and correlated states in Moiré superlattices. This Roadmap provides a comprehensive overview of recent progress and future directions in exploiting 2D materials for quantum sensing, computation, communication, and simulation. We survey advances spanning spin defects and quantum sensing, quantum emitters and nonlinear photonics, computational theory and data-driven discovery of quantum defects, spintronic and magnonic devices, cavity-engineered quantum materials, superconducting and hybrid quantum circuits, quantum dots, Moiré quantum simulators, and quantum communication platforms. Across these themes, we identify common challenges in defect control, coherence preservation, interfacial engineering, and scalable integration, alongside emerging opportunities driven by machine-learning-assisted design and integrated experiment-theory feedback loops. By connecting microscopic quantum states to mesoscopic excitations and macroscopic device architectures, this Roadmap outlines a materials-centric framework for integrating coherent quantum functionalities and positions 2D materials as foundational building blocks for next-generation quantum technologies.

  • 32 authors
·
Dec 16, 2025

Agentic Fusion of Large Atomic and Language Models to Accelerate Superconductors Discovery

The discovery of novel materials is critical for global energy and quantum technology transitions. While deep learning has fundamentally reshaped this landscape, existing predictive or generative models typically operate in isolation, lacking the autonomous orchestration required to execute the full discovery process. Here we present ElementsClaw, an agentic framework for materials discovery that synergizes Large Atomic Models (LAMs) with Large Language Models (LLMs). In response to varied human queries, ElementsClaw orchestrates a suite of LAM tools finetuned from our proposed 1-billion-parameter model Elements for atomic-scale numerical computation, while leveraging LLMs for high-level semantic reasoning. This shift moves AI-driven materials science from isolated processes toward integrated and human interactive discovery. Applied to superconductors, ElementsClaw screens 2.4 million crystals in just 28 GPU hours to identify 68,000 high-confidence candidates (The complete dataset of screened superconductors is available at https://developer.damo-academy.com/material), expanding known superconducting space by orders of magnitude compared to datasets curated over decades. Critically, ElementsClaw achieves a high success rate in identifying superconductors hidden in literature and discovers four novel experimentally verified superconductors, exemplified by Zr3ScRe8 with a transition temperature of 6.8 K and HfZrRe4 at 6.7 K. Together, our results establish a knowledge integrated, autonomously orchestrated, and experimentally grounded paradigm for materials discovery.

  • 19 authors
·
Apr 28 2

JARVIS-Leaderboard: A Large Scale Benchmark of Materials Design Methods

Lack of rigorous reproducibility and validation are major hurdles for scientific development across many fields. Materials science in particular encompasses a variety of experimental and theoretical approaches that require careful benchmarking. Leaderboard efforts have been developed previously to mitigate these issues. However, a comprehensive comparison and benchmarking on an integrated platform with multiple data modalities with both perfect and defect materials data is still lacking. This work introduces JARVIS-Leaderboard, an open-source and community-driven platform that facilitates benchmarking and enhances reproducibility. The platform allows users to set up benchmarks with custom tasks and enables contributions in the form of dataset, code, and meta-data submissions. We cover the following materials design categories: Artificial Intelligence (AI), Electronic Structure (ES), Force-fields (FF), Quantum Computation (QC) and Experiments (EXP). For AI, we cover several types of input data, including atomic structures, atomistic images, spectra, and text. For ES, we consider multiple ES approaches, software packages, pseudopotentials, materials, and properties, comparing results to experiment. For FF, we compare multiple approaches for material property predictions. For QC, we benchmark Hamiltonian simulations using various quantum algorithms and circuits. Finally, for experiments, we use the inter-laboratory approach to establish benchmarks. There are 1281 contributions to 274 benchmarks using 152 methods with more than 8 million data-points, and the leaderboard is continuously expanding. The JARVIS-Leaderboard is available at the website: https://pages.nist.gov/jarvis_leaderboard

  • 38 authors
·
Jun 20, 2023

AQVolt26: High-Temperature r^2SCAN Halide Dataset for Universal ML Potentials and Solid-State Batteries

The demand for safe, high-energy-density batteries has spotlighted halide solid-state electrolytes, which offer the potential for enhanced ionic mobility, electrochemical stability, and interfacial deformability. Accelerating their discovery requires extensive molecular dynamics, which has been increasingly enabled by universal machine learning interatomic potentials trained on foundational datasets. However, the dynamic softness of halides poses a stringent test of whether general-purpose models can reliably replace first-principles calculations under the highly distorted, elevated-temperature regimes necessary to probe ion transport. Here, we present AQVolt26, a dataset of 322,656 r^2SCAN single-point calculations for lithium halides, generated via high-temperature configurational sampling across sim5K structures. We demonstrate that foundational datasets provide a strong baseline for stable halide chemistries and transfer local forces well, however absolute energy predictions degrade in distorted higher-temperature regimes. Co-training with AQVolt26 resolves this blind spot. Furthermore, incorporating Materials Project relaxation data improves near-equilibrium performance but degrades extreme-strain robustness without enhancing high-temperature force accuracy. These results demonstrate that domain-specific configurational sampling is essential for the reliable dynamic screening of halide electrolytes. Furthermore, our findings suggest that while foundational models provide a robust base, they are most effective for dynamically soft solid-state chemistries when augmented with targeted, high-temperature data. Finally, we show that near-equilibrium relaxation data serves as a task-specific complement rather than a universally beneficial addition.

  • 9 authors
·
Apr 1

Rise and Fall of Anderson Localization by Lattice Vibrations: A Time-Dependent Machine Learning Approach

The intricate relationship between electrons and the crystal lattice is a linchpin in condensed matter, traditionally described by the Fr\"ohlich model encompassing the lowest-order lattice-electron coupling. Recently developed quantum acoustics, emphasizing the wave nature of lattice vibrations, has enabled the exploration of previously uncharted territories of electron-lattice interaction not accessible with conventional tools such as perturbation theory. In this context, our agenda here is two-fold. First, we showcase the application of machine learning methods to categorize various interaction regimes within the subtle interplay of electrons and the dynamical lattice landscape. Second, we shed light on a nebulous region of electron dynamics identified by the machine learning approach and then attribute it to transient localization, where strong lattice vibrations result in a momentary Anderson prison for electronic wavepackets, which are later released by the evolution of the lattice. Overall, our research illuminates the spectrum of dynamics within the Fr\"ohlich model, such as transient localization, which has been suggested as a pivotal factor contributing to the mysteries surrounding strange metals. Furthermore, this paves the way for utilizing time-dependent perspectives in machine learning techniques for designing materials with tailored electron-lattice properties.

  • 4 authors
·
May 27, 2024

S2SNet: A Pretrained Neural Network for Superconductivity Discovery

Superconductivity allows electrical current to flow without any energy loss, and thus making solids superconducting is a grand goal of physics, material science, and electrical engineering. More than 16 Nobel Laureates have been awarded for their contribution to superconductivity research. Superconductors are valuable for sustainable development goals (SDGs), such as climate change mitigation, affordable and clean energy, industry, innovation and infrastructure, and so on. However, a unified physics theory explaining all superconductivity mechanism is still unknown. It is believed that superconductivity is microscopically due to not only molecular compositions but also the geometric crystal structure. Hence a new dataset, S2S, containing both crystal structures and superconducting critical temperature, is built upon SuperCon and Material Project. Based on this new dataset, we propose a novel model, S2SNet, which utilizes the attention mechanism for superconductivity prediction. To overcome the shortage of data, S2SNet is pre-trained on the whole Material Project dataset with Masked-Language Modeling (MLM). S2SNet makes a new state-of-the-art, with out-of-sample accuracy of 92% and Area Under Curve (AUC) of 0.92. To the best of our knowledge, S2SNet is the first work to predict superconductivity with only information of crystal structures. This work is beneficial to superconductivity discovery and further SDGs. Code and datasets are available in https://github.com/zjuKeLiu/S2SNet

  • 4 authors
·
Jun 28, 2023

Observing the Rosensweig instability of a quantum ferrofluid

Ferrofluids show unusual hydrodynamic effects due to the magnetic nature of their constituents. For increasing magnetization a classical ferrofluid undergoes a Rosensweig instability and creates self-organized ordered surface structures or droplet crystals. A Bose-Einstein condensate with strong dipolar interactions is a quantum ferrofluid that also shows superfluidity. The field of dipolar quantum gases is motivated by the search for new phases that break continuous symmetries. The simultaneous breaking of continuous symmetries like the phase invariance for the superfluid state and the translational symmetry for a crystal provides the basis of novel states of matter. However, interaction-induced crystallization in a superfluid has not been observed. Here we use in situ imaging to directly observe the spontaneous transition from an unstructured superfluid to an ordered arrangement of droplets in an atomic dysprosium Bose-Einstein condensate. By utilizing a Feshbach resonance to control the interparticle interactions, we induce a finite-wavelength instability and observe discrete droplets in a triangular structure, growing with increasing atom number. We find that these states are surprisingly long-lived and measure a hysteretic behaviour, which is typical for a crystallization process and in close analogy to the Rosensweig instability. Our system can show both superfluidity and, as shown here, spontaneous translational symmetry breaking. The presented observations do not probe superfluidity in the structured states, but if the droplets establish a common phase via weak links, this system is a very good candidate for a supersolid ground state.

  • 7 authors
·
Aug 20, 2015

Multi-property directed generative design of inorganic materials through Wyckoff-augmented transfer learning

Accelerated materials discovery is an urgent demand to drive advancements in fields such as energy conversion, storage, and catalysis. Property-directed generative design has emerged as a transformative approach for rapidly discovering new functional inorganic materials with multiple desired properties within vast and complex search spaces. However, this approach faces two primary challenges: data scarcity for functional properties and the multi-objective optimization required to balance competing tasks. Here, we present a multi-property-directed generative framework designed to overcome these limitations and enhance site symmetry-compliant crystal generation beyond P1 (translational) symmetry. By incorporating Wyckoff-position-based data augmentation and transfer learning, our framework effectively handles sparse and small functional datasets, enabling the generation of new stable materials simultaneously conditioned on targeted space group, band gap, and formation energy. Using this approach, we identified previously unknown thermodynamically and lattice-dynamically stable semiconductors in tetragonal, trigonal, and cubic systems, with bandgaps ranging from 0.13 to 2.20 eV, as validated by density functional theory (DFT) calculations. Additionally, we assessed their thermoelectric descriptors using DFT, indicating their potential suitability for thermoelectric applications. We believe our integrated framework represents a significant step forward in generative design of inorganic materials.

  • 6 authors
·
Mar 20, 2025

Discovery and recovery of crystalline materials with property-conditioned transformers

Generative models have recently shown great promise for accelerating the design and discovery of new functional materials. Conditional generation enhances this capacity by allowing inverse design, where specific desired properties can be requested during the generation process. However, conditioning of transformer-based approaches, in particular, is constrained by discrete tokenisation schemes and the risk of catastrophic forgetting during fine-tuning. This work introduces CrystaLLM-π (property injection), a conditional autoregressive framework that integrates continuous property representations directly into the transformer's attention mechanism. Two architectures, Property-Key-Value (PKV) Prefix attention and PKV Residual attention, are presented. These methods bypass inefficient sequence-level tokenisation and preserve foundational knowledge from unsupervised pre-training on Crystallographic Information Files (CIFs) as textual input. We establish the efficacy of these mechanisms through systematic robustness studies and evaluate the framework's versatility across two distinct tasks. First, for structure recovery, the model processes high-dimensional, heterogeneous X-ray diffraction patterns, achieving structural accuracy competitive with specialised models and demonstrating applications to experimental structure recovery and polymorph differentiation. Second, for materials discovery, the model is fine-tuned on a specialised photovoltaic dataset to generate novel, stable candidates validated by Density Functional Theory (DFT). It implicitly learns to target optimal band gap regions for high photovoltaic efficiency, demonstrating a capability to map complex structure-property relationships. CrystaLLM-π provides a unified, flexible, and computationally efficient framework for inverse materials design.

  • 8 authors
·
Nov 26, 2025

Materials Expert-Artificial Intelligence for Materials Discovery

The advent of material databases provides an unprecedented opportunity to uncover predictive descriptors for emergent material properties from vast data space. However, common reliance on high-throughput ab initio data necessarily inherits limitations of such data: mismatch with experiments. On the other hand, experimental decisions are often guided by an expert's intuition honed from experiences that are rarely articulated. We propose using machine learning to "bottle" such operational intuition into quantifiable descriptors using expertly curated measurement-based data. We introduce "Materials Expert-Artificial Intelligence" (ME-AI) to encapsulate and articulate this human intuition. As a first step towards such a program, we focus on the topological semimetal (TSM) among square-net materials as the property inspired by the expert-identified descriptor based on structural information: the tolerance factor. We start by curating a dataset encompassing 12 primary features of 879 square-net materials, using experimental data whenever possible. We then use Dirichlet-based Gaussian process regression using a specialized kernel to reveal composite descriptors for square-net topological semimetals. The ME-AI learned descriptors independently reproduce expert intuition and expand upon it. Specifically, new descriptors point to hypervalency as a critical chemical feature predicting TSM within square-net compounds. Our success with a carefully defined problem points to the "machine bottling human insight" approach as promising for machine learning-aided material discovery.

  • 8 authors
·
Dec 5, 2023

3D Multiphase Heterogeneous Microstructure Generation Using Conditional Latent Diffusion Models

The ability to generate 3D multiphase microstructures on-demand with targeted attributes can greatly accelerate the design of advanced materials. Here, we present a conditional latent diffusion model (LDM) framework that rapidly synthesizes high-fidelity 3D multiphase microstructures tailored to user specifications. Using this approach, we generate diverse two-phase and three-phase microstructures at high resolution (volumes of 128 times 128 times 64 voxels, representing >10^6 voxels each) within seconds, overcoming the scalability and time limitations of traditional simulation-based methods. Key design features, such as desired volume fractions and tortuosities, are incorporated as controllable inputs to guide the generative process, ensuring that the output structures meet prescribed statistical and topological targets. Moreover, the framework predicts corresponding manufacturing (processing) parameters for each generated microstructure, helping to bridge the gap between digital microstructure design and experimental fabrication. While demonstrated on organic photovoltaic (OPV) active-layer morphologies, the flexible architecture of our approach makes it readily adaptable to other material systems and microstructure datasets. By combining computational efficiency, adaptability, and experimental relevance, this framework addresses major limitations of existing methods and offers a powerful tool for accelerated materials discovery.

  • 6 authors
·
Mar 12, 2025

DiffCrysGen: A Score-Based Diffusion Model for Design of Diverse Inorganic Crystalline Materials

Crystal structure generation is a foundational challenge in materials discovery, particularly in designing functional inorganic crystalline materials with desired properties. Most existing diffusion-based generative models for crystals rely on complex, hand-crafted priors and modular architectures to separately model atom types, atomic positions, and lattice parameters. These methods often require customized diffusion processes and conditional denoising, which can introduce additional model complexities and inconsistencies. Here we introduce DiffCrysGen, a fully data-driven, score-based diffusion model that jointly learns the distribution of all structural components in crystalline materials. With crystal structure representation as unified 2D matrices, DiffCrysGen bypasses the need for task-specific priors or decoupled modules, enabling end-to-end generation of atom types, fractional coordinates, and lattice parameters within a single framework. Our model learns crystallographic symmetry and chemical validity directly from large-scale datasets, allowing it to scale to complex materials discovery tasks. As a demonstration, we applied DiffCrysGen to the design of rare-earth-free magnetic materials with high saturation magnetization, showing its effectiveness in generating stable, diverse, and property-aligned candidates for sustainable magnet applications.

  • 3 authors
·
May 12, 2025

MT-CGCNN: Integrating Crystal Graph Convolutional Neural Network with Multitask Learning for Material Property Prediction

Developing accurate, transferable and computationally inexpensive machine learning models can rapidly accelerate the discovery and development of new materials. Some of the major challenges involved in developing such models are, (i) limited availability of materials data as compared to other fields, (ii) lack of universal descriptor of materials to predict its various properties. The limited availability of materials data can be addressed through transfer learning, while the generic representation was recently addressed by Xie and Grossman [1], where they developed a crystal graph convolutional neural network (CGCNN) that provides a unified representation of crystals. In this work, we develop a new model (MT-CGCNN) by integrating CGCNN with transfer learning based on multi-task (MT) learning. We demonstrate the effectiveness of MT-CGCNN by simultaneous prediction of various material properties such as Formation Energy, Band Gap and Fermi Energy for a wide range of inorganic crystals (46774 materials). MT-CGCNN is able to reduce the test error when employed on correlated properties by upto 8%. The model prediction has lower test error compared to CGCNN, even when the training data is reduced by 10%. We also demonstrate our model's better performance through prediction of end user scenario related to metal/non-metal classification. These results encourage further development of machine learning approaches which leverage multi-task learning to address the aforementioned challenges in the discovery of new materials. We make MT-CGCNN's source code available to encourage reproducible research.

  • 7 authors
·
Nov 14, 2018

An inorganic ABX3 perovskite materials dataset for target property prediction and classification using machine learning

The reliability with Machine Learning (ML) techniques in novel materials discovery often depend on the quality of the dataset, in addition to the relevant features used in describing the material. In this regard, the current study presents and validates a newly processed materials dataset that can be utilized for benchmark ML analysis, as it relates to the prediction and classification of deterministic target properties. Originally, the dataset was extracted from the Open Quantum Materials Database (OQMD) and contains a robust 16,323 samples of ABX3 inorganic perovskite structures. The dataset is tabular in form and is preprocessed to include sixty-one generalized input features that broadly describes the physicochemical, stability/geometrical, and Density Functional Theory (DFT) target properties associated with the elemental ionic sites in a three-dimensional ABX3 polyhedral. For validation, four different ML models are employed to predict three distinctive target properties, namely: formation energy, energy band gap, and crystal system. On experimentation, the best accuracy measurements are reported at 0.013 eV/atom MAE, 0.216 eV MAE, and 85% F1, corresponding to the formation energy prediction, band gap prediction and crystal system multi-classification, respectively. Moreover, the realized results are compared with previous literature and as such, affirms the resourcefulness of the current dataset for future benchmark materials analysis via ML techniques. The preprocessed dataset and source codes are openly available to download from github.com/chenebuah/ML_abx3_dataset.

  • 2 authors
·
Dec 18, 2023

PhononBench:A Large-Scale Phonon-Based Benchmark for Dynamical Stability in Crystal Generation

In this work, we introduce PhononBench, the first large-scale benchmark for dynamical stability in AI-generated crystals. Leveraging the recently developed MatterSim interatomic potential, which achieves DFT-level accuracy in phonon predictions across more than 10,000 materials, PhononBench enables efficient large-scale phonon calculations and dynamical-stability analysis for 108,843 crystal structures generated by six leading crystal generation models. PhononBench reveals a widespread limitation of current generative models in ensuring dynamical stability: the average dynamical-stability rate across all generated structures is only 25.83%, with the top-performing model, MatterGen, reaching just 41.0%. Further case studies show that in property-targeted generation-illustrated here by band-gap conditioning with MatterGen--the dynamical-stability rate remains as low as 23.5% even at the optimal band-gap condition of 0.5 eV. In space-group-controlled generation, higher-symmetry crystals exhibit better stability (e.g., cubic systems achieve rates up to 49.2%), yet the average stability across all controlled generations is still only 34.4%. An important additional outcome of this study is the identification of 28,119 crystal structures that are phonon-stable across the entire Brillouin zone, providing a substantial pool of reliable candidates for future materials exploration. By establishing the first large-scale dynamical-stability benchmark, this work systematically highlights the current limitations of crystal generation models and offers essential evaluation criteria and guidance for their future development toward the design and discovery of physically viable materials. All model-generated crystal structures, phonon calculation results, and the high-throughput evaluation workflows developed in PhononBench will be openly released at https://github.com/xqh19970407/PhononBench

Weyl, Dirac and high-fold chiral fermions in topological quantum materials

Quantum materials hosting Weyl fermions have opened a new era of research in condensed matter physics. First proposed in 1929 in particle physics, Weyl fermions have yet to be observed as elementary particles. In 2015, Weyl fermions were detected as collective electronic excitations in the strong spin-orbit coupled material tantalum arsenide, TaAs. This discovery was followed by a flurry of experimental and theoretical explorations of Weyl phenomena in materials. Weyl materials naturally lend themselves to the exploration of the topological index associated with Weyl fermions and their divergent Berry curvature field, as well as the topological bulk-boundary correspondence giving rise to protected conducting surface states. Here, we review the broader class of Weyl topological phenomena in materials, starting with the observation of emergent Weyl fermions in the bulk and of Fermi arc states on the surface of the TaAs family of crystals by photoemission spectroscopy. We then discuss some of the exotic optical and magnetic responses observed in these materials, as well as the progress in developing some of the related chiral materials. We discuss the conceptual development of high-fold chiral fermions, which generalize Weyl fermions, and we review the observation of high-fold chiral fermion phases by taking the rhodium silicide, RhSi, family of crystals as a prime example. Lastly, we discuss recent advances in Weyl-line phases in magnetic topological materials. With this Review, we aim to provide an introduction to the basic concepts underlying Weyl physics in condensed matter, and to representative materials and their electronic structures and topology as revealed by spectroscopic studies. We hope this work serves as a guide for future theoretical and experimental explorations of chiral fermions and related topological quantum systems with potentially enhanced functionalities.

  • 6 authors
·
Mar 2, 2021

Matbench Discovery -- An evaluation framework for machine learning crystal stability prediction

Matbench Discovery simulates the deployment of machine learning (ML) energy models in a high-throughput search for stable inorganic crystals. We address the disconnect between (i) thermodynamic stability and formation energy and (ii) in-domain vs out-of-distribution performance. Alongside this paper, we publish a Python package to aid with future model submissions and a growing online leaderboard with further insights into trade-offs between various performance metrics. To answer the question which ML methodology performs best at materials discovery, our initial release explores a variety of models including random forests, graph neural networks (GNN), one-shot predictors, iterative Bayesian optimizers and universal interatomic potentials (UIP). Ranked best-to-worst by their test set F1 score on thermodynamic stability prediction, we find CHGNet > M3GNet > MACE > ALIGNN > MEGNet > CGCNN > CGCNN+P > Wrenformer > BOWSR > Voronoi tessellation fingerprints with random forest. The top 3 models are UIPs, the winning methodology for ML-guided materials discovery, achieving F1 scores of ~0.6 for crystal stability classification and discovery acceleration factors (DAF) of up to 5x on the first 10k most stable predictions compared to dummy selection from our test set. We also highlight a sharp disconnect between commonly used global regression metrics and more task-relevant classification metrics. Accurate regressors are susceptible to unexpectedly high false-positive rates if those accurate predictions lie close to the decision boundary at 0 eV/atom above the convex hull where most materials are. Our results highlight the need to focus on classification metrics that actually correlate with improved stability hit rate.

  • 6 authors
·
Aug 28, 2023

AutoMat: Enabling Automated Crystal Structure Reconstruction from Microscopy via Agentic Tool Use

Machine learning-based interatomic potentials and force fields depend critically on accurate atomic structures, yet such data are scarce due to the limited availability of experimentally resolved crystals. Although atomic-resolution electron microscopy offers a potential source of structural data, converting these images into simulation-ready formats remains labor-intensive and error-prone, creating a bottleneck for model training and validation. We introduce AutoMat, an end-to-end, agent-assisted pipeline that automatically transforms scanning transmission electron microscopy (STEM) images into atomic crystal structures and predicts their physical properties. AutoMat combines pattern-adaptive denoising, physics-guided template retrieval, symmetry-aware atomic reconstruction, fast relaxation and property prediction via MatterSim, and coordinated orchestration across all stages. We propose the first dedicated STEM2Mat-Bench for this task and evaluate performance using lattice RMSD, formation energy MAE, and structure-matching success rate. By orchestrating external tool calls, AutoMat enables a text-only LLM to outperform vision-language models in this domain, achieving closed-loop reasoning throughout the pipeline. In large-scale experiments over 450 structure samples, AutoMat substantially outperforms existing multimodal large language models and tools. These results validate both AutoMat and STEM2Mat-Bench, marking a key step toward bridging microscopy and atomistic simulation in materials science.The code and dataset are publicly available at https://github.com/yyt-2378/AutoMat and https://huggingface.co/datasets/yaotianvector/STEM2Mat.

  • 17 authors
·
May 18, 2025 2

MeLM, a generative pretrained language modeling framework that solves forward and inverse mechanics problems

We report a flexible multi-modal mechanics language model, MeLM, applied to solve various nonlinear forward and inverse problems, that can deal with a set of instructions, numbers and microstructure data. The framework is applied to various examples including bio-inspired hierarchical honeycomb design, carbon nanotube mechanics, and protein unfolding. In spite of the flexible nature of the model-which allows us to easily incorporate diverse materials, scales, and mechanical features-it performs well across disparate forward and inverse tasks. Based on an autoregressive attention-model, MeLM effectively represents a large multi-particle system consisting of hundreds of millions of neurons, where the interaction potentials are discovered through graph-forming self-attention mechanisms that are then used to identify relationships from emergent structures, while taking advantage of synergies discovered in the training data. We show that the model can solve complex degenerate mechanics design problems and determine novel material architectures across a range of hierarchical levels, providing an avenue for materials discovery and analysis. Looking beyond the demonstrations reported in this paper, we discuss other opportunities in applied mechanics and general considerations about the use of large language models in modeling, design, and analysis that can span a broad spectrum of material properties from mechanical, thermal, optical, to electronic.

  • 1 authors
·
Jun 30, 2023

Emergence of a new band and the Lifshitz transition in kagome metal ScV_6Sn_6 with charge density wave

Topological kagome systems have been a topic of great interest in condensed matter physics due totheir unique electronic properties. The vanadium-based kagome materials are particularly intrigu-ing since they exhibit exotic phenomena such as charge density wave (CDW) and unconventionalsuperconductivity. The origin of these electronic instabilities is not fully understood, and the re-cent discovery of a charge density wave in ScV6Sn6provides a new avenue for investigation. In thiswork, we investigate the electronic structure of the novel kagome metal ScV6Sn6using angle resolvedphotoemission spectroscopy (ARPES), scanning tunneling microscopy (STM), and first-principlesdensity functional theory calculations. Our analysis reveals for the first time the temperature-dependent band changes of ScV6Sn6and identifies a new band that exhibits a strong signatureof a structure with CDW below the critical temperature. Further analysis revealed that this newband is due to the surface kagome layer of the CDW structure. In addition, a Lifshitz transition isidentified in the ARPES spectra that is related to the saddle point moving across the Fermi levelat the critical temperature for the CDW formation. This result shows the CDW behavior may alsobe related to nesting of the saddle point, similar to related materials. However, no energy gap is observed at the Fermi level and thus the CDW is not a typical Fermi surface nesting scenario. These results provide new insights into the underlying physics of the CDW in the kagome materials and could have implications for the development of materials with new functionality.

  • 13 authors
·
Feb 27, 2023

TOMATOES: Topology and Material Optimization for Latent Heat Thermal Energy Storage Devices

Latent heat thermal energy storage (LHTES) systems are compelling candidates for energy storage, primarily owing to their high storage density. Improving their performance is crucial for developing the next-generation efficient and cost effective devices. Topology optimization (TO) has emerged as a powerful computational tool to design LHTES systems by optimally distributing a high-conductivity material (HCM) and a phase change material (PCM). However, conventional TO typically limits to optimizing the geometry for a fixed, pre-selected materials. This approach does not leverage the large and expanding databases of novel materials. Consequently, the co-design of material and geometry for LHTES remains a challenge and unexplored. To address this limitation, we present an automated design framework for the concurrent optimization of material choice and topology. A key challenge is the discrete nature of material selection, which is incompatible with the gradient-based methods used for TO. We overcome this by using a data-driven variational autoencoder (VAE) to project discrete material databases for both the HCM and PCM onto continuous and differentiable latent spaces. These continuous material representations are integrated into an end-to-end differentiable, transient nonlinear finite-element solver that accounts for phase change. We demonstrate this framework on a problem aimed at maximizing the discharged energy within a specified time, subject to cost constraints. The effectiveness of the proposed method is validated through several illustrative examples.

  • 3 authors
·
Oct 8, 2025

Disentangling lattice and electronic contributions to the metal-insulator transition from bulk vs. layer confined RNiO_3

In complex oxide materials, changes in electronic properties are often associated with changes in crystal structure, raising the question of the relative roles of the electronic and lattice effects in driving the metal-insulator transition. This paper presents a combined theoretical and experimental analysis of the dependence of the metal-insulator transition of NdNiO_3 on crystal structure, specifically comparing properties of bulk materials to one and two layer samples of NdNiO_3 grown between multiple electronically inert NdAlO_3 counterlayers in a superlattice. The comparison amplifies and validates a theoretical approach developed in previous papers and disentangles the electronic and lattice contributions, through an independent variation of each. In bulk NdNiO_3 the correlations are not strong enough to drive a metal-insulator transition by themselves: a lattice distortion is required. Ultra-thin films exhibit two additional electronic effects and one lattice-related effect. The electronic effects are quantum confinement, leading to dimensional reduction of the electronic Hamiltonian, and an increase in electronic bandwidth due to counterlayer induced bond angle changes. We find that the confinement effect is much more important. The lattice effect is an increase in stiffness due to the cost of propagation of the lattice disproportionation into the confining material.

  • 5 authors
·
Sep 30, 2018

Strain-Balanced Low-Temperature-Grown Beryllium-Doped InGaAs/InAlAs Superlattices for High-Performance Terahertz Photoconductors under 1550 nm Laser Excitation

This study systematically investigates the photoconductive properties of low-temperature-grown Beryllium (Be)-doped InGaAs/InAlAs strain-balanced superlattices (SLs) grown by molecular beam epitaxy under stationary growth conditions on semi-insulating InP:Fe substrates. The stationary growth approach enabled precise control over lateral gradients in layer strain, composition, and thickness across a single wafer, while strain-balancing facilitated pseudomorphic growth to explore a wide range of structural parameters, providing a robust platform to study their influence on photoconductive performance. Structural characterization confirmed high crystalline quality and smooth surface morphology in all samples. Time-resolved pump-probe spectroscopy revealed subpicosecond carrier lifetimes, validating the effectiveness of strain balancing and Be doping in tuning ultrafast recombination dynamics. Hall effect measurements supported by 8-band k.p modeling revealed enhanced carrier mobility in strain-balanced SLs compared to lattice-matched structures, primarily due to reduced electron and hole effective masses and stronger quantum confinement. Additionally, optical absorption under 1550 nm excitation showed improved absorption coefficients for the strain-balanced structure, consistent with the reduction in bandgap energy predicted by theoretical modeling, thereby enhancing photon-to-carrier conversion efficiency. Furthermore, transmission electron microscopy provided first-time evidence of significant Be-induced interdiffusion at the strained SL interfaces, an important factor influencing carrier transport and dynamics. These findings position low-temperature-grown Be-doped InGaAs/InAlAs strain-balanced SLs as promising materials for high-performance broadband THz photoconductive detectors operating at telecom-compatible wavelengths.

  • 6 authors
·
May 3, 2025

AIMS-EREA -- A framework for AI-accelerated Innovation of Materials for Sustainability -- for Environmental Remediation and Energy Applications

Many environmental remediation and energy applications (conversion and storage) for sustainability need design and development of green novel materials. Discovery processes of such novel materials are time taking and cumbersome due to large number of possible combinations and permutations of materials structures. Often theoretical studies based on Density Functional Theory (DFT) and other theories, coupled with Simulations are conducted to narrow down sample space of candidate materials, before conducting laboratory-based synthesis and analytical process. With the emergence of artificial intelligence (AI), AI techniques are being tried in this process too to ease out simulation time and cost. However tremendous values of previously published research from various parts of the world are still left as labor-intensive manual effort and discretion of individual researcher and prone to human omissions. AIMS-EREA is our novel framework to blend best of breed of Material Science theory with power of Generative AI to give best impact and smooth and quickest discovery of material for sustainability. This also helps to eliminate the possibility of production of hazardous residues and bye-products of the reactions. AIMS-EREA uses all available resources -- Predictive and Analytical AI on large collection of chemical databases along with automated intelligent assimilation of deep materials knowledge from previously published research works through Generative AI. We demonstrate use of our own novel framework with an example, how this framework can be successfully applied to achieve desired success in development of thermoelectric material for waste heat conversion.

  • 3 authors
·
Nov 18, 2023

Accurate generation of chemical reaction transition states by conditional flow matching

Transition state (TS) structures define the critical geometries and energy barriers underlying chemical reactivity, yet their fleeting nature renders them experimentally elusive and drives the reliance on costly, high-throughput density functional theory (DFT) calculations. Here, we introduce TS-GEN, a conditional flow-matching generative model that maps samples from a simple Gaussian prior directly to transition-state saddle-point geometries in a single, deterministic pass. By embedding both reactant and product conformations as conditioning information, TS-GEN learns to transport latent noise to true TS structures via an optimal-transport path, effectively replacing the iterative optimization common in nudged-elastic band or string-method algorithms. TS-GEN delivers unprecedented accuracy, achieving a root-mean-square deviation of 0.004 mathring{A} (vs. 0.103 mathring{A} for prior state-of-the-art) and a mean barrier-height error of 1.019 {rm kcal/mol} (vs. 2.864 {rm kcal/mol}), while requiring only 0.06 {rm s} GPU time per inference. Over 87% of generated TSs meet chemical-accuracy criteria (<1.58 {rm kcal/mol} error), substantially outpacing existing methods. TS-GEN also exhibits strong transferability to out-of-distribution reactions from a larger database. By uniting sub-angstrom precision, sub-second speed, and broad applicability, TS-GEN will be highly useful for high-throughput exploration of complex reaction networks, paving the way to the exploration of novel chemical reaction mechanisms.

  • 3 authors
·
Jul 14, 2025

Orbital Graph Convolutional Neural Network for Material Property Prediction

Material representations that are compatible with machine learning models play a key role in developing models that exhibit high accuracy for property prediction. Atomic orbital interactions are one of the important factors that govern the properties of crystalline materials, from which the local chemical environments of atoms is inferred. Therefore, to develop robust machine learningmodels for material properties prediction, it is imperative to include features representing such chemical attributes. Here, we propose the Orbital Graph Convolutional Neural Network (OGCNN), a crystal graph convolutional neural network framework that includes atomic orbital interaction features that learns material properties in a robust way. In addition, we embedded an encoder-decoder network into the OGCNN enabling it to learn important features among basic atomic (elemental features), orbital-orbital interactions, and topological features. We examined the performance of this model on a broad range of crystalline material data to predict different properties. We benchmarked the performance of the OGCNN model with that of: 1) the crystal graph convolutional neural network (CGCNN), 2) other state-of-the-art descriptors for material representations including Many-body Tensor Representation (MBTR) and the Smooth Overlap of Atomic Positions (SOAP), and 3) other conventional regression machine learning algorithms where different crystal featurization methods have been used. We find that OGCNN significantly outperforms them. The OGCNN model with high predictive accuracy can be used to discover new materials among the immense phase and compound spaces of materials

  • 6 authors
·
Aug 14, 2020

Achieving the quantum field theory limit in far-from-equilibrium quantum link models

Realizations of gauge theories in setups of quantum synthetic matter open up the possibility of probing salient exotic phenomena in condensed matter and high-energy physics, along with potential applications in quantum information and science technologies. In light of the impressive ongoing efforts to achieve such realizations, a fundamental question regarding quantum link model regularizations of lattice gauge theories is how faithfully they capture the quantum field theory limit of gauge theories. Recent work [Zache, Van Damme, Halimeh, Hauke, and Banerjee, at https://journals.aps.org/prd/abstract/10.1103/PhysRevD.106.L091502 has shown through analytic derivations, exact diagonalization, and infinite matrix product state calculations that the low-energy physics of 1+1D U(1) quantum link models approaches the quantum field theory limit already at small link spin length S. Here, we show that the approach to this limit also lends itself to the far-from-equilibrium quench dynamics of lattice gauge theories, as demonstrated by our numerical simulations of the Loschmidt return rate and the chiral condensate in infinite matrix product states, which work directly in the thermodynamic limit. Similar to our findings in equilibrium that show a distinct behavior between half-integer and integer link spin lengths, we find that criticality emerging in the Loschmidt return rate is fundamentally different between half-integer and integer spin quantum link models in the regime of strong electric-field coupling. Our results further affirm that state-of-the-art finite-size ultracold-atom and NISQ-device implementations of quantum link lattice gauge theories have the real potential to simulate their quantum field theory limit even in the far-from-equilibrium regime.

  • 5 authors
·
Dec 8, 2021

Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models

The ability to discover new materials with desirable properties is critical for numerous applications from helping mitigate climate change to advances in next generation computing hardware. AI has the potential to accelerate materials discovery and design by more effectively exploring the chemical space compared to other computational methods or by trial-and-error. While substantial progress has been made on AI for materials data, benchmarks, and models, a barrier that has emerged is the lack of publicly available training data and open pre-trained models. To address this, we present a Meta FAIR release of the Open Materials 2024 (OMat24) large-scale open dataset and an accompanying set of pre-trained models. OMat24 contains over 110 million density functional theory (DFT) calculations focused on structural and compositional diversity. Our EquiformerV2 models achieve state-of-the-art performance on the Matbench Discovery leaderboard and are capable of predicting ground-state stability and formation energies to an F1 score above 0.9 and an accuracy of 20 meV/atom, respectively. We explore the impact of model size, auxiliary denoising objectives, and fine-tuning on performance across a range of datasets including OMat24, MPtraj, and Alexandria. The open release of the OMat24 dataset and models enables the research community to build upon our efforts and drive further advancements in AI-assisted materials science.

  • 9 authors
·
Oct 16, 2024 1

High-throughput calculations of magnetic topological materials

The discoveries of intrinsically magnetic topological materials, including semimetals with a large anomalous Hall effect and axion insulators, have directed fundamental research in solid-state materials. Topological quantum chemistry has enabled the understanding of and the search for paramagnetic topological materials. Using magnetic topological indices obtained from magnetic topological quantum chemistry (MTQC), here we perform a high-throughput search for magnetic topological materials based on first-principles calculations. We use as our starting point the Magnetic Materials Database on the Bilbao Crystallographic Server, which contains more than 549 magnetic compounds with magnetic structures deduced from neutron-scattering experiments, and identify 130 enforced semimetals (for which the band crossings are implied by symmetry eigenvalues), and topological insulators. For each compound, we perform complete electronic structure calculations, which include complete topological phase diagrams using different values of the Hubbard potential. Using a custom code to find the magnetic co-representations of all bands in all magnetic space groups, we generate data to be fed into the algorithm of MTQC to determine the topology of each magnetic material. Several of these materials display previously unknown topological phases, including symmetry-indicated magnetic semimetals, three-dimensional anomalous Hall insulators and higher-order magnetic semimetals. We analyse topological trends in the materials under varying interactions: 60 per cent of the 130 topological materials have topologies sensitive to interactions, and the others have stable topologies under varying interactions. We provide a materials database for future experimental studies and open-source code for diagnosing topologies of magnetic materials.

  • 9 authors
·
Feb 28, 2020

PhysX: Physical-Grounded 3D Asset Generation

3D modeling is moving from virtual to physical. Existing 3D generation primarily emphasizes geometries and textures while neglecting physical-grounded modeling. Consequently, despite the rapid development of 3D generative models, the synthesized 3D assets often overlook rich and important physical properties, hampering their real-world application in physical domains like simulation and embodied AI. As an initial attempt to address this challenge, we propose PhysX, an end-to-end paradigm for physical-grounded 3D asset generation. 1) To bridge the critical gap in physics-annotated 3D datasets, we present PhysXNet - the first physics-grounded 3D dataset systematically annotated across five foundational dimensions: absolute scale, material, affordance, kinematics, and function description. In particular, we devise a scalable human-in-the-loop annotation pipeline based on vision-language models, which enables efficient creation of physics-first assets from raw 3D assets.2) Furthermore, we propose PhysXGen, a feed-forward framework for physics-grounded image-to-3D asset generation, injecting physical knowledge into the pre-trained 3D structural space. Specifically, PhysXGen employs a dual-branch architecture to explicitly model the latent correlations between 3D structures and physical properties, thereby producing 3D assets with plausible physical predictions while preserving the native geometry quality. Extensive experiments validate the superior performance and promising generalization capability of our framework. All the code, data, and models will be released to facilitate future research in generative physical AI.

  • 4 authors
·
Jul 16, 2025 1