new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Apr 21

Accurate Chemistry Collection: Coupled cluster atomization energies for broad chemical space

Accurate thermochemical data with sub-chemical accuracy (i.e., within pm1 kcal mol^{-1} from sufficiently accurate experimental or theoretical reference data) is essential for the development and improvement of computational chemistry methods. Challenging thermochemical properties such as heats of formation and total atomization energies (TAEs) are of particular interest because they rigorously test the ability of computational chemistry methods to accurately describe complex chemical transformations involving multiple bond rearrangements. Yet, existing thermochemical datasets that confidently reach this level of accuracy are limited in either size or scope. Datasets with highly accurate reference values include a small number of data points, and larger datasets provide less accurate data or only cover a narrow portion of the chemical space. The existing datasets are therefore insufficient for developing data-driven methods with predictive accuracy over a large chemical space. The Microsoft Research Accurate Chemistry Collection (MSR-ACC) will address this challenge. Here, it offers the MSR-ACC/TAE25 dataset of 76,879 total atomization energies obtained at the CCSD(T)/CBS level via the W1-F12 thermochemical protocol. The dataset is constructed to exhaustively cover chemical space for all elements up to argon by enumerating and sampling chemical graphs, thus avoiding bias towards any particular subspace of the chemical space (such as drug-like, organic, or experimentally observed molecules). With this first dataset in MSR-ACC, we enable data-driven approaches for developing predictive computational chemistry methods with unprecedented accuracy and scope.

microsoft Microsoft
·
Jun 17, 2025

Canonical and DLPNO-based G4(MP2)XK-inspired composite wavefunction methods parametrized against large and chemically diverse training sets: Are they more accurate and/or robust than double hybrid DFT?

The large and chemically diverse GMTKN55 benchmark was used as a training set for parametrizing composite wave function thermochemistry protocols akin to G4(MP2)XK theory (Chan et al, JCTC 2019, 15, 4478-4484). Even after reparametrization, the GMTKN55 WTMAD2 (weighted mean absolute deviation, type 2) for G4(MP2)-XK is actually inferior to that of the best rung-4 DFT functional, wB97M-V. By increasing the basis set for the MP2 part to def2-QZVPPD, we were able to substantially improve performance at modest cost (if an RI-MP2 approximation is made), with WTMAD2 for this G4(MP2)-XK-D method now comparable to the better rung-5 functionals (albeit at greater cost). A three-tier approach with a scaled MP3/def2-TZVPP intermediate step, however, leads to a G4(MP3)-D method that is markedly superior to even the best double hybrids wB97M(2) and revDSD-PBEP86-D4. Evaluating the CCSD(T) component with a triple-zeta, rather than split-valence, basis set yields only a modest further improvement that is incommensurate with the drastic increase in computational cost. G4(MP3)-D and G4(MP2)- XK-D have about 40% better WTMAD2, at similar or lower computational cost, than their counterparts G4 and G4(MP2), respectively: detailed comparison reveals that the difference lies in larger molecules due to basis set incompleteness error. An E2/ {T,Q} extrapolation and a CCSD(T)/def2-TZVP step provided the G4-T method of high accuracy and with just three fitted parameters. Using KS orbitals in MP2 leads to the G4(MP3|KS)-D method, which entirely eliminates the CCSD(T) step and has no steps costlier than scaled MP3; this shows a path forward to further improvements in double-hybrid density functional methods. G4-T-DLPNO, a variant in which post-MP2 corrections are evaluated at the DLPNO- CCSD(T) level, achieves nearly the accuracy of G4-T but is applicable to much larger systems.

  • 2 authors
·
Jun 8, 2020

Nuclear Quadrupole Hyperfine Structure in HC14N/H14NC and DC15N/D15NC Isomerization: A Diagnostic Tool for Characterizing Vibrational Localization

Large-amplitude molecular motions which occur during isomerization can cause significant changes in electronic structure. These variations in electronic properties can be used to identify vibrationally-excited eigenstates which are localized along the potential energy surface. This work demonstrates that nuclear quadrupole hyperfine interactions can be used as a diagnostic marker of progress along the isomerization path in both the HC14N/H14NC and DC15N/D15NC chemical systems. Ab initio calculations at the CCSD(T)/cc-pCVQZ level indicate that the hyperfine interaction is extremely sensitive to the chemical bonding of the quadrupolar 14N nucleus and can therefore be used to determine in which potential well the vibrational wavefunction is localized. A natural bonding orbital analysis along the isomerization path further demonstrates that hyperfine interactions arise from the asphericity of the electron density at the quadrupolar nucleus. Using the CCSD(T) potential surface, the quadrupole coupling constants of highly-excited vibrational states are computed from a one-dimensional internal coordinate path Hamiltonian. The excellent agreement between ab initio calculations and recent measurements demonstrates that nuclear quadrupole hyperfine structure can be used as a diagnostic tool for characterizing localized HCN and HNC vibrational states.

  • 1 authors
·
Dec 20, 2010

OrbNet Denali: A machine learning potential for biological and organic chemistry with semi-empirical cost and DFT accuracy

We present OrbNet Denali, a machine learning model for electronic structure that is designed as a drop-in replacement for ground-state density functional theory (DFT) energy calculations. The model is a message-passing neural network that uses symmetry-adapted atomic orbital features from a low-cost quantum calculation to predict the energy of a molecule. OrbNet Denali is trained on a vast dataset of 2.3 million DFT calculations on molecules and geometries. This dataset covers the most common elements in bio- and organic chemistry (H, Li, B, C, N, O, F, Na, Mg, Si, P, S, Cl, K, Ca, Br, I) as well as charged molecules. OrbNet Denali is demonstrated on several well-established benchmark datasets, and we find that it provides accuracy that is on par with modern DFT methods while offering a speedup of up to three orders of magnitude. For the GMTKN55 benchmark set, OrbNet Denali achieves WTMAD-1 and WTMAD-2 scores of 7.19 and 9.84, on par with modern DFT functionals. For several GMTKN55 subsets, which contain chemical problems that are not present in the training set, OrbNet Denali produces a mean absolute error comparable to those of DFT methods. For the Hutchison conformers benchmark set, OrbNet Denali has a median correlation coefficient of R^2=0.90 compared to the reference DLPNO-CCSD(T) calculation, and R^2=0.97 compared to the method used to generate the training data (wB97X-D3/def2-TZVP), exceeding the performance of any other method with a similar cost. Similarly, the model reaches chemical accuracy for non-covalent interactions in the S66x10 dataset. For torsional profiles, OrbNet Denali reproduces the torsion profiles of wB97X-D3/def2-TZVP with an average MAE of 0.12 kcal/mol for the potential energy surfaces of the diverse fragments in the TorsionNet500 dataset.

  • 11 authors
·
Jul 1, 2021