new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Apr 16

Φeat: Physically-Grounded Feature Representation

Foundation models have emerged as effective backbones for many vision tasks. However, current self-supervised features entangle high-level semantics with low-level physical factors, such as geometry and illumination, hindering their use in tasks requiring explicit physical reasoning. In this paper, we introduce Φeat, a novel physically-grounded visual backbone that encourages a representation sensitive to material identity, including reflectance cues and geometric mesostructure. Our key idea is to employ a pretraining strategy that contrasts spatial crops and physical augmentations of the same material under varying shapes and lighting conditions. While similar data have been used in high-end supervised tasks such as intrinsic decomposition or material estimation, we demonstrate that a pure self-supervised training strategy, without explicit labels, already provides a strong prior for tasks requiring robust features invariant to external physical factors. We evaluate the learned representations through feature similarity analysis and material selection, showing that Φeat captures physically-grounded structure beyond semantic grouping. These findings highlight the promise of unsupervised physical feature learning as a foundation for physics-aware perception in vision and graphics. These findings highlight the promise of unsupervised physical feature learning as a foundation for physics-aware perception in vision and graphics.

adobe Adobe
·
Nov 14, 2025 2

UltraShape 1.0: High-Fidelity 3D Shape Generation via Scalable Geometric Refinement

In this report, we introduce UltraShape 1.0, a scalable 3D diffusion framework for high-fidelity 3D geometry generation. The proposed approach adopts a two-stage generation pipeline: a coarse global structure is first synthesized and then refined to produce detailed, high-quality geometry. To support reliable 3D generation, we develop a comprehensive data processing pipeline that includes a novel watertight processing method and high-quality data filtering. This pipeline improves the geometric quality of publicly available 3D datasets by removing low-quality samples, filling holes, and thickening thin structures, while preserving fine-grained geometric details. To enable fine-grained geometry refinement, we decouple spatial localization from geometric detail synthesis in the diffusion process. We achieve this by performing voxel-based refinement at fixed spatial locations, where voxel queries derived from coarse geometry provide explicit positional anchors encoded via RoPE, allowing the diffusion model to focus on synthesizing local geometric details within a reduced, structured solution space. Our model is trained exclusively on publicly available 3D datasets, achieving strong geometric quality despite limited training resources. Extensive evaluations demonstrate that UltraShape 1.0 performs competitively with existing open-source methods in both data processing quality and geometry generation. All code and trained models will be released to support future research.

  • 13 authors
·
Dec 24, 2025 4

Mem4D: Decoupling Static and Dynamic Memory for Dynamic Scene Reconstruction

Reconstructing dense geometry for dynamic scenes from a monocular video is a critical yet challenging task. Recent memory-based methods enable efficient online reconstruction, but they fundamentally suffer from a Memory Demand Dilemma: The memory representation faces an inherent conflict between the long-term stability required for static structures and the rapid, high-fidelity detail retention needed for dynamic motion. This conflict forces existing methods into a compromise, leading to either geometric drift in static structures or blurred, inaccurate reconstructions of dynamic objects. To address this dilemma, we propose Mem4D, a novel framework that decouples the modeling of static geometry and dynamic motion. Guided by this insight, we design a dual-memory architecture: 1) The Transient Dynamics Memory (TDM) focuses on capturing high-frequency motion details from recent frames, enabling accurate and fine-grained modeling of dynamic content; 2) The Persistent Structure Memory (PSM) compresses and preserves long-term spatial information, ensuring global consistency and drift-free reconstruction for static elements. By alternating queries to these specialized memories, Mem4D simultaneously maintains static geometry with global consistency and reconstructs dynamic elements with high fidelity. Experiments on challenging benchmarks demonstrate that our method achieves state-of-the-art or competitive performance while maintaining high efficiency. Codes will be publicly available.

  • 10 authors
·
Aug 11, 2025

General teleparallel geometric theory of defects

We revisit the geometric theory of defects. In the differential-geometric models of defects that have been adopted since the 1950s, dislocations have been associated with torsion, disclinations with the full curvature, and point defects with the first kind trace of non-metricity. The mainstream formulation exhibits several conceptual and technical shortcomings, most notably a hierarchy inconsistency, the non-exictence of a genuine metric formulation, and the potential emergence of Ostrogradsky-type instabilities. These issues have motivated us to develop a new framework, namely a generalized teleparallel geometric theory of defects. In our model, dislocations are identified with the trace of torsion, disclinations with the second kind trace of the non-metricity, and point defects with the first kind trace of the non-metricity. In addition, we retain the scalar part torsion as a free parameter for describing some possible unknown degrees of freedom in the theory of defects. The proposed geometric theory of defects is free from all of the aforementioned drawbacks and is therefore worthy of further investigation. To ensure the coherence and completeness of the discussion, we begin our analysis with elastic deformations, then summarize the existing metric-affine geometric theory of defects, and finally proceed to our original contribution, namely the new theory introduced here. We formulate the entire theory in Eulerian coordinates. Naturally, all results can be reformulated in Lagrangian coordinates as well. All analyses and formulae are expressed in the language of exterior algebra and are carried out in coordinate-independent orthonormal frames.

  • 3 authors
·
Feb 1

2D Theoretically Twistable Material Database

The study of twisted two-dimensional (2D) materials, where twisting layers create moiré superlattices, has opened new opportunities for investigating topological phases and strongly correlated physics. While systems such as twisted bilayer graphene (TBG) and twisted transition metal dichalcogenides (TMDs) have been extensively studied, the broader potential of a seemingly infinite set of other twistable 2D materials remains largely unexplored. In this paper, we define "theoretically twistable materials" as single- or multi-layer structures that allow for the construction of simple continuum models of their moiré structures. This excludes, for example, materials with a "spaghetti" of bands or those with numerous crossing points at the Fermi level, for which theoretical moiré modeling is unfeasible. We present a high-throughput algorithm that systematically searches for theoretically twistable semimetals and insulators based on the Topological 2D Materials Database. By analyzing key electronic properties, we identify thousands of new candidate materials that could host rich topological and strongly correlated phenomena when twisted. We propose representative twistable materials for realizing different types of moiré systems, including materials with different Bravais lattices, valleys, and strength of spin-orbital coupling. We provide examples of crystal growth for several of these materials and showcase twisted bilayer band structures along with simplified twisted continuum models. Our results significantly broaden the scope of moiré heterostructures and provide a valuable resource for future experimental and theoretical studies on novel moiré systems.

  • 25 authors
·
Nov 14, 2024

GraphShaper: Geometry-aware Alignment for Improving Transfer Learning in Text-Attributed Graphs

Graph foundation models represent a transformative paradigm for learning transferable representations across diverse graph domains. Recent methods leverage large language models to unify graph and text modalities into a shared representation space using contrastive learning. However, systematic evaluations reveal significant performance degradation at structural boundaries where distinct topological patterns converge, with accuracy losses exceeding 20 percentage points. This issue arises from a key limitation: current methods assume all graph structures can be encoded within a single Euclidean space. In reality, tree structures require hyperbolic geometry to preserve hierarchical branching, while cyclic patterns depend on spherical geometry for closure properties. At structural boundaries, nodes experience conflicting geometric constraints that uniform encoding spaces cannot resolve. This raises a crucial challenge: Can alignment frameworks be designed to respect the intrinsic geometric diversity of graph structures? We introduce GraphShaper, a geometry-aware framework that enhances graph encoding through multi-geometric specialization. Our approach employs expert networks tailored to different geometric spaces, dynamically computing fusion weights to adaptively integrate geometric properties based on local structural characteristics. This adaptive fusion preserves structural integrity before alignment with text embeddings. Extensive experiments demonstrate that GraphShaper achieves 9.47\% accuracy improvements on citation networks and 7.63\% on social networks in zero-shot settings.

  • 9 authors
·
Oct 13, 2025

Multicell-Fold: geometric learning in folding multicellular life

During developmental processes such as embryogenesis, how a group of cells fold into specific structures, is a central question in biology that defines how living organisms form. Establishing tissue-level morphology critically relies on how every single cell decides to position itself relative to its neighboring cells. Despite its importance, it remains a major challenge to understand and predict the behavior of every cell within the living tissue over time during such intricate processes. To tackle this question, we propose a geometric deep learning model that can predict multicellular folding and embryogenesis, accurately capturing the highly convoluted spatial interactions among cells. We demonstrate that multicellular data can be represented with both granular and foam-like physical pictures through a unified graph data structure, considering both cellular interactions and cell junction networks. We successfully use our model to achieve two important tasks, interpretable 4-D morphological sequence alignment, and predicting local cell rearrangements before they occur at single-cell resolution. Furthermore, using an activation map and ablation studies, we demonstrate that cell geometries and cell junction networks together regulate local cell rearrangement which is critical for embryo morphogenesis. This approach provides a novel paradigm to study morphogenesis, highlighting a unified data structure and harnessing the power of geometric deep learning to accurately model the mechanisms and behaviors of cells during development. It offers a pathway toward creating a unified dynamic morphological atlas for a variety of developmental processes such as embryogenesis.

  • 5 authors
·
Jul 9, 2024

Pruning-based Topology Refinement of 3D Mesh using a 2D Alpha Mask

Image-based 3D reconstruction has increasingly stunning results over the past few years with the latest improvements in computer vision and graphics. Geometry and topology are two fundamental concepts when dealing with 3D mesh structures. But the latest often remains a side issue in the 3D mesh-based reconstruction literature. Indeed, performing per-vertex elementary displacements over a 3D sphere mesh only impacts its geometry and leaves the topological structure unchanged and fixed. Whereas few attempts propose to update the geometry and the topology, all need to lean on costly 3D ground-truth to determine the faces/edges to prune. We present in this work a method that aims to refine the topology of any 3D mesh through a face-pruning strategy that extensively relies upon 2D alpha masks and camera pose information. Our solution leverages a differentiable renderer that renders each face as a 2D soft map. Its pixel intensity reflects the probability of being covered during the rendering process by such a face. Based on the 2D soft-masks available, our method is thus able to quickly highlight all the incorrectly rendered faces for a given viewpoint. Because our module is agnostic to the network that produces the 3D mesh, it can be easily plugged into any self-supervised image-based (either synthetic or natural) 3D reconstruction pipeline to get complex meshes with a non-spherical topology.

  • 2 authors
·
Oct 17, 2022

Topological Materials for Near-Field Radiative Heat Transfer

Topological materials provide a platform that utilizes the geometric characteristics of structured materials to control the flow of waves, enabling unidirectional and protected transmission that is immune to defects or impurities. The topologically designed photonic materials can carry quantum states and electromagnetic energy, benefiting nanolasers or quantum photonic systems. This article reviews recent advances in the topological applications of photonic materials for radiative heat transfer, especially in the near field. When the separation distance between media is considerably smaller than the thermal wavelength, the heat transfer exhibits super-Planckian behavior that surpasses Planck's blackbody predictions. Near-field thermal radiation in subwavelength systems supporting surface modes has various applications, including nanoscale thermal management and energy conversion. Photonic materials and structures that support topological surface states show immense potential for enhancing or suppressing near-field thermal radiation. We present various topological effects, such as periodic and quasi-periodic nanoparticle arrays, Dirac and Weyl semimetal-based materials, structures with broken global symmetries, and other topological insulators, on near-field heat transfer. Also, the possibility of realizing near-field thermal radiation in such topological materials for alternative thermal management and heat flux guiding in nano-scale systems is discussed based on the existing technology.

  • 7 authors
·
Jun 6, 2024

MMGP: a Mesh Morphing Gaussian Process-based machine learning method for regression of physical problems under non-parameterized geometrical variability

When learning simulations for modeling physical phenomena in industrial designs, geometrical variabilities are of prime interest. While classical regression techniques prove effective for parameterized geometries, practical scenarios often involve the absence of shape parametrization during the inference stage, leaving us with only mesh discretizations as available data. Learning simulations from such mesh-based representations poses significant challenges, with recent advances relying heavily on deep graph neural networks to overcome the limitations of conventional machine learning approaches. Despite their promising results, graph neural networks exhibit certain drawbacks, including their dependency on extensive datasets and limitations in providing built-in predictive uncertainties or handling large meshes. In this work, we propose a machine learning method that do not rely on graph neural networks. Complex geometrical shapes and variations with fixed topology are dealt with using well-known mesh morphing onto a common support, combined with classical dimensionality reduction techniques and Gaussian processes. The proposed methodology can easily deal with large meshes without the need for explicit shape parameterization and provides crucial predictive uncertainties, which are essential for informed decision-making. In the considered numerical experiments, the proposed method is competitive with respect to existing graph neural networks, regarding training efficiency and accuracy of the predictions.

  • 3 authors
·
May 22, 2023

Geometric Trajectory Diffusion Models

Generative models have shown great promise in generating 3D geometric systems, which is a fundamental problem in many natural science domains such as molecule and protein design. However, existing approaches only operate on static structures, neglecting the fact that physical systems are always dynamic in nature. In this work, we propose geometric trajectory diffusion models (GeoTDM), the first diffusion model for modeling the temporal distribution of 3D geometric trajectories. Modeling such distribution is challenging as it requires capturing both the complex spatial interactions with physical symmetries and temporal correspondence encapsulated in the dynamics. We theoretically justify that diffusion models with equivariant temporal kernels can lead to density with desired symmetry, and develop a novel transition kernel leveraging SE(3)-equivariant spatial convolution and temporal attention. Furthermore, to induce an expressive trajectory distribution for conditional generation, we introduce a generalized learnable geometric prior into the forward diffusion process to enhance temporal conditioning. We conduct extensive experiments on both unconditional and conditional generation in various scenarios, including physical simulation, molecular dynamics, and pedestrian motion. Empirical results on a wide suite of metrics demonstrate that GeoTDM can generate realistic geometric trajectories with significantly higher quality.

  • 5 authors
·
Oct 16, 2024

Proposing and solving olympiad geometry with guided tree search

Mathematics olympiads are prestigious competitions, with problem proposing and solving highly honored. Building artificial intelligence that proposes and solves olympiads presents an unresolved challenge in automated theorem discovery and proving, especially in geometry for its combination of numerical and spatial elements. We introduce TongGeometry, a Euclidean geometry system supporting tree-search-based guided problem proposing and solving. The efficient geometry system establishes the most extensive repository of geometry theorems to date: within the same computational budget as the existing state-of-the-art, TongGeometry discovers 6.7 billion geometry theorems requiring auxiliary constructions, including 4.1 billion exhibiting geometric symmetry. Among them, 10 theorems were proposed to regional mathematical olympiads with 3 of TongGeometry's proposals selected in real competitions, earning spots in a national team qualifying exam or a top civil olympiad in China and the US. Guided by fine-tuned large language models, TongGeometry solved all International Mathematical Olympiad geometry in IMO-AG-30, outperforming gold medalists for the first time. It also surpasses the existing state-of-the-art across a broader spectrum of olympiad-level problems. The full capabilities of the system can be utilized on a consumer-grade machine, making the model more accessible and fostering widespread democratization of its use. By analogy, unlike existing systems that merely solve problems like students, TongGeometry acts like a geometry coach, discovering, presenting, and proving theorems.

  • 8 authors
·
Dec 13, 2024

Primal-Dual Mesh Convolutional Neural Networks

Recent works in geometric deep learning have introduced neural networks that allow performing inference tasks on three-dimensional geometric data by defining convolution, and sometimes pooling, operations on triangle meshes. These methods, however, either consider the input mesh as a graph, and do not exploit specific geometric properties of meshes for feature aggregation and downsampling, or are specialized for meshes, but rely on a rigid definition of convolution that does not properly capture the local topology of the mesh. We propose a method that combines the advantages of both types of approaches, while addressing their limitations: we extend a primal-dual framework drawn from the graph-neural-network literature to triangle meshes, and define convolutions on two types of graphs constructed from an input mesh. Our method takes features for both edges and faces of a 3D mesh as input and dynamically aggregates them using an attention mechanism. At the same time, we introduce a pooling operation with a precise geometric interpretation, that allows handling variations in the mesh connectivity by clustering mesh faces in a task-driven fashion. We provide theoretical insights of our approach using tools from the mesh-simplification literature. In addition, we validate experimentally our method in the tasks of shape classification and shape segmentation, where we obtain comparable or superior performance to the state of the art.

  • 5 authors
·
Oct 23, 2020

Riemannian Flow Matching for Disentangled Graph Domain Adaptation

Graph Domain Adaptation (GDA) typically uses adversarial learning to align graph embeddings in Euclidean space. However, this paradigm suffers from two critical challenges: Structural Degeneration, where hierarchical and semantic representations are entangled, and Optimization Instability, which arises from oscillatory dynamics of minimax adversarial training. To tackle these issues, we propose DisRFM, a geometry-aware GDA framework that unifies Riemannian embedding and flow-based transport. First, to overcome structural degeneration, we embed graphs into a Riemannian manifold. By adopting polar coordinates, we explicitly disentangle structure (radius) from semantics (angle). Then, we enforce topology preservation through radial Wasserstein alignment and semantic discrimination via angular clustering, thereby preventing feature entanglement and collapse. Second, we address the instability of adversarial alignment by using Riemannian flow matching. This method learns a smooth vector field to guide source features toward the target along geodesic paths, guaranteeing stable convergence. The geometric constraints further guide the flow to maintain the disentangled structure during transport. Theoretically, we prove the asymptotic stability of the flow matching and derive a tighter bound for the target risk. Extensive experiments demonstrate that DisRFM consistently outperforms state-of-the-art methods.

  • 5 authors
·
Jan 31

Roadmap: 2D Materials for Quantum Technologies

Two-dimensional (2D) materials have emerged as a versatile and powerful platform for quantum technologies, offering atomic-scale control, strong quantum confinement, and seamless integration into heterogeneous device architectures. Their reduced dimensionality enables unique quantum phenomena, including optically addressable spin defects, tunable single-photon emitters, low-dimensional magnetism, gate-controlled superconductivity, and correlated states in Moiré superlattices. This Roadmap provides a comprehensive overview of recent progress and future directions in exploiting 2D materials for quantum sensing, computation, communication, and simulation. We survey advances spanning spin defects and quantum sensing, quantum emitters and nonlinear photonics, computational theory and data-driven discovery of quantum defects, spintronic and magnonic devices, cavity-engineered quantum materials, superconducting and hybrid quantum circuits, quantum dots, Moiré quantum simulators, and quantum communication platforms. Across these themes, we identify common challenges in defect control, coherence preservation, interfacial engineering, and scalable integration, alongside emerging opportunities driven by machine-learning-assisted design and integrated experiment-theory feedback loops. By connecting microscopic quantum states to mesoscopic excitations and macroscopic device architectures, this Roadmap outlines a materials-centric framework for integrating coherent quantum functionalities and positions 2D materials as foundational building blocks for next-generation quantum technologies.

  • 32 authors
·
Dec 16, 2025

Ghost on the Shell: An Expressive Representation of General 3D Shapes

The creation of photorealistic virtual worlds requires the accurate modeling of 3D surface geometry for a wide range of objects. For this, meshes are appealing since they 1) enable fast physics-based rendering with realistic material and lighting, 2) support physical simulation, and 3) are memory-efficient for modern graphics pipelines. Recent work on reconstructing and statistically modeling 3D shape, however, has critiqued meshes as being topologically inflexible. To capture a wide range of object shapes, any 3D representation must be able to model solid, watertight, shapes as well as thin, open, surfaces. Recent work has focused on the former, and methods for reconstructing open surfaces do not support fast reconstruction with material and lighting or unconditional generative modelling. Inspired by the observation that open surfaces can be seen as islands floating on watertight surfaces, we parameterize open surfaces by defining a manifold signed distance field on watertight templates. With this parameterization, we further develop a grid-based and differentiable representation that parameterizes both watertight and non-watertight meshes of arbitrary topology. Our new representation, called Ghost-on-the-Shell (G-Shell), enables two important applications: differentiable rasterization-based reconstruction from multiview images and generative modelling of non-watertight meshes. We empirically demonstrate that G-Shell achieves state-of-the-art performance on non-watertight mesh reconstruction and generation tasks, while also performing effectively for watertight meshes.

  • 7 authors
·
Oct 23, 2023

SOLIDGEO: Measuring Multimodal Spatial Math Reasoning in Solid Geometry

Geometry is a fundamental branch of mathematics and plays a crucial role in evaluating the reasoning capabilities of multimodal large language models (MLLMs). However, existing multimodal mathematics benchmarks mainly focus on plane geometry and largely ignore solid geometry, which requires spatial reasoning and is more challenging than plane geometry. To address this critical gap, we introduce SolidGeo, the first large-scale benchmark specifically designed to evaluate the performance of MLLMs on mathematical reasoning tasks in solid geometry. SolidGeo consists of 3,113 real-world K-12 and competition-level problems, each paired with visual context and annotated with difficulty levels and fine-grained solid geometry categories. Our benchmark covers a wide range of 3D reasoning subjects such as projection, unfolding, spatial measurement, and spatial vector, offering a rigorous testbed for assessing solid geometry. Through extensive experiments, we observe that MLLMs encounter substantial challenges in solid geometry math tasks, with a considerable performance gap relative to human capabilities on SolidGeo. Moreover, we analyze the performance, inference efficiency and error patterns of various models, offering insights into the solid geometric mathematical reasoning capabilities of MLLMs. We hope SolidGeo serves as a catalyst for advancing MLLMs toward deeper geometric reasoning and spatial intelligence.

  • 9 authors
·
May 27, 2025

Image2Gcode: Image-to-G-code Generation for Additive Manufacturing Using Diffusion-Transformer Model

Mechanical design and manufacturing workflows conventionally begin with conceptual design, followed by the creation of a computer-aided design (CAD) model and fabrication through material-extrusion (MEX) printing. This process requires converting CAD geometry into machine-readable G-code through slicing and path planning. While each step is well established, dependence on CAD modeling remains a major bottleneck: constructing object-specific 3D geometry is slow and poorly suited to rapid prototyping. Even minor design variations typically necessitate manual updates in CAD software, making iteration time-consuming and difficult to scale. To address this limitation, we introduce Image2Gcode, an end-to-end data-driven framework that bypasses the CAD stage and generates printer-ready G-code directly from images and part drawings. Instead of relying on an explicit 3D model, a hand-drawn or captured 2D image serves as the sole input. The framework first extracts slice-wise structural cues from the image and then employs a denoising diffusion probabilistic model (DDPM) over G-code sequences. Through iterative denoising, the model transforms Gaussian noise into executable print-move trajectories with corresponding extrusion parameters, establishing a direct mapping from visual input to native toolpaths. By producing structured G-code directly from 2D imagery, Image2Gcode eliminates the need for CAD or STL intermediates, lowering the entry barrier for additive manufacturing and accelerating the design-to-fabrication cycle. This approach supports on-demand prototyping from simple sketches or visual references and integrates with upstream 2D-to-3D reconstruction modules to enable an automated pipeline from concept to physical artifact. The result is a flexible, computationally efficient framework that advances accessibility in design iteration, repair workflows, and distributed manufacturing.

Procedural Generation of Grain Orientations using the Wave Function Collapse Algorithm

Statistics of grain sizes and orientations in metals correlate to the material's mechanical properties. Reproducing representative volume elements for further analysis of deformation and failure in metals, like 316L stainless steel, is particularly important due to their wide use in manufacturing goods today. Two approaches, initially created for video games, were considered for the procedural generation of representative grain microstructures. The first is the Wave Function Collapse (WFC) algorithm, and the second is constraint propagation and probabilistic inference through Markov Junior, a free and open-source software. This study aimed to investigate these two algorithms' effectiveness in using reference electron backscatter diffraction (EBSD) maps and recreating a statistically similar one that could be used in further research. It utilized two stainless steel EBSD maps as references to test both algorithms. First, the WFC algorithm was too constricting and, thus, incapable of producing images that resembled EBSDs. The second, MarkovJunior, was much more effective in creating a Voronoi tessellation that could be used to create an EBSD map in Python. When comparing the results between the reference and the generated EBSD, we discovered that the orientation and volume fractions were extremely similar. With the study, it was concluded that MarkovJunior is an effective machine learning tool that can reproduce representative grain microstructures.

  • 3 authors
·
Nov 20, 2023

Deep sequence models tend to memorize geometrically; it is unclear why

Deep sequence models are said to store atomic facts predominantly in the form of associative memory: a brute-force lookup of co-occurring entities. We identify a dramatically different form of storage of atomic facts that we term as geometric memory. Here, the model has synthesized embeddings encoding novel global relationships between all entities, including ones that do not co-occur in training. Such storage is powerful: for instance, we show how it transforms a hard reasoning task involving an ell-fold composition into an easy-to-learn 1-step navigation task. From this phenomenon, we extract fundamental aspects of neural embedding geometries that are hard to explain. We argue that the rise of such a geometry, as against a lookup of local associations, cannot be straightforwardly attributed to typical supervisory, architectural, or optimizational pressures. Counterintuitively, a geometry is learned even when it is more complex than the brute-force lookup. Then, by analyzing a connection to Node2Vec, we demonstrate how the geometry stems from a spectral bias that -- in contrast to prevailing theories -- indeed arises naturally despite the lack of various pressures. This analysis also points out to practitioners a visible headroom to make Transformer memory more strongly geometric. We hope the geometric view of parametric memory encourages revisiting the default intuitions that guide researchers in areas like knowledge acquisition, capacity, discovery, and unlearning.

google Google
·
Oct 30, 2025

GVGEN: Text-to-3D Generation with Volumetric Representation

In recent years, 3D Gaussian splatting has emerged as a powerful technique for 3D reconstruction and generation, known for its fast and high-quality rendering capabilities. To address these shortcomings, this paper introduces a novel diffusion-based framework, GVGEN, designed to efficiently generate 3D Gaussian representations from text input. We propose two innovative techniques:(1) Structured Volumetric Representation. We first arrange disorganized 3D Gaussian points as a structured form GaussianVolume. This transformation allows the capture of intricate texture details within a volume composed of a fixed number of Gaussians. To better optimize the representation of these details, we propose a unique pruning and densifying method named the Candidate Pool Strategy, enhancing detail fidelity through selective optimization. (2) Coarse-to-fine Generation Pipeline. To simplify the generation of GaussianVolume and empower the model to generate instances with detailed 3D geometry, we propose a coarse-to-fine pipeline. It initially constructs a basic geometric structure, followed by the prediction of complete Gaussian attributes. Our framework, GVGEN, demonstrates superior performance in qualitative and quantitative assessments compared to existing 3D generation methods. Simultaneously, it maintains a fast generation speed (sim7 seconds), effectively striking a balance between quality and efficiency.

  • 9 authors
·
Mar 19, 2024 1

GeoMVD: Geometry-Enhanced Multi-View Generation Model Based on Geometric Information Extraction

Multi-view image generation holds significant application value in computer vision, particularly in domains like 3D reconstruction, virtual reality, and augmented reality. Most existing methods, which rely on extending single images, face notable computational challenges in maintaining cross-view consistency and generating high-resolution outputs. To address these issues, we propose the Geometry-guided Multi-View Diffusion Model, which incorporates mechanisms for extracting multi-view geometric information and adjusting the intensity of geometric features to generate images that are both consistent across views and rich in detail. Specifically, we design a multi-view geometry information extraction module that leverages depth maps, normal maps, and foreground segmentation masks to construct a shared geometric structure, ensuring shape and structural consistency across different views. To enhance consistency and detail restoration during generation, we develop a decoupled geometry-enhanced attention mechanism that strengthens feature focus on key geometric details, thereby improving overall image quality and detail preservation. Furthermore, we apply an adaptive learning strategy that fine-tunes the model to better capture spatial relationships and visual coherence between the generated views, ensuring realistic results. Our model also incorporates an iterative refinement process that progressively improves the output quality through multiple stages of image generation. Finally, a dynamic geometry information intensity adjustment mechanism is proposed to adaptively regulate the influence of geometric data, optimizing overall quality while ensuring the naturalness of generated images. More details can be found on the project page: https://sobeymil.github.io/GeoMVD.com.

  • 3 authors
·
Nov 15, 2025

CraftsMan: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner

We present a novel generative 3D modeling system, coined CraftsMan, which can generate high-fidelity 3D geometries with highly varied shapes, regular mesh topologies, and detailed surfaces, and, notably, allows for refining the geometry in an interactive manner. Despite the significant advancements in 3D generation, existing methods still struggle with lengthy optimization processes, irregular mesh topologies, noisy surfaces, and difficulties in accommodating user edits, consequently impeding their widespread adoption and implementation in 3D modeling software. Our work is inspired by the craftsman, who usually roughs out the holistic figure of the work first and elaborates the surface details subsequently. Specifically, we employ a 3D native diffusion model, which operates on latent space learned from latent set-based 3D representations, to generate coarse geometries with regular mesh topology in seconds. In particular, this process takes as input a text prompt or a reference image and leverages a powerful multi-view (MV) diffusion model to generate multiple views of the coarse geometry, which are fed into our MV-conditioned 3D diffusion model for generating the 3D geometry, significantly improving robustness and generalizability. Following that, a normal-based geometry refiner is used to significantly enhance the surface details. This refinement can be performed automatically, or interactively with user-supplied edits. Extensive experiments demonstrate that our method achieves high efficacy in producing superior-quality 3D assets compared to existing methods. HomePage: https://craftsman3d.github.io/, Code: https://github.com/wyysf-98/CraftsMan

  • 7 authors
·
May 23, 2024 2

GeoDream: Disentangling 2D and Geometric Priors for High-Fidelity and Consistent 3D Generation

Text-to-3D generation by distilling pretrained large-scale text-to-image diffusion models has shown great promise but still suffers from inconsistent 3D geometric structures (Janus problems) and severe artifacts. The aforementioned problems mainly stem from 2D diffusion models lacking 3D awareness during the lifting. In this work, we present GeoDream, a novel method that incorporates explicit generalized 3D priors with 2D diffusion priors to enhance the capability of obtaining unambiguous 3D consistent geometric structures without sacrificing diversity or fidelity. Specifically, we first utilize a multi-view diffusion model to generate posed images and then construct cost volume from the predicted image, which serves as native 3D geometric priors, ensuring spatial consistency in 3D space. Subsequently, we further propose to harness 3D geometric priors to unlock the great potential of 3D awareness in 2D diffusion priors via a disentangled design. Notably, disentangling 2D and 3D priors allows us to refine 3D geometric priors further. We justify that the refined 3D geometric priors aid in the 3D-aware capability of 2D diffusion priors, which in turn provides superior guidance for the refinement of 3D geometric priors. Our numerical and visual comparisons demonstrate that GeoDream generates more 3D consistent textured meshes with high-resolution realistic renderings (i.e., 1024 times 1024) and adheres more closely to semantic coherence.

  • 6 authors
·
Nov 29, 2023 1

Segmentation of 3D pore space from CT images using curvilinear skeleton: application to numerical simulation of microbial decomposition

Recent advances in 3D X-ray Computed Tomographic (CT) sensors have stimulated research efforts to unveil the extremely complex micro-scale processes that control the activity of soil microorganisms. Voxel-based description (up to hundreds millions voxels) of the pore space can be extracted, from grey level 3D CT scanner images, by means of simple image processing tools. Classical methods for numerical simulation of biological dynamics using mesh of voxels, such as Lattice Boltzmann Model (LBM), are too much time consuming. Thus, the use of more compact and reliable geometrical representations of pore space can drastically decrease the computational cost of the simulations. Several recent works propose basic analytic volume primitives (e.g. spheres, generalized cylinders, ellipsoids) to define a piece-wise approximation of pore space for numerical simulation of draining, diffusion and microbial decomposition. Such approaches work well but the drawback is that it generates approximation errors. In the present work, we study another alternative where pore space is described by means of geometrically relevant connected subsets of voxels (regions) computed from the curvilinear skeleton. Indeed, many works use the curvilinear skeleton (3D medial axis) for analyzing and partitioning 3D shapes within various domains (medicine, material sciences, petroleum engineering, etc.) but only a few ones in soil sciences. Within the context of soil sciences, most studies dealing with 3D medial axis focus on the determination of pore throats. Here, we segment pore space using curvilinear skeleton in order to achieve numerical simulation of microbial decomposition (including diffusion processes). We validate simulation outputs by comparison with other methods using different pore space geometrical representations (balls, voxels).

  • 6 authors
·
Sep 4, 2023

3D Multiphase Heterogeneous Microstructure Generation Using Conditional Latent Diffusion Models

The ability to generate 3D multiphase microstructures on-demand with targeted attributes can greatly accelerate the design of advanced materials. Here, we present a conditional latent diffusion model (LDM) framework that rapidly synthesizes high-fidelity 3D multiphase microstructures tailored to user specifications. Using this approach, we generate diverse two-phase and three-phase microstructures at high resolution (volumes of 128 times 128 times 64 voxels, representing >10^6 voxels each) within seconds, overcoming the scalability and time limitations of traditional simulation-based methods. Key design features, such as desired volume fractions and tortuosities, are incorporated as controllable inputs to guide the generative process, ensuring that the output structures meet prescribed statistical and topological targets. Moreover, the framework predicts corresponding manufacturing (processing) parameters for each generated microstructure, helping to bridge the gap between digital microstructure design and experimental fabrication. While demonstrated on organic photovoltaic (OPV) active-layer morphologies, the flexible architecture of our approach makes it readily adaptable to other material systems and microstructure datasets. By combining computational efficiency, adaptability, and experimental relevance, this framework addresses major limitations of existing methods and offers a powerful tool for accelerated materials discovery.

  • 6 authors
·
Mar 12, 2025

DeepMesh: Differentiable Iso-Surface Extraction

Geometric Deep Learning has recently made striking progress with the advent of continuous deep implicit fields. They allow for detailed modeling of watertight surfaces of arbitrary topology while not relying on a 3D Euclidean grid, resulting in a learnable parameterization that is unlimited in resolution. Unfortunately, these methods are often unsuitable for applications that require an explicit mesh-based surface representation because converting an implicit field to such a representation relies on the Marching Cubes algorithm, which cannot be differentiated with respect to the underlying implicit field. In this work, we remove this limitation and introduce a differentiable way to produce explicit surface mesh representations from Deep Implicit Fields. Our key insight is that by reasoning on how implicit field perturbations impact local surface geometry, one can ultimately differentiate the 3D location of surface samples with respect to the underlying deep implicit field. We exploit this to define DeepMesh - an end-to-end differentiable mesh representation that can vary its topology. We validate our theoretical insight through several applications: Single view 3D Reconstruction via Differentiable Rendering, Physically-Driven Shape Optimization, Full Scene 3D Reconstruction from Scans and End-to-End Training. In all cases our end-to-end differentiable parameterization gives us an edge over state-of-the-art algorithms.

  • 7 authors
·
Jun 20, 2021

Equivariant Graph Attention Networks with Structural Motifs for Predicting Cell Line-Specific Synergistic Drug Combinations

Cancer is the second leading cause of death, with chemotherapy as one of the primary forms of treatment. As a result, researchers are turning to drug combination therapy to decrease drug resistance and increase efficacy. Current methods of drug combination screening, such as in vivo and in vitro, are inefficient due to stark time and monetary costs. In silico methods have become increasingly important for screening drugs, but current methods are inaccurate and generalize poorly to unseen anticancer drugs. In this paper, I employ a geometric deep-learning model utilizing a graph attention network that is equivariant to 3D rotations, translations, and reflections with structural motifs. Additionally, the gene expression of cancer cell lines is utilized to classify synergistic drug combinations specific to each cell line. I compared the proposed geometric deep learning framework to current state-of-the-art (SOTA) methods, and the proposed model architecture achieved greater performance on all 12 benchmark tasks performed on the DrugComb dataset. Specifically, the proposed framework outperformed other SOTA methods by an accuracy difference greater than 28%. Based on these results, I believe that the equivariant graph attention network's capability of learning geometric data accounts for the large performance improvements. The model's ability to generalize to foreign drugs is thought to be due to the structural motifs providing a better representation of the molecule. Overall, I believe that the proposed equivariant geometric deep learning framework serves as an effective tool for virtually screening anticancer drug combinations for further validation in a wet lab environment. The code for this work is made available online at: https://github.com/WeToTheMoon/EGAT_DrugSynergy.

  • 1 authors
·
Nov 7, 2024

Self-limiting stacks of curvature-frustrated colloidal plates: Roles of intra-particle versus inter-particle deformations

In geometrically frustrated assemblies local inter-subunit misfits propagate to intra-assembly strain gradients, giving rise to anomalous self-limiting assembly thermodynamics. Here, we use theory and coarse-grained simulation to study a recently developed class of ``curvamer'' particles, flexible shell-like particles that exhibit self-limiting assembly due to the build up of curvature deformation in cohesive stacks. To address a generic, yet poorly understood aspect of frustrated assembly, we introduce a model of curvamer assembly that incorporates both {\it intra-particle} shape deformation as well as compliance of {\it inter-particle} cohesive gaps, an effect we can attribute to a {\it finite range of attraction} between particles. We show that the ratio of intra-particle (bending elasticity) to inter-particle stiffness not only controls the regimes of self-limitation but also the nature of frustration propagation through curvamer stacks. We find a transition from uniformly-bound, curvature-focusing stacks at small size to gap-opened, uniformly curved stacks at large size is controlled by a dimensionless measure of inter- versus intra-curvamer stiffness. The finite range of inter-particle attraction determines range of cohesion in stacks are self-limiting, a prediction which is in strong agreement with numerical studies of our coarse-grained colloidal model. These predictions provide critical guidance for experimental realizations of frustrated particle systems designed to exhibit self-limitation at especially large multi-particle scales.

  • 3 authors
·
Mar 3, 2024

A Geometric Theory of Cosmological Structure via Entropic Curvature in Wasserstein Space

We construct a geometric framework for cosmological large-scale structure based on optimal transport theory and Wasserstein geometry. In this framework, Ricci curvature on the probability measure space P_2(M) is characterized by the geodesic convexity of entropy and is formulated as the response of probability distributions to optimal transport. We introduce effective Ricci curvatures K_{eff}^{(infty)} and K_{eff}^{(N)} associated with Kullback--Leibler-type and Rényi-type entropies, corresponding respectively to the curvature-dimension conditions CD(K,infty) and CD(K,N). By localizing these curvatures to finite scales using local and reference measures, we construct curvature indicators applicable to observational data. Under a local quadratic approximation, the effective curvature reduces to the Hessian of the log-density, showing that conventional Hessian-based structure classifications arise as a limiting case of the present framework. We further show that effective curvature depends on observational scale and formulate this dependence as a scale flow, distinct from Ricci flow because it describes a change of resolution rather than a time evolution of geometry. Treating curvature as a random field then extends the statistical description of density fields: curvature statistics are given by higher-order weighted integrals of the power spectrum and by spatial derivatives of the correlation function, emphasizing geometric rather than amplitude information. This framework provides a unified connection between optimal transport geometry and cosmological structure analysis, and offers a new perspective on multiscale structure and nonlinear statistics.

  • 1 authors
·
Mar 31

Modeling and design of heterogeneous hierarchical bioinspired spider web structures using generative deep learning and additive manufacturing

Spider webs are incredible biological structures, comprising thin but strong silk filament and arranged into complex hierarchical architectures with striking mechanical properties (e.g., lightweight but high strength, achieving diverse mechanical responses). While simple 2D orb webs can easily be mimicked, the modeling and synthesis of 3D-based web structures remain challenging, partly due to the rich set of design features. Here we provide a detailed analysis of the heterogenous graph structures of spider webs, and use deep learning as a way to model and then synthesize artificial, bio-inspired 3D web structures. The generative AI models are conditioned based on key geometric parameters (including average edge length, number of nodes, average node degree, and others). To identify graph construction principles, we use inductive representation sampling of large experimentally determined spider web graphs, to yield a dataset that is used to train three conditional generative models: 1) An analog diffusion model inspired by nonequilibrium thermodynamics, with sparse neighbor representation, 2) a discrete diffusion model with full neighbor representation, and 3) an autoregressive transformer architecture with full neighbor representation. All three models are scalable, produce complex, de novo bio-inspired spider web mimics, and successfully construct graphs that meet the design objectives. We further propose algorithm that assembles web samples produced by the generative models into larger-scale structures based on a series of geometric design targets, including helical and parametric shapes, mimicking, and extending natural design principles towards integration with diverging engineering objectives. Several webs are manufactured using 3D printing and tested to assess mechanical properties.

  • 3 authors
·
Apr 11, 2023

CADmium: Fine-Tuning Code Language Models for Text-Driven Sequential CAD Design

Computer-aided design (CAD) is the digital construction of 2D and 3D objects, and is central to a wide range of engineering and manufacturing applications like automobile and aviation. Despite its importance, CAD modeling remains largely a time-intensive, manual task. Recent works have attempted to automate this process with small transformer-based models and handcrafted CAD sequence representations. However, there has been little effort to leverage the potential of large language models (LLMs) for sequential CAD design. In this work, we introduce a new large-scale dataset of more than 170k CAD models annotated with high-quality, human-like descriptions generated with our pipeline based on GPT-4.1. Using this dataset, we fine-tune powerful code-LLMs to generate CAD sequences represented in a JSON-based format from natural language descriptions, demonstrating the viability and effectiveness of this approach for text-conditioned CAD generation. Because simple metrics often fail to reflect the quality of generated objects, we introduce geometric and topological metrics based on sphericity, mean curvature, and Euler characteristic to provide richer structural insights. Our experiments and ablation studies on both synthetic and human-annotated data demonstrate that CADmium is able to automate CAD design, drastically speeding up the design of new objects. The dataset, code, and fine-tuned models are available online.

  • 5 authors
·
Jul 13, 2025

Cylindric plane partitions, Lambda determinants, Commutants in semicircular systems

This thesis is divided into three parts. The first part deals with cylindric plane partitions. The second with lambda-determinants and the third with commutators in semi-circular systems. For more detailed abstract please see inside. Cylindric plane partitions may be thought of as a natural generalization of reverse plane partitions. A generating series for the enumeration of cylindric plane partitions was recently given by Borodin. The first result of section one is a new bijective proof of Borodin's identity which makes use of Fomin's growth diagram framework for generalized RSK correspondences. The second result is a (q,t)-analog of Borodin's identity which extends previous work by Okada in the reverse plane partition case. The third result is an explicit combinatorial interpretation of the Macdonald weight occurring in the (q,t)-analog using the non-intersecting lattice path model for cylindric plane partitions. Alternating sign matrices were discovered by Robbins and Rumsey whilst studying λ-determinants. In the second part of this thesis we prove a multi-parameter generalization of the λ-determinant, generalizing a recent result by di Francesco. Like the original λ-determinant, our formula exhibits the Laurent phenomenon. Semicircular systems were first introduced by Voiculescu as a part of his study of von Neumann algebras. In the third part of this thesis we study certain commutator subalgebras of the semicircular system. We find a projection matrix with an interesting self-similar structure. Making use of our projection formula we given an alternative, elementary proof that the semicircular system is a factor.

  • 1 authors
·
Oct 25, 2021

Diagnosing Generalization Failures from Representational Geometry Markers

Generalization, the ability to perform well beyond the training context, is a hallmark of biological and artificial intelligence, yet anticipating unseen failures remains a central challenge. Conventional approaches often take a ``bottom-up'' mechanistic route by reverse-engineering interpretable features or circuits to build explanatory models. While insightful, these methods often struggle to provide the high-level, predictive signals for anticipating failure in real-world deployment. Here, we propose using a ``top-down'' approach to studying generalization failures inspired by medical biomarkers: identifying system-level measurements that serve as robust indicators of a model's future performance. Rather than mapping out detailed internal mechanisms, we systematically design and test network markers to probe structure, function links, identify prognostic indicators, and validate predictions in real-world settings. In image classification, we find that task-relevant geometric properties of in-distribution (ID) object manifolds consistently forecast poor out-of-distribution (OOD) generalization. In particular, reductions in two geometric measures, effective manifold dimensionality and utility, predict weaker OOD performance across diverse architectures, optimizers, and datasets. We apply this finding to transfer learning with ImageNet-pretrained models. We consistently find that the same geometric patterns predict OOD transfer performance more reliably than ID accuracy. This work demonstrates that representational geometry can expose hidden vulnerabilities, offering more robust guidance for model selection and AI interpretability.

  • 4 authors
·
Mar 2

GASP: Gaussian Splatting for Physic-Based Simulations

Physics simulation is paramount for modeling and utilizing 3D scenes in various real-world applications. However, integrating with state-of-the-art 3D scene rendering techniques such as Gaussian Splatting (GS) remains challenging. Existing models use additional meshing mechanisms, including triangle or tetrahedron meshing, marching cubes, or cage meshes. Alternatively, we can modify the physics-grounded Newtonian dynamics to align with 3D Gaussian components. Current models take the first-order approximation of a deformation map, which locally approximates the dynamics by linear transformations. In contrast, our GS for Physics-Based Simulations (GASP) pipeline uses parametrized flat Gaussian distributions. Consequently, the problem of modeling Gaussian components using the physics engine is reduced to working with 3D points. In our work, we present additional rules for manipulating Gaussians, demonstrating how to adapt the pipeline to incorporate meshes, control Gaussian sizes during simulations, and enhance simulation efficiency. This is achieved through the Gaussian grouping strategy, which implements hierarchical structuring and enables simulations to be performed exclusively on selected Gaussians. The resulting solution can be integrated into any physics engine that can be treated as a black box. As demonstrated in our studies, the proposed pipeline exhibits superior performance on a diverse range of benchmark datasets designed for 3D object rendering. The project webpage, which includes additional visualizations, can be found at https://waczjoan.github.io/GASP.

  • 6 authors
·
Sep 9, 2024

Bulk Modulus along Jamming Transition Lines of Bidisperse Granular Packings

We present 3D DEM simulations of bidisperse granular packings to investigate their jamming densities, phi_J, and dimensionless bulk moduli, K, as a function of the size ratio, delta, and the concentration of small particles, X_{mathrm S}. We determine the partial and total bulk moduli for each packing and report the jamming transition diagram, i.e., the density or volume fraction marking both the first and second transitions of the system. At a large enough size difference, e.g., delta le 0.22, X^{*}_{mathrm S} divides the diagram with most small particles either non-jammed or jammed jointly with large ones. We find that the bulk modulus K jumps at X^{*}_{mathrm S}(delta = 0.15) approx 0.21, at the maximum jamming density, where both particle species mix most efficiently, while for X_{mathrm S} < X^{*}_{mathrm S} K is decoupled in two scenarios as a result of the first and second jamming transition. Along the second transition, K rises relative to the values found at the first transition, however, is still small compared to K at X^{*}_{mathrm S}. While the first transition is sharp, the second is smooth, carried by small-large interactions, while the small-small contacts display a transition. This demonstrates that for low enough delta and X_{mathrm S}, the jamming of small particles indeed impacts the internal resistance of the system. Our new results will allow tuning the bulk modulus K or other properties, such as the wave speed, by choosing specific sizes and concentrations based on a better understanding of whether small particles contribute to the jammed structure or not, and how the micromechanical structure behaves at either transition.

  • 4 authors
·
Mar 3, 2021

Make Geometry Matter for Spatial Reasoning

Empowered by large-scale training, vision-language models (VLMs) achieve strong image and video understanding, yet their ability to perform spatial reasoning in both static scenes and dynamic videos remains limited. Recent advances try to handle this limitation by injecting geometry tokens from pretrained 3D foundation models into VLMs. Nevertheless, we observe that naive token fusion followed by standard fine-tuning in this line of work often leaves such geometric cues underutilized for spatial reasoning, as VLMs tend to rely heavily on 2D visual cues. In this paper, we propose GeoSR, a framework designed to make geometry matter by encouraging VLMs to actively reason with geometry tokens. GeoSR introduces two key components: (1) Geometry-Unleashing Masking, which strategically masks portions of 2D vision tokens during training to weaken non-geometric shortcuts and force the model to consult geometry tokens for spatial reasoning; and (2) Geometry-Guided Fusion, a gated routing mechanism that adaptively amplifies geometry token contributions in regions where geometric evidence is critical. Together, these designs unleash the potential of geometry tokens for spatial reasoning tasks. Extensive experiments on both static and dynamic spatial reasoning benchmarks demonstrate that GeoSR consistently outperforms prior methods and establishes new state-of-the-art performance by effectively leveraging geometric information. The project page is available at https://suhzhang.github.io/GeoSR/.

SAGS: Structure-Aware 3D Gaussian Splatting

Following the advent of NeRFs, 3D Gaussian Splatting (3D-GS) has paved the way to real-time neural rendering overcoming the computational burden of volumetric methods. Following the pioneering work of 3D-GS, several methods have attempted to achieve compressible and high-fidelity performance alternatives. However, by employing a geometry-agnostic optimization scheme, these methods neglect the inherent 3D structure of the scene, thereby restricting the expressivity and the quality of the representation, resulting in various floating points and artifacts. In this work, we propose a structure-aware Gaussian Splatting method (SAGS) that implicitly encodes the geometry of the scene, which reflects to state-of-the-art rendering performance and reduced storage requirements on benchmark novel-view synthesis datasets. SAGS is founded on a local-global graph representation that facilitates the learning of complex scenes and enforces meaningful point displacements that preserve the scene's geometry. Additionally, we introduce a lightweight version of SAGS, using a simple yet effective mid-point interpolation scheme, which showcases a compact representation of the scene with up to 24times size reduction without the reliance on any compression strategies. Extensive experiments across multiple benchmark datasets demonstrate the superiority of SAGS compared to state-of-the-art 3D-GS methods under both rendering quality and model size. Besides, we demonstrate that our structure-aware method can effectively mitigate floating artifacts and irregular distortions of previous methods while obtaining precise depth maps. Project page https://eververas.github.io/SAGS/.

  • 5 authors
·
Apr 29, 2024 1

Geometric Stability: The Missing Axis of Representations

Analysis of learned representations has a blind spot: it focuses on similarity, measuring how closely embeddings align with external references, but similarity reveals only what is represented, not whether that structure is robust. We introduce geometric stability, a distinct dimension that quantifies how reliably representational geometry holds under perturbation, and present Shesha, a framework for measuring it. Across 2,463 configurations in seven domains, we show that stability and similarity are empirically uncorrelated (ρapprox 0.01) and mechanistically distinct: similarity metrics collapse after removing the top principal components, while stability retains sensitivity to fine-grained manifold structure. This distinction yields actionable insights: for safety monitoring, stability acts as a functional geometric canary, detecting structural drift nearly 2times more sensitively than CKA while filtering out the non-functional noise that triggers false alarms in rigid distance metrics; for controllability, supervised stability predicts linear steerability (ρ= 0.89-0.96); for model selection, stability dissociates from transferability, revealing a geometric tax that transfer optimization incurs. Beyond machine learning, stability predicts CRISPR perturbation coherence and neural-behavioral coupling. By quantifying how reliably systems maintain structure, geometric stability provides a necessary complement to similarity for auditing representations across biological and computational systems.

  • 1 authors
·
Jan 14 2

Volume Rendering of Neural Implicit Surfaces

Neural volume rendering became increasingly popular recently due to its success in synthesizing novel views of a scene from a sparse set of input images. So far, the geometry learned by neural volume rendering techniques was modeled using a generic density function. Furthermore, the geometry itself was extracted using an arbitrary level set of the density function leading to a noisy, often low fidelity reconstruction. The goal of this paper is to improve geometry representation and reconstruction in neural volume rendering. We achieve that by modeling the volume density as a function of the geometry. This is in contrast to previous work modeling the geometry as a function of the volume density. In more detail, we define the volume density function as Laplace's cumulative distribution function (CDF) applied to a signed distance function (SDF) representation. This simple density representation has three benefits: (i) it provides a useful inductive bias to the geometry learned in the neural volume rendering process; (ii) it facilitates a bound on the opacity approximation error, leading to an accurate sampling of the viewing ray. Accurate sampling is important to provide a precise coupling of geometry and radiance; and (iii) it allows efficient unsupervised disentanglement of shape and appearance in volume rendering. Applying this new density representation to challenging scene multiview datasets produced high quality geometry reconstructions, outperforming relevant baselines. Furthermore, switching shape and appearance between scenes is possible due to the disentanglement of the two.

  • 4 authors
·
Jun 22, 2021

GeoSDF: Plane Geometry Diagram Synthesis via Signed Distance Field

Plane Geometry Diagram Synthesis has been a crucial task in computer graphics, with applications ranging from educational tools to AI-driven mathematical reasoning. Traditionally, we rely on manual tools (e.g., Matplotlib and GeoGebra) to generate precise diagrams, but this usually requires huge, complicated calculations. Recently, researchers start to work on model-based methods (e.g., Stable Diffusion and GPT5) to automatically generate diagrams, saving operational cost but usually suffering from limited realism and insufficient accuracy. In this paper, we propose a novel framework GeoSDF, to automatically generate diagrams efficiently and accurately with Signed Distance Field (SDF). Specifically, we first represent geometric elements (e.g., points, segments, and circles) in the SDF, then construct a series of constraint functions to represent geometric relationships. Next, we optimize those constructed constraint functions to get an optimized field of both elements and constraints. Finally, by rendering the optimized field, we can obtain the synthesized diagram. In our GeoSDF, we define a symbolic language to represent geometric elements and constraints, and our synthesized geometry diagrams can be self-verified in the SDF, ensuring both mathematical accuracy and visual plausibility. In experiments, through both qualitative and quantitative analysis, GeoSDF synthesized both normal high-school level and IMO-level geometry diagrams. We achieve 88.67\% synthesis accuracy by human evaluation in the IMO problem set. Furthermore, we obtain a very high accuracy of solving geometry problems (over 95\% while the current SOTA accuracy is around 75%) by leveraging our self-verification property. All of these demonstrate the advantage of GeoSDF, paving the way for more sophisticated, accurate, and flexible generation of geometric diagrams for a wide array of applications.

  • 7 authors
·
Jun 16, 2025

CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets

In the realm of digital creativity, our potential to craft intricate 3D worlds from imagination is often hampered by the limitations of existing digital tools, which demand extensive expertise and efforts. To narrow this disparity, we introduce CLAY, a 3D geometry and material generator designed to effortlessly transform human imagination into intricate 3D digital structures. CLAY supports classic text or image inputs as well as 3D-aware controls from diverse primitives (multi-view images, voxels, bounding boxes, point clouds, implicit representations, etc). At its core is a large-scale generative model composed of a multi-resolution Variational Autoencoder (VAE) and a minimalistic latent Diffusion Transformer (DiT), to extract rich 3D priors directly from a diverse range of 3D geometries. Specifically, it adopts neural fields to represent continuous and complete surfaces and uses a geometry generative module with pure transformer blocks in latent space. We present a progressive training scheme to train CLAY on an ultra large 3D model dataset obtained through a carefully designed processing pipeline, resulting in a 3D native geometry generator with 1.5 billion parameters. For appearance generation, CLAY sets out to produce physically-based rendering (PBR) textures by employing a multi-view material diffusion model that can generate 2K resolution textures with diffuse, roughness, and metallic modalities. We demonstrate using CLAY for a range of controllable 3D asset creations, from sketchy conceptual designs to production ready assets with intricate details. Even first time users can easily use CLAY to bring their vivid 3D imaginations to life, unleashing unlimited creativity.

  • 9 authors
·
May 30, 2024 2

Geometric and Dynamic Scaling in Deep Transformers

Despite their empirical success, pushing Transformer architectures to extreme depth often leads to a paradoxical failure: representations become increasingly redundant, lose rank, and ultimately collapse. Existing explanations largely attribute this phenomenon to optimization instability or vanishing gradients, yet such accounts fail to explain why collapse persists even under modern normalization and initialization schemes. In this paper, we argue that the collapse of deep Transformers is fundamentally a geometric problem. Standard residual updates implicitly assume that feature accumulation is always beneficial, but offer no mechanism to constrain update directions or to erase outdated information. As depth increases, this leads to systematic drift off the semantic manifold and monotonic feature accumulation, causing representational degeneracy. We propose a unified geometric framework that addresses these failures through two orthogonal principles. First, manifold-constrained hyper-connections restrict residual updates to valid local tangent directions, preventing uncontrolled manifold drift. Second, deep delta learning introduces data-dependent, non-monotonic updates that enable reflection and erasure of redundant features rather than their unconditional accumulation. Together, these mechanisms decouple the direction and sign of feature updates, yielding a stable geometric evolution across depth. We term the resulting architecture the Manifold-Geometric Transformer (MGT). Our analysis predicts that enforcing geometric validity while allowing dynamic erasure is essential for avoiding rank collapse in ultra-deep networks. We outline an evaluation protocol for Transformers exceeding 100 layers to test the hypothesis that geometry, rather than depth itself, is the key limiting factor in deep representation learning.

  • 2 authors
·
Jan 2

CAT: Curvature-Adaptive Transformers for Geometry-Aware Learning

Transformers achieve strong performance across diverse domains but implicitly assume Euclidean geometry in their attention mechanisms, limiting their effectiveness on data with non-Euclidean structure. While recent extensions to hyperbolic and spherical spaces show promise for hierarchical and cyclical patterns, respectively, they require committing to a single geometry a priori, reducing flexibility when data exhibits mixed geometric properties. We introduce the Curvature-Adaptive Transformer (CAT), a novel architecture that dynamically learns per-token routing across three geometric attention branches through a lightweight, differentiable gating mechanism. Unlike fixed-geometry approaches, CAT enables adaptive geometric specialization, routing tokens to the appropriate curvature based on their local relational structure. The routing network provides interpretable curvature preferences while each branch employs geometry-specific operations optimized for its respective manifold. On knowledge graph completion benchmarks (FB15k-237, WN18RR), CAT achieves approximately 10% improvements in MRR and Hits@10 over fixed-geometry baselines with minimal overhead (5% parameter increase, comparable inference time). These results demonstrate that learned geometric adaptation outperforms any single fixed geometry for complex relational reasoning, establishing CAT as a scalable and interpretable foundation for mixture-of-geometry architectures across language, vision, and multimodal domains.

  • 3 authors
·
Oct 1, 2025

Cog2Gen3D: Sculpturing 3D Semantic-Geometric Cognition for 3D Generation

Generative models have achieved success in producing semantically plausible 2D images, but it remains challenging in 3D generation due to the absence of spatial geometry constraints. Typically, existing methods utilize geometric features as conditions to enhance spatial awareness. However, these methods can only model relative relationships and are prone to scale inconsistency of absolute geometry. Thus, we argue that semantic information and absolute geometry empower 3D cognition, thereby enabling controllable 3D generation for the physical world. In this work, we propose Cog2Gen3D, a 3D cognition-guided diffusion framework for 3D generation. Our model is guided by three key designs: 1) Cognitive Feature Embeddings. We encode different modalities into semantic and geometric representations and further extract logical representations. 2) 3D Latent Cognition Graph. We structure different representations into dual-stream semantic-geometric graphs and fuse them via common-based cross-attention to obtain a 3D cognition graph. 3) Cognition-Guided Latent Diffusion. We leverage the fused 3D cognition graph as the condition to guide the latent diffusion process for 3D Gaussian generation. Under this unified framework, the 3D cognition graph ensures the physical plausibility and structural rationality of 3D generation. Moreover, we construct a validation subset based on the Marble World Labs. Extensive experiments demonstrate that our Cog2Gen3D significantly outperforms existing methods in both semantic fidelity and geometric plausibility.

  • 5 authors
·
Mar 5

Learn2Fold: Structured Origami Generation with World Model Planning

The ability to transform a flat sheet into a complex three-dimensional structure is a fundamental test of physical intelligence. Unlike cloth manipulation, origami is governed by strict geometric axioms and hard kinematic constraints, where a single invalid crease or collision can invalidate the entire folding sequence. As a result, origami demands long-horizon constructive reasoning that jointly satisfies precise physical laws and high-level semantic intent. Existing approaches fall into two disjoint paradigms: optimization-based methods enforce physical validity but require dense, precisely specified inputs, making them unsuitable for sparse natural language descriptions, while generative foundation models excel at semantic and perceptual synthesis yet fail to produce long-horizon, physics-consistent folding processes. Consequently, generating valid origami folding sequences directly from text remains an open challenge. To address this gap, we introduce Learn2Fold, a neuro-symbolic framework that formulates origami folding as conditional program induction over a crease-pattern graph. Our key insight is to decouple semantic proposal from physical verification. A large language model generates candidate folding programs from abstract text prompts, while a learned graph-structured world model serves as a differentiable surrogate simulator that predicts physical feasibility and failure modes before execution. Integrated within a lookahead planning loop, Learn2Fold enables robust generation of physically valid folding sequences for complex and out-of-distribution patterns, demonstrating that effective spatial intelligence arises from the synergy between symbolic reasoning and grounded physical simulation.

  • 7 authors
·
Feb 2 1

Microscale stress-geometry interactions in an additively manufactured NiTi cardiovascular stent: A synchrotron dual imaging tomography and diffraction study

This study explores cardiovascular stents fabricated using laser powder bed fusion (LPBF); an emerging method to offer patient-specific customisable parts. Here, the shape memory alloy NiTi, in a near equiatomic composition, was investigated to deconvolve the material response from macroscopic component effects. Specifically, stress-geometry interactions were revealed, in-situ, for a minaturised cardiovascular stent subjected to an externally applied cylindrical stress whilst acquiring synchrotron X-ray imaging and diffraction data. The approach enabled the collection of spatially resolved micromechanical deformation data; the formation of stress-induced martensite and R-phase was evident, occurring in locations near junctions between stent ligaments where stress concentrations exist. In the as-fabricated condition, hardness maps were obtained through nanoindentation, demonstrating that the localised deformation and deformation patterning is further controlled by porosity and microstructural heterogeneity. Electron backscatter diffraction (EBSD) supported these observations, showing a finer grain structure near stent junctions with higher associated lattice curvature. These features, combined with stress concentrations when loaded will initiate localised phase transformations. If the stent was subjected to repeated loading, representing in-vivo conditions, these regions would be susceptible to cyclic damage through transformation memory loss, leading to premature component failure. This study highlights the challenges that must be addressed for the post-processing treatment of LABF-processed stents for healthcare-related applications.

  • 11 authors
·
Dec 12, 2023

Low-Dimensional Execution Manifolds in Transformer Learning Dynamics: Evidence from Modular Arithmetic Tasks

We investigate the geometric structure of learning dynamics in overparameterized transformer models through carefully controlled modular arithmetic tasks. Our primary finding is that despite operating in high-dimensional parameter spaces (d=128), transformer training trajectories rapidly collapse onto low-dimensional execution manifolds of dimension 3--4. This dimensional collapse is robust across random seeds and moderate task difficulties, though the orientation of the manifold in parameter space varies between runs. We demonstrate that this geometric structure underlies several empirically observed phenomena: (1) sharp attention concentration emerges as saturation along routing coordinates within the execution manifold, (2) SGD commutators are preferentially aligned with the execution subspace (up to 10times random baseline) early in training, with >92% of non-commutativity confined to orthogonal staging directions and this alignment decreasing as training converges, and (3) sparse autoencoders capture auxiliary routing structure but fail to isolate execution itself, which remains distributed across the low-dimensional manifold. Our results suggest a unifying geometric framework for understanding transformer learning, where the vast majority of parameters serve to absorb optimization interference while core computation occurs in a dramatically reduced subspace. These findings have implications for interpretability, training curriculum design, and understanding the role of overparameterization in neural network learning.

  • 1 authors
·
Feb 10

REMA: A Unified Reasoning Manifold Framework for Interpreting Large Language Model

Understanding how Large Language Models (LLMs) perform complex reasoning and their failure mechanisms is a challenge in interpretability research. To provide a measurable geometric analysis perspective, we define the concept of the Reasoning Manifold, a latent low-dimensional geometric structure formed by the internal representations corresponding to all correctly reasoned generations. This structure can be conceptualized as the embodiment of the effective thinking paths that the model has learned to successfully solve a given task. Based on this concept, we build REMA, a framework that explains the origins of failures by quantitatively comparing the spatial relationships of internal model representations corresponding to both erroneous and correct reasoning samples. Specifically, REMA first quantifies the geometric deviation of each erroneous representation by calculating its k-nearest neighbors distance to the approximated manifold formed by correct representations, thereby providing a unified failure signal. It then localizes the divergence points where these deviations first become significant by tracking this deviation metric across the model's layers and comparing it against a baseline of internal fluctuations from correct representations, thus identifying where the reasoning chain begins to go off-track. Our extensive experiments on diverse language and multimodal models and tasks demonstrate the low-dimensional nature of the reasoning manifold and the high separability between erroneous and correct reasoning representations. The results also validate the effectiveness of the REMA framework in analyzing the origins of reasoning failures. This research connects abstract reasoning failures to measurable geometric deviations in representations, providing new avenues for in-depth understanding and diagnosis of the internal computational processes of black-box models.

  • 8 authors
·
Sep 26, 2025 2

Unified Micromechanics Theory of Composites

We consider the matrix composite materials (CM) of either random (statistically homogeneous or inhomogeneous), periodic, or deterministic (neither random nor periodic) structures. CMs exhibit linear or nonlinear behavior, coupled or uncoupled multi-physical phenomena, locally elastic, weakly nonlocal (strain gradient and stress gradient), or strongly nonlocal (strain-type and displacement-type, peridynamics) phase properties. A modified Computational Analytical Micromechanics (CAM) approach introduces an exact Additive General Integral Equation (AGIE) for CMs of any structure and phase properties mentioned above. The unified iteration solution of static AGIEs is adapted to the body force with compact support serving as a fundamentally new universal training parameter. The approach also establishes a critical threshold for filtering out unsuitable sub-datasets of effective parameters through a novel Representative Volume Element (RVE) concept, which extends Hill's classical framework. This RVE concept eliminates sample size, boundary layer, and edge effects, making it applicable to CMs of any structure and phase properties, regardless of local or nonlocal, linear or nonlinear. Incorporating this new RVE concept into machine learning and neural network techniques enables the construction of any unpredefined surrogate nonlocal operators. The methodology is structured as a modular, block-based framework, allowing independent development and refinement of software components. This flexible, robust AGIE-CAM framework integrates data-driven, multi-scale, and multi-physics modeling, accelerating research in CM of any microtopology and phase properties considered. The AGIE-CAM framework represents a groundbreaking paradigm shift in the micromechanics of composites, redefining the very philosophy that underpins our understanding of their behavior at the microscopic level.

  • 1 authors
·
Mar 15, 2025

SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling

Creating high-fidelity 3D meshes with arbitrary topology, including open surfaces and complex interiors, remains a significant challenge. Existing implicit field methods often require costly and detail-degrading watertight conversion, while other approaches struggle with high resolutions. This paper introduces SparseFlex, a novel sparse-structured isosurface representation that enables differentiable mesh reconstruction at resolutions up to 1024^3 directly from rendering losses. SparseFlex combines the accuracy of Flexicubes with a sparse voxel structure, focusing computation on surface-adjacent regions and efficiently handling open surfaces. Crucially, we introduce a frustum-aware sectional voxel training strategy that activates only relevant voxels during rendering, dramatically reducing memory consumption and enabling high-resolution training. This also allows, for the first time, the reconstruction of mesh interiors using only rendering supervision. Building upon this, we demonstrate a complete shape modeling pipeline by training a variational autoencoder (VAE) and a rectified flow transformer for high-quality 3D shape generation. Our experiments show state-of-the-art reconstruction accuracy, with a ~82% reduction in Chamfer Distance and a ~88% increase in F-score compared to previous methods, and demonstrate the generation of high-resolution, detailed 3D shapes with arbitrary topology. By enabling high-resolution, differentiable mesh reconstruction and generation with rendering losses, SparseFlex significantly advances the state-of-the-art in 3D shape representation and modeling.

  • 9 authors
·
Mar 27, 2025 2

Microstructural Geometry Revealed by NMR Lineshape Analysis

We introduce a technique for extracting microstructural geometry from NMR lineshape analysis in porous materials at angstrom-scale resolution with the use of weak magnetic field gradients. Diverging from the generally held view of FID signals undergoing simple exponential decay, we show that a detailed analysis of the line shape can unravel structural geometry on much smaller scales than previously thought. While the original q-space PFG NMR relies on strong magnetic field gradients in order to achieve high spatial resolution, our current approach reaches comparable or higher resolution using much weaker gradients. As a model system, we simulated gas diffusion for xenon confined within carbon nanotubes over a range of temperatures and nanotube diameters in order to unveil manifestations of confinement in the diffusion behavior. We report a multiscale scheme that couples the above MD simulations with the generalized Langevin equation to estimate the transport properties of interest for this problem, such as diffusivity coefficients and NMR lineshapes, using the Green-Kubo correlation function to correctly evaluate time-dependent diffusion. Our results highlight how NMR methodologies can be adapted as effective means towards structural investigation at very small scales when dealing with complicated geometries. This method is expected to find applications in materials science, catalysis, biomedicine and other areas.

  • 2 authors
·
Oct 12, 2024

Visual Diffusion Models are Geometric Solvers

In this paper we show that visual diffusion models can serve as effective geometric solvers: they can directly reason about geometric problems by working in pixel space. We first demonstrate this on the Inscribed Square Problem, a long-standing problem in geometry that asks whether every Jordan curve contains four points forming a square. We then extend the approach to two other well-known hard geometric problems: the Steiner Tree Problem and the Simple Polygon Problem. Our method treats each problem instance as an image and trains a standard visual diffusion model that transforms Gaussian noise into an image representing a valid approximate solution that closely matches the exact one. The model learns to transform noisy geometric structures into correct configurations, effectively recasting geometric reasoning as image generation. Unlike prior work that necessitates specialized architectures and domain-specific adaptations when applying diffusion to parametric geometric representations, we employ a standard visual diffusion model that operates on the visual representation of the problem. This simplicity highlights a surprising bridge between generative modeling and geometric problem solving. Beyond the specific problems studied here, our results point toward a broader paradigm: operating in image space provides a general and practical framework for approximating notoriously hard problems, and opens the door to tackling a far wider class of challenging geometric tasks.

  • 6 authors
·
Oct 24, 2025 1

Nautilus: Locality-aware Autoencoder for Scalable Mesh Generation

Triangle meshes are fundamental to 3D applications, enabling efficient modification and rasterization while maintaining compatibility with standard rendering pipelines. However, current automatic mesh generation methods typically rely on intermediate representations that lack the continuous surface quality inherent to meshes. Converting these representations into meshes produces dense, suboptimal outputs. Although recent autoregressive approaches demonstrate promise in directly modeling mesh vertices and faces, they are constrained by the limitation in face count, scalability, and structural fidelity. To address these challenges, we propose Nautilus, a locality-aware autoencoder for artist-like mesh generation that leverages the local properties of manifold meshes to achieve structural fidelity and efficient representation. Our approach introduces a novel tokenization algorithm that preserves face proximity relationships and compresses sequence length through locally shared vertices and edges, enabling the generation of meshes with an unprecedented scale of up to 5,000 faces. Furthermore, we develop a Dual-stream Point Conditioner that provides multi-scale geometric guidance, ensuring global consistency and local structural fidelity by capturing fine-grained geometric features. Extensive experiments demonstrate that Nautilus significantly outperforms state-of-the-art methods in both fidelity and scalability. The project page is at https://nautilusmeshgen.github.io.

  • 9 authors
·
Jan 24, 2025

Homogenization framework for rigid and non-rigid foldable origami metamaterials

Origami metamaterials typically consist of folded sheets with periodic patterns, conferring them with remarkable mechanical properties. In the context of Continuum Mechanics, the majority of existing predictive methods are mechanism analogs which favor rigid folding and panel bending. While effective in predicting primary deformation modes, existing methods fall short in capturing the full spectrum of deformation of non-rigid foldable origami, such as the emergence of curvature along straight creases, local strain at vertices and warpage in panels. To fully capture the entire deformation spectrum and enhance the accuracy of existing methods, this paper introduces a homogenization framework for origami metamaterials where the faces are modeled as plate elements. Both asymptotic and energy-based homogenization methods are formulated and implemented. As a representative crease pattern, we examine the Miura origami sheet homogenized as an equivalent Kirchhoff-Love plate. The results reveal that certain effective elastic properties are nonlinearly related to both the initial fold angle and the crease stiffness. When benchmarked with results from fully resolved simulations, our framework yields errors up to 12.9\%, while existing models, including the bar-and-hinge model and the rigid-panel model, show up to 161\% error. The differences in errors are associated with the complex modes of crease and panel deformation in non-rigid origami, unexplored by the existing models. This work demonstrates a precise and efficient continuum framework for origami metamaterials as an effective strategy for predicting their elastic properties, understanding their mechanics, and designing their functionalities.

  • 4 authors
·
Aug 22, 2025

From Density to Geometry: YOLOv8 Instance Segmentation for Reverse Engineering of Optimized Structures

This paper introduces YOLOv8-TO, a novel approach for reverse engineering of topology-optimized structures into interpretable geometric parameters using the YOLOv8 instance segmentation model. Density-based topology optimization methods require post-processing to convert the optimal density distribution into a parametric representation for design exploration and integration with CAD tools. Traditional methods such as skeletonization struggle with complex geometries and require manual intervention. YOLOv8-TO addresses these challenges by training a custom YOLOv8 model to automatically detect and reconstruct structural components from binary density distributions. The model is trained on a diverse dataset of both optimized and random structures generated using the Moving Morphable Components method. A custom reconstruction loss function based on the dice coefficient of the predicted geometry is used to train the new regression head of the model via self-supervised learning. The method is evaluated on test sets generated from different topology optimization methods, including out-of-distribution samples, and compared against a skeletonization approach. Results show that YOLOv8-TO significantly outperforms skeletonization in reconstructing visually and structurally similar designs. The method showcases an average improvement of 13.84% in the Dice coefficient, with peak enhancements reaching 20.78%. The method demonstrates good generalization to complex geometries and fast inference times, making it suitable for integration into design workflows using regular workstations. Limitations include the sensitivity to non-max suppression thresholds. YOLOv8-TO represents a significant advancement in topology optimization post-processing, enabling efficient and accurate reverse engineering of optimized structures for design exploration and manufacturing.

  • 4 authors
·
Apr 29, 2024

MatText: Do Language Models Need More than Text & Scale for Materials Modeling?

Effectively representing materials as text has the potential to leverage the vast advancements of large language models (LLMs) for discovering new materials. While LLMs have shown remarkable success in various domains, their application to materials science remains underexplored. A fundamental challenge is the lack of understanding of how to best utilize text-based representations for materials modeling. This challenge is further compounded by the absence of a comprehensive benchmark to rigorously evaluate the capabilities and limitations of these text representations in capturing the complexity of material systems. To address this gap, we propose MatText, a suite of benchmarking tools and datasets designed to systematically evaluate the performance of language models in modeling materials. MatText encompasses nine distinct text-based representations for material systems, including several novel representations. Each representation incorporates unique inductive biases that capture relevant information and integrate prior physical knowledge about materials. Additionally, MatText provides essential tools for training and benchmarking the performance of language models in the context of materials science. These tools include standardized dataset splits for each representation, probes for evaluating sensitivity to geometric factors, and tools for seamlessly converting crystal structures into text. Using MatText, we conduct an extensive analysis of the capabilities of language models in modeling materials. Our findings reveal that current language models consistently struggle to capture the geometric information crucial for materials modeling across all representations. Instead, these models tend to leverage local information, which is emphasized in some of our novel representations. Our analysis underscores MatText's ability to reveal shortcomings of text-based methods for materials design.

  • 3 authors
·
Jun 25, 2024