new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Apr 14

ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial and Multi-Map SLAM

This paper presents ORB-SLAM3, the first system able to perform visual, visual-inertial and multi-map SLAM with monocular, stereo and RGB-D cameras, using pin-hole and fisheye lens models. The first main novelty is a feature-based tightly-integrated visual-inertial SLAM system that fully relies on Maximum-a-Posteriori (MAP) estimation, even during the IMU initialization phase. The result is a system that operates robustly in real-time, in small and large, indoor and outdoor environments, and is 2 to 5 times more accurate than previous approaches. The second main novelty is a multiple map system that relies on a new place recognition method with improved recall. Thanks to it, ORB-SLAM3 is able to survive to long periods of poor visual information: when it gets lost, it starts a new map that will be seamlessly merged with previous maps when revisiting mapped areas. Compared with visual odometry systems that only use information from the last few seconds, ORB-SLAM3 is the first system able to reuse in all the algorithm stages all previous information. This allows to include in bundle adjustment co-visible keyframes, that provide high parallax observations boosting accuracy, even if they are widely separated in time or if they come from a previous mapping session. Our experiments show that, in all sensor configurations, ORB-SLAM3 is as robust as the best systems available in the literature, and significantly more accurate. Notably, our stereo-inertial SLAM achieves an average accuracy of 3.6 cm on the EuRoC drone and 9 mm under quick hand-held motions in the room of TUM-VI dataset, a setting representative of AR/VR scenarios. For the benefit of the community we make public the source code.

  • 5 authors
·
Jul 23, 2020

InfiMed-ORBIT: Aligning LLMs on Open-Ended Complex Tasks via Rubric-Based Incremental Training

Large Language Models (LLMs) have shown substantial advances through reinforcement learning (RL), particularly in domains where rewards can be programmatically verified, such as mathematics and code. In these areas, models benefit from a well-defined operational base guided by explicit rule-based objectives. However, this progress reveals a significant limitation: in open-ended domains where rewards are ambiguous, subjective, or context-dependent, such as creative writing, scientific reasoning, and notably medical consultation, robust reward functions are lacking, making these areas challenging for current RL strategies. To bridge this gap, we introduce ORBIT, an open-ended rubric-based incremental training framework specifically designed for high-stakes medical dialogue. ORBIT integrates syn- thetic dialogue generation with the dynamic creation of rubrics, employing these rubrics to direct an incremental RL process. In particular, this approach does not depend on external medical knowledge or manual rules, instead utilizing rubric-guided feedback to shape learning. When implemented on the Qwen3-4B-Instruct model, our method can greatly enhance its performance on the HealthBench-Hard benchmark from 7.0 to 27.2 using only 2k samples, thus achieving state-of-the-art results for models of this scale. Our analysis confirms that rubric-driven RL fos-ters consistent performance gains across diverse consultation scenarios, going beyond simple numerical improvements. These findings underscore rubric-based feedback as a scalable strategy for advancing LLMs in intricate, open-ended tasks.

  • 6 authors
·
Oct 17, 2025 2

Orbital Transformers for Predicting Wavefunctions in Time-Dependent Density Functional Theory

We aim to learn wavefunctions simulated by time-dependent density functional theory (TDDFT), which can be efficiently represented as linear combination coefficients of atomic orbitals. In real-time TDDFT, the electronic wavefunctions of a molecule evolve over time in response to an external excitation, enabling first-principles predictions of physical properties such as optical absorption, electron dynamics, and high-order response. However, conventional real-time TDDFT relies on time-consuming propagation of all occupied states with fine time steps. In this work, we propose OrbEvo, which is based on an equivariant graph transformer architecture and learns to evolve the full electronic wavefunction coefficients across time steps. First, to account for external field, we design an equivariant conditioning to encode both strength and direction of external electric field and break the symmetry from SO(3) to SO(2). Furthermore, we design two OrbEvo models, OrbEvo-WF and OrbEvo-DM, using wavefunction pooling and density matrix as interaction method, respectively. Motivated by the central role of the density functional in TDDFT, OrbEvo-DM encodes the density matrix aggregated from all occupied electronic states into feature vectors via tensor contraction, providing a more intuitive approach to learn the time evolution operator. We adopt a training strategy specifically tailored to limit the error accumulation of time-dependent wavefunctions over autoregressive rollout. To evaluate our approach, we generate TDDFT datasets consisting of 5,000 different molecules in the QM9 dataset and 1,500 molecular configurations of the malonaldehyde molecule in the MD17 dataset. Results show that our OrbEvo model accurately captures quantum dynamics of excited states under external field, including time-dependent wavefunctions, time-dependent dipole moment, and optical absorption spectra.

  • 6 authors
·
Mar 3

ORBIT: An Object Property Reasoning Benchmark for Visual Inference Tasks

While vision-language models (VLMs) have made remarkable progress on many popular visual question answering (VQA) benchmarks, it remains unclear whether they abstract and reason over depicted objects. Inspired by human object categorisation, object property reasoning involves identifying and recognising low-level details and higher-level abstractions. While current VQA benchmarks consider a limited set of object property attributes like size, they typically blend perception and reasoning, and lack representativeness in terms of reasoning and image categories. To this end, we introduce a systematic evaluation framework with images of three representative types, three reasoning levels of increasing complexity, and four object property dimensions driven by prior work on commonsense reasoning. We develop a procedure to instantiate this benchmark into ORBIT, a multi-level reasoning VQA benchmark for object properties comprising 360 images paired with a total of 1,080 count-based questions. Experiments with 12 state-of-the-art VLMs in zero-shot settings reveal significant limitations compared to humans, with the best-performing model only reaching 40\% accuracy. VLMs struggle particularly with realistic (photographic) images, counterfactual reasoning about physical and functional properties, and higher counts. ORBIT points to the need to develop methods for scalable benchmarking, generalize annotation guidelines, and explore additional reasoning VLMs. We make the ORBIT benchmark and the experimental code available to support such endeavors.

  • 5 authors
·
Aug 14, 2025

KineticNet: Deep learning a transferable kinetic energy functional for orbital-free density functional theory

Orbital-free density functional theory (OF-DFT) holds the promise to compute ground state molecular properties at minimal cost. However, it has been held back by our inability to compute the kinetic energy as a functional of the electron density only. We here set out to learn the kinetic energy functional from ground truth provided by the more expensive Kohn-Sham density functional theory. Such learning is confronted with two key challenges: Giving the model sufficient expressivity and spatial context while limiting the memory footprint to afford computations on a GPU; and creating a sufficiently broad distribution of training data to enable iterative density optimization even when starting from a poor initial guess. In response, we introduce KineticNet, an equivariant deep neural network architecture based on point convolutions adapted to the prediction of quantities on molecular quadrature grids. Important contributions include convolution filters with sufficient spatial resolution in the vicinity of the nuclear cusp, an atom-centric sparse but expressive architecture that relays information across multiple bond lengths; and a new strategy to generate varied training data by finding ground state densities in the face of perturbations by a random external potential. KineticNet achieves, for the first time, chemical accuracy of the learned functionals across input densities and geometries of tiny molecules. For two electron systems, we additionally demonstrate OF-DFT density optimization with chemical accuracy.

  • 5 authors
·
May 8, 2023

OrbNet Denali: A machine learning potential for biological and organic chemistry with semi-empirical cost and DFT accuracy

We present OrbNet Denali, a machine learning model for electronic structure that is designed as a drop-in replacement for ground-state density functional theory (DFT) energy calculations. The model is a message-passing neural network that uses symmetry-adapted atomic orbital features from a low-cost quantum calculation to predict the energy of a molecule. OrbNet Denali is trained on a vast dataset of 2.3 million DFT calculations on molecules and geometries. This dataset covers the most common elements in bio- and organic chemistry (H, Li, B, C, N, O, F, Na, Mg, Si, P, S, Cl, K, Ca, Br, I) as well as charged molecules. OrbNet Denali is demonstrated on several well-established benchmark datasets, and we find that it provides accuracy that is on par with modern DFT methods while offering a speedup of up to three orders of magnitude. For the GMTKN55 benchmark set, OrbNet Denali achieves WTMAD-1 and WTMAD-2 scores of 7.19 and 9.84, on par with modern DFT functionals. For several GMTKN55 subsets, which contain chemical problems that are not present in the training set, OrbNet Denali produces a mean absolute error comparable to those of DFT methods. For the Hutchison conformers benchmark set, OrbNet Denali has a median correlation coefficient of R^2=0.90 compared to the reference DLPNO-CCSD(T) calculation, and R^2=0.97 compared to the method used to generate the training data (wB97X-D3/def2-TZVP), exceeding the performance of any other method with a similar cost. Similarly, the model reaches chemical accuracy for non-covalent interactions in the S66x10 dataset. For torsional profiles, OrbNet Denali reproduces the torsion profiles of wB97X-D3/def2-TZVP with an average MAE of 0.12 kcal/mol for the potential energy surfaces of the diverse fragments in the TorsionNet500 dataset.

  • 11 authors
·
Jul 1, 2021

Orbital Graph Convolutional Neural Network for Material Property Prediction

Material representations that are compatible with machine learning models play a key role in developing models that exhibit high accuracy for property prediction. Atomic orbital interactions are one of the important factors that govern the properties of crystalline materials, from which the local chemical environments of atoms is inferred. Therefore, to develop robust machine learningmodels for material properties prediction, it is imperative to include features representing such chemical attributes. Here, we propose the Orbital Graph Convolutional Neural Network (OGCNN), a crystal graph convolutional neural network framework that includes atomic orbital interaction features that learns material properties in a robust way. In addition, we embedded an encoder-decoder network into the OGCNN enabling it to learn important features among basic atomic (elemental features), orbital-orbital interactions, and topological features. We examined the performance of this model on a broad range of crystalline material data to predict different properties. We benchmarked the performance of the OGCNN model with that of: 1) the crystal graph convolutional neural network (CGCNN), 2) other state-of-the-art descriptors for material representations including Many-body Tensor Representation (MBTR) and the Smooth Overlap of Atomic Positions (SOAP), and 3) other conventional regression machine learning algorithms where different crystal featurization methods have been used. We find that OGCNN significantly outperforms them. The OGCNN model with high predictive accuracy can be used to discover new materials among the immense phase and compound spaces of materials

  • 6 authors
·
Aug 14, 2020

ORBSLAM-Atlas: a robust and accurate multi-map system

We propose ORBSLAM-Atlas, a system able to handle an unlimited number of disconnected sub-maps, that includes a robust map merging algorithm able to detect sub-maps with common regions and seamlessly fuse them. The outstanding robustness and accuracy of ORBSLAM are due to its ability to detect wide-baseline matches between keyframes, and to exploit them by means of non-linear optimization, however it only can handle a single map. ORBSLAM-Atlas brings the wide-baseline matching detection and exploitation to the multiple map arena. The result is a SLAM system significantly more general and robust, able to perform multi-session mapping. If tracking is lost during exploration, instead of freezing the map, a new sub-map is launched, and it can be fused with the previous map when common parts are visited. Our criteria to declare the camera lost contrast with previous approaches that simply count the number of tracked points, we propose to discard also inaccurately estimated camera poses due to bad geometrical conditioning. As a result, the map is split into more accurate sub-maps, that are eventually merged in a more accurate global map, thanks to the multi-mapping capabilities. We provide extensive experimental validation in the EuRoC datasets, where ORBSLAM-Atlas obtains accurate monocular and stereo results in the difficult sequences where ORBSLAM failed. We also build global maps after multiple sessions in the same room, obtaining the best results to date, between 2 and 3 times more accurate than competing multi-map approaches. We also show the robustness and capability of our system to deal with dynamic scenes, quantitatively in the EuRoC datasets and qualitatively in a densely populated corridor where camera occlusions and tracking losses are frequent.

  • 3 authors
·
Aug 30, 2019

From Orbit to Ground: Generative City Photogrammetry from Extreme Off-Nadir Satellite Images

City-scale 3D reconstruction from satellite imagery presents the challenge of extreme viewpoint extrapolation, where our goal is to synthesize ground-level novel views from sparse orbital images with minimal parallax. This requires inferring nearly 90^circ viewpoint gaps from image sources with severely foreshortened facades and flawed textures, causing state-of-the-art reconstruction engines such as NeRF and 3DGS to fail. To address this problem, we propose two design choices tailored for city structures and satellite inputs. First, we model city geometry as a 2.5D height map, implemented as a Z-monotonic signed distance field (SDF) that matches urban building layouts from top-down viewpoints. This stabilizes geometry optimization under sparse, off-nadir satellite views and yields a watertight mesh with crisp roofs and clean, vertically extruded facades. Second, we paint the mesh appearance from satellite images via differentiable rendering techniques. While the satellite inputs may contain long-range, blurry captures, we further train a generative texture restoration network to enhance the appearance, recovering high-frequency, plausible texture details from degraded inputs. Our method's scalability and robustness are demonstrated through extensive experiments on large-scale urban reconstruction. For example, in our teaser figure, we reconstruct a 4,km^2 real-world region from only a few satellite images, achieving state-of-the-art performance in synthesizing photorealistic ground views. The resulting models are not only visually compelling but also serve as high-fidelity, application-ready assets for downstream tasks like urban planning and simulation. Project page can be found at https://pku-vcl-geometry.github.io/Orbit2Ground/.

  • 13 authors
·
Dec 8, 2025

Radii, masses, and transit-timing variations of the three-planet system orbiting the naked-eye star TOI-396

TOI-396 is an F6V star (Vapprox6.4) orbited by three transiting planets. The orbital periods of the two innermost planets are close to the 5:3 commensurability (P_b sim3.6 d and P_c sim6.0 d). To measure the masses of the three planets, refine their radii, and investigate whether planets b and c are in MMR, we carried out HARPS RV observations and retrieved photometric data from TESS. We extracted the RVs via a skew-normal fit onto the HARPS CCFs and performed an MCMC joint analysis of the Doppler measurements and transit photometry, while employing the breakpoint method to remove stellar activity from the RV time series. We also performed a thorough TTV dynamical analysis of the system. Our analysis confirms that the three planets have similar sizes: R_b=2.004_{-0.047}^{+0.045}R_{oplus}; R_c=1.979_{-0.051}^{+0.054}R_{oplus}; R_d=2.001_{-0.064}^{+0.063}R_{oplus}. For the first time, we have determined the RV masses for TOI-396b and d: M_b=3.55_{-0.96}^{+0.94}M_{oplus} (rho_b=2.44_{-0.68}^{+0.69} g cm^{-3}) and M_d=7.1pm1.6M_{oplus} (rho_d=4.9_{-1.1}^{+1.2} g cm^{-3}). Our results suggest a quite unusual system architecture, with the outermost planet being the densest. The Doppler reflex motion induced by TOI-396c remains undetected in our RV time series, likely due to the proximity of P_c to the star's rotation period (P_{rot}=6.7pm1.3 d). We also discovered that TOI-396b and c display significant TTVs. While the TTV dynamical analysis returns a formally precise mass for TOI-396c (M_{c,dyn}=2.24^{+0.13}_{-0.67}M_{oplus}), the result might not be accurate owing to the poor sampling of the TTV phase. We also conclude that TOI-396b and c are close to but out of the 5:3 MMR. Our numerical simulation suggests TTV semi-amplitudes of up to 5 hours over a temporal baseline of sim5.2 years.

  • 41 authors
·
Nov 22, 2024

Hidden orbital polarization in diamond, silicon, germanium, gallium arsenide and layered materials

It was previously believed that the Bloch electronic states of non-magnetic materials with inversion symmetry cannot have finite spin polarizations. However, since the seminal work by Zhang et al. [Nat. Phys. 10, 387-393 (2014)] on local spin polarizations of Bloch states in non-magnetic, centrosymmetric materials, the scope of spintronics has been significantly broadened. Here, we show, using a framework that is universally applicable independent of whether hidden spin polarizations are small (e.g., diamond, Si, Ge, and GaAs) or large (e.g., MoS2 and WSe2), that the corresponding quantity arising from orbital - instead of spin - degrees of freedom, the hidden orbital polarization, is (i) much more abundant in nature since it exists even without spin-orbit coupling and (ii) more fundamental since the interband matrix elements of the site-dependent orbital angular momentum operator determines the hidden spin polarization. We predict that the hidden spin polarization of transition metal dichalcogenides is reduced significantly upon compression. We suggest experimental signatures of hidden orbital polarization from photoemission spectroscopies and demonstrate that the current-induced hidden orbital polarization may play a far more important role than its spin counterpart in antiferromagnetic information technology by calculating the current-driven antiferromagnetism in compressed silicon.

  • 2 authors
·
Aug 21, 2016

Thermal Image Refinement with Depth Estimation using Recurrent Networks for Monocular ORB-SLAM3

Autonomous navigation in GPS-denied and visually degraded environments remains challenging for unmanned aerial vehicles (UAVs). To this end, we investigate the use of a monocular thermal camera as a standalone sensor on a UAV platform for real-time depth estimation and simultaneous localization and mapping (SLAM). To extract depth information from thermal images, we propose a novel pipeline employing a lightweight supervised network with recurrent blocks (RBs) integrated to capture temporal dependencies, enabling more robust predictions. The network combines lightweight convolutional backbones with a thermal refinement network (T-RefNet) to refine raw thermal inputs and enhance feature visibility. The refined thermal images and predicted depth maps are integrated into ORB-SLAM3, enabling thermal-only localization. Unlike previous methods, the network is trained on a custom non-radiometric dataset, obviating the need for high-cost radiometric thermal cameras. Experimental results on datasets and UAV flights demonstrate competitive depth accuracy and robust SLAM performance under low-light conditions. On the radiometric VIVID++ (indoor-dark) dataset, our method achieves an absolute relative error of approximately 0.06, compared to baselines exceeding 0.11. In our non-radiometric indoor set, baseline errors remain above 0.24, whereas our approach remains below 0.10. Thermal-only ORB-SLAM3 maintains a mean trajectory error under 0.4 m.

  • 5 authors
·
Mar 16

The S2 orbit and tidally disrupted binaries: indications for collisional depletion in the Galactic center

The properties of the stellar cluster surrounding Sagittarius A* can be assessed indirectly through the motion of the S-stars. Specifically, the current accuracy to which the prograde precession of the S2 star is measured allows to place significant constraints on the extended mass enclosed by its orbit. We suggest that high velocity destructive collisions (DCs) offer a natural mechanism for depleting the mass inside the S2 orbit, thus allowing to reconcile the measured precession and the existence of a dense stellar cluster. Such a solution is especially necessary when considering that stars are supplied to the inner part of the cluster by both dynamical relaxation and by stars being captured in tight orbits during tidal disruption of binaries. We use analytic arguments and results from simulations to demonstrate that in order to obtain a precession that is consistent with observations, collisional depletion is necessary if the capture rate is greater than a few 10^{-6} yr^{-1}. We also show that fluctuations arising from the finite number of stars cannot serve as an alternative to DCs for generating consistency with the observed S2 precession. We conclude that astrometric observations of the S-stars provide a meaningful indication that the inner part of our galactic center is shaped by collisional depletion, supporting the hypothesis that DCs occur in galactic nuclei at an astrophysically significant rate.

  • 2 authors
·
Dec 10, 2024

Colors and Dynamics of a Near-Sun Orbital Asteroid Family: 2021 PH27 and 2025 GN1

We observed the dynamically similar near-Sun asteroids 2021 PH27 and 2025 GN1 for their optical colors. These objects have the lowest known semi-major axes of any asteroids. 2021 PH27 has the largest general relativistic effects of any known solar system object. The small semi-major axis and very close passage to the Sun suggests the extreme thermal and gravitational environment should highly modify these asteroids' surfaces. From g', r', i' and z'-band imaging, we find the colors of 2021 PH27 to be between the two major asteroid types the S and C classes (g'-r'= 0.58 +- 0.02, r'-i'=0.12 +- 0.02 and i'-z'=-0.08 +- 0.05 mags). With a spectral slope of 6.8 +-0.03 percent per 100nm, 2021 PH27 is a X-type asteroid and requires albedo or spectral features to further identify its composition. We find the dynamically similar 2025 GN1 also has very similar colors (g'-r'=0.55 +-0.06 and r'-i'=0.14 +-0.04) as 2021 PH27, suggesting these objects are fragments from a once larger parent asteroid or 2021 PH27 is shedding material. The colors are not blue like some other near-Sun asteroids such as 3200 Phaethon that have been interpreted to be from the loss of reddening substances from the extreme temperatures. There is no evidence of activity or a large amplitude period for 2021 PH27, whereas 2025 GN1 might have a more significant rotational light curve. 2025 GN1 may have a very close encounter or hit Venus in about 2155 years and likely separated from 2021 PH27 in about the last 10 kyrs.

  • 9 authors
·
Apr 22, 2025

Origin of Phobos and Deimos : Orbital evolution shortly after formation from a potential dislocation

This paper deals with the formation and evolution of Mars' moons, Phobos and Deimos, assuming the dislocation of a larger progenitor as the origin of these moons. The study by Hyodo et al. (2022) argue that under somewhat simplistic modeling, the post-dislocation orbits of Phobos and Deimos inevitably collide within 10,000 years, leading to their mutual annihilation. These findings are based on N-body simulations, accounting for Mars' J_2 and J_4 gravitational perturbations and mutual perturbations between the moons. In this paper, we challenge these findings by extending their work. We incorporate important perturbations such as solar perturbations, Mars' axial precession and nutation, and its deformation along three axes. We also extend some of the hypotheses made by Hyodo et al. (2022) concerning the initial distribution of Phobos and Deimos after the dislocation. Our analysis reveals that including these additional perturbations as well as the possibility of having more than two fragments after the dislocation does not alter the ultimate fate of Phobos and Deimos. The moons still converge towards collision within comparable timescales, supporting Hyodo et al. (2022) conclusions that the dislocation hypothesis under the dynamical scenario developed by Bagheri et al. (2021) has, in the best conditions, about 10\% chance of surviving after the first 100,000 years following their formation.

  • 3 authors
·
Apr 11, 2025

PECCARY: A novel approach for characterizing orbital complexity, stochasticity, and regularity

Permutation Entropy and statistiCal Complexity Analysis for astRophYsics (PECCARY) is a computationally inexpensive, statistical method by which any time-series can be characterized as predominantly regular, complex, or stochastic. Elements of the PECCARY method have been used in a variety of physical, biological, economic, and mathematical scenarios, but have not yet gained traction in the astrophysical community. This study introduces the PECCARY technique with the specific aims to motivate its use in and optimize it for the analysis of astrophysical orbital systems. PECCARY works by decomposing a time-dependent measure, such as the x-coordinate or orbital angular momentum time-series, into ordinal patterns. Due to its unique approach and statistical nature, PECCARY is well-suited for detecting preferred and forbidden patterns (a signature of chaos), even when the chaotic behavior is short-lived or when working with a relatively short duration time-series or small sets of time-series data. A variety of examples are used to demonstrate the capabilities of PECCARY. These include mathematical examples (sine waves, varieties of noise, sums of sine waves, well-known chaotic functions), a double pendulum system, and astrophysical tracer particle simulations with potentials of varying intricacies. Since the adopted timescale used to diagnose a given time-series can affect the outcome, a method is presented to identify an ideal sampling scheme, constrained by the overall duration and the natural timescale of the system. The accompanying PECCARY Python package and its usage are discussed.

  • 3 authors
·
Jul 16, 2024

The Impact of Stellar Flares on the Atmospheric Escape of Exoplanets orbiting M stars I: Insights from the AU Mic System

The X-rays and Extreme Ultraviolet (XUV) emission from M stars can drive the atmospheric escape on planets orbiting them. M stars are also known for their frequent emission of stellar flares, which will increase the high-energy flux received by their orbiting planets. To understand how stellar flares impact the primordial atmospheres of planets orbiting young M stars, we use UV spectroscopic data of flares from the Habitable Zones and M dwarf Activity across Time (HAZMAT) and Measurements of the Ultraviolet Spectral Characteristics of Low-mass Exoplanetary Systems (MUSCLES) programs as a proxy to the XUV flare emission. Using the software package VPLanet, we simulate the young AU Mic planetary system composed of two Neptune-sized and one Earth-sized planet orbiting a 23-Myr-old M1 star. Our findings show that the Earth-sized planet AU Mic d should be in the process of losing completely its atmosphere in the next couple million years, solely due to the quiescent emission, with flares not significantly contributing to its atmospheric escape due to the small size of AU mic d and its close-in distance from the star. However, our results indicate that flares would play a crucial role for such planets further away, in the habitable zone (i.e. 0.2935 AU) of AU Mic-like stars during the post-saturation phase, accelerating the total atmospheric loss process by a few billion years. For planets between 0.365 AU and the HZ outer edge, the additional XUV from flares is necessary to deplete primordial atmospheres fully since the quiescent emission alone is insufficient.

  • 4 authors
·
Mar 17, 2025

Evaluating Machine Learning Models with NERO: Non-Equivariance Revealed on Orbits

Proper evaluations are crucial for better understanding, troubleshooting, interpreting model behaviors and further improving model performance. While using scalar-based error metrics provides a fast way to overview model performance, they are often too abstract to display certain weak spots and lack information regarding important model properties, such as robustness. This not only hinders machine learning models from being more interpretable and gaining trust, but also can be misleading to both model developers and users. Additionally, conventional evaluation procedures often leave researchers unclear about where and how model fails, which complicates model comparisons and further developments. To address these issues, we propose a novel evaluation workflow, named Non-Equivariance Revealed on Orbits (NERO) Evaluation. The goal of NERO evaluation is to turn focus from traditional scalar-based metrics onto evaluating and visualizing models equivariance, closely capturing model robustness, as well as to allow researchers quickly investigating interesting or unexpected model behaviors. NERO evaluation is consist of a task-agnostic interactive interface and a set of visualizations, called NERO plots, which reveals the equivariance property of the model. Case studies on how NERO evaluation can be applied to multiple research areas, including 2D digit recognition, object detection, particle image velocimetry (PIV), and 3D point cloud classification, demonstrate that NERO evaluation can quickly illustrate different model equivariance, and effectively explain model behaviors through interactive visualizations of the model outputs. In addition, we propose consensus, an alternative to ground truths, to be used in NERO evaluation so that model equivariance can still be evaluated with new, unlabeled datasets.

  • 5 authors
·
May 31, 2023

The Hubble Legacy Fields (HLF-GOODS-S) v1.5 Data Products: Combining 2442 Orbits of GOODS-S/CDF-S Region ACS and WFC3/IR Images

We have submitted to MAST the 1.5 version data release of the Hubble Legacy Fields (HLF) project covering a 25 x 25 arcmin area over the GOODS-S (ECDF-S) region from the HST archival program AR-13252. The release combines exposures from Hubble's two main cameras, the Advanced Camera for Surveys (ACS/WFC) and the Wide Field Camera 3 (WFC3/IR), taken over more than a decade between mid-2002 to the end of 2016. The HLF includes essentially all optical (ACS/WFC F435W, F606W, F775W, F814W and F850LP filters) and infrared (WFC3/ IR F098M, F105W, F125W, F140W and F160W filters) data taken by Hubble over the original CDF-S region including the GOODS-S, ERS, CANDELS and many other programs (31 in total). The data has been released at https://archive.stsci.edu/prepds/hlf/ as images with a common astrometric reference frame, with corresponding inverse variance weight maps. We provide one image per filter of WFC3/IR images at 60 mas per pixel resolution and two ACS/WFC images per filter, at both 30 and 60 mas per pixel. Since this comprehensive dataset combines data from 31 programs on the GOODS-S/CDF-S, the AR proposal identified the MAST products by the global name "Hubble Legacy Field", with this region being identified by "HLF-GOODS-S". This dataset complements that of the Frontier Fields program. The total incorporated in the HLF-GOODS-S is 5.8 Msec in 7211 exposures from 2442 orbits. This is ~70% of a HST full cycle!

  • 10 authors
·
Jun 2, 2016

Chemical abundances and kinematics of 257 G-, K-type field giants. Setting a base for further analysis of giant-planet properties orbiting evolved stars

We performed a uniform and detailed abundance analysis of 12 refractory elements (Na, Mg, Al, Si, Ca, Ti, Cr, Ni, Co, Sc, Mn, and V) for a sample of 257 G- and K-type evolved stars from the CORALIE planet search program. To date, only one of these stars is known to harbor a planetary companion. We aimed to characterize this large sample of evolved stars in terms of chemical abundances and kinematics, thus setting a solid base for further analysis of planetary properties around giant stars. This sample, being homogeneously analyzed, can be used as a comparison sample for other planet-related studies, as well as for different type of studies related to stellar and Galaxy astrophysics. The abundances of the chemical elements were determined using an LTE abundance analysis relative to the Sun, with the spectral synthesis code MOOG and a grid of Kurucz ATLAS9 atmospheres. To separate the Galactic stellar populations both a purely kinematical approach and a chemical method were applied. We confirm the overabundance of Na in giant stars compared to the field FGK dwarfs. This enhancement might have a stellar evolutionary character, but departures from LTE may also produce a similar enhancement. Our chemical separation of stellar populations also suggests a "gap" in metallicity between the thick-disk and high-alpha metal-rich stars, as previously observed in dwarfs sample from HARPS. The present sample, as most of the giant star samples, also suffers from the B - V colour cut-off, which excludes low-log g stars with high metallicities, and high-logg star with low-[Fe/H]. For future studies of planet occurrence dependence on stellar metallicity around these evolved stars we suggest to use a sub-sample of stars in a "cut-rectangle" in the logg - [Fe/H] diagram to overcome the aforementioned issue.

  • 12 authors
·
Mar 28, 2015

Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

Value functions are a central component of deep reinforcement learning (RL). These functions, parameterized by neural networks, are trained using a mean squared error regression objective to match bootstrapped target values. However, scaling value-based RL methods that use regression to large networks, such as high-capacity Transformers, has proven challenging. This difficulty is in stark contrast to supervised learning: by leveraging a cross-entropy classification loss, supervised methods have scaled reliably to massive networks. Observing this discrepancy, in this paper, we investigate whether the scalability of deep RL can also be improved simply by using classification in place of regression for training value functions. We demonstrate that value functions trained with categorical cross-entropy significantly improves performance and scalability in a variety of domains. These include: single-task RL on Atari 2600 games with SoftMoEs, multi-task RL on Atari with large-scale ResNets, robotic manipulation with Q-transformers, playing Chess without search, and a language-agent Wordle task with high-capacity Transformers, achieving state-of-the-art results on these domains. Through careful analysis, we show that the benefits of categorical cross-entropy primarily stem from its ability to mitigate issues inherent to value-based RL, such as noisy targets and non-stationarity. Overall, we argue that a simple shift to training value functions with categorical cross-entropy can yield substantial improvements in the scalability of deep RL at little-to-no cost.

  • 12 authors
·
Mar 6, 2024 1

Rapid patient-specific neural networks for intraoperative X-ray to volume registration

The integration of artificial intelligence in image-guided interventions holds transformative potential, promising to extract 3D geometric and quantitative information from conventional 2D imaging modalities during complex procedures. Achieving this requires the rapid and precise alignment of 2D intraoperative images (e.g., X-ray) with 3D preoperative volumes (e.g., CT, MRI). However, current 2D/3D registration methods fail across the broad spectrum of procedures dependent on X-ray guidance: traditional optimization techniques require custom parameter tuning for each subject, whereas neural networks trained on small datasets do not generalize to new patients or require labor-intensive manual annotations, increasing clinical burden and precluding application to new anatomical targets. To address these challenges, we present xvr, a fully automated framework for training patient-specific neural networks for 2D/3D registration. xvr uses physics-based simulation to generate abundant high-quality training data from a patient's own preoperative volumetric imaging, thereby overcoming the inherently limited ability of supervised models to generalize to new patients and procedures. Furthermore, xvr requires only 5 minutes of training per patient, making it suitable for emergency interventions as well as planned procedures. We perform the largest evaluation of a 2D/3D registration algorithm on real X-ray data to date and find that xvr robustly generalizes across a diverse dataset comprising multiple anatomical structures, imaging modalities, and hospitals. Across surgical tasks, xvr achieves submillimeter-accurate registration at intraoperative speeds, improving upon existing methods by an order of magnitude. xvr is released as open-source software freely available at https://github.com/eigenvivek/xvr.

  • 8 authors
·
Mar 20, 2025