new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Apr 15

Indirect measurement of atomic magneto-optical rotation via Hilbert transform

The Kramers-Kronig relations are a pivotal foundation of linear optics and atomic physics, embedding a physical connection between the real and imaginary components of any causal response function. A mathematically equivalent, but simpler, approach instead utilises the Hilbert transform. In a previous study, the Hilbert transform was applied to absorption spectra in order to infer the sole refractive index of an atomic medium in the absence of an external magnetic field. The presence of a magnetic field causes the medium to become birefringent and dichroic, and therefore it is instead characterised by two refractive indices. In this study, we apply the same Hilbert transform technique to independently measure both refractive indices of a birefringent atomic medium, leading to an indirect measurement of atomic magneto-optical rotation. Key to this measurement is the insight that inputting specific light polarisations into an atomic medium induces absorption associated with only one of the refractive indices. We show this is true in two configurations, commonly referred to in literature as the Faraday and Voigt geometries, which differ by the magnetic field orientation with respect to the light wavevector. For both cases, we measure the two refractive indices independently for a Rb thermal vapour in a 0.6 T magnetic field, finding excellent agreement with theory. This study further emphasises the application of the Hilbert transform to the field of quantum and atomic optics in the linear regime.

  • 4 authors
·
Mar 1, 2024

Space-time tradeoffs of lenses and optics via higher category theory

Optics and lenses are abstract categorical gadgets that model systems with bidirectional data flow. In this paper we observe that the denotational definition of optics - identifying two optics as equivalent by observing their behaviour from the outside - is not suitable for operational, software oriented approaches where optics are not merely observed, but built with their internal setups in mind. We identify operational differences between denotationally isomorphic categories of cartesian optics and lenses: their different composition rule and corresponding space-time tradeoffs, positioning them at two opposite ends of a spectrum. With these motivations we lift the existing categorical constructions and their relationships to the 2-categorical level, showing that the relevant operational concerns become visible. We define the 2-category 2-Optic(C) whose 2-cells explicitly track optics' internal configuration. We show that the 1-category Optic(C) arises by locally quotienting out the connected components of this 2-category. We show that the embedding of lenses into cartesian optics gets weakened from a functor to an oplax functor whose oplaxator now detects the different composition rule. We determine the difficulties in showing this functor forms a part of an adjunction in any of the standard 2-categories. We establish a conjecture that the well-known isomorphism between cartesian lenses and optics arises out of the lax 2-adjunction between their double-categorical counterparts. In addition to presenting new research, this paper is also meant to be an accessible introduction to the topic.

  • 1 authors
·
Sep 19, 2022

Towards Robust and Generalizable Lensless Imaging with Modular Learned Reconstruction

Lensless cameras disregard the conventional design that imaging should mimic the human eye. This is done by replacing the lens with a thin mask, and moving image formation to the digital post-processing. State-of-the-art lensless imaging techniques use learned approaches that combine physical modeling and neural networks. However, these approaches make simplifying modeling assumptions for ease of calibration and computation. Moreover, the generalizability of learned approaches to lensless measurements of new masks has not been studied. To this end, we utilize a modular learned reconstruction in which a key component is a pre-processor prior to image recovery. We theoretically demonstrate the pre-processor's necessity for standard image recovery techniques (Wiener filtering and iterative algorithms), and through extensive experiments show its effectiveness for multiple lensless imaging approaches and across datasets of different mask types (amplitude and phase). We also perform the first generalization benchmark across mask types to evaluate how well reconstructions trained with one system generalize to others. Our modular reconstruction enables us to use pre-trained components and transfer learning on new systems to cut down weeks of tedious measurements and training. As part of our work, we open-source four datasets, and software for measuring datasets and for training our modular reconstruction.

  • 3 authors
·
Feb 3, 2025

OPTIAGENT: A Physics-Driven Agentic Framework for Automated Optical Design

Optical design is the process of configuring optical elements to precisely manipulate light for high-fidelity imaging. It is inherently a highly non-convex optimization problem that relies heavily on human heuristic expertise and domain-specific knowledge. While Large Language Models (LLMs) possess extensive optical knowledge, their capabilities in leveraging the knowledge in designing lens system remain significantly constrained. This work represents the first attempt to employ LLMs in the field of optical design. We bridge the expertise gap by enabling users without formal optical training to successfully develop functional lens systems. Concretely, we curate a comprehensive dataset, named OptiDesignQA, which encompasses both classical lens systems sourced from standard optical textbooks and novel configurations generated by automated design algorithms for training and evaluation. Furthermore, we inject domain-specific optical expertise into the LLM through a hybrid objective of full-system synthesis and lens completion. To align the model with optical principles, we employ Group Relative Policy Optimization Done Right (DrGRPO) guided by Optical Lexicographic Reward for physics-driven policy alignment. This reward system incorporates structural format rewards, physical feasibility rewards, light-manipulation accuracy, and LLM-based heuristics. Finally, our model integrates with specialized optical optimization routines for end-to-end fine-tuning and precision refinement. We benchmark our proposed method against both traditional optimization-based automated design algorithms and LLM counterparts, and experimental results show the superiority of our method.

  • 9 authors
·
Feb 27

Physics-Informed Graph Neural Network with Frequency-Aware Learning for Optical Aberration Correction

Optical aberrations significantly degrade image quality in microscopy, particularly when imaging deeper into samples. These aberrations arise from distortions in the optical wavefront and can be mathematically represented using Zernike polynomials. Existing methods often address only mild aberrations on limited sample types and modalities, typically treating the problem as a black-box mapping without leveraging the underlying optical physics of wavefront distortions. We propose ZRNet, a physics-informed framework that jointly performs Zernike coefficient prediction and optical image Restoration. We contribute a Zernike Graph module that explicitly models physical relationships between Zernike polynomials based on their azimuthal degrees-ensuring that learned corrections align with fundamental optical principles. To further enforce physical consistency between image restoration and Zernike prediction, we introduce a Frequency-Aware Alignment (FAA) loss, which better aligns Zernike coefficient prediction and image features in the Fourier domain. Extensive experiments on CytoImageNet demonstrates that our approach achieves state-of-the-art performance in both image restoration and Zernike coefficient prediction across diverse microscopy modalities and biological samples with complex, large-amplitude aberrations. Code is available at https://github.com/janetkok/ZRNet.

  • 7 authors
·
Dec 5, 2025

LensNet: An End-to-End Learning Framework for Empirical Point Spread Function Modeling and Lensless Imaging Reconstruction

Lensless imaging stands out as a promising alternative to conventional lens-based systems, particularly in scenarios demanding ultracompact form factors and cost-effective architectures. However, such systems are fundamentally governed by the Point Spread Function (PSF), which dictates how a point source contributes to the final captured signal. Traditional lensless techniques often require explicit calibrations and extensive pre-processing, relying on static or approximate PSF models. These rigid strategies can result in limited adaptability to real-world challenges, including noise, system imperfections, and dynamic scene variations, thus impeding high-fidelity reconstruction. In this paper, we propose LensNet, an end-to-end deep learning framework that integrates spatial-domain and frequency-domain representations in a unified pipeline. Central to our approach is a learnable Coded Mask Simulator (CMS) that enables dynamic, data-driven estimation of the PSF during training, effectively mitigating the shortcomings of fixed or sparsely calibrated kernels. By embedding a Wiener filtering component, LensNet refines global structure and restores fine-scale details, thus alleviating the dependency on multiple handcrafted pre-processing steps. Extensive experiments demonstrate LensNet's robust performance and superior reconstruction quality compared to state-of-the-art methods, particularly in preserving high-frequency details and attenuating noise. The proposed framework establishes a novel convergence between physics-based modeling and data-driven learning, paving the way for more accurate, flexible, and practical lensless imaging solutions for applications ranging from miniature sensors to medical diagnostics. The link of code is https://github.com/baijiesong/Lensnet.

  • 6 authors
·
May 3, 2025

MetaFormer: High-fidelity Metalens Imaging via Aberration Correcting Transformers

Metalens is an emerging optical system with an irreplaceable merit in that it can be manufactured in ultra-thin and compact sizes, which shows great promise of various applications such as medical imaging and augmented/virtual reality (AR/VR). Despite its advantage in miniaturization, its practicality is constrained by severe aberrations and distortions, which significantly degrade the image quality. Several previous arts have attempted to address different types of aberrations, yet most of them are mainly designed for the traditional bulky lens and not convincing enough to remedy harsh aberrations of the metalens. While there have existed aberration correction methods specifically for metalens, they still fall short of restoration quality. In this work, we propose MetaFormer, an aberration correction framework for metalens-captured images, harnessing Vision Transformers (ViT) that has shown remarkable restoration performance in diverse image restoration tasks. Specifically, we devise a Multiple Adaptive Filters Guidance (MAFG), where multiple Wiener filters enrich the degraded input images with various noise-detail balances, enhancing output restoration quality. In addition, we introduce a Spatial and Transposed self-Attention Fusion (STAF) module, which aggregates features from spatial self-attention and transposed self-attention modules to further ameliorate aberration correction. We conduct extensive experiments, including correcting aberrated images and videos, and clean 3D reconstruction from the degraded images. The proposed method outperforms the previous arts by a significant margin. We further fabricate a metalens and verify the practicality of MetaFormer by restoring the images captured with the manufactured metalens in the wild. Code and pre-trained models are available at https://benhenryl.github.io/MetaFormer

  • 10 authors
·
Dec 5, 2024

Interferometer response characterization algorithm for multi-aperture Fabry-Perot imaging spectrometers

In recent years, the demand for hyperspectral imaging devices has grown significantly, driven by their ability of capturing high-resolution spectral information. Among the several possible optical designs for acquiring hyperspectral images, there is a growing interest in interferometric spectral imaging systems based on division of aperture. These systems have the advantage of capturing snapshot acquisitions while maintaining a compact design. However, they require a careful calibration to operate properly. In this work, we present the interferometer response characterization algorithm (IRCA), a robust three-step procedure designed to characterize the transmittance response of multi-aperture imaging spectrometers based on the interferometry of Fabry-Perot. Additionally, we propose a formulation of the image formation model for such devices suitable to estimate the parameters of interest by considering the model under various regimes of finesse. The proposed algorithm processes the image output obtained from a set of monochromatic light sources and refines the results using nonlinear regression after an ad-hoc initialization. Through experimental analysis conducted on four different prototypes from the Image SPectrometer On Chip (ImSPOC) family, we validate the performance of our approach for characterization. The associated source code for this paper is available at https://github.com/danaroth83/irca.

  • 5 authors
·
Mar 24, 2023

MODEST: Multi-Optics Depth-of-Field Stereo Dataset

Reliable depth estimation under real optical conditions remains a core challenge for camera vision in systems such as autonomous robotics and augmented reality. Despite recent progress in depth estimation and depth-of-field rendering, research remains constrained by the lack of large-scale, high-fidelity, real stereo DSLR datasets, limiting real-world generalization and evaluation of models trained on synthetic data as shown extensively in literature. We present the first high-resolution (5472times3648px) stereo DSLR dataset with 18000 images, systematically varying focal length and aperture across complex real scenes and capturing the optical realism and complexity of professional camera systems. For 9 scenes with varying scene complexity, lighting and background, images are captured with two identical camera assemblies at 10 focal lengths (28-70mm) and 5 apertures (f/2.8-f/22), spanning 50 optical configurations in 2000 images per scene. This full-range optics coverage enables controlled analysis of geometric and optical effects for monocular and stereo depth estimation, shallow depth-of-field rendering, deblurring, 3D scene reconstruction and novel view synthesis. Each focal configuration has a dedicated calibration image set, supporting evaluation of classical and learning based methods for intrinsic and extrinsic calibration. The dataset features challenging visual elements such as multi-scale optical illusions, reflective surfaces, mirrors, transparent glass walls, fine-grained details, and natural / artificial ambient light variations. This work attempts to bridge the realism gap between synthetic training data and real camera optics, and demonstrates challenges with the current state-of-the-art monocular, stereo depth and depth-of-field methods. We release the dataset, calibration files, and evaluation code to support reproducible research on real-world optical generalization.

  • 5 authors
·
Nov 25, 2025

Searching for Materials with High Refractive Index and Wide Band Gap: A First-Principles High-Throughput Study

Materials combining both a high refractive index and a wide band gap are of great interest for optoelectronic and sensor applications. However, these two properties are typically described by an inverse correlation with high refractive index appearing in small gap materials and vice-versa. Here, we conduct a first-principles high-throughput study on more than 4000 semiconductors (with a special focus on oxides). Our data confirm the general inverse trend between refractive index and band gap but interesting outliers are also identified. The data are then analyzed through a simple model involving two main descriptors: the average optical gap and the effective frequency. The former can be determined directly from the electronic structure of the compounds, but the latter cannot. This calls for further analysis in order to obtain a predictive model. Nonetheless, it turns out that the negative effect of a large band gap on the refractive index can counterbalanced in two ways: (i) by limiting the difference between the direct band gap and the average optical gap which can be realized by a narrow distribution in energy of the optical transitions and (ii) by increasing the effective frequency which can be achieved through either a high number of transitions from the top of the valence band to the bottom of the conduction or a high average probability for these transitions. Focusing on oxides, we use our data to investigate how the chemistry influences this inverse relationship and rationalize why certain classes of materials would perform better. Our findings can be used to search for new compounds in many optical applications both in the linear and non-linear regime (waveguides, optical modulators, laser, frequency converter, etc.).

  • 6 authors
·
Sep 4, 2018

Learning Non-Local Spatial-Angular Correlation for Light Field Image Super-Resolution

Exploiting spatial-angular correlation is crucial to light field (LF) image super-resolution (SR), but is highly challenging due to its non-local property caused by the disparities among LF images. Although many deep neural networks (DNNs) have been developed for LF image SR and achieved continuously improved performance, existing methods cannot well leverage the long-range spatial-angular correlation and thus suffer a significant performance drop when handling scenes with large disparity variations. In this paper, we propose a simple yet effective method to learn the non-local spatial-angular correlation for LF image SR. In our method, we adopt the epipolar plane image (EPI) representation to project the 4D spatial-angular correlation onto multiple 2D EPI planes, and then develop a Transformer network with repetitive self-attention operations to learn the spatial-angular correlation by modeling the dependencies between each pair of EPI pixels. Our method can fully incorporate the information from all angular views while achieving a global receptive field along the epipolar line. We conduct extensive experiments with insightful visualizations to validate the effectiveness of our method. Comparative results on five public datasets show that our method not only achieves state-of-the-art SR performance, but also performs robust to disparity variations. Code is publicly available at https://github.com/ZhengyuLiang24/EPIT.

  • 6 authors
·
Feb 15, 2023

Learning Multiple-Scattering Solutions for Sphere-Tracing of Volumetric Subsurface Effects

Accurate subsurface scattering solutions require the integration of optical material properties along many complicated light paths. We present a method that learns a simple geometric approximation of random paths in a homogeneous volume of translucent material. The generated representation allows determining the absorption along the path as well as a direct lighting contribution, which is representative of all scattering events along the path. A sequence of conditional variational auto-encoders (CVAEs) is trained to model the statistical distribution of the photon paths inside a spherical region in presence of multiple scattering events. A first CVAE learns to sample the number of scattering events, occurring on a ray path inside the sphere, which effectively determines the probability of the ray being absorbed. Conditioned on this, a second model predicts the exit position and direction of the light particle. Finally, a third model generates a representative sample of photon position and direction along the path, which is used to approximate the contribution of direct illumination due to in-scattering. To accelerate the tracing of the light path through the volumetric medium toward the solid boundary, we employ a sphere-tracing strategy that considers the light absorption and is able to perform statistically accurate next-event estimation. We demonstrate efficient learning using shallow networks of only three layers and no more than 16 nodes. In combination with a GPU shader that evaluates the CVAEs' predictions, performance gains can be demonstrated for a variety of different scenarios. A quality evaluation analyzes the approximation error that is introduced by the data-driven scattering simulation and sheds light on the major sources of error in the accelerated path tracing process.

  • 3 authors
·
Nov 5, 2020

GOBench: Benchmarking Geometric Optics Generation and Understanding of MLLMs

The rapid evolution of Multi-modality Large Language Models (MLLMs) is driving significant advancements in visual understanding and generation. Nevertheless, a comprehensive assessment of their capabilities, concerning the fine-grained physical principles especially in geometric optics, remains underexplored. To address this gap, we introduce GOBench, the first benchmark to systematically evaluate MLLMs' ability across two tasks: 1) Generating Optically Authentic Imagery and 2) Understanding Underlying Optical Phenomena. We curates high-quality prompts of geometric optical scenarios and use MLLMs to construct GOBench-Gen-1k dataset.We then organize subjective experiments to assess the generated imagery based on Optical Authenticity, Aesthetic Quality, and Instruction Fidelity, revealing MLLMs' generation flaws that violate optical principles. For the understanding task, we apply crafted evaluation instructions to test optical understanding ability of eleven prominent MLLMs. The experimental results demonstrate that current models face significant challenges in both optical generation and understanding. The top-performing generative model, GPT-4o-Image, cannot perfectly complete all generation tasks, and the best-performing MLLM model, Gemini-2.5Pro, attains a mere 37.35\% accuracy in optical understanding. Database and codes are publicly available at https://github.com/aiben-ch/GOBench.

  • 9 authors
·
Jun 1, 2025

Polarization aberrations in next-generation Giant Segmented Mirror Telescopes (GSMTs). II. Influence of segment-to-segment coating variations on high-contrast imaging and polarimetry

Direct exo-Earth imaging is a key science goal for astronomy in the next decade. This ambitious task imposes a target contrast of ~10^-7 at wavelengths from I to J-band. In our prior study, we determined that polarization aberrations can limit the achievable contrast to 10^-5 to 10^-6 in the infrared. However, these results assumed a perfect coronagraph coupled to a telescope with an ideal coating on each of the mirrors. In this study we seek to understand the influence of polarization aberrations from segment-to-segment coating variations on coronagraphy and polarimetry. We use the Poke open-source polarization ray tracing package to compute the Jones pupil of each GSMT with spatially-varying coatings applied to the segments. The influence of the resultant polarization aberrations is simulated by propagating the Jones pupil through physical optics models of coronagraphs using HCIPy. After applying wavefront control from an ideal adaptive optics system, we determine that the segment-to-segment variations applied limit the performance of coronagraphy to a raw contrast of approximately 10^-8 in I-band, which is 2-3 orders of magnitude lower the target performance for high-contrast imaging systems on the ground. This is a negligible addition to the nominal polarization aberrations for ground-based systems. We further observe negligible degradation in polarimetric imaging of debris disks from segment-to-segment aberrations above and beyond the impact of nominal polarization aberration.

  • 11 authors
·
Jan 7, 2025

Mamba-based Light Field Super-Resolution with Efficient Subspace Scanning

Transformer-based methods have demonstrated impressive performance in 4D light field (LF) super-resolution by effectively modeling long-range spatial-angular correlations, but their quadratic complexity hinders the efficient processing of high resolution 4D inputs, resulting in slow inference speed and high memory cost. As a compromise, most prior work adopts a patch-based strategy, which fails to leverage the full information from the entire input LFs. The recently proposed selective state-space model, Mamba, has gained popularity for its efficient long-range sequence modeling. In this paper, we propose a Mamba-based Light Field Super-Resolution method, named MLFSR, by designing an efficient subspace scanning strategy. Specifically, we tokenize 4D LFs into subspace sequences and conduct bi-directional scanning on each subspace. Based on our scanning strategy, we then design the Mamba-based Global Interaction (MGI) module to capture global information and the local Spatial- Angular Modulator (SAM) to complement local details. Additionally, we introduce a Transformer-to-Mamba (T2M) loss to further enhance overall performance. Extensive experiments on public benchmarks demonstrate that MLFSR surpasses CNN-based models and rivals Transformer-based methods in performance while maintaining higher efficiency. With quicker inference speed and reduced memory demand, MLFSR facilitates full-image processing of high-resolution 4D LFs with enhanced performance.

  • 3 authors
·
Jun 23, 2024

KiloNeuS: A Versatile Neural Implicit Surface Representation for Real-Time Rendering

NeRF-based techniques fit wide and deep multi-layer perceptrons (MLPs) to a continuous radiance field that can be rendered from any unseen viewpoint. However, the lack of surface and normals definition and high rendering times limit their usage in typical computer graphics applications. Such limitations have recently been overcome separately, but solving them together remains an open problem. We present KiloNeuS, a neural representation reconstructing an implicit surface represented as a signed distance function (SDF) from multi-view images and enabling real-time rendering by partitioning the space into thousands of tiny MLPs fast to inference. As we learn the implicit surface locally using independent models, resulting in a globally coherent geometry is non-trivial and needs to be addressed during training. We evaluate rendering performance on a GPU-accelerated ray-caster with in-shader neural network inference, resulting in an average of 46 FPS at high resolution, proving a satisfying tradeoff between storage costs and rendering quality. In fact, our evaluation for rendering quality and surface recovery shows that KiloNeuS outperforms its single-MLP counterpart. Finally, to exhibit the versatility of KiloNeuS, we integrate it into an interactive path-tracer taking full advantage of its surface normals. We consider our work a crucial first step toward real-time rendering of implicit neural representations under global illumination.

  • 5 authors
·
Jun 22, 2022

Wave optics lensing of gravitational waves: theory and phenomenology of triple systems in the LISA band

We study lensing of gravitational waves by a black hole in the deep wave optics regime, i.e. when the wavelength is much larger than the black hole Schwarzschild radius. We apply it to triple systems, with a binary of stellar mass objects in the inspiraling phase orbiting around a central massive black hole. We describe the full polarisation structure of the wave and derive predictions for the polarisation modes of the scattered wave measured by the observer. We show that lensing in the wave optics regime is not helicity preserving, as opposed to lensing in the geometric optics regime. The amplitude of the total wave is modulated due to interference between the directly transmitted and lensed components. The relative amplitude of the modulation is fixed by the lensing geometry and can reach unity in the most favourable settings. This indicates that wave optics lensing is potentially detectable by LISA for sufficiently high SNR systems. Our findings show that in the wave optics regime it is necessary to go beyond the usual lensing description where the amplification factor is assumed to be the same for both helicity modes. While motivated by GW190521 and the AGN formation scenario, our results apply more broadly to stellar-mass binaries orbiting a third body described as a Schwarzschild black hole, with a period comparable to the GW observation time.

  • 4 authors
·
Apr 10, 2024

Efficient and tunable narrowband second-harmonic generation by a large-area etchless lithium niobate metasurface

Optical resonances in nanostructures enable strong enhancement of nonlinear processes at the nanoscale, such as second-harmonic generation (SHG), with high-Q modes providing intensified light--matter interactions and sharp spectral selectivity for applications in filtering, sensing, and nonlinear spectroscopy. Thanks to the recent advances in thin-film lithium niobate (TFLN) technology, these key features can be now translated to lithium niobate for realizing novel nanoscale nonlinear optical platforms. Here, we demonstrate a large-area metasurface, realized by scalable nanoimprint lithography, comprising a slanted titanium dioxide (TiO_2) nanograting on etchless TFLN for efficient narrowband SHG. This is enabled by the optimal coupling of quasi-bound state in the continuum (q-BIC) modes with a narrowband pulsed laser pump. The demonstrated normalized SHG efficiency is 0.15%,cm^2/GW, which is among the largest reported for LN metasurfaces. The low pump peak intensity (3.64~kW/cm^2) employed, which enables SHG even by continuous-wave pumping, allows envisioning integrated and portable photonic applications. SHG wavelength tuning from 870 to 920~nm with stable output power as well as polarization control is also achieved by off-normal pump illumination. This versatile platform opens new opportunities for sensing, THz generation and detection, and ultrafast electro-optic modulation of nonlinear optical signals.

  • 11 authors
·
Jan 31

waveOrder: generalist framework for label-agnostic computational microscopy

Correlative computational microscopy is accelerating the mapping of dynamic biological systems by integrating morphological and molecular measurements across spatial scales, from organelles to entire organisms. Visualization, measurement, and prediction of interactions among the components of biological systems can be accelerated by generalist computational imaging frameworks that relax the trade-offs imposed by multiplex dynamic imaging. This work reports a generalist framework for wave optical imaging of the architectural order (waveOrder) among biomolecules for encoding and decoding multiple specimen properties from a minimal set of acquired channels, with or without fluorescent labels. waveOrder expresses material properties in terms of elegant physically motivated basis vectors directly interpretable as phase, absorption, birefringence, diattenuation, and fluorophore density; and it expresses image data in terms of directly measurable Stokes parameters. We report a corresponding multi-channel reconstruction algorithm to recover specimen properties in multiple contrast modes. With this framework, we implement multiple 3D computational microscopy methods, including quantitative phase imaging, quantitative label-free imaging with phase and polarization, and fluorescence deconvolution imaging, across scales ranging from organelles to whole zebrafish. These advances are available via an extensible open-source computational imaging library, waveOrder, and a napari plugin, recOrder.

  • 9 authors
·
Dec 12, 2024

Self-Calibration and Bilinear Inverse Problems via Linear Least Squares

Whenever we use devices to take measurements, calibration is indispensable. While the purpose of calibration is to reduce bias and uncertainty in the measurements, it can be quite difficult, expensive, and sometimes even impossible to implement. We study a challenging problem called self-calibration, i.e., the task of designing an algorithm for devices so that the algorithm is able to perform calibration automatically. More precisely, we consider the setup y = A(d) x + epsilon where only partial information about the sensing matrix A(d) is known and where A(d) linearly depends on d. The goal is to estimate the calibration parameter d (resolve the uncertainty in the sensing process) and the signal/object of interests x simultaneously. For three different models of practical relevance, we show how such a bilinear inverse problem, including blind deconvolution as an important example, can be solved via a simple linear least squares approach. As a consequence, the proposed algorithms are numerically extremely efficient, thus potentially allowing for real-time deployment. We also present a variation of the least squares approach, which leads to a~spectral method, where the solution to the bilinear inverse problem can be found by computing the singular vector associated with the smallest singular value of a certain matrix derived from the bilinear system. Explicit theoretical guarantees and stability theory are derived for both techniques; and the number of sampling complexity is nearly optimal (up to a poly-log factor). Applications in imaging sciences and signal processing are discussed and numerical simulations are presented to demonstrate the effectiveness and efficiency of our approach.

  • 2 authors
·
Nov 13, 2016

Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields

3D scene representations have gained immense popularity in recent years. Methods that use Neural Radiance fields are versatile for traditional tasks such as novel view synthesis. In recent times, some work has emerged that aims to extend the functionality of NeRF beyond view synthesis, for semantically aware tasks such as editing and segmentation using 3D feature field distillation from 2D foundation models. However, these methods have two major limitations: (a) they are limited by the rendering speed of NeRF pipelines, and (b) implicitly represented feature fields suffer from continuity artifacts reducing feature quality. Recently, 3D Gaussian Splatting has shown state-of-the-art performance on real-time radiance field rendering. In this work, we go one step further: in addition to radiance field rendering, we enable 3D Gaussian splatting on arbitrary-dimension semantic features via 2D foundation model distillation. This translation is not straightforward: naively incorporating feature fields in the 3DGS framework leads to warp-level divergence. We propose architectural and training changes to efficiently avert this problem. Our proposed method is general, and our experiments showcase novel view semantic segmentation, language-guided editing and segment anything through learning feature fields from state-of-the-art 2D foundation models such as SAM and CLIP-LSeg. Across experiments, our distillation method is able to provide comparable or better results, while being significantly faster to both train and render. Additionally, to the best of our knowledge, we are the first method to enable point and bounding-box prompting for radiance field manipulation, by leveraging the SAM model. Project website at: https://feature-3dgs.github.io/

  • 10 authors
·
Dec 5, 2023

TempoRL: laser pulse temporal shape optimization with Deep Reinforcement Learning

High Power Laser's (HPL) optimal performance is essential for the success of a wide variety of experimental tasks related to light-matter interactions. Traditionally, HPL parameters are optimised in an automated fashion relying on black-box numerical methods. However, these can be demanding in terms of computational resources and usually disregard transient and complex dynamics. Model-free Deep Reinforcement Learning (DRL) offers a promising alternative framework for optimising HPL performance since it allows to tune the control parameters as a function of system states subject to nonlinear temporal dynamics without requiring an explicit dynamics model of those. Furthermore, DRL aims to find an optimal control policy rather than a static parameter configuration, particularly suitable for dynamic processes involving sequential decision-making. This is particularly relevant as laser systems are typically characterised by dynamic rather than static traits. Hence the need for a strategy to choose the control applied based on the current context instead of one single optimal control configuration. This paper investigates the potential of DRL in improving the efficiency and safety of HPL control systems. We apply this technique to optimise the temporal profile of laser pulses in the L1 pump laser hosted at the ELI Beamlines facility. We show how to adapt DRL to the setting of spectral phase control by solely tuning dispersion coefficients of the spectral phase and reaching pulses similar to transform limited with full-width at half-maximum (FWHM) of ca1.6 ps.

  • 5 authors
·
Apr 20, 2023

One is All: Bridging the Gap Between Neural Radiance Fields Architectures with Progressive Volume Distillation

Neural Radiance Fields (NeRF) methods have proved effective as compact, high-quality and versatile representations for 3D scenes, and enable downstream tasks such as editing, retrieval, navigation, etc. Various neural architectures are vying for the core structure of NeRF, including the plain Multi-Layer Perceptron (MLP), sparse tensors, low-rank tensors, hashtables and their compositions. Each of these representations has its particular set of trade-offs. For example, the hashtable-based representations admit faster training and rendering but their lack of clear geometric meaning hampers downstream tasks like spatial-relation-aware editing. In this paper, we propose Progressive Volume Distillation (PVD), a systematic distillation method that allows any-to-any conversions between different architectures, including MLP, sparse or low-rank tensors, hashtables and their compositions. PVD consequently empowers downstream applications to optimally adapt the neural representations for the task at hand in a post hoc fashion. The conversions are fast, as distillation is progressively performed on different levels of volume representations, from shallower to deeper. We also employ special treatment of density to deal with its specific numerical instability problem. Empirical evidence is presented to validate our method on the NeRF-Synthetic, LLFF and TanksAndTemples datasets. For example, with PVD, an MLP-based NeRF model can be distilled from a hashtable-based Instant-NGP model at a 10X~20X faster speed than being trained the original NeRF from scratch, while achieving a superior level of synthesis quality. Code is available at https://github.com/megvii-research/AAAI2023-PVD.

  • 6 authors
·
Nov 29, 2022

NeRF-based Point Cloud Reconstruction using a Stationary Camera for Agricultural Applications

This paper presents a NeRF-based framework for point cloud (PCD) reconstruction, specifically designed for indoor high-throughput plant phenotyping facilities. Traditional NeRF-based reconstruction methods require cameras to move around stationary objects, but this approach is impractical for high-throughput environments where objects are rapidly imaged while moving on conveyors or rotating pedestals. To address this limitation, we develop a variant of NeRF-based PCD reconstruction that uses a single stationary camera to capture images as the object rotates on a pedestal. Our workflow comprises COLMAP-based pose estimation, a straightforward pose transformation to simulate camera movement, and subsequent standard NeRF training. A defined Region of Interest (ROI) excludes irrelevant scene data, enabling the generation of high-resolution point clouds (10M points). Experimental results demonstrate excellent reconstruction fidelity, with precision-recall analyses yielding an F-score close to 100.00 across all evaluated plant objects. Although pose estimation remains computationally intensive with a stationary camera setup, overall training and reconstruction times are competitive, validating the method's feasibility for practical high-throughput indoor phenotyping applications. Our findings indicate that high-quality NeRF-based 3D reconstructions are achievable using a stationary camera, eliminating the need for complex camera motion or costly imaging equipment. This approach is especially beneficial when employing expensive and delicate instruments, such as hyperspectral cameras, for 3D plant phenotyping. Future work will focus on optimizing pose estimation techniques and further streamlining the methodology to facilitate seamless integration into automated, high-throughput 3D phenotyping pipelines.

  • 7 authors
·
Mar 27, 2025

3D Convex Splatting: Radiance Field Rendering with 3D Smooth Convexes

Recent advances in radiance field reconstruction, such as 3D Gaussian Splatting (3DGS), have achieved high-quality novel view synthesis and fast rendering by representing scenes with compositions of Gaussian primitives. However, 3D Gaussians present several limitations for scene reconstruction. Accurately capturing hard edges is challenging without significantly increasing the number of Gaussians, creating a large memory footprint. Moreover, they struggle to represent flat surfaces, as they are diffused in space. Without hand-crafted regularizers, they tend to disperse irregularly around the actual surface. To circumvent these issues, we introduce a novel method, named 3D Convex Splatting (3DCS), which leverages 3D smooth convexes as primitives for modeling geometrically-meaningful radiance fields from multi-view images. Smooth convex shapes offer greater flexibility than Gaussians, allowing for a better representation of 3D scenes with hard edges and dense volumes using fewer primitives. Powered by our efficient CUDA-based rasterizer, 3DCS achieves superior performance over 3DGS on benchmarks such as Mip-NeRF360, Tanks and Temples, and Deep Blending. Specifically, our method attains an improvement of up to 0.81 in PSNR and 0.026 in LPIPS compared to 3DGS while maintaining high rendering speeds and reducing the number of required primitives. Our results highlight the potential of 3D Convex Splatting to become the new standard for high-quality scene reconstruction and novel view synthesis. Project page: convexsplatting.github.io.

  • 9 authors
·
Nov 22, 2024 5

Bilateral Guided Radiance Field Processing

Neural Radiance Fields (NeRF) achieves unprecedented performance in synthesizing novel view synthesis, utilizing multi-view consistency. When capturing multiple inputs, image signal processing (ISP) in modern cameras will independently enhance them, including exposure adjustment, color correction, local tone mapping, etc. While these processings greatly improve image quality, they often break the multi-view consistency assumption, leading to "floaters" in the reconstructed radiance fields. To address this concern without compromising visual aesthetics, we aim to first disentangle the enhancement by ISP at the NeRF training stage and re-apply user-desired enhancements to the reconstructed radiance fields at the finishing stage. Furthermore, to make the re-applied enhancements consistent between novel views, we need to perform imaging signal processing in 3D space (i.e. "3D ISP"). For this goal, we adopt the bilateral grid, a locally-affine model, as a generalized representation of ISP processing. Specifically, we optimize per-view 3D bilateral grids with radiance fields to approximate the effects of camera pipelines for each input view. To achieve user-adjustable 3D finishing, we propose to learn a low-rank 4D bilateral grid from a given single view edit, lifting photo enhancements to the whole 3D scene. We demonstrate our approach can boost the visual quality of novel view synthesis by effectively removing floaters and performing enhancements from user retouching. The source code and our data are available at: https://bilarfpro.github.io.

  • 4 authors
·
Jun 1, 2024

Dynamic Mesh-Aware Radiance Fields

Embedding polygonal mesh assets within photorealistic Neural Radience Fields (NeRF) volumes, such that they can be rendered and their dynamics simulated in a physically consistent manner with the NeRF, is under-explored from the system perspective of integrating NeRF into the traditional graphics pipeline. This paper designs a two-way coupling between mesh and NeRF during rendering and simulation. We first review the light transport equations for both mesh and NeRF, then distill them into an efficient algorithm for updating radiance and throughput along a cast ray with an arbitrary number of bounces. To resolve the discrepancy between the linear color space that the path tracer assumes and the sRGB color space that standard NeRF uses, we train NeRF with High Dynamic Range (HDR) images. We also present a strategy to estimate light sources and cast shadows on the NeRF. Finally, we consider how the hybrid surface-volumetric formulation can be efficiently integrated with a high-performance physics simulator that supports cloth, rigid and soft bodies. The full rendering and simulation system can be run on a GPU at interactive rates. We show that a hybrid system approach outperforms alternatives in visual realism for mesh insertion, because it allows realistic light transport from volumetric NeRF media onto surfaces, which affects the appearance of reflective/refractive surfaces and illumination of diffuse surfaces informed by the dynamic scene.

  • 6 authors
·
Sep 8, 2023

EvidenceMoE: A Physics-Guided Mixture-of-Experts with Evidential Critics for Advancing Fluorescence Light Detection and Ranging in Scattering Media

Fluorescence LiDAR (FLiDAR), a Light Detection and Ranging (LiDAR) technology employed for distance and depth estimation across medical, automotive, and other fields, encounters significant computational challenges in scattering media. The complex nature of the acquired FLiDAR signal, particularly in such environments, makes isolating photon time-of-flight (related to target depth) and intrinsic fluorescence lifetime exceptionally difficult, thus limiting the effectiveness of current analytical and computational methodologies. To overcome this limitation, we present a Physics-Guided Mixture-of-Experts (MoE) framework tailored for specialized modeling of diverse temporal components. In contrast to the conventional MoE approaches our expert models are informed by underlying physics, such as the radiative transport equation governing photon propagation in scattering media. Central to our approach is EvidenceMoE, which integrates Evidence-Based Dirichlet Critics (EDCs). These critic models assess the reliability of each expert's output by providing per-expert quality scores and corrective feedback. A Decider Network then leverages this information to fuse expert predictions into a robust final estimate adaptively. We validate our method using realistically simulated Fluorescence LiDAR (FLiDAR) data for non-invasive cancer cell depth detection generated from photon transport models in tissue. Our framework demonstrates strong performance, achieving a normalized root mean squared error (NRMSE) of 0.030 for depth estimation and 0.074 for fluorescence lifetime.

  • 9 authors
·
May 23, 2025

WaveNeRF: Wavelet-based Generalizable Neural Radiance Fields

Neural Radiance Field (NeRF) has shown impressive performance in novel view synthesis via implicit scene representation. However, it usually suffers from poor scalability as requiring densely sampled images for each new scene. Several studies have attempted to mitigate this problem by integrating Multi-View Stereo (MVS) technique into NeRF while they still entail a cumbersome fine-tuning process for new scenes. Notably, the rendering quality will drop severely without this fine-tuning process and the errors mainly appear around the high-frequency features. In the light of this observation, we design WaveNeRF, which integrates wavelet frequency decomposition into MVS and NeRF to achieve generalizable yet high-quality synthesis without any per-scene optimization. To preserve high-frequency information when generating 3D feature volumes, WaveNeRF builds Multi-View Stereo in the Wavelet domain by integrating the discrete wavelet transform into the classical cascade MVS, which disentangles high-frequency information explicitly. With that, disentangled frequency features can be injected into classic NeRF via a novel hybrid neural renderer to yield faithful high-frequency details, and an intuitive frequency-guided sampling strategy can be designed to suppress artifacts around high-frequency regions. Extensive experiments over three widely studied benchmarks show that WaveNeRF achieves superior generalizable radiance field modeling when only given three images as input.

  • 8 authors
·
Aug 9, 2023

MuS-Polar3D: A Benchmark Dataset for Computational Polarimetric 3D Imaging under Multi-Scattering Conditions

Polarization-based underwater 3D imaging exploits polarization cues to suppress background scattering, exhibiting distinct advantages in turbid water. Although data-driven polarization-based underwater 3D reconstruction methods show great potential, existing public datasets lack sufficient diversity in scattering and observation conditions, hindering fair comparisons among different approaches, including single-view and multi-view polarization imaging methods. To address this limitation, we construct MuS-Polar3D, a benchmark dataset comprising polarization images of 42 objects captured under seven quantitatively controlled scattering conditions and five viewpoints, together with high-precision 3D models (+/- 0.05 mm accuracy), normal maps, and foreground masks. The dataset supports multiple vision tasks, including normal estimation, object segmentation, descattering, and 3D reconstruction. Inspired by computational imaging, we further decouple underwater 3D reconstruction under scattering into a two-stage pipeline, namely descattering followed by 3D reconstruction, from an imaging-chain perspective. Extensive evaluations using multiple baseline methods under complex scattering conditions demonstrate the effectiveness of the proposed benchmark, achieving a best mean angular error of 15.49 degrees. To the best of our knowledge, MuS-Polar3D is the first publicly available benchmark dataset for quantitative turbidity underwater polarization-based 3D imaging, enabling accurate reconstruction and fair algorithm evaluation under controllable scattering conditions. The dataset and code are publicly available at https://github.com/WangPuyun/MuS-Polar3D.

  • 4 authors
·
Dec 25, 2025