new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Apr 15

APNet: An All-Frame-Level Neural Vocoder Incorporating Direct Prediction of Amplitude and Phase Spectra

This paper presents a novel neural vocoder named APNet which reconstructs speech waveforms from acoustic features by predicting amplitude and phase spectra directly. The APNet vocoder is composed of an amplitude spectrum predictor (ASP) and a phase spectrum predictor (PSP). The ASP is a residual convolution network which predicts frame-level log amplitude spectra from acoustic features. The PSP also adopts a residual convolution network using acoustic features as input, then passes the output of this network through two parallel linear convolution layers respectively, and finally integrates into a phase calculation formula to estimate frame-level phase spectra. Finally, the outputs of ASP and PSP are combined to reconstruct speech waveforms by inverse short-time Fourier transform (ISTFT). All operations of the ASP and PSP are performed at the frame level. We train the ASP and PSP jointly and define multilevel loss functions based on amplitude mean square error, phase anti-wrapping error, short-time spectral inconsistency error and time domain reconstruction error. Experimental results show that our proposed APNet vocoder achieves an approximately 8x faster inference speed than HiFi-GAN v1 on a CPU due to the all-frame-level operations, while its synthesized speech quality is comparable to HiFi-GAN v1. The synthesized speech quality of the APNet vocoder is also better than that of several equally efficient models. Ablation experiments also confirm that the proposed parallel phase estimation architecture is essential to phase modeling and the proposed loss functions are helpful for improving the synthesized speech quality.

  • 2 authors
·
May 13, 2023

Towards High-Quality and Efficient Speech Bandwidth Extension with Parallel Amplitude and Phase Prediction

Speech bandwidth extension (BWE) refers to widening the frequency bandwidth range of speech signals, enhancing the speech quality towards brighter and fuller. This paper proposes a generative adversarial network (GAN) based BWE model with parallel prediction of Amplitude and Phase spectra, named AP-BWE, which achieves both high-quality and efficient wideband speech waveform generation. The proposed AP-BWE generator is entirely based on convolutional neural networks (CNNs). It features a dual-stream architecture with mutual interaction, where the amplitude stream and the phase stream communicate with each other and respectively extend the high-frequency components from the input narrowband amplitude and phase spectra. To improve the naturalness of the extended speech signals, we employ a multi-period discriminator at the waveform level and design a pair of multi-resolution amplitude and phase discriminators at the spectral level, respectively. Experimental results demonstrate that our proposed AP-BWE achieves state-of-the-art performance in terms of speech quality for BWE tasks targeting sampling rates of both 16 kHz and 48 kHz. In terms of generation efficiency, due to the all-convolutional architecture and all-frame-level operations, the proposed AP-BWE can generate 48 kHz waveform samples 292.3 times faster than real-time on a single RTX 4090 GPU and 18.1 times faster than real-time on a single CPU. Notably, to our knowledge, AP-BWE is the first to achieve the direct extension of the high-frequency phase spectrum, which is beneficial for improving the effectiveness of existing BWE methods.

  • 4 authors
·
Jan 12, 2024

Frequency-Aware Deepfake Detection: Improving Generalizability through Frequency Space Learning

This research addresses the challenge of developing a universal deepfake detector that can effectively identify unseen deepfake images despite limited training data. Existing frequency-based paradigms have relied on frequency-level artifacts introduced during the up-sampling in GAN pipelines to detect forgeries. However, the rapid advancements in synthesis technology have led to specific artifacts for each generation model. Consequently, these detectors have exhibited a lack of proficiency in learning the frequency domain and tend to overfit to the artifacts present in the training data, leading to suboptimal performance on unseen sources. To address this issue, we introduce a novel frequency-aware approach called FreqNet, centered around frequency domain learning, specifically designed to enhance the generalizability of deepfake detectors. Our method forces the detector to continuously focus on high-frequency information, exploiting high-frequency representation of features across spatial and channel dimensions. Additionally, we incorporate a straightforward frequency domain learning module to learn source-agnostic features. It involves convolutional layers applied to both the phase spectrum and amplitude spectrum between the Fast Fourier Transform (FFT) and Inverse Fast Fourier Transform (iFFT). Extensive experimentation involving 17 GANs demonstrates the effectiveness of our proposed method, showcasing state-of-the-art performance (+9.8\%) while requiring fewer parameters. The code is available at {\cred https://github.com/chuangchuangtan/FreqNet-DeepfakeDetection}.

  • 6 authors
·
Mar 11, 2024

First Order Quantum Phase Transition in the Hybrid Metal-Mott Insulator Transition Metal Dichalcogenide 4Hb-TaS2

Coupling together distinct correlated and topologically non-trivial electronic phases of matter can potentially induce novel electronic orders and phase transitions among them. Transition metal dichalcogenide compounds serve as a bedrock for exploration of such hybrid systems. They host a variety of exotic electronic phases and their Van der Waals nature enables to admix them, either by exfoliation and stacking or by stoichiometric growth, and thereby induce novel correlated complexes. Here we investigate the compound 4Hb-TaS_2 that interleaves the Mott-insulating state of 1T-TaS_2 and the putative spin liquid it hosts together with the metallic state of 2H-TaS_2 and the low temperature superconducting phase it harbors. We reveal a thermodynamic phase diagram that hosts a first order quantum phase transition between a correlated Kondo cluster state and a flat band state in which the Kondo cluster becomes depleted. We demonstrate that this intrinsic transition can be induced by an electric field and temperature as well as by manipulation of the interlayer coupling with the probe tip, hence allowing to reversibly toggle between the Kondo cluster and the flat band states. The phase transition is manifested by a discontinuous change of the complete electronic spectrum accompanied by hysteresis and low frequency noise. We find that the shape of the transition line in the phase diagram is determined by the local compressibility and the entropy of the two electronic states. Our findings set such heterogeneous structures as an exciting platform for systematic investigation and manipulation of Mott-metal transitions and strongly correlated phases and quantum phase transitions therein.

  • 11 authors
·
Mar 2, 2023

Self Consistent Thermal Resummation: A Case Study of the Phase Transition in 2HDM

An accurate description of the scalar potential at finite temperature is crucial for studying cosmological first-order phase transitions (FOPT) in the early Universe. At finite temperatures, a precise treatment of thermal resummations is essential, as bosonic fields encounter significant infrared issues that can compromise standard perturbative approaches. The Partial Dressing (or the tadpole resummation) method provides a self consistent resummation of higher order corrections, allowing the computation of thermal masses and the effective potential including the proper Boltzmann suppression factors and without relying on any high-temperature approximation. We systematically compare the Partial dressing resummation scheme results with the Parwani and Arnold Espinosa (AE) ones to investigate the thermal phase transition dynamics in the Two-Higgs-Doublet Model (2HDM). Our findings reveal that different resummation prescriptions can significantly alter the nature of the phase transition within the same region of parameter space, confirming the differences that have already been noticed between the Parwani and AE schemes. Notably, the more refined resummation prescription, the Partial Dressing scheme, does not support symmetry non-restoration in 2HDM at high temperatures observed using the AE prescription. Furthermore, we quantify the uncertainties in the stochastic gravitational wave (GW) spectrum from an FOPT due to variations in resummation methods, illustrating their role in shaping theoretical predictions for upcoming GW experiments. Finally, we discuss the capability of the High-Luminosity LHC and proposed GW experiments to probe the FOEWPT-favored region of the parameter space.

  • 3 authors
·
Apr 2, 2025

SN 2023ixf in the Pinwheel Galaxy M101: From Shock Breakout to the Nebular Phase

We present photometric and spectroscopic observations of SN 2023ixf covering from day one to 442 days after explosion. SN 2023ixf reached a peak V-band absolute magnitude of -18.2 pm 0.07, and light curves show that it is in the fast-decliner (IIL) subclass with a relatively short ``plateau'' phase (fewer than sim 70 days). Early-time spectra of SN 2023ixf exhibit strong, very narrow emission lines from ionized circumstellar matter (CSM), possibly indicating a Type IIn classification. But these flash/shock-ionization emission features faded after the first week and the spectrum evolved in a manner similar to that of typical Type II SNe, unlike the case of most genuine SNe~IIn in which the ejecta interact with CSM for an extended period of time and develop intermediate-width emission lines. We compare observed spectra of SN 2023ixf with various model spectra to understand the physics behind SN 2023ixf. Our nebular spectra (between 200-400 d) match best with the model spectra from a 15 rm M_{odot} progenitor which experienced enhanced mass loss a few years before explosion. A last-stage mass-loss rate of M = 0.01 rm M_{odot} yr^{-1} from the r1w6 model matches best with the early-time spectra, higher than M approx 2.4 times 10^{-3} rm M_{odot} yr^{-1} derived from the ionized H{alpha} luminosity at 1.58 d. We also use SN 2023ixf as a distance indicator and fit the light curves to derive the Hubble constant by adding SN 2023ixf to the existing sample; we obtain H_{0}=73.1^{+3.68}_{-3.50} km s^{-1} Mpc^{-1}, consistent with the results from SNe~Ia and many other independent methods.

  • 42 authors
·
Mar 18, 2025

More than Carbon: Cradle-to-Grave environmental impacts of GenAI training on the Nvidia A100 GPU

The rapid expansion of AI has intensified concerns about its environmental sustainability. Yet, current assessments predominantly focus on operational carbon emissions using secondary data or estimated values, overlooking environmental impacts in other life cycle stages. This study presents the first comprehensive multi-criteria life cycle assessment (LCA) of AI training, examining 16 environmental impact categories based on detailed primary data collection of the Nvidia A100 SXM 40GB GPU. The LCA results for training BLOOM reveal that the use phase dominates 11 of 16 impact categories including climate change (96\%), while manufacturing dominates the remaining 5 impact categories including human toxicity, cancer (99\%) and mineral and metal depletion (85\%). For training GPT-4, the use phase dominates 10 of 16 impact categories, contributing about 96\% to both the climate change and resource use, fossils category. The manufacturing stage dominates 6 of 16 impact categories including human toxicity, cancer (94\%) and eutrophication, freshwater (81\%). Assessing the cradle-to-gate environmental impact distribution across the GPU components reveals that the GPU chip is the largest contributor across 10 of 16 of impact categories and shows particularly pronounced contributions to climate change (81\%) and resource use, fossils (80\%). While primary data collection results in modest changes in carbon estimates compared to database-derived estimates, substantial variations emerge in other categories. Most notably, minerals and metals depletion increases by 33\%, demonstrating the critical importance of primary data for non-carbon accounting. This multi-criteria analysis expands the Sustainable AI discourse beyond operational carbon emissions, challenging current sustainability narratives and highlighting the need for policy frameworks addressing the full spectrum of AI's environmental impact.

  • 8 authors
·
Aug 27, 2025

NeuroRVQ: Multi-Scale EEG Tokenization for Generative Large Brainwave Models

Electroencephalography (EEG) captures neural activity across multiple temporal and spectral scales, yielding signals that are rich but complex for representation learning. Recently, EEG foundation models trained to predict masked signal-tokens have shown promise for learning generalizable representations. However, their performance is hindered by their signal tokenization modules. Existing neural tokenizers fail to preserve high-frequency dynamics, limiting their ability to reconstruct EEG signals with high fidelity. We introduce NeuroRVQ, a scalable Large Brainwave Model (LBM) centered on a codebook-based tokenizer. Our tokenizer integrates: (i) multi-scale feature extraction modules that capture the full frequency neural spectrum; (ii) hierarchical residual vector quantization (RVQ) codebooks for high-resolution encoding; and, (iii) an EEG signal phase- and amplitude-aware loss function for efficient training. This design enables efficient EEG compression while supporting accurate reconstruction across all frequency bands, leading to robust generative masked modeling. Our empirical results demonstrate that NeuroRVQ achieves lower reconstruction error and outperforms existing LBMs on a variety of downstream tasks. More broadly, NeuroRVQ tokenizer establishes a strong prior for codebook-based general-purpose brainwave models, enabling advances in neural decoding, generative modeling and multimodal biosignal integration.

  • 7 authors
·
Oct 14, 2025

Accelerated Bayesian Inference for Pulsar Timing Arrays: Normalizing Flows for Rapid Model Comparison Across Stochastic Gravitational-Wave Background Sources

The recent detection of nanohertz stochastic gravitational-wave backgrounds (SGWBs) by pulsar timing arrays (PTAs) promises unique insights into astrophysical and cosmological origins. However, traditional Markov Chain Monte Carlo (MCMC) approaches become prohibitively expensive for large datasets. We employ a normalizing flow (NF)-based machine learning framework to accelerate Bayesian inference in PTA analyses. For the first time, we perform Bayesian model comparison across SGWB source models in the framework of machine learning by training NF architectures on the PTA dataset (NANOGrav 15-year) and enabling direct evidence estimation via learned harmonic mean estimators. Our examples include 10 conventional SGWB source models such as supermassive black hole binaries, power-law spectrum, cosmic strings, domain walls, scalar-induced GWs, first-order phase transitions, and dual scenario/inflationary gravitational wave. Our approach jointly infers 20 red noise parameters and 2 SGWB parameters per model in sim 20\,hours (including training), compared to sim 10\,days with MCMC. Critically, the NF method preserves rigorous model selection accuracy, with small Hellinger distances (lesssim 0.3) relative to MCMC posteriors, and reproduces MCMC-based Bayes factors across all tested scenarios. This scalable technique for SGWB source comparison will be essential for future PTA expansions and next-generation arrays such as the SKA, offering orders-of-magnitude efficiency gains without sacrificing physical interpretability.

  • 2 authors
·
Apr 5, 2025

Gravitational waves in massive gravity: Waveforms generated by a particle plunging into a black hole and the excitation of quasinormal modes and quasibound states

With the aim of testing massive gravity in the context of black hole physics, we investigate the gravitational radiation emitted by a massive particle plunging into a Schwarzschild black hole from slightly below the innermost stable circular orbit. To do so, we first construct the quasinormal and quasibound resonance spectra of the spin-2 massive field for odd and even parity. Then, we compute the waveforms produced by the plunging particle and study their spectral content. This allows us to highlight and interpret important phenomena in the plunge regime, including (i) the excitation of quasibound states, with particular emphasis on the amplification and slow decay of the post-ringdown phase of the even-parity dipolar mode due to harmonic resonance; (ii) during the adiabatic phase, the waveform emitted by the plunging particle is very well described by the waveform emitted by the particle living on the innermost stable circular orbit, and (iii) the regularized waveforms and their unregularized counterparts constructed from the quasinormal mode spectrum are in excellent agreement. Finally, we construct, for arbitrary directions of observation and, in particular, outside the orbital plane of the plunging particle, the regularized multipolar waveforms, i.e., the waveforms constructed by summing over partial waveforms.

  • 1 authors
·
Nov 25, 2024

Preliminary sonification of ENSO using traditional Javanese gamelan scales

Sonification -- the mapping of data to non-speech audio -- offers an underexplored channel for representing complex dynamical systems. We treat El Niño-Southern Oscillation (ENSO), a canonical example of low-dimensional climate chaos, as a test case for culturally-situated sonification evaluated through complex systems diagnostics. Using parameter-mapping sonification of the Niño 3.4 sea surface temperature anomaly index (1870--2024), we encode ENSO variability into two traditional Javanese gamelan pentatonic systems (pelog and slendro) across four composition strategies, then analyze the resulting audio as trajectories in a two-dimensional acoustic phase space. Recurrence-based diagnostics, convex hull geometry, and coupling analysis reveal that the sonification pipeline preserves key dynamical signatures: alternating modes produce the highest trajectory recurrence rates, echoing ENSO's quasi-periodicity; layered polyphonic modes explore the broadest phase space regions; and the two scale families induce qualitatively distinct coupling regimes between spectral brightness and energy -- predominantly anti-phase in pelog but near-independent in slendro. Phase space trajectory analysis provides a rigorous geometric framework for comparing sonification designs within a complex systems context. Perceptual validation remains necessary; we contribute the dynamical systems methodology for evaluating such mappings.

Global Rotation Equivariant Phase Modeling for Speech Enhancement with Deep Magnitude-Phase Interaction

While deep learning has advanced speech enhancement (SE), effective phase modeling remains challenging, as conventional networks typically operate within a flat Euclidean feature space, which is not easy to model the underlying circular topology of the phase. To address this, we propose a manifold-aware magnitude-phase dual-stream framework that aligns the phase stream with its intrinsic circular geometry by enforcing Global Rotation Equivariance (GRE) characteristic. Specifically, we introduce a Magnitude-Phase Interactive Convolutional Module (MPICM) for modulus-based information exchange and a Hybrid-Attention Dual-FFN (HADF) bottleneck for unified feature fusion, both of which are designed to preserve GRE in the phase stream. Comprehensive evaluations are conducted across phase retrieval, denoising, dereverberation, and bandwidth extension tasks to validate the superiority of the proposed method over multiple advanced baselines. Notably, the proposed architecture reduces Phase Distance by over 20\% in the phase retrieval task and improves PESQ by more than 0.1 in zero-shot cross-corpus denoising evaluations. The overall superiority is also established in universal SE tasks involving mixed distortions. Qualitative analysis further reveals that the learned phase features exhibit distinct periodic patterns, which are consistent with the intrinsic circular nature of the phase. The source code is available at https://github.com/wangchengzhong/RENet.

  • 4 authors
·
Feb 9

Dynamical phase diagram of synchronization in one dimension: universal behavior from Edwards-Wilkinson to random deposition through Kardar-Parisi-Zhang

Synchronization in one dimension displays generic scale invariance with universal properties previously observed in surface kinetic roughening and the wider context of the Kardar-Parisi-Zhang (KPZ) universality class. This has been established for phase oscillators and also for some limit-cycle oscillators, both in the presence of columnar (quenched) disorder and of time-dependent noise, by extensive numerical simulations, and has been analytically motivated by continuum approximations in the strong oscillator coupling limit. The robustness and the precise boundaries in parameter space for such critical behavior remain unclear, however, which may preclude further developments, including the extension of these results to higher dimensions and the experimental observation of nonequilibrium criticality in synchronizing (e.g.~electronic or chemical) oscillators. We here present complete numerical phase diagrams of one-dimensional synchronization, including saturation times and values, but, most importantly, also dynamical features giving insight into the gradual emergence of synchronous dynamics, based on systems of phase oscillators with either type of randomness. In the absence of synchronization, the dynamics evolves as expected for random deposition (for time-dependent noise) or linear growth (for columnar disorder), while a crossover from Edwards-Wilkinson to Kardar-Parisi-Zhang behavior (with the corresponding type of randomness) is observed as the randomness strength, or the nonoddity of the coupling among oscillators, is increased in the synchronous region -- their combined effect being partially captured by the so-called KPZ coupling. The distortion of scaling due to phase slips near the desynchronization boundary, a feature that is likely to play a role in experimental contexts, is also discussed.

  • 2 authors
·
Apr 6

PhaseNet: A Deep-Neural-Network-Based Seismic Arrival Time Picking Method

As the number of seismic sensors grows, it is becoming increasingly difficult for analysts to pick seismic phases manually and comprehensively, yet such efforts are fundamental to earthquake monitoring. Despite years of improvements in automatic phase picking, it is difficult to match the performance of experienced analysts. A more subtle issue is that different seismic analysts may pick phases differently, which can introduce bias into earthquake locations. We present a deep-neural-network-based arrival-time picking method called "PhaseNet" that picks the arrival times of both P and S waves. Deep neural networks have recently made rapid progress in feature learning, and with sufficient training, have achieved super-human performance in many applications. PhaseNet uses three-component seismic waveforms as input and generates probability distributions of P arrivals, S arrivals, and noise as output. We engineer PhaseNet such that peaks in probability provide accurate arrival times for both P and S waves, and have the potential to increase the number of S-wave observations dramatically over what is currently available. This will enable both improved locations and improved shear wave velocity models. PhaseNet is trained on the prodigious available data set provided by analyst-labeled P and S arrival times from the Northern California Earthquake Data Center. The dataset we use contains more than seven million waveform samples extracted from over thirty years of earthquake recordings. We demonstrate that PhaseNet achieves much higher picking accuracy and recall rate than existing methods.

  • 2 authors
·
Mar 8, 2018

HoloBeam: Learning Optimal Beamforming in Far-Field Holographic Metasurface Transceivers

Holographic Metasurface Transceivers (HMTs) are emerging as cost-effective substitutes to large antenna arrays for beamforming in Millimeter and TeraHertz wave communication. However, to achieve desired channel gains through beamforming in HMT, phase-shifts of a large number of elements need to be appropriately set, which is challenging. Also, these optimal phase-shifts depend on the location of the receivers, which could be unknown. In this work, we develop a learning algorithm using a {\it fixed-budget multi-armed bandit framework} to beamform and maximize received signal strength at the receiver for far-field regions. Our algorithm, named \Algo exploits the parametric form of channel gains of the beams, which can be expressed in terms of two {\it phase-shifting parameters}. Even after parameterization, the problem is still challenging as phase-shifting parameters take continuous values. To overcome this, {\it\HB} works with the discrete values of phase-shifting parameters and exploits their unimodal relations with channel gains to learn the optimal values faster. We upper bound the probability of {\it\HB} incorrectly identifying the (discrete) optimal phase-shift parameters in terms of the number of pilots used in learning. We show that this probability decays exponentially with the number of pilot signals. We demonstrate that {\it\HB} outperforms state-of-the-art algorithms through extensive simulations.

  • 3 authors
·
Dec 29, 2023

SPRMamba: Surgical Phase Recognition for Endoscopic Submucosal Dissection with Mamba

Endoscopic Submucosal Dissection (ESD) is a minimally invasive procedure initially developed for early gastric cancer treatment and has expanded to address diverse gastrointestinal lesions. While computer-assisted surgery (CAS) systems enhance ESD precision and safety, their efficacy hinges on accurate real-time surgical phase recognition, a task complicated by ESD's inherent complexity, including heterogeneous lesion characteristics and dynamic tissue interactions. Existing video-based phase recognition algorithms, constrained by inefficient temporal context modeling, exhibit limited performance in capturing fine-grained phase transitions and long-range dependencies. To overcome these limitations, we propose SPRMamba, a novel framework integrating a Mamba-based architecture with a Scaled Residual TranMamba (SRTM) block to synergize long-term temporal modeling and localized detail extraction. SPRMamba further introduces the Hierarchical Sampling Strategy to optimize computational efficiency, enabling real-time processing critical for clinical deployment. Evaluated on the ESD385 dataset and the cholecystectomy benchmark Cholec80, SPRMamba achieves state-of-the-art performance (87.64% accuracy on ESD385, +1.0% over prior methods), demonstrating robust generalizability across surgical workflows. This advancement bridges the gap between computational efficiency and temporal sensitivity, offering a transformative tool for intraoperative guidance and skill assessment in ESD surgery. The code is accessible at https://github.com/Zxnyyyyy/SPRMamba.

  • 8 authors
·
Sep 18, 2024

Model-agnostic search for the quasinormal modes of gravitational wave echoes

Post-merger gravitational wave echoes provide a unique opportunity to probe the near-horizon structure of astrophysical black holes, that may be modified due to non-perturbative quantum gravity phenomena. However, since the waveform is subject to large theoretical uncertainties, it is necessary to develop model-agnostic search methods for detecting echoes from observational data. A promising strategy is to identify the characteristic quasinormal modes (QNMs) associated with echoes, {\it in frequency space}, which complements existing searches of quasiperiodic pulses in time. In this study, we build upon our previous work targeting these modes by incorporating relative phase information to optimize the Bayesian search algorithm. Using a new phase-marginalized likelihood, the performance can be significantly improved for well-resolved QNMs. This enables an efficient model-agnostic search for QNMs of different shapes by using a simple search template. To demonstrate the robustness of the search algorithm, we construct four complementary benchmarks for the echo waveform that span a diverse range of different theoretical possibilities for the near-horizon structure. We then validate our Bayesian search algorithms by injecting the benchmark models into different realizations of Gaussian noise. Using two types of phase-marginalized likelihoods, we find that the search algorithm can efficiently detect the corresponding QNMs. Therefore, our search strategy provides a concrete Bayesian and model-agnostic approach to "quantum black hole seismology".

  • 4 authors
·
Aug 2, 2023

The nature of an imaginary quasi-periodic oscillation in the soft-to-hard transition of MAXI J1820+070

A recent study shows that if the power spectra (PS) of accreting compact objects consist of a combination of Lorentzian functions that are coherent in different energy bands but incoherent with each other, the same is true for the Real and Imaginary parts of the cross spectrum (CS). Using this idea, we discovered imaginary quasi-periodic oscillations (QPOs) in NICER observations of the black hole candidate MAXI J1820+070. The imaginary QPOs appear as narrow features with a small Real and large Imaginary part in the CS but are not significantly detected in the PS when they overlap in frequency with other variability components. The coherence function drops and the phase lags increase abruptly at the frequency of the imaginary QPO. We show that the multi-Lorentzian model that fits the PS and CS of the source in two energy bands correctly reproduces the lags and the coherence, and that the narrow drop of the coherence is caused by the interaction of the imaginary QPO with other variability components. The imaginary QPO appears only in the decay of the outburst, during the transition from the high-soft to the low-hard state of MAXI J1820+070, and its frequency decreases from approximately 5 Hz to around 1 Hz as the source spectrum hardens. We also analysed the earlier observations of the transition, where no narrow features were seen, and we identified a QPO in the PS that appears to evolve into the imaginary QPO as the source hardens. As for the type-B and C QPOs in this source, the rms spectrum of the imaginary QPO increases with energy. The lags of the imaginary QPO are similar to those of the type-B and C QPOs above 2 keV but differ from the lags of those other QPOs below that energy. While the properties of this imaginary QPO resemble those of type-C QPOs, we cannot rule out that it is a new type of QPO.

  • 5 authors
·
Feb 17, 2025

M^3-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation

Intelligent robots need to interact with diverse objects across various environments. The appearance and state of objects frequently undergo complex transformations depending on the object properties, e.g., phase transitions. However, in the vision community, segmenting dynamic objects with phase transitions is overlooked. In light of this, we introduce the concept of phase in segmentation, which categorizes real-world objects based on their visual characteristics and potential morphological and appearance changes. Then, we present a new benchmark, Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation (M^3-VOS), to verify the ability of models to understand object phases, which consists of 479 high-resolution videos spanning over 10 distinct everyday scenarios. It provides dense instance mask annotations that capture both object phases and their transitions. We evaluate state-of-the-art methods on M^3-VOS, yielding several key insights. Notably, current appearance-based approaches show significant room for improvement when handling objects with phase transitions. The inherent changes in disorder suggest that the predictive performance of the forward entropy-increasing process can be improved through a reverse entropy-reducing process. These findings lead us to propose ReVOS, a new plug-andplay model that improves its performance by reversal refinement. Our data and code will be publicly available at https://zixuan-chen.github.io/M-cube-VOS.github.io/.

  • 7 authors
·
Dec 18, 2024

Modelling & Steady State Compliance Testing of an Improved Time Synchronized Phasor Measurement Unit Based on IEEE Standard C37.118.1

Synchrophasor technology is an emerging and developing technology for monitoring and control of wide area measurement systems (WAMS). In an elementary WAMS, two identical phasors measured at two different locations have difference in the phase angles measured since their reference waveforms are not synchronized with each other. Phasor measurement units (PMUs) measure input phasors with respect to a common reference wave based on the atomic clock pulses received from global positioning system (GPS) satellites, eliminating variation in the measured phase angles due to distant locations of the measurement nodes. This has found tremendous applications in quick fault detection, fault location analysis, accurate current, voltage, frequency and phase angle measurements in WAMS. Commercially available PMU models are often proven to be expensive for research and development as well as for grid integration projects. This research article proposes an economic PMU model optimized for accurate steadystate performance based on recursive discrete Fourier transform (DFT) and provides results and detailed analysis of the proposed PMU model as per the steady state compliance specifications of IEEE standard C37.118.1. Results accurate up to 13 digits after decimal point are obtained through the developed PMU model for both nominal and off-nominal frequency inputs in steady state.

  • 1 authors
·
Apr 14, 2025

A Comprehensive Perturbative Formalism for Phase Mixing in Perturbed Disks. II. Phase Spirals in an Inhomogeneous Disk Galaxy with a Non-responsive Dark Matter Halo

We develop a linear perturbative formalism to compute the response of an inhomogeneous stellar disk embedded in a non-responsive dark matter halo to perturbations like bars, spiral arms and satellite galaxy encounters. Without self-gravity to reinforce it, the response of a Fourier mode phase mixes away due to an intrinsic spread in the vertical (Omega_z), radial (Omega_r) and azimuthal (Omega_phi) frequencies, giving rise to local phase-space spirals. Collisional diffusion due to scattering of stars by structures like giant molecular clouds causes super-exponential damping of the phase-spiral amplitude. The z-v_z phase-spiral is 1-armed (2-armed) for vertically anti-symmetric (symmetric) bending (breathing) modes. Only transient perturbations with timescales (tau_{P}) comparable to the vertical oscillation period (tau_z sim 1/Omega_z) trigger z-v_z phase-spirals. Each (n,l,m) mode of the response to impulsive (tau_{P}<tau=1/(nOmega_z+lOmega_r+mOmega_phi)) perturbations is power law (sim tau_{P}/tau) suppressed, but that to adiabatic (tau_{P}>tau) perturbations is exponentially weak (sim left[-left(tau_{mathrm{P}/tauright)^alpharight]}) except resonant (tauto infty) modes. Slower (tau_{P}>tau_z) perturbations, e.g., distant encounters with satellite galaxies, induce stronger bending modes. If the Gaia phase-spiral was triggered by a satellite, Sagittarius is the leading contender as it dominates the Solar neighborhood response of the Milky Way disk to satellite encounters. However, survival against collisional damping necessitates that the impact occurred within sim 0.6-0.7 Gyr ago. We discuss how the detailed galactic potential dictates the phase-spiral shape: phase mixing occurs slower and phase-spirals are less wound in the outer disk and in presence of an ambient halo.

  • 3 authors
·
Feb 28, 2023

I Can't Believe It's Not Real: CV-MuSeNet: Complex-Valued Multi-Signal Segmentation

The increasing congestion of the radio frequency spectrum presents challenges for efficient spectrum utilization. Cognitive radio systems enable dynamic spectrum access with the aid of recent innovations in neural networks. However, traditional real-valued neural networks (RVNNs) face difficulties in low signal-to-noise ratio (SNR) environments, as they were not specifically developed to capture essential wireless signal properties such as phase and amplitude. This work presents CMuSeNet, a complex-valued multi-signal segmentation network for wideband spectrum sensing, to address these limitations. Extensive hyperparameter analysis shows that a naive conversion of existing RVNNs into their complex-valued counterparts is ineffective. Built on complex-valued neural networks (CVNNs) with a residual architecture, CMuSeNet introduces a complexvalued Fourier spectrum focal loss (CFL) and a complex plane intersection over union (CIoU) similarity metric to enhance training performance. Extensive evaluations on synthetic, indoor overthe-air, and real-world datasets show that CMuSeNet achieves an average accuracy of 98.98%-99.90%, improving by up to 9.2 percentage points over its real-valued counterpart and consistently outperforms state of the art. Strikingly, CMuSeNet achieves the accuracy level of its RVNN counterpart in just two epochs, compared to the 27 epochs required for RVNN, while reducing training time by up to a 92.2% over the state of the art. The results highlight the effectiveness of complex-valued architectures in improving weak signal detection and training efficiency for spectrum sensing in challenging low-SNR environments. The dataset is available at: https://dx.doi.org/10.21227/hcc1-6p22

  • 2 authors
·
May 21, 2025

On the Mechanism and Dynamics of Modular Addition: Fourier Features, Lottery Ticket, and Grokking

We present a comprehensive analysis of how two-layer neural networks learn features to solve the modular addition task. Our work provides a full mechanistic interpretation of the learned model and a theoretical explanation of its training dynamics. While prior work has identified that individual neurons learn single-frequency Fourier features and phase alignment, it does not fully explain how these features combine into a global solution. We bridge this gap by formalizing a diversification condition that emerges during training when overparametrized, consisting of two parts: phase symmetry and frequency diversification. We prove that these properties allow the network to collectively approximate a flawed indicator function on the correct logic for the modular addition task. While individual neurons produce noisy signals, the phase symmetry enables a majority-voting scheme that cancels out noise, allowing the network to robustly identify the correct sum. Furthermore, we explain the emergence of these features under random initialization via a lottery ticket mechanism. Our gradient flow analysis proves that frequencies compete within each neuron, with the "winner" determined by its initial spectral magnitude and phase alignment. From a technical standpoint, we provide a rigorous characterization of the layer-wise phase coupling dynamics and formalize the competitive landscape using the ODE comparison lemma. Finally, we use these insights to demystify grokking, characterizing it as a three-stage process involving memorization followed by two generalization phases, driven by the competition between loss minimization and weight decay.

Interpretable Electrophysiological Features of Resting-State EEG Capture Cortical Network Dynamics in Parkinsons Disease

Parkinsons disease (PD) alters cortical neural dynamics, yet reliable non-invasive electrophysiological biomarkers remain elusive. This study examined whether interpretable EEG features capturing complementary aspects of neural dynamics can discriminate Parkinsonian neural states. A comprehensive set of interpretable features was extracted and grouped into Standard descriptors (spectral power, phase synchronization, time-domain statistics) and Dynamical descriptors (aperiodic activity, cross-frequency coupling, scale-free dynamics, neuronal avalanche statistics, and instantaneous frequency measures). A multi-head attention transformer classifier was trained using strict LOSO validation. Group-level comparisons were performed to identify electrophysiological differences associated with disease and medication state. Standard feature sets achieved strongest performance in discriminating medication states (PDoff vs PDon), whereas Dynamical performed competitively in contrasts between PD patients and healthy controls. Random feature ablation analyses indicated that Dynamical descriptors provide complementary information distributed across features while correlation analysis revealed low redundancy within both feature sets. Group-level comparisons revealed medication-sensitive reductions in delta power and voltage variance, modulation of neuronal avalanche statistics, persistent increases in theta phase synchronization in PD patients, and disease-related alterations in cross-frequency interactions. Traditional spectral and synchronization features primarily reflect medication-related neural modulation, whereas dynamical descriptors reveal broader alterations in cortical network organization associated with disease but also with medication. These findings support multivariate EEG representations as a promising framework for developing non-invasive biomarkers of PD.

  • 1 authors
·
Mar 31

Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement

Phase information has a significant impact on speech perceptual quality and intelligibility. However, existing speech enhancement methods encounter limitations in explicit phase estimation due to the non-structural nature and wrapping characteristics of the phase, leading to a bottleneck in enhanced speech quality. To overcome the above issue, in this paper, we proposed MP-SENet, a novel Speech Enhancement Network that explicitly enhances Magnitude and Phase spectra in parallel. The proposed MP-SENet comprises a Transformer-embedded encoder-decoder architecture. The encoder aims to encode the input distorted magnitude and phase spectra into time-frequency representations, which are further fed into time-frequency Transformers for alternatively capturing time and frequency dependencies. The decoder comprises a magnitude mask decoder and a phase decoder, directly enhancing magnitude and wrapped phase spectra by incorporating a magnitude masking architecture and a phase parallel estimation architecture, respectively. Multi-level loss functions explicitly defined on the magnitude spectra, wrapped phase spectra, and short-time complex spectra are adopted to jointly train the MP-SENet model. A metric discriminator is further employed to compensate for the incomplete correlation between these losses and human auditory perception. Experimental results demonstrate that our proposed MP-SENet achieves state-of-the-art performance across multiple speech enhancement tasks, including speech denoising, dereverberation, and bandwidth extension. Compared to existing phase-aware speech enhancement methods, it further mitigates the compensation effect between the magnitude and phase by explicit phase estimation, elevating the perceptual quality of enhanced speech.

  • 3 authors
·
Aug 17, 2023

CPKD: Clinical Prior Knowledge-Constrained Diffusion Models for Surgical Phase Recognition in Endoscopic Submucosal Dissection

Gastrointestinal malignancies constitute a leading cause of cancer-related mortality worldwide, with advanced-stage prognosis remaining particularly dismal. Originating as a groundbreaking technique for early gastric cancer treatment, Endoscopic Submucosal Dissection has evolved into a versatile intervention for diverse gastrointestinal lesions. While computer-assisted systems significantly enhance procedural precision and safety in ESD, their clinical adoption faces a critical bottleneck: reliable surgical phase recognition within complex endoscopic workflows. Current state-of-the-art approaches predominantly rely on multi-stage refinement architectures that iteratively optimize temporal predictions. In this paper, we present Clinical Prior Knowledge-Constrained Diffusion (CPKD), a novel generative framework that reimagines phase recognition through denoising diffusion principles while preserving the core iterative refinement philosophy. This architecture progressively reconstructs phase sequences starting from random noise and conditioned on visual-temporal features. To better capture three domain-specific characteristics, including positional priors, boundary ambiguity, and relation dependency, we design a conditional masking strategy. Furthermore, we incorporate clinical prior knowledge into the model training to improve its ability to correct phase logical errors. Comprehensive evaluations on ESD820, Cholec80, and external multi-center demonstrate that our proposed CPKD achieves superior or comparable performance to state-of-the-art approaches, validating the effectiveness of diffusion-based generative paradigms for surgical phase recognition.

  • 7 authors
·
Jul 4, 2025

A search for periodic activity in multi-peaked long gamma-ray bursts

A sizeable fraction of gamma-ray burst (GRB) light curves (LCs) features a sequence of peaks, which holds information on the unknown way energy is dissipated into gamma-rays over time. Traditional searches for periodic signals in GRB LCs turned out to be inconclusive, partly because they are challenging as a consequence of the short-lived, coloured-noise, and non-stationary nature of the LCs themselves. Yet, recent claims have revived the issue. We searched for periodic components in GRB LCs through a new approach to GRBs, that avoids most of the issues faced by traditional techniques. We identified peaks through a well tested algorithm and selected GRBs with at least 10 peaks out of 5 GRB catalogues (Swift/BAT, CGRO/BATSE, Fermi/GBM, Insight-HXMT, BeppoSAX/GRBM). Each GRB was simply treated as a discrete point process, whose realisation coincides with the sequence of peak times. We searched for possible periodic recurrences based on the multinomial distribution, after accounting for the clustering of peaks due to the non-stationarity of the GRB signals. The best candidate has a p-value of 3e-4 that there is no periodic recurrence. However, accounting for the multiple trials of 555 searched GRBs, its statistical significance is demoted to 17%. The overall distribution of the p-values obtained for all GRBs is compatible with a uniform distribution in [0,1]. We found no robust evidence for multi-peaked GRBs with periodic recurrences. We can exclude that a sizeable fraction (>~ 0.75) of peaks of each GRB with at least 10 peaks are periodic. While our result does not necessarily clash with claimed periodicities based on Fourier techniques, it constrains the putative recurrent behaviour, which would not manifest itself through the sequence of peaks, but, evidently, in a more elusive way.

  • 13 authors
·
Apr 10, 2025

Peakbagging the K2 KEYSTONE sample with PBjam: characterising the individual mode frequencies in solar-like oscillators

The pattern of individual mode frequencies in solar-like oscillators provides valuable insight into their properties and interior structures. The identification and characterisation of these modes requires high signal-to-noise and frequency resolution. The KEYSTONE project unlocks the asteroseismic potential of the K2 mission by providing individually reduced, high-quality time series data, global asteroseismic parameters, and spectroscopic analysis for 173 solar-like oscillators. In this work, we build on the KEYSTONE project and present the first analysis of the pattern of individual modes in the oscillation spectra for the K2 KEYSTONE stars. We perform a robust identification and characterisation of the modes through peakbagging methods in the open-source analysis tool PBjam. We present over 6000 mode frequencies, widths, and heights for 168 stars in the sample, covering the HR diagram from FGK dwarfs to sub-giants and the lower red giant branch, providing a significant increase in the number of individual mode frequency detections for main sequence and sub-giant oscillators. This study also presents sample-wide trends of oscillation patterns as a function of the fundamental stellar properties, and improves the precision of the global asteroseismic parameters. These measurements are part of the legacy of the K2 mission, and can be used to perform detailed modelling to improve the precision of fundamental properties of these stars. The results of this analysis provides evidence for the validity of using PBjam to identify and characterise the modes resulting from the observations of the future PLATO mission.

  • 8 authors
·
Oct 24, 2025