new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Apr 20

Can Alfvénic Fluctuations Affect the Correlation and Complexity of Magnetic Fields in Magnetic Ejecta? A Case Study Based on Multi-Spacecraft Measurements at 1~au

We investigate whether Alfv\'enic fluctuations (AFs) can affect the structure of magnetic ejecta (MEs) within interplanetary coronal mass ejections (ICMEs). We study an ICME observed on 2001 December 29 at 1 au by ACE and Wind, at a total angular separation of sim0.8^circ (sim0.014~au). We focus on the correlation and complexity of its magnetic structure measured between two spacecraft in association with large-amplitude AFs. The Alfv\'enicity of the ME is investigated in terms of the residual energy and cross helicity of fluctuations. We find that as for the event of interest, large-amplitude AFs occur in the rear region of the ME at both Wind and ACE with a duration of about six hours. We compare the correlation of the magnetic field strength and vector components measured between Wind and ACE, and investigate complexity in terms of the magnetic hodograms. The region showing AFs is found to be associated with a decreased correlation of the magnetic field components and an increased complexity of the ME magnetic configuration detected at ACE and Wind, which may be due to the fact that the two spacecraft crossing the same ME along different trajectories likely sampled AFs in different oscillation phases. Combining multi-point in-situ measurements and remote-sensing observations of the ICME source region, we further discuss different potential sources of the AFs.

  • 7 authors
·
Dec 10, 2024

Energy-Based Diffusion Language Models for Text Generation

Despite remarkable progress in autoregressive language models, alternative generative paradigms beyond left-to-right generation are still being actively explored. Discrete diffusion models, with the capacity for parallel generation, have recently emerged as a promising alternative. Unfortunately, these models still underperform the autoregressive counterparts, with the performance gap increasing when reducing the number of sampling steps. Our analysis reveals that this degradation is a consequence of an imperfect approximation used by diffusion models. In this work, we propose Energy-based Diffusion Language Model (EDLM), an energy-based model operating at the full sequence level for each diffusion step, introduced to improve the underlying approximation used by diffusion models. More specifically, we introduce an EBM in a residual form, and show that its parameters can be obtained by leveraging a pretrained autoregressive model or by finetuning a bidirectional transformer via noise contrastive estimation. We also propose an efficient generation algorithm via parallel important sampling. Comprehensive experiments on language modeling benchmarks show that our model can consistently outperform state-of-the-art diffusion models by a significant margin, and approaches autoregressive models' perplexity. We further show that, without any generation performance drop, our framework offers a 1.3times sampling speedup over existing diffusion models.

  • 8 authors
·
Oct 28, 2024

MuSE-SVS: Multi-Singer Emotional Singing Voice Synthesizer that Controls Emotional Intensity

We propose a multi-singer emotional singing voice synthesizer, Muse-SVS, that expresses emotion at various intensity levels by controlling subtle changes in pitch, energy, and phoneme duration while accurately following the score. To control multiple style attributes while avoiding loss of fidelity and expressiveness due to interference between attributes, Muse-SVS represents all attributes and their relations together by a joint embedding in a unified embedding space. Muse-SVS can express emotional intensity levels not included in the training data through embedding interpolation and extrapolation. We also propose a statistical pitch predictor to express pitch variance according to emotional intensity, and a context-aware residual duration predictor to prevent the accumulation of variances in phoneme duration, which is crucial for synchronization with instrumental parts. In addition, we propose a novel ASPP-Transformer, which combines atrous spatial pyramid pooling (ASPP) and Transformer, to improve fidelity and expressiveness by referring to broad contexts. In experiments, Muse-SVS exhibited improved fidelity, expressiveness, and synchronization performance compared with baseline models. The visualization results show that Muse-SVS effectively express the variance in pitch, energy, and phoneme duration according to emotional intensity. To the best of our knowledge, Muse-SVS is the first neural SVS capable of controlling emotional intensity.

StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation

We introduce StreamDiffusion, a real-time diffusion pipeline designed for interactive image generation. Existing diffusion models are adept at creating images from text or image prompts, yet they often fall short in real-time interaction. This limitation becomes particularly evident in scenarios involving continuous input, such as Metaverse, live video streaming, and broadcasting, where high throughput is imperative. To address this, we present a novel approach that transforms the original sequential denoising into the batching denoising process. Stream Batch eliminates the conventional wait-and-interact approach and enables fluid and high throughput streams. To handle the frequency disparity between data input and model throughput, we design a novel input-output queue for parallelizing the streaming process. Moreover, the existing diffusion pipeline uses classifier-free guidance(CFG), which requires additional U-Net computation. To mitigate the redundant computations, we propose a novel residual classifier-free guidance (RCFG) algorithm that reduces the number of negative conditional denoising steps to only one or even zero. Besides, we introduce a stochastic similarity filter(SSF) to optimize power consumption. Our Stream Batch achieves around 1.5x speedup compared to the sequential denoising method at different denoising levels. The proposed RCFG leads to speeds up to 2.05x higher than the conventional CFG. Combining the proposed strategies and existing mature acceleration tools makes the image-to-image generation achieve up-to 91.07fps on one RTX4090, improving the throughputs of AutoPipline developed by Diffusers over 59.56x. Furthermore, our proposed StreamDiffusion also significantly reduces the energy consumption by 2.39x on one RTX3060 and 1.99x on one RTX4090, respectively.

  • 10 authors
·
Dec 19, 2023 5

QuantVLA: Scale-Calibrated Post-Training Quantization for Vision-Language-Action Models

Vision-language-action (VLA) models unify perception, language, and control for embodied agents but face significant challenges in practical deployment due to rapidly increasing compute and memory demands, especially as models scale to longer horizons and larger backbones. To address these bottlenecks, we introduce QuantVLA, a training-free post-training quantization (PTQ) framework that, to our knowledge, is the first PTQ approach for VLA systems and the first to successfully quantize a diffusion transformer (DiT) action head. QuantVLA incorporates three scale-calibrated components: (1) a selective quantization layout that integerizes all linear layers in both the language backbone and the DiT while keeping attention projections in floating point to preserve the original operator schedule; (2) attention temperature matching, a lightweight per-head scaling mechanism that stabilizes attention logits and is folded into the dequantization scales at inference; and (3) output head balancing, a per-layer residual interface calibration that mitigates post-projection energy drift. The framework requires no additional training, uses only a small unlabeled calibration buffer, and supports integer kernels for low-bit weights and activations while leaving the architecture unchanged. Across representative VLA models on LIBERO, QuantVLA exceeds the task success rates of full-precision baselines, achieves about 70% relative memory savings on the quantized components, and delivers a 1.22x speedup in end-to-end inference latency, providing a practical pathway toward scalable low-bit embodied intelligence under strict compute, memory, and power constraints.

  • 8 authors
·
Feb 23 4

A Model RRNet for Spectral Information Exploitation and LAMOST Medium-resolution Spectrum Parameter Estimation

This work proposes a Residual Recurrent Neural Network (RRNet) for synthetically extracting spectral information, and estimating stellar atmospheric parameters together with 15 chemical element abundances for medium-resolution spectra from Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST). The RRNet consists of two fundamental modules: a residual module and a recurrent module. The residual module extracts spectral features based on the longitudinally driving power from parameters, while the recurrent module recovers spectral information and restrains the negative influences from noises based on Cross-band Belief Enhancement. RRNet is trained by the spectra from common stars between LAMOST DR7 and APOGEE-Payne catalog. The 17 stellar parameters and their uncertainties for 2.37 million medium-resolution spectra from LAMOST DR7 are predicted. For spectra with S/N >= 10, the precision of estimations Teff and log g are 88 K and 0.13 dex respectively, elements C, Mg, Al, Si, Ca, Fe, Ni are 0.05 dex to 0.08 dex, and N, O, S, K, Ti, Cr, Mn are 0.09 dex to 0.14 dex, while that of Cu is 0.19 dex. Compared with StarNet and SPCANet, RRNet shows higher accuracy and robustness. In comparison to Apache Point Observatory Galactic Evolution Experiment and Galactic Archaeology with HERMES surveys, RRNet manifests good consistency within a reasonable range of bias. Finally, this work releases a catalog for 2.37 million medium-resolution spectra from the LAMOST DR7, the source code, the trained model and the experimental data respectively for astronomical science exploration and data processing algorithm research reference.

  • 3 authors
·
May 30, 2022

Probabilistic Assessment of Engineered Timber Reusability after Moisture Exposure

Engineered timber is pivotal to low-carbon construction, but moisture uptake during its service life can compromise structural reliability and impede reuse within a circular economy model. Despite growing interest, quantitative standards for classifying the reusability of moisture-exposed timber are still lacking. This study develops a probabilistic framework to determine the post-exposure reusability of engineered timber. Laminated specimens were soaked to full saturation, dried to 25% moisture content, and subjected to destructive three-point flexural testing. Structural integrity was quantified by a residual-performance metric that assigns 80% weight to the retained flexural modulus and 20% to the retained maximum load, benchmarked against unexposed controls. A hierarchical Bayesian multinomial logistic model with horseshoe priors, calibrated through Markov-Chain Monte-Carlo sampling, jointly infers the decision threshold separating three Modern Methods of Construction (MMC) reuse levels and predicts those levels from five field-measurable features: density, moisture content, specimen size, grain orientation, and surface hardness. Results indicate that a single wet-dry cycle preserves 70% of specimens above the 0.90 residual-performance threshold (Level 1), whereas repeated cycling lowers the mean residual to 0.78 and reallocates many specimens to Levels 2-3. The proposed framework yields quantified decision boundaries and a streamlined on-site testing protocol, providing a foundation for robust quality assurance standards.

  • 5 authors
·
May 29, 2025

AIMS-EREA -- A framework for AI-accelerated Innovation of Materials for Sustainability -- for Environmental Remediation and Energy Applications

Many environmental remediation and energy applications (conversion and storage) for sustainability need design and development of green novel materials. Discovery processes of such novel materials are time taking and cumbersome due to large number of possible combinations and permutations of materials structures. Often theoretical studies based on Density Functional Theory (DFT) and other theories, coupled with Simulations are conducted to narrow down sample space of candidate materials, before conducting laboratory-based synthesis and analytical process. With the emergence of artificial intelligence (AI), AI techniques are being tried in this process too to ease out simulation time and cost. However tremendous values of previously published research from various parts of the world are still left as labor-intensive manual effort and discretion of individual researcher and prone to human omissions. AIMS-EREA is our novel framework to blend best of breed of Material Science theory with power of Generative AI to give best impact and smooth and quickest discovery of material for sustainability. This also helps to eliminate the possibility of production of hazardous residues and bye-products of the reactions. AIMS-EREA uses all available resources -- Predictive and Analytical AI on large collection of chemical databases along with automated intelligent assimilation of deep materials knowledge from previously published research works through Generative AI. We demonstrate use of our own novel framework with an example, how this framework can be successfully applied to achieve desired success in development of thermoelectric material for waste heat conversion.

  • 3 authors
·
Nov 18, 2023