new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Apr 15

Modelling the 5G Energy Consumption using Real-world Data: Energy Fingerprint is All You Need

The introduction of fifth-generation (5G) radio technology has revolutionized communications, bringing unprecedented automation, capacity, connectivity, and ultra-fast, reliable communications. However, this technological leap comes with a substantial increase in energy consumption, presenting a significant challenge. To improve the energy efficiency of 5G networks, it is imperative to develop sophisticated models that accurately reflect the influence of base station (BS) attributes and operational conditions on energy usage.Importantly, addressing the complexity and interdependencies of these diverse features is particularly challenging, both in terms of data processing and model architecture design. This paper proposes a novel 5G base stations energy consumption modelling method by learning from a real-world dataset used in the ITU 5G Base Station Energy Consumption Modelling Challenge in which our model ranked second. Unlike existing methods that omit the Base Station Identifier (BSID) information and thus fail to capture the unique energy fingerprint in different base stations, we incorporate the BSID into the input features and encoding it with an embedding layer for precise representation. Additionally, we introduce a novel masked training method alongside an attention mechanism to further boost the model's generalization capabilities and accuracy. After evaluation, our method demonstrates significant improvements over existing models, reducing Mean Absolute Percentage Error (MAPE) from 12.75% to 4.98%, leading to a performance gain of more than 60%.

  • 8 authors
·
Jun 13, 2024

Sim2Rec: A Simulator-based Decision-making Approach to Optimize Real-World Long-term User Engagement in Sequential Recommender Systems

Long-term user engagement (LTE) optimization in sequential recommender systems (SRS) is shown to be suited by reinforcement learning (RL) which finds a policy to maximize long-term rewards. Meanwhile, RL has its shortcomings, particularly requiring a large number of online samples for exploration, which is risky in real-world applications. One of the appealing ways to avoid the risk is to build a simulator and learn the optimal recommendation policy in the simulator. In LTE optimization, the simulator is to simulate multiple users' daily feedback for given recommendations. However, building a user simulator with no reality-gap, i.e., can predict user's feedback exactly, is unrealistic because the users' reaction patterns are complex and historical logs for each user are limited, which might mislead the simulator-based recommendation policy. In this paper, we present a practical simulator-based recommender policy training approach, Simulation-to-Recommendation (Sim2Rec) to handle the reality-gap problem for LTE optimization. Specifically, Sim2Rec introduces a simulator set to generate various possibilities of user behavior patterns, then trains an environment-parameter extractor to recognize users' behavior patterns in the simulators. Finally, a context-aware policy is trained to make the optimal decisions on all of the variants of the users based on the inferred environment-parameters. The policy is transferable to unseen environments (e.g., the real world) directly as it has learned to recognize all various user behavior patterns and to make the correct decisions based on the inferred environment-parameters. Experiments are conducted in synthetic environments and a real-world large-scale ride-hailing platform, DidiChuxing. The results show that Sim2Rec achieves significant performance improvement, and produces robust recommendations in unseen environments.

  • 8 authors
·
May 3, 2023

Gotta Detect 'Em All: Fake Base Station and Multi-Step Attack Detection in Cellular Networks

Fake base stations (FBSes) pose a significant security threat by impersonating legitimate base stations (BSes). Though efforts have been made to defeat this threat, up to this day, the presence of FBSes and the multi-step attacks (MSAs) stemming from them can lead to unauthorized surveillance, interception of sensitive information, and disruption of network services. Therefore, detecting these malicious entities is crucial to ensure the security and reliability of cellular networks. Traditional detection methods often rely on additional hardware, rules, signal scanning, changing protocol specifications, or cryptographic mechanisms that have limitations and incur huge infrastructure costs. In this paper, we develop FBSDetector-an effective and efficient detection solution that can reliably detect FBSes and MSAs from layer-3 network traces using machine learning (ML) at the user equipment (UE) side. To develop FBSDetector, we create FBSAD and MSAD, the first-ever high-quality and large-scale datasets incorporating instances of FBSes and 21 MSAs. These datasets capture the network traces in different real-world cellular network scenarios (including mobility and different attacker capabilities) incorporating legitimate BSes and FBSes. Our novel ML framework, specifically designed to detect FBSes in a multi-level approach for packet classification using stateful LSTM with attention and trace level classification and MSAs using graph learning, can effectively detect FBSes with an accuracy of 96% and a false positive rate of 2.96%, and recognize MSAs with an accuracy of 86% and a false positive rate of 3.28%. We deploy FBSDetector as a real-world solution to protect end-users through a mobile app and validate it in real-world environments. Compared to the existing heuristic-based solutions that fail to detect FBSes, FBSDetector can detect FBSes in the wild in real-time.

  • 3 authors
·
Jan 10, 2024

Geo2SigMap: High-Fidelity RF Signal Mapping Using Geographic Databases

Radio frequency (RF) signal mapping, which is the process of analyzing and predicting the RF signal strength and distribution across specific areas, is crucial for cellular network planning and deployment. Traditional approaches to RF signal mapping rely on statistical models constructed based on measurement data, which offer low complexity but often lack accuracy, or ray tracing tools, which provide enhanced precision for the target area but suffer from increased computational complexity. Recently, machine learning (ML) has emerged as a data-driven method for modeling RF signal propagation, which leverages models trained on synthetic datasets to perform RF signal mapping in "unseen" areas. In this paper, we present Geo2SigMap, an ML-based framework for efficient and high-fidelity RF signal mapping using geographic databases. First, we develop an automated framework that seamlessly integrates three open-source tools: OpenStreetMap (geographic databases), Blender (computer graphics), and Sionna (ray tracing), enabling the efficient generation of large-scale 3D building maps and ray tracing models. Second, we propose a cascaded U-Net model, which is pre-trained on synthetic datasets and employed to generate detailed RF signal maps, leveraging environmental information and sparse measurement data. Finally, we evaluate the performance of Geo2SigMap via a real-world measurement campaign, where three types of user equipment (UE) collect over 45,000 data points related to cellular information from six LTE cells operating in the citizens broadband radio service (CBRS) band. Our results show that Geo2SigMap achieves an average root-mean-square-error (RMSE) of 6.04 dB for predicting the reference signal received power (RSRP) at the UE, representing an average RMSE improvement of 3.59 dB compared to existing methods.

  • 4 authors
·
Dec 21, 2023

Market-based Short-Term Allocations in Small Cell Wireless Networks

Mobile users (or UEs, to use 3GPP terminology) served by small cells in dense urban settings may abruptly experience a significant deterioration in their channel to their serving base stations (BSs) in several scenarios, such as after turning a corner around a tall building, or a sudden knot of traffic blocking the direct path between the UE and its serving BS. In this work, we propose a scheme to temporarily increase the data rate to/from this UE with additional bandwidth from the nearest Coordinated Multi-Point (CoMP) cluster of BSs, while the slower process of handover of the UE to a new serving BS is ongoing. We emphasize that this additional bandwidth is additional to the data rates the UE is getting over its primary connection to the current serving BS and, after the handover, to the new serving BS. The key novelty of the present work is the proposal of a decentralized market-based resource allocation method to perform resource allocation to support Coordinated Beamforming (CB) CoMP. It is scalable to large numbers of UEs and BSs, and it is fast because resource allocations are made bilaterally, between BSs and UEs. Once the resource allocation to the UE has been made, the coordinated of transmissions occurs as per the usual CB methods. Thus the proposed method has the benefit of giving the UE access to its desired amount of resources fast, without waiting for handover to complete, or reporting channel state information before it knows the resources it will be allocated for receiving transmissions from the serving BS.

  • 2 authors
·
May 8, 2020

A Disentangled Representation Learning Framework for Low-altitude Network Coverage Prediction

The expansion of the low-altitude economy has underscored the significance of Low-Altitude Network Coverage (LANC) prediction for designing aerial corridors. While accurate LANC forecasting hinges on the antenna beam patterns of Base Stations (BSs), these patterns are typically proprietary and not readily accessible. Operational parameters of BSs, which inherently contain beam information, offer an opportunity for data-driven low-altitude coverage prediction. However, collecting extensive low-altitude road test data is cost-prohibitive, often yielding only sparse samples per BS. This scarcity results in two primary challenges: imbalanced feature sampling due to limited variability in high-dimensional operational parameters against the backdrop of substantial changes in low-dimensional sampling locations, and diminished generalizability stemming from insufficient data samples. To overcome these obstacles, we introduce a dual strategy comprising expert knowledge-based feature compression and disentangled representation learning. The former reduces feature space complexity by leveraging communications expertise, while the latter enhances model generalizability through the integration of propagation models and distinct subnetworks that capture and aggregate the semantic representations of latent features. Experimental evaluation confirms the efficacy of our framework, yielding a 7% reduction in error compared to the best baseline algorithm. Real-network validations further attest to its reliability, achieving practical prediction accuracy with MAE errors at the 5dB level.

  • 8 authors
·
Jul 13, 2025

Multi-Agent Reinforcement Learning for Offloading Cellular Communications with Cooperating UAVs

Effective solutions for intelligent data collection in terrestrial cellular networks are crucial, especially in the context of Internet of Things applications. The limited spectrum and coverage area of terrestrial base stations pose challenges in meeting the escalating data rate demands of network users. Unmanned aerial vehicles, known for their high agility, mobility, and flexibility, present an alternative means to offload data traffic from terrestrial BSs, serving as additional access points. This paper introduces a novel approach to efficiently maximize the utilization of multiple UAVs for data traffic offloading from terrestrial BSs. Specifically, the focus is on maximizing user association with UAVs by jointly optimizing UAV trajectories and users association indicators under quality of service constraints. Since, the formulated UAVs control problem is nonconvex and combinatorial, this study leverages the multi agent reinforcement learning framework. In this framework, each UAV acts as an independent agent, aiming to maintain inter UAV cooperative behavior. The proposed approach utilizes the finite state Markov decision process to account for UAVs velocity constraints and the relationship between their trajectories and state space. A low complexity distributed state action reward state action algorithm is presented to determine UAVs optimal sequential decision making policies over training episodes. The extensive simulation results validate the proposed analysis and offer valuable insights into the optimal UAV trajectories. The derived trajectories demonstrate superior average UAV association performance compared to benchmark techniques such as Q learning and particle swarm optimization.

  • 6 authors
·
Feb 5, 2024

RFRL Gym: A Reinforcement Learning Testbed for Cognitive Radio Applications

Radio Frequency Reinforcement Learning (RFRL) is anticipated to be a widely applicable technology in the next generation of wireless communication systems, particularly 6G and next-gen military communications. Given this, our research is focused on developing a tool to promote the development of RFRL techniques that leverage spectrum sensing. In particular, the tool was designed to address two cognitive radio applications, specifically dynamic spectrum access and jamming. In order to train and test reinforcement learning (RL) algorithms for these applications, a simulation environment is necessary to simulate the conditions that an agent will encounter within the Radio Frequency (RF) spectrum. In this paper, such an environment has been developed, herein referred to as the RFRL Gym. Through the RFRL Gym, users can design their own scenarios to model what an RL agent may encounter within the RF spectrum as well as experiment with different spectrum sensing techniques. Additionally, the RFRL Gym is a subclass of OpenAI gym, enabling the use of third-party ML/RL Libraries. We plan to open-source this codebase to enable other researchers to utilize the RFRL Gym to test their own scenarios and RL algorithms, ultimately leading to the advancement of RL research in the wireless communications domain. This paper describes in further detail the components of the Gym, results from example scenarios, and plans for future additions. Index Terms-machine learning, reinforcement learning, wireless communications, dynamic spectrum access, OpenAI gym

  • 17 authors
·
Dec 20, 2023

U6G XL-MIMO Radiomap Prediction: Multi-Config Dataset and Beam Map Approach

The upper 6 GHz (U6G) band with XL-MIMO is a key enabler for sixth-generation wireless systems, yet intelligent radiomap prediction for such systems remains challenging. Existing datasets support only small-scale arrays (up to 8x8) with predominantly isotropic antennas, far from the 1024-element directional arrays envisioned for 6G. Moreover, current methods encode array configurations as scalar parameters, forcing neural networks to extrapolate array-specific radiation patterns, which fails when predicting radiomaps for configurations absent from training data. To jointly address data scarcity and generalization limitations, this paper advances XL-MIMO radiomap prediction from three aspects. To overcome data limitations, we construct the first XL-MIMO radiomap dataset containing 78400 radiomaps across 800 urban scenes, five frequency bands (1.8-6.7 GHz), and nine array configurations up to 32x32 uniform planar arrays with directional elements. To enable systematic evaluation, we establish a comprehensive benchmark framework covering practical scenarios from coverage estimation without field measurements to generalization across unseen configurations and environments. To enable generalization to arbitrary beam configurations without retraining, we propose the beam map, a physics-informed spatial feature that analytically computes array-specific coverage patterns. By decoupling deterministic array radiation from data learned multipath propagation, beam maps shift generalization from neural network extrapolation to physics-based computation. Integrating beam maps into existing architectures reduces mean absolute error by up to 60.0% when generalizing to unseen configurations and up to 50.5% when transferring to unseen environments. The complete dataset and code are publicly available at https://lxj321.github.io/MulticonfigRadiomapDataset/.

  • 5 authors
·
Mar 6

UrbanMIMOMap: A Ray-Traced MIMO CSI Dataset with Precoding-Aware Maps and Benchmarks

Sixth generation (6G) systems require environment-aware communication, driven by native artificial intelligence (AI) and integrated sensing and communication (ISAC). Radio maps (RMs), providing spatially continuous channel information, are key enablers. However, generating high-fidelity RM ground truth via electromagnetic (EM) simulations is computationally intensive, motivating machine learning (ML)-based RM construction. The effectiveness of these data-driven methods depends on large-scale, high-quality training data. Current public datasets often focus on single-input single-output (SISO) and limited information, such as path loss, which is insufficient for advanced multi-input multi-output (MIMO) systems requiring detailed channel state information (CSI). To address this gap, this paper presents UrbanMIMOMap, a novel large-scale urban MIMO CSI dataset generated using high-precision ray tracing. UrbanMIMOMap offers comprehensive complex CSI matrices across a dense spatial grid, going beyond traditional path loss data. This rich CSI is vital for constructing high-fidelity RMs and serves as a fundamental resource for data-driven RM generation, including deep learning. We demonstrate the dataset's utility through baseline performance evaluations of representative ML methods for RM construction. This work provides a crucial dataset and reference for research in high-precision RM generation, MIMO spatial performance, and ML for 6G environment awareness. The code and data for this work are available at: https://github.com/UNIC-Lab/UrbanMIMOMap.

  • 5 authors
·
Sep 7, 2025

RADIANCE: Radio-Frequency Adversarial Deep-learning Inference for Automated Network Coverage Estimation

Radio-frequency coverage maps (RF maps) are extensively utilized in wireless networks for capacity planning, placement of access points and base stations, localization, and coverage estimation. Conducting site surveys to obtain RF maps is labor-intensive and sometimes not feasible. In this paper, we propose radio-frequency adversarial deep-learning inference for automated network coverage estimation (RADIANCE), a generative adversarial network (GAN) based approach for synthesizing RF maps in indoor scenarios. RADIANCE utilizes a semantic map, a high-level representation of the indoor environment to encode spatial relationships and attributes of objects within the environment and guide the RF map generation process. We introduce a new gradient-based loss function that computes the magnitude and direction of change in received signal strength (RSS) values from a point within the environment. RADIANCE incorporates this loss function along with the antenna pattern to capture signal propagation within a given indoor configuration and generate new patterns under new configuration, antenna (beam) pattern, and center frequency. Extensive simulations are conducted to compare RADIANCE with ray-tracing simulations of RF maps. Our results show that RADIANCE achieves a mean average error (MAE) of 0.09, root-mean-squared error (RMSE) of 0.29, peak signal-to-noise ratio (PSNR) of 10.78, and multi-scale structural similarity index (MS-SSIM) of 0.80.

  • 3 authors
·
Aug 21, 2023

RadioDiff-3D: A 3Dtimes3D Radio Map Dataset and Generative Diffusion Based Benchmark for 6G Environment-Aware Communication

Radio maps (RMs) serve as a critical foundation for enabling environment-aware wireless communication, as they provide the spatial distribution of wireless channel characteristics. Despite recent progress in RM construction using data-driven approaches, most existing methods focus solely on pathloss prediction in a fixed 2D plane, neglecting key parameters such as direction of arrival (DoA), time of arrival (ToA), and vertical spatial variations. Such a limitation is primarily due to the reliance on static learning paradigms, which hinder generalization beyond the training data distribution. To address these challenges, we propose UrbanRadio3D, a large-scale, high-resolution 3D RM dataset constructed via ray tracing in realistic urban environments. UrbanRadio3D is over 37times3 larger than previous datasets across a 3D space with 3 metrics as pathloss, DoA, and ToA, forming a novel 3Dtimes33D dataset with 7times3 more height layers than prior state-of-the-art (SOTA) dataset. To benchmark 3D RM construction, a UNet with 3D convolutional operators is proposed. Moreover, we further introduce RadioDiff-3D, a diffusion-model-based generative framework utilizing the 3D convolutional architecture. RadioDiff-3D supports both radiation-aware scenarios with known transmitter locations and radiation-unaware settings based on sparse spatial observations. Extensive evaluations on UrbanRadio3D validate that RadioDiff-3D achieves superior performance in constructing rich, high-dimensional radio maps under diverse environmental dynamics. This work provides a foundational dataset and benchmark for future research in 3D environment-aware communication. The dataset is available at https://github.com/UNIC-Lab/UrbanRadio3D.

  • 8 authors
·
Jul 16, 2025

A Homogeneous Graph Neural Network for Precoding and Power Allocation in Scalable Wireless Networks

Deep learning is widely used in wireless communications but struggles with fixed neural network sizes, which limit their adaptability in environments where the number of users and antennas varies. To overcome this, this paper introduced a generalization strategy for precoding and power allocation in scalable wireless networks. Initially, we employ an innovative approach to abstract the wireless network into a homogeneous graph. This primarily focuses on bypassing the heterogeneous features between transmitter (TX) and user entities to construct a virtual homogeneous graph serving optimization objectives, thereby enabling all nodes in the virtual graph to share the same neural network. This "TX entity" is known as a base station (BS) in cellular networks and an access point (AP) in cell-free networks. Subsequently, we design a universal graph neural network, termed the information carrying graph neural network (ICGNN), to capture and integrate information from this graph, maintaining permutation invariance. Lastly, using ICGNN as the core algorithm, we tailor the neural network's input and output for specific problem requirements and validate its performance in two scenarios: 1) in cellular networks, we develop a matrix-inverse-free multi-user multi-input multi-output (MU-MIMO) precoding scheme using the conjugate gradient (CG) method, adaptable to varying user and antenna numbers; 2) in a cell-free network, facing dynamic variations in the number of users served by APs, the number of APs serving each user, and the number of antennas per AP, we propose a universal power allocation scheme. Simulations demonstrate that the proposed approach not only significantly reduces computational complexity but also achieves, and potentially exceeds, the spectral efficiency (SE) of conventional algorithms.

  • 6 authors
·
Aug 30, 2024

Transform-Invariant Generative Ray Path Sampling for Efficient Radio Propagation Modeling

Ray tracing has become a standard for accurate radio propagation modeling, but suffers from exponential computational complexity, as the number of candidate paths scales with the number of objects raised to the power of the interaction order. This bottleneck limits its use in large-scale or real-time applications, forcing traditional tools to rely on heuristics to reduce the number of path candidates at the cost of potentially reduced accuracy. To overcome this limitation, we propose a comprehensive machine-learning-assisted framework that replaces exhaustive path searching with intelligent sampling via Generative Flow Networks. Applying such generative models to this domain presents significant challenges, particularly sparse rewards due to the rarity of valid paths, which can lead to convergence failures and trivial solutions when evaluating high-order interactions in complex environments. To ensure robust learning and efficient exploration, our framework incorporates three key architectural components. First, we implement an experience replay buffer to capture and retain rare valid paths. Second, we adopt a uniform exploratory policy to improve generalization and prevent the model from overfitting to simple geometries. Third, we apply a physics-based action masking strategy that filters out physically impossible paths before the model even considers them. As demonstrated in our experimental validation, the proposed model achieves substantial speedups over exhaustive search -- up to 10times faster on GPU and 1000times faster on CPU -- while maintaining high coverage accuracy and successfully uncovering complex propagation paths. The complete source code, tests, and tutorial are available at https://github.com/jeertmans/sampling-paths.

Artificial General Intelligence (AGI)-Native Wireless Systems: A Journey Beyond 6G

Building future wireless systems that support services like digital twins (DTs) is challenging to achieve through advances to conventional technologies like meta-surfaces. While artificial intelligence (AI)-native networks promise to overcome some limitations of wireless technologies, developments still rely on AI tools like neural networks. Such tools struggle to cope with the non-trivial challenges of the network environment and the growing demands of emerging use cases. In this paper, we revisit the concept of AI-native wireless systems, equipping them with the common sense necessary to transform them into artificial general intelligence (AGI)-native systems. These systems acquire common sense by exploiting different cognitive abilities such as perception, analogy, and reasoning, that enable them to generalize and deal with unforeseen scenarios. Towards developing the components of such a system, we start by showing how the perception module can be built through abstracting real-world elements into generalizable representations. These representations are then used to create a world model, founded on principles of causality and hyper-dimensional (HD) computing, that aligns with intuitive physics and enables analogical reasoning, that define common sense. Then, we explain how methods such as integrated information theory play a role in the proposed intent-driven and objective-driven planning methods that maneuver the AGI-native network to take actions. Next, we discuss how an AGI-native network can enable use cases related to human and autonomous agents: a) analogical reasoning for next-generation DTs, b) synchronized and resilient experiences for cognitive avatars, and c) brain-level metaverse experiences like holographic teleportation. Finally, we conclude with a set of recommendations to build AGI-native systems. Ultimately, we envision this paper as a roadmap for the beyond 6G era.

  • 7 authors
·
Apr 29, 2024

Performance Limits of Network Densification

Network densification is a promising cellular deployment technique that leverages spatial reuse to enhance coverage and throughput. Recent work has identified that at some point ultra-densification will no longer be able to deliver significant throughput gains. In this paper, we provide a unified treatment of the performance limits of network densification. We develop a general framework, which incorporates multi-slope pathloss and the entire space of shadowing and small scale fading distributions, under strongest cell association in a Poisson field of interferers. First, our results show that there are three scaling regimes for the downlink signal-to-interference-plus-noise ratio (SINR), coverage probability, and average per-user rate. Specifically, depending on the near-field pathloss and the fading distribution, the user performance of 5G ultra dense networks (UDNs) would either monotonically increase, saturate, or decay with increasing network density. Second, we show that network performance in terms of coverage density and area spectral efficiency can scale with the network density better than the user performance does. Furthermore, we provide ordering results for both coverage and average rate as a means to qualitatively compare different transmission techniques that may exhibit the same performance scaling. Our results, which are verified by simulations, provide succinct insights and valuable design guidelines for the deployment of 5G UDNs.

  • 2 authors
·
Nov 23, 2016

Model Context Protocol-based Internet of Experts For Wireless Environment-aware LLM Agents

Large Language Models (LLMs) exhibit strong general-purpose reasoning abilities but lack access to wireless environment information due to the absence of native sensory input and domain-specific priors. Previous attempts to apply LLMs in wireless systems either depend on retraining with network-specific data, which compromises language generalization, or rely on manually scripted interfaces, which hinder scalability. To overcome these limitations, we propose a Model Context Protocol (MCP)-based Internet of Experts (IoX) framework that equips LLMs with wireless environment-aware reasoning capabilities. The framework incorporates a set of lightweight expert models, each trained to solve a specific deterministic task in wireless communications, such as detecting a specific wireless attribute, e.g., line-of-sight propagation, Doppler effects, or fading conditions. Through MCP, the LLM can selectively query and interpret expert outputs at inference time, without modifying its own parameters. This architecture enables modular, extensible, and interpretable reasoning over wireless contexts. Evaluated across multiple mainstream LLMs, the proposed wireless environment-aware LLM agents achieve 40%-50% improvements in classification tasks over LLM-only baselines. More broadly, the MCP-based design offers a viable paradigm for future LLMs to inherit structured wireless network management capabilities.

  • 2 authors
·
May 3, 2025

Harnessing Selective State Space Models to Enhance Semianalytical Design of Fabrication-Ready Multilayered Huygens' Metasurfaces: Part II - Generative Inverse Design (MetaMamba)

We present a generative framework for inverse design of five-layer transmissive Huygens' metasurfaces (HMSs), addressing a longstanding challenge in achieving full-phase, high-efficiency unit cell designs with minimal full-wave simulations. The key to achieving this is our reliance on the field-based semianalytical (SA) scheme developed in Part I of this paper, which allows rapid and highly effective synthesis of such multilayer composites, however with limited accuracy. To overcome the prohibitive data demands of traditional pipelines, we employ Mamba, a selective state space model well suited for long-range sequence modeling as the backbone of our learning framework. A bidirectional Mamba (Bi-Mamba) forward surrogate is first trained on SA-generated data and subsequently fine-tuned with full-wave CST samples. An ablation over a 1080-sample CST pool shows that as few as 270 full-wave calibration samples suffice to reach near-CST-level agreement at a fraction of the simulation cost. An autoregressive Mamba inverse generator is subsequently trained on surrogate-augmented data, treating unit-cell synthesis as a sequential generation task. The resulting one-to-many generative model produces diverse unit cell geometries conditioned on target scattering responses. It achieves CST-validated designs with field transmission magnitude 0.9 across the full 0-2π phase range at 20 GHz. Moreover, a CST-calibrated surrogate trained to accurately predict frequency responses (18-22 GHz) enables functional post-selection of inverse generated designs. Together, the hybrid SA-generative methodology in this two-part compilation establishes a scalable and data-efficient solution for multilayer HMS synthesis, with natural extensions toward broadband, oblique-incidence, and higher-dimensional electromagnetic inverse-design problems.

  • 5 authors
·
Mar 4

MAI-UI Technical Report: Real-World Centric Foundation GUI Agents

The development of GUI agents could revolutionize the next generation of human-computer interaction. Motivated by this vision, we present MAI-UI, a family of foundation GUI agents spanning the full spectrum of sizes, including 2B, 8B, 32B, and 235B-A22B variants. We identify four key challenges to realistic deployment: the lack of native agent-user interaction, the limits of UI-only operation, the absence of a practical deployment architecture, and brittleness in dynamic environments. MAI-UI addresses these issues with a unified methodology: a self-evolving data pipeline that expands the navigation data to include user interaction and MCP tool calls, a native device-cloud collaboration system routes execution by task state, and an online RL framework with advanced optimizations to scale parallel environments and context length. MAI-UI establishes new state-of-the-art across GUI grounding and mobile navigation. On grounding benchmarks, it reaches 73.5% on ScreenSpot-Pro, 91.3% on MMBench GUI L2, 70.9% on OSWorld-G, and 49.2% on UI-Vision, surpassing Gemini-3-Pro and Seed1.8 on ScreenSpot-Pro. On mobile GUI navigation, it sets a new SOTA of 76.7% on AndroidWorld, surpassing UI-Tars-2, Gemini-2.5-Pro and Seed1.8. On MobileWorld, MAI-UI obtains 41.7% success rate, significantly outperforming end-to-end GUI models and competitive with Gemini-3-Pro based agentic frameworks. Our online RL experiments show significant gains from scaling parallel environments from 32 to 512 (+5.2 points) and increasing environment step budget from 15 to 50 (+4.3 points). Finally, the native device-cloud collaboration system improves on-device performance by 33%, reduces cloud model calls by over 40%, and preserves user privacy.

AlibabaTongyiLab TongyiLab
·
Dec 26, 2025 2

Think over Trajectories: Leveraging Video Generation to Reconstruct GPS Trajectories from Cellular Signaling

Mobile devices continuously interact with cellular base stations, generating massive volumes of signaling records that provide broad coverage for understanding human mobility. However, such records offer only coarse location cues (e.g., serving-cell identifiers) and therefore limit their direct use in applications that require high-precision GPS trajectories. This paper studies the Sig2GPS problem: reconstructing GPS trajectories from cellular signaling. Inspired by domain experts often lay the signaling trace on the map and sketch the corresponding GPS route, unlike conventional solutions that rely on complex multi-stage engineering pipelines or regress coordinates, Sig2GPS is reframed as an image-to-video generation task that directly operates in the map-visual domain: signaling traces are rendered on a map, and a video generation model is trained to draw a continuous GPS path. To support this paradigm, a paired signaling-to-trajectory video dataset is constructed to fine-tune an open-source video model, and a trajectory-aware reinforcement learning-based optimization method is introduced to improve generation fidelity via rewards. Experiments on large-scale real-world datasets show substantial improvements over strong engineered and learning-based baselines, while additional results on next GPS prediction indicate scalability and cross-city transferability. Overall, these results suggest that map-visual video generation provides a practical interface for trajectory data mining by enabling direct generation and refinement of continuous paths under map constraints.

  • 6 authors
·
Mar 27 2