Abstract
Urban air quality forecasting is vital for managing pollution exposure and protecting public health. This study introduces AQNet, a spatiotemporal deep learning framework that integrates adaptive mesh-based graph construction with attention driven temporal modeling to enhance prediction accuracy. The model is tested on the Beijing Multi-Site Air Quality Dataset containing hourly pollutant and meteorological records from 12 monitoring stations. AQNet captures spatial dependencies through graph convolution over nonuniform mesh partitions and models temporal dynamics using gated recurrent units. A cross-fusion layer integrates these representations before final prediction. The framework surpasses existing models such as those by Han et al. and Chen et al., achieving a Mean Absolute Error of 2.50, Root Mean Square Error of 3.35, and an R2 of 0.96, with precision, recall, and F1-score above 0.95. The adaptive mesh and attention modules enable efficient scaling across cities with different sensor densities, as confirmed through runtime and GPU profiling across Beijing, Delhi, Bangkok, and Kathmandu datasets. AQNet maintains high-resolution forecasting accuracy with minimal computational cost, confirming its suitability for deployment in diverse urban environments.
Introduction
Airborne fine particulate matter (PM2.5) and nitrogen dioxide (NO2) have been widely recognized as major environmental health hazards. Long-term exposure to PM2.5 is strongly associated with increased risks of cardiovascular and respiratory diseases, contributing substantially to premature mortality and global disease burden. Short- and long-term NO2 exposure exacerbates asthma, reduces lung function, and elevates hospital admissions for respiratory illnesses. Mak et al. reported that district-level NOx and PM2.5 emissions exhibit strong statistical correlations with mortality in dense urban regions (correlation coefficients ranging 0.371,0.783 for NOx and 0.509,0.754 for PM2.5). Furthermore, recent clinical evidence highlights that combined exposure to NO2 and PM2.5 can produce synergistic or antagonistic health effects, worsening cardiovascular and metabolic outcomes. These findings emphasize the urgency of accurate and explainable air quality forecasting models capable of capturing the spatiotemporal behavior of these pollutants. Air pollution continues to threaten urban health and quality of life, with PM2.5 and NO2 contributing to respiratory and cardiovascular illnesses. As industrialization and traffic density rise, cities struggle to monitor and predict pollution accurately across both space and time. Forecasting air quality helps governments implement timely policies, raise public awareness, and manage transportation and industrial activity. Traditional sensor-based forecasting systems are often limited in their spatial coverage and fail to offer fine-grained, city-wide predictions. In many developing countries, air quality sensors are sparse, and the infrastructure to collect real-time data remains underdeveloped. This makes it essential to design predictive systems that can handle incomplete, multi-source, and geographically dispersed data.
Proposed methodology
The model is built on three major blocks: adaptive spatial graph modeling, temporal encoding, and mesh attention fusion. The dataset includes multi-source inputs from OpenAQ, Microsoft Planetary Computer, and local weather-traffic channels. We define the AQ prediction problem over adaptive meshes and formulate its learning objective as follows. The proposed methodology introduces a novel framework, AQNet, for urban air quality forecasting by combining adaptive mesh construction with attention-based spatiotemporal learning. The approach begins by preprocessing heterogeneous datasets from OpenAQ, Microsoft Planetary Computer, and urban traffic/weather feeds to generate synchronized, geo-tagged sequences. Using dynamic mesh partitioning, urban regions are divided based on population density, industrial activity, and traffic flow, creating a graph G = (V, E) where nodes represent spatial cells and edges reflect contextual similarity.
Feature embeddings from each node incorporate meteorological variables, traffic indicators, and pollutant levels, passed through a Graph Attention Network (GAT) to learn spatial dependencies. Temporal modeling is achieved using a Transformer encoder that captures long-range patterns across time steps. To bridge the spatial and temporal domains, a mesh-based fusion layer aggregates latent features with adaptiveweights, refining predictions per node. The final AQI prediction ˆyt+1 is obtained through a fully connected regressor trained using Mean Squared Error (MSE) loss. A total of 20 equations formally define the end-to-end process, covering data normalization, graph formation, attention weights, transformer encoding, and final prediction mapping. This unified architecture is optimized to handlemissing data, varying spatial granularity, and cross-modal signal dependencies in urban environments.
Figure 1 illustrates the top-to-bottom dataflow in the AQNet framework.At the top, multimodal data sources,including Open AQ sensors, satellite imagery from Microsoft Planetary Computer, and urban traffic/weather APIs, are ingested and preprocessed. The city is then segmented into adaptive mesh cells using road density and emission sources. These regions are encoded as graph nodes, with edges reflecting spatial and functional proximity. A Graph Attention Network (GAT) module learns spatial dependencies, while a Transformer-based temporal encoder models time-evolving AQI patterns. The spatial and temporal embeddings are fused in a mesh-aware attention block. Finally, the fused features are passed through a regression head to predict AQI for each mesh cell at the next time step. The entire model is trained end-to-end using uncertainty-aware loss functions and smoothness constraints.
3.1 Problem formulation
We define the urban region as a set of nonuniform mesh cells M = {m1,m2, . . . ,mN }. For each mesh cell, we collect input features over time including pollutant concentration, weather, and traffic variables.Before defining the input matrices, it is essential to understand that AQNet operates over an adaptive mesh structure rather than a fixed spatial grid.
.





