Under Review — 2026

TopoFlow

A physics-informed Vision Transformer for multi-pollutant air quality forecasting over China, with wind-following patch reordering and topography-aware attention.

Paper Summary

We propose TopoFlow, a physics-informed Vision Transformer for multi-pollutant air quality forecasting over China. TopoFlow extends the ClimaX architecture with two key innovations: (1) wind-following patch reordering that aligns the sequential processing of spatial patches with atmospheric transport direction, and (2) topography-aware attention bias that encodes elevation barriers as a learnable inductive bias in the first transformer block.

The model operates on a 128 × 256 grid at 0.25° resolution covering mainland China and Taiwan, jointly forecasting six pollutants (PM2.5, PM10, SO2, NO2, CO, O3) at four temporal horizons (12h, 24h, 48h, 96h). Trained on the Chinese Air Quality Reanalysis (CAQRA) dataset from 2013–2016, validated on 2017, and tested on 2018, TopoFlow is compared against CAMS global forecasts, Microsoft Aurora, the ClimaX baseline, and independent OpenAQ station measurements.

The learnable parameter α controls the strength of topographic resistance in the elevation bias, physically encoding mountain blocking effects on pollutant transport — a phenomenon critical in regions like the Sichuan Basin, Tarim Basin, and North China Plain.

Read the Full Paper on arXiv

Architecture

TopoFlow extends the Vision Transformer with physics-informed components for atmospheric transport modeling.

TopoFlow Architecture

Figure 1. TopoFlow architecture: Wind-guided patch reordering with sector-based estimation, elevation-aware attention bias (α parameter), variable-specific patch embedding, and multi-pollutant prediction heads.

Key Innovations

Two physics-informed inductive biases for atmospheric transport modeling, integrated into a pre-trained ClimaX backbone.

Wind-Following Patch Reordering

Standard ViTs process patches in raster-scan order, which is agnostic to physical transport. TopoFlow reorders patches from upwind to downwind using 16 pre-computed wind sectors, creating an inductive bias aligned with atmospheric advection.

$\pi_k^{(s)} = \cos(\theta_s) \cdot \frac{c_k}{W-1} - \sin(\theta_s) \cdot \frac{r_k}{H-1}$

Patches are sorted by ascending projection $\pi_k^{(s)}$ along the dominant wind direction $\theta_s$. The wind direction is computed from magnitude-weighted averaging of $(u, v)$ wind components.

Alpha Elevation Bias

The first transformer block (Block 0) incorporates a learnable elevation bias that penalizes attention between patches when the destination patch is at higher elevation, encoding topographic barriers.

$\mathbf{B}_{\text{elev}}[i,j] = -\alpha \cdot \text{ReLU}\!\left(\frac{z_j - z_i}{z_0}\right)$

The learnable parameter $\alpha$ (initialized at 2.0) controls topographic resistance strength. $z_0 = 1000$ m is a normalization constant. ReLU ensures only uphill transport is penalized.

Relative 2D Position Bias

Following T5-style bucketing, relative spatial positions are discretized into 32 × 32 buckets with logarithmic binning. This reduces memory from $\mathcal{O}(N^2)$ to $\mathcal{O}(M^2)$ learnable parameters while encoding spatial proximity.

$\mathbf{A} = \text{softmax}\!\left(\frac{\mathbf{Q}\mathbf{K}^\top}{\sqrt{d_h}} + \mathbf{B}_{\text{pos}} + \mathbf{B}_{\text{elev}}\right)$

The final attention scores combine raw dot-product similarity, position bias, and elevation bias before softmax.

Multi-Pollutant Forecasting

Each of 15 input variables is embedded independently via variable-specific Conv2D projections, then aggregated through cross-attention. A 2-layer MLP decoder predicts 6 pollutants × 4 horizons simultaneously from a single forward pass.

$\hat{\mathbf{Y}} \in \mathbb{R}^{B \times 6 \times 128 \times 256}$

Training uses MSE loss with geographic masking restricted to mainland China and Taiwan (~45% of the grid).

52M
Parameters
6
Pollutants
4
Forecast Horizons
128
GPUs (MI250X)
0.25°
Resolution

Results & Case Studies

TopoFlow validated across extreme pollution events, seasonal patterns, and multi-model comparisons on the 2018 test set and independent OpenAQ station data.

TopoFlow vs CAQRA Animation
Animated Predictions

TopoFlow Predictions vs CAQRA Ground Truth

Side-by-side animation of TopoFlow 12h forecasts compared to CAQRA reanalysis across the 2018 test set. Select a pollutant to see spatial pattern evolution.

China PM2.5 Publication Map
Overview

PM2.5 Distribution over China

Annual mean PM2.5 concentrations across the study domain, highlighting the North China Plain, Sichuan Basin, and Yangtze River Delta as major pollution hotspots. The 128 × 256 grid at 0.25° resolution covers mainland China and Taiwan.

Seasonal Multi-Model Comparison
Multi-Model Comparison

Seasonal Validation: CAMS vs Aurora vs CAQRA vs TopoFlow vs OpenAQ

PM2.5 spatial distribution across Winter (Jan 15), Spring (Mar 1), Summer (Jul 12), and Autumn (Oct 19) 2019. Five rows compare CAMS global forecasts, Microsoft Aurora, CAQRA reanalysis (ground truth), TopoFlow predictions, and independent OpenAQ station measurements. TopoFlow captures fine-scale spatial structure absent from CAMS global forecasts and better resolves seasonal patterns than Aurora.

Beijing Haze Back-Trajectory
Case Study

Beijing Haze Episode (Nov 2018)

Back-trajectory analysis of a major trans-boundary pollution event. PM2.5 snapshots from Nov 8–16 show the plume building from Central China and converging on Beijing, reaching >200 μg/m3. TopoFlow captures the temporal evolution and spatial extent of the event.

Sichuan Basin Topographic Blocking
Topographic Blocking

Sichuan Basin Case Study

Topographic blocking effect in the Sichuan Basin (Jul 2018). The Tibetan Plateau acts as a barrier, trapping pollutants in the basin. TopoFlow's elevation bias captures this mechanism — the transect at 30°N shows PM2.5 accumulation matching observed patterns, unlike ClimaX which smooths across the barrier.

3D PM2.5 Surface with OpenAQ Stations
3D Visualization

PM2.5 Prediction Surface with OpenAQ Station Validation

Three-dimensional visualization of TopoFlow's predicted PM2.5 surface over the Sichuan Basin region, with independent OpenAQ station measurements overlaid as red markers. The surface captures the spatial gradient from elevated concentrations in the basin interior (trapped by topography) to lower values at the plateau edges.

Core Equations

The mathematical formulation of TopoFlow's physics-informed attention mechanism.

TopoFlow Attention (Block 0)

The first transformer block augments standard attention with position and elevation biases:

$$\mathbf{A} = \text{softmax}\!\left(\frac{\mathbf{QK}^\top}{\sqrt{d_h}} + \mathbf{B}_{\text{pos}} + \mathbf{B}_{\text{elev}}\right)$$

Elevation Barrier Bias

Penalizes uphill pollutant transport across topographic barriers:

$$\mathbf{B}_{\text{elev}}[i,j] = -\alpha \cdot \text{ReLU}\!\left(\frac{z_j - z_i}{1000}\right)$$

Wind Direction Estimation

Magnitude-weighted average of horizontal wind components:

$$\theta_{\text{wind}} = \arctan2\!\left(\frac{\sum u \cdot w}{\sum w},\; \frac{\sum v \cdot w}{\sum w}\right)$$

Geographic Loss Function

MSE computed only over valid China + Taiwan regions:

$$\mathcal{L} = \frac{1}{\|\mathbf{M}\|_1} \sum_{v,i,j} \mathbf{M}_{ij} \cdot (\hat{Y}_{v,i,j} - Y_{v,i,j})^2$$

Model Specifications

Complete technical specifications of the TopoFlow architecture and training configuration.

ComponentSpecification
Input Resolution128 × 256 (0.25° grid)
Patch Size2 × 2 pixels (8,192 patches)
Embedding Dimension768
Transformer Depth6 blocks (1 TopoFlow + 5 standard ViT)
Attention Heads8 (per-head dim = 96)
MLP Ratio4.0 (hidden dim = 3,072)
Total Parameters~52M
Input Variables15 (5 meteo + 6 pollutants + 2 coords + 2 static)
Output Variables6 pollutants (PM2.5, PM10, SO2, NO2, CO, O3)
Forecast Horizons12h, 24h, 48h, 96h
Wind Sectors16 pre-computed orderings (22.5° each)
Position Bias Buckets32 × 32 = 1,024 per head
α Initialization2.0 (learnable, converges to ~1.0)
Training ParameterValue
OptimizerAdamW (weight decay = 0.01)
Base Learning Rate1 × 10-4 (cosine annealing + 2k warmup)
Effective Batch Size512 (128 GPUs × 2 × 2 grad accum)
Training Data2013–2016 (CAQRA)
Validation2017
Test Set2018
Epochs60
Infrastructure128 AMD MI250X GPUs (16 LUMI-G nodes)
FrameworkPyTorch Lightning + DDP

Quick Start

Install TopoFlow and run inference in a few lines of code.

train.py
# Install pip install git+https://github.com/AmmarKheder/TopoFlow.git # Import from topoflow import TopoFlowModel, TopoFlowDataModule # Initialize model model = TopoFlowModel( variables=["u", "v", "temp", "rh", "psfc", "pm25", "pm10", "so2", "no2", "co", "o3", "lat2d", "lon2d", "elevation", "population"], img_size=(128, 256), embed_dim=768, depth=6, num_heads=8, parallel_patch_embed=True, # wind-following use_physics_mask=True, # elevation bias ) # Run inference predictions = model(x, lead_times=[12, 24, 48, 96]) # Output: [B, 6, 128, 256] per horizon
submit_train.sh
#!/bin/bash #SBATCH --nodes=16 --gpus-per-node=8 --time=48:00:00 # Distributed training on LUMI-G (128 AMD MI250X GPUs) srun python scripts/train.py \ --config configs/default.yaml \ --data.train_years 2013 2014 2015 2016 \ --data.val_years 2017 \ --model.embed_dim 768 \ --model.depth 6

Citation

citation.bib
@article{kheder2025topoflow, title = {TopoFlow: Physics-Informed Deep Learning for Multi-Pollutant Air Quality Forecasting}, author = {Kheder, Ammar}, year = {2026}, note = {Under review} }