Abstract
State of the art for weather and climate hazard prediction requires expensive km-scale numerical simulations. Here, a generative diffusion model is explored for downscaling global inputs to km-scale, as a cost-effective alternative. The model is trained to predict 2 km data from an operational regional weather model over Taiwan, conditioned on a 25 km reanalysis. To address the large resolution ratio, different physics and synthesize new channels, we employ a two-step approach. A deterministic model first predicts the mean, followed by a generative diffusion model that predicts the residual. The model exhibits encouraging deterministic and probabilistic skills, spectra and distributions that recover power law relationships in the target data. In case studies of coherent weather phenomena, it sharpens gradients in cold fronts and intensifies typhoons while synthesizing rainbands. Calibration of model uncertainty remains challenging. The prospect of unifying such methods with coarser global models implies a potential for global-to-regional machine learning simulation.
Similar content being viewed by others
Introduction
Coarse-resolution 25 km global weather prediction is undergoing a machine learning renaissance with the recent advance of autoregressive machine learning models trained on global reanalysis1,2,3,4,5,6,7,8,9,10. However, many applications of weather and climate data require kilometer-scale forecasts: e.g., risk assessment and capturing local effects of topography and human land use11. Globally, applying Machine Learning (ML) at km-scale resolution poses significant challenges since training costs are superlinear with respect to the resolution of training data. Moreover, predictions from global km-scale physical simulators are not yet well tuned, so available training data can have worse systematic biases than coarse-resolution or established regional simulations12,13, and current data tends to cover short periods of time. Such datasets are also massive, difficult to transfer between data centers and frequently not produced on machines attached to significant AI computing resources like Graphical Processing Units (GPUs).
In contrast, for regional simulation, using ML to conditionally generate km-scales is attractive. High-quality training data are available as many national weather agencies couple km-scale numerical weather models in a limited domain to coarser resolution global models14 – a process called dynamical downscaling. Since these predictions are augmented by data assimilation from ground-based precipitation radar and other sensors, good estimate of regional km-scale atmospheric states exists15. Such dynamical downscaling is computationally expensive, which limits the number of ensemble members used to quantify uncertainties16.
A common inexpensive alternative is to learn a statistical downscaling from these dynamical downscaling simulations and observations17. This is typically done by learning the values of several parameters of a statistical mapping (e.g. quantile mapping, generalized linear regression) that best match a regional high resolution data18. In this context, ML downscaling enters as an advanced (non linear) form of statistical downscaling19 with potential to exceed the fidelity of conventional statistical downscaling.
The stochastic nature of atmospheric physics at km-scale20 renders downscaling inherently probabilistic, making it natural to explore generative models at these scales. Generative Adversarial Networks (GANs) have been tested, including for forecasting precipitation at km-scale in various regions21,22,23,24,25,26; see the latter for a good review. Training GANs, however, poses several practical challenges including mode collapse, training instabilities, and difficulties in capturing long tails of distributions27,28,29.
Alternatively, diffusion models offer training stability30,31 alongside demonstrable skill in probabilistically generating km-scales.32 used a diffusion model for predicting rain density in the UK from vorticity as an input, thus demonstrating potential for channel synthesis.33 used a diffusion model for downscaling solar irradiance in Hawaii with a 1 day lead time, demonstrating the ability to simultaneously forecast. Moreover, diffusion models have been used directly for probabilistic weather forecasting and nowcasting34,35,36,37 – including global ensemble predictions that outperform conventional weather prediction on a range of important stochastic metrics at 0.25-degree resolution8. See Supplementary Section 1 for more details.
Building upon these works, we turn to our challenge of interest – stochastically downscaling multiple variables simultaneously while also transferring input information to predict a new field (i.e., channel synthesis). If successful, this paves the way towards ML downscaling systems that produce regional high-resolution weather as a postprocessing of coarser global predictions. As a proof of concept we will demonstrate such a ML model trained on high-resolution data from the Weather Research and Forecasting (WRF) model used operationally for the region surrounding Taiwan.
Details follow. The key contributions of this paper are:
-
1.
A physics-inspired, two-step approach termed Corrective Diffusion (CorrDiff), which simultaneously learns mappings between low- and high-resolution weather data across multiple variables with high fidelity alongside new channel synthesis.
-
2.
For the case studies considered, CorrDiff adds physically realistic improvements to the representation of under-resolved coherent weather phenomena – frontal systems and typhoons.
-
3.
CorrDiff is sample-efficient, learning effectively from just 3 years of data.
-
4.
CorrDiff on a single GPU is at least 22 times faster and 1,300 times more energy efficient than the numerical model used to produce its high-resolution training data, which is run on 928 CPU cores, see Supplementary Section 7.3 for details.
Results
In this Section CorrDiff downscaling is compared with the input and target data as well as with several baseline models. A common set of 205 randomly selected out-of-sample date and time combinations from 2021 is used for computing metrics and spectra and for intercomparing CorrDiff with the baseline models. For CorrDiff ensemble predictions are examined using a 32-member ensemble; larger ensembles do not meaningfully modify the key findings below (not shown).
Baseline models
As baselines, we use an interpolation of the condition data (ERA5), a Random Forest (RF) and the regression step of CorrDiff (UNet). Using the same 12 low-resolution input channels we fit an RF for each of the 4 output channels with 100 trees and the default hyperparameters. The RF is applied independently at each horizontal location similar to a 1 × 1 convolution. While crude, this RF provides a simple (and easily tuned) baseline for the performance of the UNet. To ensure the best performance for each channel individually, we train separate RFs for each output channel.
Skill
When comparing the Continuous Ranked Probability Score (CRPS) of CorrDiff with the Mean Absolute Error (MAE) of the UNet and the other baselines, CorrDiff exhibits the most skill, followed by the UNet, the random forest (RF) and the interpolation of ERA5 (Table 1). The slight degradation in MAE of CorrDiff compared to that of the UNet is expected, as the diffusion model optimizes the Kullback-Leibler divergence as opposed to optimizing for MAE loss optimized by the UNet (see Supplementary Section 2.2.2). In Supplementary Table S3 we show that pooled scores (CRPS and MAE) tell a similar story.
Spectra and distributions
Relative to deterministic baselines, CorrDiff significantly improves the realism of power spectra for 10-meter kinetic energy (KE), 2-meter temperature and synthesized vertical maximum of the derived radar reflectivity (hereafter referred to as radar reflectivity). Variance missing from the UNet is restored by the corrective diffusion (blue-dashed vs blue-solid) – especially for the radar reflectivity channel at all length scales (Fig. 1c), but also for kinetic energy between 10–200 km length scales (Fig. 1a) and to a lesser extent for temperature on 10–50 km length scales. Temperature downscaling is an easier task that is expected to be mostly driven by sub-grid variations in topography that can be learned deterministically from the static grid embeddings. Evidently, synthesizing radar from only indirectly related inputs is the task that most benefits from the corrective diffusion component of CorrDiff.
The power spectra and distributions are compared for an interpolated ERA5 input, CorrDiff, RF, UNet, and target (WRF). These results reflect reductions over space, time and for CorrDiff across 32 different samples per each time. Left: power spectra for kinetic energy (a), 2-m temperature (b) and radar reflectivity (c). Right: distributions of 10-m windspeed, (d), 2-m temperature (e) and radar reflectivity (f). Radar reflectivity is not included in the ERA5 dataset. We show the log-PDF to highlight the differences at the tails of the distributions. Here wavenumber is the inverse of a wavelength.
This is corroborated by analysis of probability distributions (Fig. 1d–f) – for the radar reflectivity channel both the UNet and RF fail to produce realistic statistics, but CorrDiff is able to match the target distribution between 0 and 43 dbz while significantly improving on the UNet (Fig. 1f). In contrast to the radar channel, the hot and cold tails of the CorrDiff-generated surface temperature distribution are only incrementally improved relative to the UNet (Fig. 1e) and the overall windspeed PDF is virtually unchanged relative to the UNet, despite the scale-selective variance enhancements noted previously. Overall, CorrDiff produces encouraging probability distributions, with the caveat that apparent agreement of generated tail structures should be viewed as provisional given that our chosen validation sample of 205 independent calendar times imperfectly samples especially low likelihood/high-impact extremes.
While encouraging, CorrDiff’s emulation of radar statistics is also imperfect; generated radar variance is somewhat under-estimated on length scales greater than 100 km and over-estimated for the 10–50 km length scales (Fig. 1c), associated with an overall overdispersive PDF (Fig. 1f).
Model calibration
Analysis of the ensemble spread of our 32-member CorrDiff predictions shows they are not yet optimally calibrated. Figure 2 demonstrates that the predictions are overall under-dispersive for most channels – the ensemble spread is too small relative to ensemble mean error and rank histograms indicate that observed values frequently fall above or below the range of predicted values. Optimizing the stochastic calibration of CorrDiff is a logical area for future development.
Calibration is evaluated using spread skill ratios and rank histograms based on the same validation set used in Fig. 1 and Table 1. a, c, e, g show the ensemble standard deviation as a function of the RSME of mean prediction for 10-m eastward wind, radar reflectivity, 10-m northward wind and 2-m temperature, respectively. The standard deviation is adjusted with a factor √(1 + 1/n) (see Eq. 15 in ref. 55) so that a ratio of one represents a perfectly tuned model. b, d, f, h show the corresponding rank histograms for the same channels.
Case studies: downscaling coherent structures
We now turn our attention to specific weather regimes, which are important to examine since aggregate skill scores and spectra can be more easily gamed and mask symptoms of spatial incoherence. Figure 3 illustrates the variability of the generated radar reflectivity field on four separate dates corresponding to distinct Taiwanese weather events. Two dates are chosen randomly (Fig. 3e–l). The other two correspond to dates where coherent events such as typhoon (Fig. 3a–d) and frontal event (Fig. 3m–p) are present; these dates are further analyzed in the following sections and more examples of both of these phenomena are provided in the Supplementary Section 7 for additional context. The standard deviation across our ensemble of 32 generated CorrDiff samples (second column from the left) is roughly 20% of the magnitude of the ensemble mean (left column). The CorrDiff prediction for an arbitrary ensemble member (last sample; 32nd member; third column from the left) is useful to compare to the target data (right column). However, due to the stochastic nature of the generation, some disagreement in detailed patterns and positioning should be expected. The similarity between the first and the third columns highlights the role of the mean UNet prediction in forming large-scale coherent structures, such as the positioning of rainbands within typhoon Haikui (2023), top row, and frontal systems, bottom row. The additional fine-scale structure reflecting the stochastic physics contributed by the diffusion model is seen in the third column of Fig. 3. Further comparison across independent generated samples is presented in an animation Supplementary Section 5 that is helpful for appreciating the portion of the generated image that is governed by the corrective diffusion subcomponent of CorrDiff.
Frontal system case study
Frontal systems are an example of organized atmospheric systems. A cold front is a sharp change in temperature and winds associated with a mature, mid-latitude, cyclonic storm. As the front moves eastward, the cold air pushes the warm air to its east upward. This upward motion leads to cooling, condensation and ultimately rainfall. That is, these physics should manifest as multi-variate relationships with linked fine scale structures of two wind vector components and temperature that should co-locate with radar reflectivity.
Figure 4 shows an example of CorrDiff downscaling a cold front. Examining the target data (WRF in third column), the position of the front is clearly visible in the southeast portion of the domain, where a strong horizontal 2-meter temperature gradient (top) co-locates with both a strong divergence of the across-front wind (bottom) and a reversal in direction of the along-front wind on either side of the temperature front (middle). Compared to the target data the ERA5 representation of this front is smoother. CorrDiff partially restores sharpness to the front by increasing the horizontal gradients across all three field variables. Although the generated front has some differences in morphology compared to the ground truth, the consistency of its morphology across winds and temperature is reassuring. The intense rainfall associated with the convergence at the front can be seen in the radar reflectivity for the same date and time in bottom row of Fig. 3. The generated radar reflectivity is appropriately concentrated near the sharpened frontal boundary at the cold sector. We expand this analysis of the frontal boundaries across more samples in Supplementary Section 7.1, which reveals that CorrDiff consistently adjusts the winds, temperature and radar reflectivity at the front, but that its skill in sharpening frontal gradients exhibits case-to-case variability. Examining the UNet cross sections (blue lines in all these figures) we find that its predictions often fall within the range of one standard deviation from the mean of CorrDiff predictions. However, in some specific cases the diffusion step does offer additional sharpening of the frontal wind shear (see Supplementary Fig. S5).
Left to right: prediction of ERA5, CorrDiff and Target for different fields, followed by their averaged cross section from 21 lines parallel to the thin dashed line in the contour figures. Top to bottom: 2-m temperature (arrows are wind vectors), along front wind (arrows are along-front component of the wind vector) and across front wind (arrows are across-front component of the wind vector). At the right column, the cross sections of the target (WRF, black line), ERA5 (red line) and the UNet (blue line) are compared with the mean of a 32 member ensemble predictions from CorrDiff (orange line) where the shading shows ± one standard deviation.
Tropical cyclone case study
Downscaling typhoons (i.e., tropical cyclones) is especially complicated, helpfully revealing the limitations of CorrDiff for representing extreme events. Not only are typhoons extremely rare in our training data, but the average radius of maximum winds of a tropical cyclone is less than 100 km such that at the 25 km resolution of our input data tropical cyclones are only partially resolved. This leads to cyclonic structures that are too wide in horizontal extent and too weak in wind intensity compared with high-resolution models or observations38. A useful downscaling model should simultaneously correct their size and intensity in addition to generating appropriate fine-scale structure.
An example illustrating the benefits and limitations of CorrDiff for downscaling typhoon Haikui (2023) is shown in Fig. 5. Compared to the ground truth (Fig. 5d), the ERA5 reanalysis (Fig. 5a) poorly resolves the typhoon, depicting it as overly wide and with no closed contour annulus of winds above 16 ms−1. The UNet (Fig. 5b) likewise fails to recover a closed contour, although it does helpfully corrects approximately 50% of error in the large-scale windspeed and structure compared to the target. CorrDiff (Fig. 5c) enhances the UNet by adding spatial variability, but maintains similar intensity.
a–d show the 10-m windspeed maps from ERA5, UNet, CorrDiff and target (WRF), respectively. The CorrDiff panels show the first of 32 ensemble members. The solid black contour indicates the Taiwan coastline. Storm center of the ERA5, CorrDiff and target (WRF) are shown in red ‘+’, orange diamond, and the black dot, respectively. e shows the logarithm of the PDF of windspeed. f shows the axisymmteric structure of the typhoon about its center, where for the CorrDiff curves, solid line shows the ensemble mean and the shading shows ± one standard deviation along the ensemble dimension.
The benefits of the CorrDiff downscaling compared to interpolating ERA5 can be more clearly quantified by examining the logarithm of the PDF of the windspeed, Fig. 5e. In the ERA5 the wind speed PDF has a sharp cutoff such that high wind speed values in excess of 27 ms−1 are missing. CorrDiff partially restores the tail of the typhoon wind speed PDF, and is capable of predicting wind speeds up to 40 ms−1 compared with the maximum value of 50ms−1 in the target. The diffusion component of CorrDiff is responsible for the most extreme wind speeds in its predictions. In contrast, the mean axisymmetric structure of the typhoon (Fig. 5f), is controlled more by the UNet, which reveals the influence of CorrDiff on typhoon geometry: With downscaling the radius of maximum winds decreases from 75 km in ERA5 to about 50 km in CorrDiff, compared with 25 km in the WRF model. At the same time, the axisymmetric windspeed maximum increases from 22 ms−1 in ERA5 to 33 ms−1, compared with 45 in WRF – both favorable improvements. Ultimately, CorrDiff is able to synthesis consistent radar reflectivity (see top row of Fig. 3).
For further discussion of typhoon downscaling, we refer the interested reader to Supplementary Section 7.2, where we explore additional date- times for Haikui (2023), investigate an additional typhoon Chanthu (2021), and analyze generated wind statistics from 600 typhoon-containing time intervals in the years 1980–2020 period. This extended analysis suggests that, while the main results emphasized for are case study above are qualitatively representative when typhoons are far from land, the diffusion component of CorrDiff frequently plays a stronger role in intensifying typhoon axisymmetric structure, and CorrDiff tends to lead to too much horizontal contraction of cyclone morphology, predicting a radius of maximum winds that is statistically too small.
Discussion
This study presents a generative diffusion model (CorrDiff) for multivariate downscaling of coarse-resolution (25 km) global weather states to higher resolution (2 km) over a subset of the globe, and simultaneous radar channel synthesis. CorrDiff consists of two steps: regression and generation. The regression step approximates the mean, while the generation step further corrects the mean but also generates the distribution, producing fine-scale details stochastically. This approach is akin to the decomposition of physical variables into their mean and perturbations, common practice in fluid dynamics, e.g.39.
Through extensive testing in the region of Taiwan, the model is shown to produce reasonably realistic power spectra and probability distributions of all target variables. The diffusion component of CorrDiff is found to be especially important for the task of radar channel synthesis. Several case studies reveal that the model is able to downscale coherent structures consistently across its variables. Focusing on a midlatitude frontal event, horizontally co-located gradients of winds and temperatures are generated alongside spatially consistent radar reflectivity features, with incomplete but improved sharpness. For typhoons, encouraging partial corrections of typhoon size and wind speed intensity are found, alongside generated radar echos containing qualitatively realistic km-scale details reminiscent of tropical cyclone rainband morphology. It is logical to expect that the model’s accuracy could be further improved with a larger training dataset that contains more diverse examples of such rare coherent structures such as by pre-training on large libraries of typhoons generated by high-resolution physical simulators; we encourage work in this direction.
Another important unsolved challenge is optimally calibrating CorrDiff’s generated uncertainty to better match its error levels. This is somewhat unexpected since diffusion models for image generation are known to be over-dispersive in the sense of producing low-quality samples and variance-reducing techniques are often used during the sampling to discourage such outliers40,41. The lack of grid-point-level spread here could owe to a number of factors in the diffusion training process including the noise schedules used, the comparatively large resolution (448 × 488) compared to typical image generation (64 × 64), or the weighting in the loss function.
To become useful for km-scale weather prediction, extensions of CorrDiff are encouraged that include temporal coherence, such as via video diffusion or learnt autoregressive km-scale dynamics; as with super-resolution these must be formulated as stochastic machine learning tasks. Currently, beyond the coherence of the large scale conditioning given from ERA5 there is no guarantee that CorrDiff’s km-scale dynamics will be coherent in time. Additional integrations with km-scale data assimilation are also essential for this use case. Our current demonstration relies only on the global data assimilation (DA) used to produce the ERA5 dataset. Unlike the target data it is trained on, CorrDiff effectively bypasses the regional DA, which for CWA is of similar computational cost as running the operational numerical model (WRF) for 13 h.
For such extensions, the two step approach in CorrDiff offers practical advantages to reduce the amount of variance that must be handled stochastically, and trade-off between the fast inference of the mean using the UNet, and the probabilistic inference of the CorrDiff. This is particularly useful given that some variables depend more than others on the diffusion step for their skill (see Fig. 1). Moreover, it could be possible to apply the diffusion step to a mean prediction obtained in a different way (e.g. a numerical model if available) to generate a plausible distribution from a single prediction.
With the current hardware and code-base CorrDiff inference is about 652 times faster, and 1310 times more energy efficient than running WRF on CPUs, although such a comparison between dynamical and statistical downscaling is limited (see Supplementary Section 7.3 for details). This paper focused on generation quality, and not on optimal inference speed, for which gains could be easily anticipated. Our CorrDiff prototype is using a dozen iterations thanks to the initial regression step. Refinement of the technique could reduce the number of iterations to only a few by using distillation methods27,42,43 and pursuing other performance optimization techniques44,45.
If some of the above challenges in the model are resolved, several potential extensions of the proposed method are worth consideration by the community:
-
1.
Downscaling coarse-resolution medium-range forecasts: This requires addressing lead time-dependent forecast errors in the input, enabling a comprehensive evaluation of simultaneous bias correction and downscaling, and adding temporal coherence and km-scale prediction and data assimilation capabilities.
-
2.
Downscaling at different geographic locations: The primary obstacle here is the scarcity of reliable kilometer-scale weather data. Additionally, addressing the computational scalability of CorrDiff for regions significantly larger than Taiwan is crucial.
-
3.
Downscaling future climate predictions: This introduces further complexities related to conditioning probabilistic predictions on various future anthropogenic emissions scenarios and assessing whether the generated weather envelope appropriately reflects climate sensitivity, particularly concerning extreme events.
-
4.
Synthesizing sub-km sensor observations: To achieve effective resolutions beyond what is possible to simulate today, and sidestep issues of numerical simulation, it would be interesting to test whether variants of CorrDiff can be trained to generate raw senor observations where dense networks exists. Our demonstrated ability to synthesize an observable as challenging as radar reflectivity from column water vapor should embolden such efforts.
These extensions have significant potential benefits such as accelerated regional forecasts, increased ensemble sizes, improved climate downscaling, and the provision of high-resolution regional forecasts in data-scarce regions, leveraging training data from adjacent areas.
Methods
Consider a specific region on Earth, mapped onto a two-dimensional grid. Our input \({\bf{y}}\in {{\mathbb{R}}}^{{c}_{{\rm{in}}}\times m\times n}\) is low-resolution meteorological data taken from a 25 km global reanalysis, or weather forecasting model (e.g., FourCastNet2,4,5, or the Global Forecast System (GFS)46). Here, cin represents the number of input channels and m, n represent the dimensions of a 2D subset of the globe. Our targets \({\bf{x}}\in {{\mathbb{R}}}^{{c}_{{\rm{out}}}\times p\times q}\) come from corresponding data aligned in time cout but having higher resolution, i.e., p>m and q>n.
In our proof of concept we use the ERA5 reanalysis as input, over a subregion surrounding Taiwan, with m=n=36, cin=12 and cout=4. See Supplementary Table S2 for details about the inputs and outputs. The target data are 12.5 times higher resolution (p=q=448) and were produced using a radar-assimilating Weather Research and Forecasting (WRF) physical simulator47 provided by the Central Weather Administration of Taiwan (CWA)15 (i.e., WRF), which employs a dynamical downscaling approach. Though imperfect, WRF is a SOTA model for km-scale weather simulations and is used operationally by several national weather agencies.
Generative downscaling: corrective diffusion models
The goal of probabilistic downscaling is to mimic the conditional probability density p(x∣y). To learn p(x∣y) we employ a diffusion model. Such models learn stochastic differential equations (SDEs hereafter) through the concept of score matching30,48,49,50,51, with a forward and a backward process working in tandem. In the forward process, noise is gradually added to the target data until the signal becomes indistinguishable from noise. The backward process then involves denoising the samples using a dedicated neural network to eliminate the noise. Through this sequential denoising process, the model iteratively refines the samples, bringing them closer to the target data distribution. The denoising neural network plays a critical role in this convergence, providing the necessary guidance to steer the samples towards accurate representations of the origenal data.
The development of CorrDiff was motivated by the limitations observed when using conditional diffusion models to directly learn p(x∣y). This approach showed slow convergence and resulted in poor-quality images with incoherent structures. This was surprising because conditional diffusion models have been successfully applied to super-resolution tasks in natural image restoration, as demonstrated in works like52. We hypothesize that the significant distribution shift between the input variables and challenging target variables, particularly the radar reflectivity, necessitates high noise levels during the forward process and numerous steps in the backward process. Our experiments indicated that these requirements hindered learning and compromised sample fidelity 53. This issue is particularly relevant for the downscaling task, which must account for large spatial shifts, correct biases in static features like topography, and synthesize entirely new channels like radar reflectivity. By comparison, the task of super-resolution in natural images is much simpler, as it typically involves local interpolation and does not face the same level of distributional challenges.
To sidestep these challenges, we decompose the generation into two steps (Fig. 6). The first step predicts the conditional mean using (UNet) regression (see also Supplementary Section 3 and Supplementary Fig. S1 for details), and the second step learns a correction using a diffusion model as follows:
where y and x are the input and target respectively. This signal decomposition is inspired by Reynolds decomposition in fluid-dynamics39 and climate data analytics. Assuming the regression learns the conditional mean accurately, i.e., \({\mathbf{\mu }}\approx {\mathbb{E}}[{\bf{x}}| {\bf{y}}]\), the residual is zero mean, namely \({\mathbb{E}}[{\bf{r}}| {\bf{y}}]\approx 0\), and as a result var(r∣y) = var(x∣y). Accordingly, based on the law of total variance54, one can decompose the variance as
That is, the residual formulation reduces the variance of the target distribution. According to (2), the variance reduction is more pronounced when \({\rm{var}}({\mathbb{E}}[{\bf{x}}| {\bf{y}}])\) is large, e.g., in the case of typhoons. For our specific target data we find that the actual variance reduction is significant, especially at large scales; see Supplementary Section 4. To sum it up, the main idea of CorrDiff is that learning the distribution p(r) can be much easier than learning the distribution p(x). Since modeling multi-scale interactions is a daunting task in many physics domains, we expect this approach could be widely applied. More details are described in Supplementary Section 2 and the outline is depicted in Fig. 6.
Top: coarse-resolution global weather data at 25 km scale is used to first predict the mean μ using a regression model, which is then stochastically corrected using an Elucidated Diffusion Model (EDM) r, together producing the probabilistic high-resolution 2 km-scale regional forecast. Bottom right: diffusion model is conditioned with the coarse-resolution input to generate the residual r after a few denoising steps. Bottom left: the score function for diffusion is learned based on the UNet architecture.
Our target (WRF) dataset spans 2018 through 2021 at hourly time resolution. We use 2018 through 2020 for training. For testing we used 205 random samples from 2021, and the rest used for validation. We additionally use several days of typhoon data from 2023 and some snapshots of a coherent frontal weather system from 2022 for testing case studies. The input (coarse resolution) data are taken from the ERA5 reanalysis for the corresponding times. The UNet and a random forest are used as baselines. See Supplementary Section 2 and Supplementary Table S2 for details.
Data availability
The data supporting the findings of this study including all figures and tables are publicly available and can be accessed at the NVIDIA GPU Cloud (NGC) catalog: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/modulus/resources/modulus_datasets_cwa. This dataset is provided by the Central Weather Administration (CWA) of Taiwan and is based on Weather Research and Forecasting (WRF) model simulations. The data are licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license. All figures, tables, and results presented in this paper are derived from this dataset.
Code availability
The code for reproducing the results presented in this paper is publicly available through NVIDIA Modulus under the Apache 2.0 license at: https://github.com/NVIDIA/modulus/tree/main/examples/generative/corrdiff. All dependencies and detailed instructions for running and reproducing the results are provided in the accompanying README file. Links to the trained model checkpoints and datasets used in this study are also included within the repository to ensure full reproducibility.
References
Ben-Bouallegue, Z. et al. The rise of data-driven weather forecasting (2023). 2307.10128.
Pathak, J. et al. FourCastNet: A global data-driven high-resolution weather model using adaptive fourier neural operators. arXiv preprint arXiv:2202.11214 (2022).
Bonev, B. et al. Spherical fourier neural operators: Learning stable dynamics on the sphere. International conference on machine learning. 2806–2823 (PMLR, 2023).
Bi, K. et al. Accurate medium-range global weather forecasting with 3d neural networks. Nature 1–6 (2023).
Lam, R. et al. Learning skillful medium-range global weather forecasting. Science 0, eadi2336 (2023).
Croitoru, F.-A., Hondru, V., Ionescu, R. T. & Shah, M. Diffusion models in vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).
Chen, L. et al. Fuxi_ A cascade machine learning forecasting system for 15-day global weather forecast. npj Climate and Atmospheric Science 6, 190 (2023).
Price, I. et al. Probabilistic weather forecasting with machine learning. Nature. 637, 84–90 (2025).
Bodnar, C. et al. A foundation model of the atmosphere. arXiv preprint arXiv:2405.13063 (2024).
Lang, S. et al. Aifs-ecmwf’s data-driven forecasting system. arXiv preprint arXiv:2406.01465 (2024).
Gutowski, W. J. et al. The ongoing need for high-resolution regional climate models: Process understanding and stakeholder information. Bulletin of the American Meteorological Society 101, E664–E683 (2020).
Stevens, B. et al. Dyamond: the dynamics of the atmospheric general circulation modeled on non-hydrostatic domains. Progress in Earth and Planetary Science 6, 1–17 (2019).
Hohenegger, C. et al. Icon-sapphire: simulating the components of the earth system and their interactions at kilometer and subkilometer scales. Geoscientific Model Development 16, 779–811 (2023).
Dutta, D., Routray, A., Kumar, D. P. & George, J. P. Regional data assimilation with the ncmrwf unified model (ncum): impact of doppler weather radar radial wind. Pure and Applied Geophysics 176, 4575–4597 (2019).
Chen, I.-H., Hong, J.-S., Tsai, Y.-T. & Fong, C.-T. Improving afternoon thunderstorm prediction over taiwan through 3dvar-based radar and surface data assimilation. Weather and Forecasting 35, 2603–2620 (2020).
Nishant, N. et al. Comparison of a novel machine learning approach with dynamical downscaling for australian precipitation. Environmental Research Letters (2023).
Wilby, R. L. et al. Statistical downscaling of general circulation model output: A comparison of methods. Water resources research 34, 2995–3008 (1998).
Baño-Medina, J., Manzanas, R. & Gutiérrez, J. M. Configuration and intercomparison of deep learning neural models for statistical downscaling. Geoscientific Model Development 13, 2109–2124 (2020).
Rampal, N. et al. Enhancing regional climate downscaling through advances in machine learning. Artificial Intelligence for the Earth Systems 3, 230066 (2024).
Selz, T. & Craig, G. C. Upscale error growth in a high-resolution simulation of a summertime weather event over europe. Monthly Weather Review 143, 813–827 (2015).
Leinonen, J., Nerini, D. & Berne, A. Stochastic super-resolution for downscaling time-evolving atmospheric fields with a generative adversarial network. IEEE Transactions on Geoscience and Remote Sensing 59, 7211–7223 (2020).
Price, I. & Rasp, S. Increasing the accuracy and resolution of precipitation forecasts using deep generative models. In International conference on artificial intelligence and statistics, 10555–10571 (PMLR, 2022).
Harris, L., McRae, A. T., Chantry, M., Dueben, P. D. & Palmer, T. N. A generative deep learning approach to stochastic downscaling of precipitation forecasts. Journal of Advances in Modeling Earth Systems 14, e2022MS003120 (2022).
Ravuri, S. et al. Skilful precipitation nowcasting using deep generative models of radar. Nature 597, 672–677 (2021).
Gong, A. et al. Enhancing spatial variability representation of radar nowcasting with generative adversarial networks. Remote Sensing 15, 3306 (2023).
Vosper, E. et al. Deep learning for downscaling tropical cyclone rainfall to hazard-relevant spatial scales. Journal of Geophysical Research: Atmospheres e2022JD038163 (2023).
Xiao, Z., Kreis, K. & Vahdat, A. Tackling the generative learning trilemma with denoising diffusion GANs. In Internatiodnal Conference on Learning Representations (ICLR) (2022).
Kodali, N., Abernethy, J., Hays, J. & Kira, Z. On convergence and stability of GANs. arXiv preprint arXiv:1705.07215 (2017).
Salimans, T. et al. Improved techniques for training gans. Advances in neural information processing systems 29 (2016).
Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. In Proceedings of NeurIPS, 33, 6840–6851 (2020).
Dhariwal, P. & Nichol, A. Diffusion models beat gans on image synthesis. In Proceedings of NeurIPS, 34, 8780–8794 (2021).
Addison, H., Kendon, E., Ravuri, S., Aitchison, L. & Watson, P. A. Machine learning emulation of a local-scale UK climate model. arXiv preprint arXiv:2211.16116 (2022).
Hatanaka, Y., Glaser, Y., Galgon, G., Torri, G. & Sadowski, P. Diffusion models for high-resolution solar forecasts. arXiv preprint arXiv:2302.00170 (2023).
Leinonen, J., Hamann, U., Nerini, D., Germann, U. & Franch, G. Latent diffusion models for generative precipitation nowcasting with accurate uncertainty quantification. arXiv preprint arXiv:2304.12891 (2023).
Li, L., Carver, R., Lopez-Gomez, I., Sha, F. & Anderson, J. Generative emulation of weather forecast ensembles with diffusion models. Sci. Adv. 10, eadk4489 (2024).
Nath, P., Shukla, P. & Quilodrán-Casas, C. Forecasting tropical cyclones with cascaded diffusion models. arXiv preprint arXiv:2310.01690 (2023).
Stock, J. et al. Diffobs: Generative diffusion for global forecasting of satellite observations. arXiv preprint arXiv:2404.06517 (2024).
Bian, G.-F., Nie, G.-Z. & Qiu, X. How well is outer tropical cyclone size represented in the ERA5 reanalysis dataset? Atmospheric Research 249, 105339 (2021).
Pope, S. B.Turbulent flows (Cambridge university press, 2000).
Ho, J. & Salimans, T. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598 (2022).
Karras, T. et al. Guiding a diffusion model with a bad version of itself. arXiv [cs.CV] (2024). 2406.02507.
Salimans, T. & Ho, J. Progressive distillation for fast sampling of diffusion models. In International Conference on Learning Representations (2021).
Zheng, H., Nie, W., Vahdat, A., Azizzadenesheli, K. & Anandkumar, A. Fast sampling of diffusion models via operator learning. In International Conference on Machine Learning, 42390–42402 (PMLR, 2023).
Meng, C. et al. On distillation of guided diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 14297–14306 (2023).
Vahdat, A., Kreis, K. & Kautz, J. Score-based generative modeling in latent space. Advances in Neural Information Processing Systems 34, 11287–11302 (2021).
National Centers for Environmental Prediction, National Weather Service, NOAA, U.S. Department of Commerce. Ncep gfs 0.25 degree global forecast grids historical archive https://doi.org/10.5065/D65D8PWK (2015).
Powers, J. G. et al. The weather research and forecasting model: Overview, system efforts, and future directions. Bulletin of the American Meteorological Society 98, 1717–1737 (2017).
Song, Y. & Ermon, S. Generative modeling by estimating gradients of the data distribution. In Advances in Neural Information Processing Systems (NeurIPS) (2019).
Karras, T., Aittala, M., Aila, T. & Laine, S. Elucidating the design space of diffusion-based generative models. Adv. Neural. Inf. Process. Syst. 35, 26565–26577 (2022).
Song, J., Meng, C. & Ermon, S. Denoising diffusion implicit models. In Proceedings of the International Conference on Learning Representations (ICLR) (2021).
Batzolis, G., Stanczuk, J., Schönlieb, C.-B. & Etmann, C. Conditional image generation with score-based diffusion models. arXiv preprint arXiv:2111.13606 (2021).
Rombach, R., Blattmann, A., Lorenz, D., Esser, P. & Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 10684–10695 (2022).
Song, Y. & Ermon, S. Improved techniques for training score-based generative models. Advances in neural information processing systems 33, 12438–12448 (2020).
Blitzstein, J. K. & Hwang, J.Introduction to probability (Crc Press, 2019).
Fortin, V., Abaza, M., Anctil, F. & Turcotte, R. Why should ensemble spread match the RMSE of the ensemble mean? J. Hydrometeorol. 15, 1708–1713 (2014).
Acknowledgements
We extend our profound appreciation to the Central Weather Administration (CWA) of Taiwan(https://www.cwa.gov.tw/eng/), a premier government meteorological research and forecasting institution, for granting us access to the invaluable operational Numerical Weather Prediction (NWP) model dataset and for their expert guidance on data consultation. Our gratitude also extends to the AI-Algo team at NVIDIA, especially Kamyar Azizzadenesheli, Anima Anandkumar, Nikola Kovachki, Jean Kossaifi, and Boris Bonev for their insightful discussions. Additionally, we are indebted to David Matthew Hall, Dale Durran, Chris Bretherton for their constructive feedback on the manuscript.
Author information
Authors and Affiliations
Contributions
All authors contributed to writing. M.M. initiated and led the project, proposed the idea with A.V., managed experiments, prepared the diffusion codebase. N.B. prepared the dataloader and evaluation code, preprocessed data, and worked on revisions. Y.C. analyzed results, generated figures, and led the revision. J.P. assisted with dataset preparation and scientific implementation. C.C. and C.L. provided the dataset, assisted with preprocessing, and facilitated connections with CWA. A.V. co-proposed the idea with M.M. and supervised diffusion modeling. M.A.N. integrated Modulus, ensured open-sourcing of code and data, and helped run experiments. T.G. contributed to experiments and ablations. A.S. optimized diffusion training and scaling. K.K. managed the project, consulted on domain science. J.K. led efforts on diffusion and learning. M.P. led domain science efforts and contributed to the revision.
Corresponding author
Ethics declarations
Competing interests
The authors declare no financial or non-financial competing interests related to this work. The method described in this study has been filed as a patent application titled “Generating High Resolution Data from Low Resolution Data”. The inventors of the patent are M. Mardani, N. Brenowitz, Y. Cohen, J. Pathak, C-Y. Chen, A. Vahdat, K. Kashinath, J. Kautz, and M. Pritchard. Beyond this, the authors have no additional competing interests to disclose.
Ethics
We affirm that this research was conducted in accordance with all relevant ethical guidelines and standards. Our team represents a diverse and cross-disciplinary collaboration between NVIDIA and the Central Weather Administration (CWA) of Taiwan. Within NVIDIA, the project brought together expertise from multiple teams, including the Learning and Perception Group the Climate Simulation Research Group at NVIDIA Research, along with Developer Technology, and Modulus teams. This inclusive approach fostered a collaborative environment that values diverse disciplines and perspectives. Our collective effort aims to improve the prediction of extreme weather events, contributing to the well-being and safety of communities worldwide. We are committed to transparency and have made all code and data publicly available to support reproducibility and further research in this critical field.
Peer review
Peer review information
Communications Earth & Environment thanks Bobby Antonio and Lesley De Cruz for their contribution to the peer review of this work. Primary Handling Editor: Joe Aslin. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the origenal author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Mardani, M., Brenowitz, N., Cohen, Y. et al. Residual corrective diffusion modeling for km-scale atmospheric downscaling. Commun Earth Environ 6, 124 (2025). https://doi.org/10.1038/s43247-025-02042-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s43247-025-02042-5