Preprints
https://doi.org/10.5194/ar-2025-18
https://doi.org/10.5194/ar-2025-18
01 Jul 2025
 | 01 Jul 2025
Status: this preprint is currently under review for the journal AR.

Global fields of daily accumulation-mode particle number concentrations using in situ observations, reanalysis data and machine learning

Aino Ovaska, Elio Rauth, Daniel Holmberg, Paulo Artaxo, John Backman, Benjamin Bergmans, Don Collins, Marco Aurélio Franco, Shahzad Gani, Roy M. Harrison, Rakes K. Hooda, Tareq Hussein, Antti-Pekka Hyvärinen, Kerneels Jaars, Adam Kristensson, Markku Kulmala, Lauri Laakso, Ari Laaksonen, Nikolaos Mihalopoulos, Colin O'Dowd, Jakub Ondracek, Tuukka Petäjä, Kristina Plauškaitė, Mira Pöhlker, Ximeng Qi, Peter Tunved, Ville Vakkari, Alfred Wiedensohler, Kai Puolamäki, Tuomo Nieminen, Veli-Matti Kerminen, Victoria A. Sinclair, and Pauli Paasonen

Abstract. Accurate global estimates of accumulation-mode particle number concentrations (N100) are essential for understanding aerosol–cloud interactions, their climate effects, and improving Earth System Models. However, traditional methods relying on sparse in situ measurements lack comprehensive coverage, and indirect satellite retrievals have limited sensitivity in the relevant size range. To overcome these challenges, we apply machine learning (ML) techniques— multiple linear regression (MLR) and eXtreme Gradient Boosting (XGB)—to generate daily global N100 fields, using in situ measurements as target variables and reanalysis data from Copernicus Atmosphere Monitoring Service (CAMS) and ERA5 as predictor variables. Our cross-validation showed that ML models captured N100 concentrations well in environments well-represented in the training set, with over 70 % of daily estimates within a factor of 1.5 of observations. However, performance declines in underrepresented regions and conditions, such as clean and remote environments, underscoring the need for more diverse observations. The most important predictors for N100 in theML models were aerosol-phase sulphate and gas-phase ammonia concentrations, followed by carbon monoxide and sulfur dioxide. Although black carbon and organic matter showed the highest feature importance values, their opposing signs in the MLR model coefficients suggest their effects largely offset each other’s contribution to the N100 estimate. By directly linking estimates to in situ measurements, our ML approach provides valuable insights into the global distribution of N100 and serves as a complementary tool for evaluating Earth System Model outputs and advancing the understanding of aerosol processes and their role in the climate system.

Competing interests: Some authors are members of the editorial board of journal AR.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.
Share
Aino Ovaska, Elio Rauth, Daniel Holmberg, Paulo Artaxo, John Backman, Benjamin Bergmans, Don Collins, Marco Aurélio Franco, Shahzad Gani, Roy M. Harrison, Rakes K. Hooda, Tareq Hussein, Antti-Pekka Hyvärinen, Kerneels Jaars, Adam Kristensson, Markku Kulmala, Lauri Laakso, Ari Laaksonen, Nikolaos Mihalopoulos, Colin O'Dowd, Jakub Ondracek, Tuukka Petäjä, Kristina Plauškaitė, Mira Pöhlker, Ximeng Qi, Peter Tunved, Ville Vakkari, Alfred Wiedensohler, Kai Puolamäki, Tuomo Nieminen, Veli-Matti Kerminen, Victoria A. Sinclair, and Pauli Paasonen

Status: open (until 12 Aug 2025)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Aino Ovaska, Elio Rauth, Daniel Holmberg, Paulo Artaxo, John Backman, Benjamin Bergmans, Don Collins, Marco Aurélio Franco, Shahzad Gani, Roy M. Harrison, Rakes K. Hooda, Tareq Hussein, Antti-Pekka Hyvärinen, Kerneels Jaars, Adam Kristensson, Markku Kulmala, Lauri Laakso, Ari Laaksonen, Nikolaos Mihalopoulos, Colin O'Dowd, Jakub Ondracek, Tuukka Petäjä, Kristina Plauškaitė, Mira Pöhlker, Ximeng Qi, Peter Tunved, Ville Vakkari, Alfred Wiedensohler, Kai Puolamäki, Tuomo Nieminen, Veli-Matti Kerminen, Victoria A. Sinclair, and Pauli Paasonen

Data sets

Daily Averaged Accumulation Mode Particle Number Concentrations (N100) from 35 Stations (2003-2019) A. Ovaska, E. Rauth, D. Holmberg, P. Artaxo, J. Backman, B. Bergmans, D. Collins, M. A. Franco, S. Gani, R. M. Harrison, R. K. Hooda, T. Hussein, A. Hyvärinen, K. Jaars, A. Kristensson, M. Kulmala, L. Laakso, A. Laaksonen, N. Mihalopoulos, C. O'Dowd, J. Ondracek, T. Petäjä, K. Plauškaitė-Šukienė, M. Pöhlker, X. Qi, P. Tunved, V. Vakkari, A. Wiedensohler, K. Puolamäki, T. Nieminen, V.-M. Kerminen, V. A. Sinclair, and P. Paasonen https://doi.org/10.5281/zenodo.15222674

Aino Ovaska, Elio Rauth, Daniel Holmberg, Paulo Artaxo, John Backman, Benjamin Bergmans, Don Collins, Marco Aurélio Franco, Shahzad Gani, Roy M. Harrison, Rakes K. Hooda, Tareq Hussein, Antti-Pekka Hyvärinen, Kerneels Jaars, Adam Kristensson, Markku Kulmala, Lauri Laakso, Ari Laaksonen, Nikolaos Mihalopoulos, Colin O'Dowd, Jakub Ondracek, Tuukka Petäjä, Kristina Plauškaitė, Mira Pöhlker, Ximeng Qi, Peter Tunved, Ville Vakkari, Alfred Wiedensohler, Kai Puolamäki, Tuomo Nieminen, Veli-Matti Kerminen, Victoria A. Sinclair, and Pauli Paasonen
Metrics will be available soon.
Latest update: 01 Jul 2025
Download
Short summary
We trained machine learning models to estimate the number of aerosol particles large enough to form clouds and generated daily estimates for the entire globe. The models performed well in many continental regions but struggled in remote and marine areas. Still, this approach offers a way to quantify these particles in areas that lack direct measurements, helping us understand their influence on clouds and climate on a global scale.
Share
Altmetrics