NASA Logo

NTRS

NTRS - NASA Technical Reports Server

Back to Results
ClimateBench2.0: Probabilistic Climate Model Scoring Despite their central role in climate science and policy, Earth system models (ESMs) remain difficult to compare in any rigorous or transparent way. Most existing evaluations either emphasize specific processes or rely on qualitative assessments across diverse metrics, making it nearly impossible to rank models by their predictive skill. ClimateBench2.0 introduces a probabilistic scoring framework that focuses instead on what matters most: a model’s ability to accurately simulate the historical climate and project future multi-decadal change.

The benchmark leverages high-quality observations from the satellite era (1980–present), with a particular focus on present-day metrics such as top-of-atmosphere (TOA) energy balance, seasonal cycle fidelity, and variability in clouds, aerosols, precipitation, and ocean heat uptake for which observational constraints are strongest. Paleoclimate reconstructions (LGM, LIG, Mid-Holocene) are incorporated as out-of-distribution tests to evaluate models beyond the narrow window of recent data. Scoring is based on robust probabilistic metrics such as CRPS and Brier scores, designed to assess ensemble skill and uncertainty quantification.

Crucially, statistical performance alone is not sufficient. ClimateBench2.0 will also introduce a dedicated Physical Consistency category, evaluating properties such as global energy balance closure, conservation of water and carbon, and realistic land-ocean-atmosphere energy exchanges. These physical integrity checks are essential for trusting a model’s out-of-distribution predictions - especially under strong forcings not seen in the historical record.

By combining empirical benchmarks with physically grounded constraints, ClimateBench2.0 transforms evaluation into a reproducible, quantitative, and outcome-driven ranking framework. It applies across model types, from physical to hybrid to ML-based, and integrates with existing efforts (e.g., CMIP, Obs4MIPs) to ensure transparency and broad adoption.
Document ID
20250011354
Acquisition Source
Goddard Space Flight Center
Document Type
Abstract
Authors
Duncan Watson-Parris
(University of California San Diego San Diego, United States)
Venkatramani Balaji
(Schmidt Futures New York, United States)
Christopher S Bretherton ORCID
(Allen Institute for AI Seattle, United States)
William Chapman
(NSF National Center for Atmospheric Research Boulder, United States)
Gregory S Elsaesser
(Goddard Institute for Space Studies New York, United States)
Pierre Gentine ORCID
(Columbia University New York, United States)
Ralph F Keeling
(University of California San Diego San Diego, United States)
David Lawrence ORCID
(NSF National Center for Atmospheric Research Boulder, United States)
J David Neelin
(University of California, Los Angeles Los Angeles, United States)
Sarah G Purkey ORCID
(Scripps Institution of Oceanography La Jolla, United States)
Tapio Schneider
(California Institute of Technology Pasadena, United States)
Isla Simpson
(NSF National Center for Atmospheric Research Boulder, United States)
Graeme L Stephens
(Jet Propulsion Laboratory Pasadena, United States)
Willa Tobin
(University of California San Diego San Diego, United States)
Laure Zanna
(University of Oxford Oxford, United Kingdom)
Kevin W Bowman
(Jet Propulsion Laboratory Pasadena, United States)
Peter Martin Caldwell
(Lawrence Livermore National Laboratory Livermore, United States)
William Drew Collins ORCID
(Berkeley Lab Reading, United Kingdom)
Veronika Eyring ORCID
(German Aerospace Center (DLR))
Stephan Hoyer
(Google, Inc. Mountain View, CA, United States)
Nikolay Koldunov
(Alfred Wegener Institute Helmholtz-Center for Polar and Marine Research Bremerhaven )
Christian Lessig
(Otto-von-Guericke Universität )
Mike S Pritchard
(University of California, Irvine Irvine, United States)
Gavin A Schmidt
(Goddard Institute for Space Studies New York, United States)
Michael Schulz ORCID
(Norwegian Meteorological Institute Oslo, Norway)
Tiffany Shaw
(University of Chicago Chicago, United States)
Joao P Teixeira
(Jet Propulsion Laboratory Pasadena, United States)
Andrew Williams
(University of Oxford Oxford, United Kingdom)
Rose Yu
(University of California San Diego San Diego, United States)
Date Acquired
December 12, 2025
Subject Category
Meteorology and Climatology
Meeting Information
Meeting: American Geophysical Union (AGU25) Meeting
Location: New Orleans, LA
Country: US
Start Date: December 15, 2025
End Date: December 19, 2025
Sponsors: American Geophysical Union
Funding Number(s)
CONTRACT_GRANT: 80NM0018D0004
CONTRACT_GRANT: 80NSSC22M0054
WBS: 509496.02.08.04.24
Distribution Limits
Public
Copyright
Use by or on behalf of the US Gov. Permitted.
Technical Review
Single Expert
Keywords
satellite observations
future multi-decadal change
historical climate
ClimateBench2.0
No Preview Available