Skip Navigation

Center for Nonlinear Data Assimilation,

Causal Discovery and Prediction

Welcome to the webpages of the Center for Nonlinear Data Assimilation, Causal Discovery and Prediction. The mission of the center is to develop cutting edge fully nonlinear data-assimilation and causal discovery methods and apply them to the understanding and prediction of complex high-dimensional systems, with a strong emphasis on atmospheric and oceanic applications.

The center members have been at the forefront of many mainstream developments in advanced data assimilation, such as Ensemble Kalman Filters and Smoothers, Particle Filters and Smoothers, Particle Flow Filters and Smoothers, consistent application of synchronization in data assimilation, and exciting new developments such as exploring Optimal Transportation and Wasserstein distances in data assimilation. We work with pure and applied mathematicians and practitioners to generate and explore the newest ideas, and at the same time develop methods that are useful for the real world. This involves implementeing our methods in state-of-the-art data-assimilation platforms such as DART and JEDI.

We also work on understanding representation errors and developing ways to estimate them, accelerating variational minimizations via randomized methods, and applying data assimilation for systematic model improvement, beyond just looking at forecast output and trying to figure out what is wrong where.

Center members have recently developed a powerful causal discovery framework for non-intervenable systems, and applied it to complex systems such as Hurricane rapid intensification. All understanding in science starts with causal reasoning, and recently large steps have been taken in building a formal mathematical framework for causal discovery. However, most causal discovery methods are based on interventions, while many systems, such as ocean and atmosphere, interventions are difficult or unethical. Even when running numerical models one can intervene on model parameters or input variables, but interventions on internal prognostic model variables will break internal feedbacks. This means that the resulting model evolution will be different from what the model normally would do. Hence, causal discovery on non-intervenable systems is a crucial tool in enhancing present-day understanding in both nature, but also for complex models.

A distinguishing factor in our research is the full nonlinearity of the methodology, allowing for causal inference in highly nonlinear systems. We found that standard Directed Acyclic Graphs are not general enough and are working on hypergraphical representations of the new framework.

Our predictability research focuses on information flow in complex systems, analyzing observations and model output. It is partly based on the causal discovery framework and has strong ties with data assimilation, making the research in the center full circle.

We use observations and models, from conceptual to full general Circulation Models, and combine these with statistics, probability, Bayesian Inference, information theory and dynamical systems theory, to name a few.

The figure below is an example of what the kind of work we do in the ocean, an amazingly complex and least-well-observed part of the climate system where the full beauty of geophysical fluid dynamics and thermodynamics can be studied in all details. It shows that state-of-the-art ocean models are quite accurate but missrepresent dominant physics in several regions that can have global implications. Data assimilation and causal discovery are among the tools that we use to unravel what is going on and why, with two main aims: 1) increase our understanding, and 2) improve predictions.

Ocean sea-surface variability (standard deviation in cm) as determined from satellite observations (top figure) and from a high-resolution ocean model (bottom figure). Note that the large-scale features are reasonably well represented, but many details are still incorrect. These might seem minor but in fact point to serious misrepresentation in the model of dominant physics in different regions. The aim is to first understand better what is missing and why, and then to improve prediction.