ATM S 591: Predictability and Data Assimilation

Winter Quarter 2005. (2009)

Instructor: Greg Hakim

Class meets: M 2:30-3:20. ATG 310C.

syllabus  |  resources  | lecture log  |

11 March 2005 (29): I Finished square-root filters. I summarized the ensemble transform Kalman filter (ETKF) in terms of the general square-root derivation given last lecture. The ETKF allows us to work in the largest subspace spanned by the ensemble, which is typically very small compared to both the number of degrees of freedom in the model and the number of observations. Basically, all we need are the right eigenvectors and eigenvalues of R^-1/2 H X^b; that is, the ensemble matrix of observation estimates normalized by the observation error covariance. Because this calculation is small, it may be used to make cheap ensemble systems by updating short-term ensemble forecasts members with the ETKF, and replacing the ensemble mean with an operationally derived analysis (ie, let somebody else do the data assimilation). The downside to the ETKF is that the ensemble background analysis-error covariance estimates cannot be localized in physical space, because the calculations take place in observation space; such localization is important for reducing spurious long-range covariances. I finished with a brief summary of the "Potter method," which was originally derived for the ETK, but applies naturally to the EnKF. The essential difference from the ETKF is that observations are assimilated serially, one at a time, and covariance localization may be applied. As with with ETKF, this method may be used for cheap ensemble schemes, although it is potentially much more expensive, because the calculation scales linearly with the number of observations.

9 March 2005 (28): Student presentations (Ryan, Alex, Lucas).

7 March 2005 (27): Student presentations (Eric, Bryan, Steven).

4 March 2005 (26): No class. PNWWW.

3 March 2005 (25): Make-up class. I started discussing the details of ensemble Kalman filters. The first issue concerns filter divergence, which relates to the ensemble becoming 'overconfident' and 'decoupling' from observations due a reduction in variance. The reduction in variance can arise if all ensemble members get the same observations; effectively, the K R K^T term in the analysis error covariance, A, is absent. Two solutions to this problem are 'perturbed observations' and 'square root filters.' For perturbed observations, each ensemble member gets a slightly different set of observations by adding random noise drawn from R. Unfortunately, this leads to skewed posterior distributions, especially for small ensembles. For square root filters, the ensemble mean is updated with the observations, and the ensemble perturbations (deviations from the mean), are transformed to have the appropriate A; this transformation is not unique.

2 March 2005 (24): Adjoint sensitivity. I derived the adjoint sensitivity equation, which shows that the sensitivity gradient of a forecast aspect to the initial conditions can be obtained by integrating the forecast-time sensitivity gradient backward in time using an adjoint model. I derived the relationship between sensitivity and singular vectors, and argued that 4DVAR can be interpreted in terms of forward model error growth that is then reduced and whitened by innovations during the "backward adjoint" integration.

28 February 2005 (23): I cleared up some confusion on 3DVAR, and derived the "incremental form," which requires only one application of the nonlinear observation operator prior to minimizing the cost function. In contrast, the "full" version of 3DVAR requires an observation estimate from the full nonlinear operator every iteration. I then finished 4DVAR by deriving an algorithm to calculate the gradient of J0, the observation contribution to the cost function. This algorithm involves a forward integration of the full nonlinear model, and then a background in time integration of the adjoint model tangent to the nonlinear forward trajectory, "forced" by innovations.

23-25 March 2005 (21 & 22): No class.

18 February 2005 (20): I derived the 3DVAR algorithm, which involves evaluating a cost function and it's gradient to minimize the error variance given statistics for the background and observations.

16 February 2005 (19): Beginning with the full EKF update algorithm, I discussed the relationships with historical approaches to data assimilation. The historical review began with "statistical interpolation" schemes (SI) that trace back to Cressman (1959), which is univariate with fixed K that depends on a distance penalty. First attempts at solving the full Gaussian analysis-update equation were called "optimum interpolation" (OI), and used fixed background-error covariance matrices. The matrices were made manageable size by restricting the spatial influence of observations. The spatial limitation was relaxed in 3DVAR, which eliminates the matrix calculations altogether and minimizes the cost function by a direct search. 4DVAR is the extension of 3DVAR to a time-interval. The assumption of fixed background-error covariance matrices is relaxed in the ensemble Kalman filter, which estimates the covariance propagation using a small ensemble of nonlinear forecasts.

14 February 2005 (18): I reviewed the extended Kalman filter (EKF), emphasizing that the update step involves integration of the nonlinear model for the new background expected value, and tangent-linear model for the new background error covariances. I then introduced the ensemble Kalman filter (EnKF) with a high-level overview and summary of the algorithm and how it differs from the EKF. I then reviewed the update-equation limits for perfect and inaccurate observations, showing how in the former case the observations are simply interpolated, and in the latter case have no impact (background is unmodified). Last we examined a simple one-dimensional scalar example, which explicitly reveals the relative weighting of background and observation according to their respective error variance.

11 February 2005 (17): Review of HW#2 (handed in today). I summarized the Kalman filter in terms of the "update step" (x^a and A), and the "forecast step." The extended Kalman filter (EKF) follows immediately, and I gave an overview of the differences. Last, I showed how to interpret the Kalman gain in terms of covariance and variance relationships; e.g. B H^T is the covariance between the background and the background estimate of the observations and H B H^T is the error covariance matrix of for the background estimate of the observations (analogous to R).

9 February 2005 (16): Following from the end of the previous lecture I derived the expression for the gain matrix (K) that gives the minimum analysis-error variance estimate. This analysis is tricky because we want to take the derivative of the general analysis-error expression with respect to the matrix K. We arrived at an expression containing one linear and one quadratic term in a perturbation matrix. After proving that the quadratic term was positive definite, and recalling that the perturbations are small (zero limit), we discovered that we had to suppress the linear term altogether. Out pops the Kalman gain as the minimum-variance estimate.

7 February 2005 (15): After revising problem 4 on HW #2 (state-space velocity cannot be reliably defined for Henon due to the fixed "time step"), I returned to our analysis of the maximum likelihood derivation of the analysis-update equation by deriving error properties for the analysis. I showed that if the errors in the prior and observation have an expected value of zero, then so does the analysis. I then derived the analysis error covariance matrix, A, for an arbitrary weight matrix. In the case where the weight matrix is given by the Kalman gain, we get simply A = (IKH)B. Returning to the general expression for A, I started to derive the K that gives the minimum total analysis error variance.

4 February 2005 (14): I derived the "optimum interpolation" (OI) equation by determining the maximum likelihood state, the "analysis," given the prior and observations by taking the gradient of the GA equation and setting the result to zero. We linearized the gradient of the observation operator ("curly" H) in terms of a linear operator, H. The OI, or analysis-update, equation says that the analysis is determined by a linear combination of the prior and the "innovation," which represents the new information provided by observations (difference between the observation and model estimate of the observation). The weight matrix for this linear combination is given by the Kalman gain matrix, K, which is a function of B, R, and H.

2 February 2005 (13): We moved on to assume Gaussian statistics in the full probabilistic update equation. I introduced the observation operator, "curly H," which maps the model state into observation space. Curly H is, in general, nonlinear, and is distinguished from the linearized map, H. Observation error covariance was broken down into two pieces, instrument error, and representativeness error; assuming that they are independent, we get the observation error covariance matrix, R, by the product of these constituent Gaussian pdfs. These assumptions result in the "Gaussian Analysis" (GA) equation, and a fork in the analysis. One branch, using the "direct solve" approach, will give us an optimum interpolation formula and be a component of the Kalman filter. The second branch, using a "variational solve" approach, will give us 3D-VAR. I began discussion on the first branch by reviewing the calculus of the gradient operator on vectors and matrices, including the chain rule.

31 January 2005 (12): Started data assimilation with "big picture" overview and definition of the problem. Plan is to cover sequential DA first, then distributed in time. Using Bayes' rule, I derived the fully general analysis update equation as a product of the observation likelihood function and the PDF of the prior. I discussed how to use Liouville's equation in connection with this update equation in a recursive algorithm. Plots for analytic update and Gaussian approximation were shown for a scalar having a bimodal prior.

28 January 2005 (11): Measures of error and skill scores. I covered continuous and dichotomous variables, and deterministic and probabilistic methods. These included contingency tables, RMS error, Brier skill score, and ensemble rank histograms.

26 January 2005 (10): Continued discussion of MC ensembles. (i) Ensemble construction was summarized for Gaussian statistics in terms of simple ways of populating an ensemble. (1) climatological mean and covariance; N(mu_c, B_c); we don't expect this to be very good. (2) a state-dependent mean (i.e. a state on the attractor) and climatological covariance; this may be good at times, but will have problems when the climo covariances are a poor approximation for the state-dependent covariances. (3) fully state dependent mean and covariance; this will be good, but not perfect. Gaussian statistics will draw states off the attractor. I then defined a perfect ensemble. Next I discussed error growth, and motivated that, on average, the ensemble mean should have smaller errors than any individual member. In fact the latter asymptotes to twice the error of the former; more on that next time. I finished with a discussion of spread-error correlation.

24 January 2005 (9): I finished discussing the LE and it's exact solution. For numerical calculation, we found that "reasonable" discretizations of the PDF give enormous calculations, O(10^6) for the 3D Lorenz attractor, and O(10^(10^6)) for modern weather and climate models; even Gaussian statistics are prohibitive, except for low order models. This basic fact motivates the next aspect of ensemble prediction: Monte Carlo methods. Basically, with MC, we take random samples from p(x,t=0) and integrate each forward using a numerical model to estimate p(x,t). Clearly we need reasonable means of sampling the initial pdf, and we're often limited to Gaussian statistics due to relatively small ensemble sizes.

21 January 2005 (8): Ensemble prediction. We've seen that, for chaotic systems, state-space trajectories diverge exponentially fast, so that a single deterministic forecast has the potential to mislead. This admission motivates solutions for probability density functions on the attractor. I discussed stochastic-dynamic prediction, and derived the Louiville equation (LE; a.k.a, probability continuity equation), which governs the evolution of PDFs on the attractor. The interpretation is the same as for mass continuity, except fluid velocity is replaced by state-space velocity. Probability density increases when state-space trajectories converge, and decreases when state-space trajectories diverge. Using the method of characteristics, I then gave the general solution of the LE, which shows that, for chaotic systems, state-space probability density decreases exponentially along characteristic curves (state-space trajectories).

19 January 2005 (7): Continued practical aspects of calculating Lyapunov exponents (LE) and vectors (LVs). (ii) orthogonalization and normalization. Since everything rotates in time toward the leading Lyapunov vector, and we need to average over long time periods, we need to orthogonalize our growing perturbations, and normalize their size. For orthogonalization I discussed Gram-Schmidt (GS), SVD and QR methods. First order vectors by size, so the biggest is the current best estimate of the leading LV; then implement a method. GS is straightforward, and retains the direction of the leading vector; it does suffer from error accumulation. SVD is robust, but does not retain the direction of the leading vector. QR is equally robust, and preserves the direction of the leading vector. So, use QR when a routine is available. For normalization I showed how to accumulate a sum of the log of the normalizing factor when computing LEs. Revision: Do not sort the vectors by length; this causes spurious results.

14 January 2005 (6): Generalized stability of maps follows easily from flows. Practical aspects of calculating Lyapunov exponents and vectors. (i) tangent linear model is determined analytically (rare; HW), coded by hand, or by algorithmic differentiation. Simple alternative is complex-step derivative method.

12 January 2005 (5): Instantaneous optimal modes as the logical t->0 limit of singular vectors; still have norm dependence. Summary of stability for linear autonomous systems. Linear time dependent stability (non-autonomous systems). Lyapunov exponents and vectors; Lyapunov dimension. Started tangent-linear time-dependent stability. Derived tangent linear model (Jacobian matrix); Change of Variables Theorem; equivalence to linear time dependent problem. HW#1 handed out.

10 January 2005 (4): Still on linear autonomous systems. If the major axes of the solution "error ellipse" are not determined by the leading eigenvector, then what does? Gave SVD diagonalization of M, initial SVs, V, from M^T M, and evolved SVs, U. Discussed orthogonality of initial and evolved SVs, amplification in terms of singular values, and dependence of the analysis on time interval, location on attractor (not for fixed point, but will come up later), and norm. Showed equivalence between norms, coordinate systems, and statistics of errors. Argued that norms and coordinates can be chosen arbitrarily, but the statistics of errors cannot; these statistics should be used to constrain the norm (covariance norm).

7 January 2005 (3): Began stability analysis. Divided up the material into hierarchy of increasing complexity: (i) linear time-independent systems, (ii) linear time-dependent systems, and (iii) nonlinear time-dependent systems. Will do flows first based on GFD relevance. Started (i) stability of "autonomous" system at a fixed point on the attractor. Defined system matrix, L, and propagator, M (a.k.a. resolvant, transfer matrix). Solution is set of ODEs with constant coefficients: integrating factor. The integrating factor yields a similarity transformation based on e-vecs and e-vals of either L or M. Although these vectors span the solution space, they are non-orthogonal.

5 January 2005 (2): Dynamical systems overview: state space, state vector, state space dimension, state space trajectory, state space volumes (conservative & non-conservative systems and dissipative systems), attractor, basin of attraction, system dimension, fractal, strange attractor, stable manifold, unstable manifold, chaos, ergodicity. I also introduced maps and flows, and examples of each that we will use: Lorenz (1963; flow), Ikeda (1979; map), and Henon(1976; map).

3 January 2005 (1): Review syllabus; course overview; DART; introduced dynamical systems.