Computational Methods for Data Analysis

**Winter 2014**

http://www.atmos.washington.edu/~breth/classes/AM582/

MWF 8:30-9:20: Loew 216

This class is being offered on-line through EDGE, which provides live streaming of each class and archived lecture videos. You'll need to sign in with your UW NetID. If you are an EDGE student, you can also get to this site through the UWEO Moodle portal moodle.extn.washington.edu. If you are registered for the virtual section 582C, it is identical to being in the on-campus 'A' section, and you are welcome to come to the classroom to listen to the lectures live (we will have enough seats, at least after the first lecture or two). Assignments and grading for 482 sections will be identical to that for 582 sections.

A Canvas page will be used for announcements, discussions, homework submission and grading.

Instructor:Prof. Chris Bretherton breth@washington.edu ATG 704, x5-7414 Office hours: MF 1:30-2:20, or by appointment. TAs:Xing Fu xingf@uw.edu Meghana Velegar mvelegar@uw.edu Office hours: Tu 12:30-1:30 (LEW 128) and Fr 12:30-1:30 (LEW 129); also via Canvas teleconferencing. |

Schedule | Homework and Exams | Syllabus and supplementary notes | Matlab scripts |

Exploratory and objective data analysis methods applied to the physical, engineering, and biological sciences. Statistics, including normal distributions, confidence intervals, linear regression. Fourier spectral analysis and filtering for time series, wavelet analysis, image processing and compression, principal component analysis, cluster analysis, Kalman filter.

Useful references to supplement lecture notes:

- D. L. Hartmann, Atm S 552 (Objective Analysis) lecture notes. (freely available at http://www.atmos.washington.edu/~dennis/, follow the Atm S 552 link, then Class Notes link)
- Kutz, J. N., 2013:
*Data-Driven Modeling & Scientific Computation*. Oxford University Press, 608 pp. - von Storch, H., and F. W. Zwiers, 1999:
*Statistical Analysis in Climate Research*. Cambridge University Press, 484 pp.

Statistics potpourri (6 lectures)

- Probability distributions (Lecture 1, Jan. 6)
- Expectation, mean and variance (Lecture 2, Jan. 10). Figure: 1000 samples of unit normal PDF.
- Central limit theorem; confidence intervals. (Lecture 3, Jan. 10 makeup). Figures: Comparison of trials of the mean of 20 samples of a Bernoulli distribution with Central Limit Theorem prediction; Graph of the normal distribution; Serially correlated data with several lag-1 autocorrelations r.
- Statistical hypothesis testing (Lecture 4, Jan. 13)
- Linear regression (Lecture 5, Jan. 15). Figure: Different manifestations of correlation coefficient 0.7. Matlab: regression_example
- Multiple linear regression (Lecture 6, Jan. 17). Matlab: regression_example (cont.)

Fourier spectral analysis and filtering (9 lectures)

- Complex Fourier series and the DFT (Lecture 7, Jan. 22)
- Properties of the DFT (Lecture 8, Jan. 24)
- DFT data analysis, power spectrum (Lecture 9, Jan. 27). Matlab: DFT_gauss, nino1, nino2
- Lagged autocovariance and autocorrelation (Lecture 10, Jan. 29). Matlab: nino2 (cont.)
- White and red noise (Lecture 11, Jan. 31). Matlab: rednoise.m, nino2 (cont.)
- Windowed spectral analysis (Lecture 12, Feb. 3)
- Application to El Nino dataset including Matlab SP toolbox spectral analysis functions (Lecture 13, Feb 5). Matlab: nino3
- Application to music (time-frequency analysis using spectrograms) (Lecture 14, Feb. 7). Matlab: music.
- Spectral filtering (Lecture 15, Feb 7 makeup and Feb 12). Matlab: music2, runningmean.
- Filter design and Butterworth filters (Lecture 16, Feb 12). Matlab: butterworth, music2.

Wavelet analysis and image compression (3 lectures)

- Single-level wavelet analysis (Lecture 17, Feb. 14). Matlab: wavelet_leleccum
- Multiresolution wavelet analysis (Lecture 18, Feb. 19). Figures: Haar 1-level and 3-level wavelet filter response and Wavelets and scaling vector for 3-level Haar transform.. Matlab: wavelet_leleccum
- Wavelet compression of time series and images (Lecture 19, Feb. 20). Matlab: wavelet_leleccum; wavelet_image; wavelet_leleccum_cwt

Principal component and cluster analysis for dimensionality reduction (6 lectures)

- Singular value decomposition (Lecture 20, Feb. 24). Matlab: PCA_SSTA
- Principal component analysis (Lecture 21, Feb. 26) Figure: Scatterplot of highly correlated bivariate data (from Hartmann Ch. 4).
- PCA: Implementation (Lecture 22, Feb. 28, Mar 3). Matlab: PCA_SSTA, PCA_cities
- K-means cluster analysis (Lecture 23, Mar. 3). Matlab: cluster_cities
- Pattern recognition (Lecture 24, Mar. 5). Matlab: classify_one_two

Data assimilation and model-data fusion (3 lectures)

- Sequential state estimation (Kalman filter) for a simple 1D system (Lecture 25, Mar. 7). Matlab: sequential_estimation_simple1
- Theory of Kalman filtering (Lecture 26, Mar. 10)
- Kalman filter for a multivariate ball-tracking problem + course Q&A (Lecture 27, Mar. 12) Matlab: kalman2

- Homework (50%), posted to class web page and assigned on a quasi-biweekly basis. Homework is heavily oriented toward problem solving and exploratory data analysis in Matlab based on methods discussed in lectures. Consultation with your fellow students is encouraged, but everyone needs to write out homework assignments in their own words, and write their own Matlab scripts. The class Canvas page has Discussions set up for homework and other class issues. Please use the Canvas Assignments tab to submit homework; please include supporting Matlab scripts. Late homework (after 11:59pm on due date) will be accepted until 5 pm on the second day after the due date. There will be a late penalty of 25% on all but the first late assigment. Homework solutions will be posted on the class web page.
- Midterm assignment (20%) and final assignment (30%). Like regular homework, but no consultation with anyone and no late final assigments accepted. Final assignment will be posted on Wed 12 Mar (penultimate class), due by 5 pm on Wed 19 Mar (in finals week).

No class:

- We 8 Jan (Instructor travel)
- Mo 20 Jan: MLK Day (UW holiday)
- Mo 10 Feb (Instructor travel)
- Mo 17 Feb: Presidents Day (UW holiday)
- Fr 21 Feb (Instructor family day)
- Fr 14 Mar (We're done...ponder your final instead!)

Makeup classes for instructor travel days (please attend if you want to watch live; otherwise watch the on-line video before the following class)

- Mo 6 Jan 2:30 pm (Loew 202)
- Fr 7 Feb. 3:30 pm (Loew 202)
- Th 20 Feb. 8:30 am (Loew 216)

Item | Due Date | Download Solutions |

Homework #1; uses hw1_dat.mat | due We 22 Jan | HW #1 solution |

Homework #2; uses snow.mat | due Fr 31 Jan | HW #2 solution |

Homework #3; uses SP500.dat | due Fr. 7 Feb | HW #3 solution |

Homework #4 (Midterm assignment; no collaboration) | due Fr. 14 Feb | HW #4 solution |

Homework #5; uses raymo.mat | due Fr. 21 Feb | HW #5 solution |

Homework #6; uses person.jpg | due Fr. 28 Feb | HW #6 solution |

Homework #7; uses USTA.mat | due Mo 10 Mar | HW #7 solution |

Final assignment; uses x_y_train.png and x_y_test.png. If you get stuck on problem 1, two 4D arrays containing 8x8 rescaled images for each of the 8x5 letters in the training and testing datasets that you can use for problem 2 are given in ScaledLetter.mat. | due We 19 Mar 5 pm, no collaboration, no late submissions | Final solutions |

For the scripts below with a .html link, I've used Matlab's publish capability to make self-documenting web page (html) versions of the scripts. To extract the original Matlab script from the web page, copy the URL (web address). In Matlab, type grabcode('URL') to bring up an untitled file with the code in an editor window, e. g. to get the file foo.m that generated the web page foo.html below, type grabcode('http://www.atmos.washington.edu/~breth/classes/AM582/matlab/html/foo.html') and save this to foo.m in the Matlab editor.

regression_example.html: Simple and multiple linear regression on a dataset of car MPG vs. weight and horsepower.

fft_hw1.m: DFT of HW1 dataset. Makes plot of the amplitudes of the complex-valued DFT components.

DFT_gauss.html: Power spectra of a Gaussian and a half-Gaussian.

nino1.html: Uses 1950-2012 monthly Nino3.4 sea-surface temperature dataset nino.mat. Plots the SST, its power spectrum, and the same for the SST anomaly after the mean and first three harmonics of the annual cycle of SST are removed.

nino2.html: Uses 1950-2012 monthly Nino3.4 sea-surface temperature anomaly dataset SSTA.mat optionally made by nino1.m. Plots subannual SSTA power spectrum with red noise fit, and plots of its autocovariance and autocorrelation sequences.

rednoise.m: Function to generate a sample of standardized Gaussian red noise with a given lag-1 covariance r.

nino3.html: Uses monthly Nino3.4 sea-surface temperature anomaly (SSTA) time series derived by nino1.m, given in SSTA.mat. Plots windowed power spectrum of SSTA, using 20-year overlapping Hann (cosine-taper) windows, both directly and using Matlab signal-processing toolbox function pwelch, and compares to red noise fit.

music.html. Plays and does windowed tapered power spectral analysis of short segment of Handel Messiah. A copy of the score suggests that the initial notes of the segment are D, A and F, as also visible from our analysis (which also suggests they are played slightly flat).

music2.html. Low, band and high pass filtering of short segment of Handel Messiah with Fourier and Butterworth filters.

runningmean.html. Power spectrum of running mean filter using DFT.

butterworth.html. Calculation of an Nth order Butterworth filter and plots showing its properties.

wavelet_leleccum_notoolbox.html. Application of single and multilevel Haar wavelet transform to an electricity consumption dataset. Uses leleccum.mat (included in Matlab wavelet toolbox) and my functions dwtHaar.m and idwtHaar.m in place of the Matlab wavelet toolbox, for portability. See wavelet_leleccum.html for an equivalent script from the wavelet toolbox documentation that uses the toolbox functions instead.

wavelet_leleccum_cwt.html. Continuous Haar wavelet transform on the electricity consumption dataset. Requires Matlab wavelet toolbox function cwt.

wavelet_image.html. Image compression example using 2D multilevel Haar wavelet transform. Uses ngc6543a.jpg and my functions dwt2Haar.m and idwt2Haar.m in place of the Matlab wavelet toolbox, for portability.

PCA_SSTA.html. Application of PCA to gridded tropical Pacific sea-surface temperature dataset. Uses SSTPac.mat.

PCA_cities.html. Application of PCA to multiparameter dataset of indices for 9 categories contributing to quality of life in 329 U.S. cities. Uses cities.mat, which is also already included in the Matlab Statistics toolbox.

cluster_cities.html. K-means cluster analysis of cities dataset. Uses cities.mat.

classify_one_two.html. Classify spoken 'one's and 'two's using DWT power spectral analysis and PCA. Uses dwtcolHaar.m, ones.m4a, twos.m4a, and ones-twos.m4a. If your Matlab doesn't have audioread or equivalent, load the following .mat files instead: ones.mat, twos.mat, ones-twos.mat ; these also include the 20 sounds being spoken in each file.

sequential_estimation_simple1D.html. Sequential state estimation (Kalman filter) on the simple 1D system x_n = ax_{n-1}.

kalman2.html. Kalman filter on a multivariate ball-tracking problem.