Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
DATA 604, HW 1
1. Download the USPS Handwritten Digits data set (usps all) from the website of
Prof. Roweis. Please produce a 2 by 5 image grid of images: the 1st digit one, the
2nd digit two, the 3rd digit 3, ..., the 9th nine and the 10th zero from the set:(
1 2 3 4 5
6 7 8 9 0
)
2. From now on, we think of the pixel values as random variables. Find the sample
mean for each set of 1100 digit examples. Please produce a 2 by 5 image grid of these
averages (reshaped as 16 x 16 images), as before. (Definition of what the sample
mean is can also be found at
3. Find the sample mean for all 10 full (1100 samples) sets of handwritten digit
examples. This would be the average handwritten digit from our collection. Reshape
as 16 x 16 and include this overall average image.
4. Find the sample covariance matrices for each of the 10 sets of 1100 digit ex-
amples. Please print these (256 x 256) covariance matrices on the 2 by 5 image grid
as before.
5. Now that you have calculated sample means and sample covariance matrices
define formally what would be the multivariate normal distribution modeling each
single integer class. Estimate the computational complexity of computing these dis-
tributions in practice in terms of the parameters associated with the USPS data set
(such as number of classes, number of training points, or size of individual image).