# Regression _ SOLUTION

Overview In this assignment you will explore regression techniques on high-dimensional data. You will use a few data sets for this assignment:

These data sets are in matrix format and can be loaded into MATLAB or OCTAVE. By calling load filename (for instance load X.dat) it will put in memory the data in the ﬁle, for instance in the above example the matrix X. You can then display this matrix by typing X As usual, it is highly recommended that you use LaTeX for this assignment. If you do not, you may lose points if your assignment is difﬁcult to read or hard to follow. Find a sample form in this directory:

1 Singular Value Decomposition (20 points) First we will compute the SVD of the matrix A we have loaded [U,S,V] = svd(A) Then take the top k components of A for values of k = 1 through k = 10 using Uk = U(:,1:k) Sk = S(1:k,1:k) Vk = V(:,1:k) Ak = Uk*Sk*Vk’ A (10 points): Compute and report the L2 norm of the difference between A and Ak for each value of k using norm(A-Ak,2) B (5 points): Find the smallest value k so that the L2 norm of A-Ak is less than 10% that of A; k might or might not be larger than 10.

2 Frequent Directions (40 points) Use the stub ﬁle FD.m to create a function for the Frequent Directions algorithm (Algorithm 15.2.1). We will consider running this code on matrix A. A (15 points): We can measure the error maxkxk=1|kAxk2 −kBxk2|as norm(A’*A - B’*B, 2). How large does l need to be for the above error to be at most kAk2 F/10? How does this compare to the theoretical bound (e.g. for k = 0). Note: you can calculatek Ak2 F as norm(A, ’fro’)ˆ2. B (25 points): Frequent Directions should also satisfy another bound based on its Frobenius norm. We can compute AΠBk using Bk = B(1:k,:) and then calculating A * pinv(Bk) * Bk. How large does l need to be to achieve kA−AΠBkk2 F ≤ 1.1·kA−Akk2 F; foreachofthe 7 valuesofvalue k intheset{1,2,3,4,5,6,7}. Answer both by running your algorithm and reporting the theoretical bound provided in the notes. (e.g.,youshouldreport 7 pairs of values,an empirical and theoretical bound for each value k) 3 Linear Regression (40 points) We will ﬁnd coefﬁcients C (was a1,...,ad in notes, but changed to avoid confusion with matrix A in Q1) to estimate X*C ≈ Y, using the provided datasets X and Y. We will compare two approaches least squares and ridge regression. Least Squares: Set C = inverse(X’ * X)*X’*Y Ridge Regression: Set Cs = inverse(X’*X + sˆ2*eye(12))*X’*Y A (20 points): Solve for the coefﬁcients C (or Cs) using Least Squares and Ridge Regression with s = {0.1,0.3,0.5,1.0,2.0}(i.e. s will take on one of those 5 values each time you try, say obtaining C05 for s = 0.5). For each set of coefﬁcients, report the error in the estimate ˆ Y of Y as norm(Y - X*C,2). B (20 points): Create three row-subsets of X and Y • X1 = X(1:66,:) and Y1 = Y(1:66) • X2 = X(34:100,:) and Y2 = Y(34:100) • X3 = [X(1:33,:); X(67:100,:)] and Y3 = [Y(1:33); Y(67:100)] Repeat the above procedure on these subsets and cross-validate the solution on the remainder of X and Y. Speciﬁcally, learn the coefﬁcients C using, say, X1 and Y1 and then measure norm(Y(67:100) X(67:100,:)*C,2). Which approach works best (averaging the results from the three subsets): Least Squares, or for which value of s using Ridge Regression? 4 BONUS (3 points) Consider a linear equationW = M*Swhere Mis a measurement matrix ﬁlled with random values {−1,0,+1} (although now that they are there, they are no longer random), and W is the output of the sparse signal S when measured by M. Use Orthogonal Matching Pursuit (as described in the notes) to recover the non-zero entries from S. Record the order in which you ﬁnd each entry and the residual vector after each step.