Starting from:

$25

CS 434 assignment 1-  Linear regression Solved


Data You will use the Boston Housing dataset of the housing prices in Boston suburbs. The goal is to predict the median value of housing of an area (in thousands) based on 13 attributes describing the area (e.g., crime rate, accessibility etc). The file housing desc.txt describes the data. Data is divided into two sets: (1) a training set housing train.csv for learning the model, and (2) a testing set housing test.csv for evaluating the performance of the learned model. Your task is to implement linear regression and explore some variations with it on this data.

1.     Load the training data into the corresponding X and Y matrices, where X stores the features and Y stores the desired outputs. The rows of X and Y correspond to the examples and the columns of X correspond to the features. Introduce the dummy variable to X by adding an extra column of ones to X (You can make this extra column to be the first column. Changing the position of the added column will only change the order of the learned weight and does not matter in practice). Compute the optimal weight vector w using w = (XTX)−1XTY . Feel free to use existing numerical packages (e.g., numpy) to perform the computation. Report the learned weight vector.

2.   Apply the learned model to make predictions for the training and testing data respectively and compute for each case the average squared error(ASE), defined by 1 , which is the sum of squared error normalized by n, the total number of examples in the data. Report the training and testing ASEs respectively. Which one is larger? Is it consistent with your expectation?

Write your code so that you get the results for questions 1 and 2 using the following command: python q1 2.py housing train.csv housing test.csv The output should include:

•    the learned weight vector

•    ASE over the training data

•    ASE over the testing data

3.     Remove the dummy variable (the column of ones) from X, repeat 1 and 2. How does this change influence the ASE on the training and testing data? Provide an explanation for this influence.

Write your code so that you get the results for question 3 using the following command: python q1 3.py housing train.csv housing test.csv The output should include:

•    the learned weight vector

•    ASE over the training data

•    ASE over the testing data

4.    Modify the data by adding additional random features. You will do this to both training and testing data. In particular, generate 20 random features by sampling from a standard normal distribution. Incrementally add the generated random features to your data, 2 at a time. So we will create 20 new train/test datasets, each with d of random features, where d = 2,4,...,20. For each version, learn the optimal linear regression model (i.e., the optimal weight vector) and compute its resulting training and testing ASEs. Plot the training and testing ASEs as a function of d. What trends do you observe for training and testing ASEs respectively? In general, how do you expect adding more features to influence the training ASE? How about testing ASE? Why?

Write your code so that you get the results for question 4 using the following command: python q1 4.py housing train.csv housing test.csv The output should include:

•    plot of the training ASE (y-axis) as a function of d (x-axis)

•    plot of the testing ASE (y-axis) as a function of d (x-axis)

2           Logistic regression with regularization (to come)
Data. For this part you will work with the USPS handwritten digit dataset and implement the logistic regression classifier to differentiate digit 4 from digit 9. Each example is an image of digit 4 or 9, with 16 by 16 pixels. Treating the gray-scale value of each pixel as a feature (between 0 and 255), each example has 162 = 256 features. For each class, we have 700 training samples and 400 testing samples. For this assignment, we have injected some small amount of salt and pepper noise to the image. You can view the original images collectively at http://www.cs.nyu.edu/~roweis/data/usps_4.jpg, andhttp://www.cs. nyu.edu/~roweis/data/usps_9.jpg The data is in the csv format and each row corresponds to a handwritten digit (the first 256 columns) and its label (last column, 0 for digit 4 and 1 for digit 9).

1.    Implement the batch gradient descent algorithm to train a binary logistic regression classifier. The behavior of Gradient descent can be strongly influenced by the learning rate. Experiment with different learning rates, report your observation on the convergence behavior of the gradient descent algorithm. For your implementation, you will need to decide a stopping condition. You might use a fixed number of iterations, the change of the objective value (when it ceases to be significant) or the norm of the gradient (when it is smaller than a small threshold). Note, if you observe an overflow, then your learning rate is too big, so you need to try smaller (e.g., divide by 2 or 10) learning rates. Once you identify a suitable learning rate, rerun the training of the model from the beginning. For each gradient descent iteration, plot the training accuracy and the testing accuracy of your model as a function of the number of gradient descent iterations. What trend do you observe? Write your code so that you get the results for question 1 using the following command: python q2 1.py usps train.csv usps test.csv learningrate The output should include:

•    plot of the learning curve: training accuracy (y-axis) as a function of the number of gradient descent iterations (x-axis)

•    plot of the learning curve: testing accuracy (y-axis) as a function of the number of gradient descent iterations (x-axis)

2.     Logistic regression is typically used with regularization. Here we will explore L2 regularization, which adds to the logistic regression objective an additional regularization term of the squared Euclidean norm of the weight vector.



where the loss function l is the same as introduced in class. Find the gradient for this objective function and modify the batch gradient descent algorithm with this new gradient. Provide the pseudo code for your modified algorithm.

3.     Implement your derived algorithm, and experiment with different λ values (e.g., 10

Report the training and testing accuracies (i.e., the percentage of correct predictions) achieved by the weight vectors learned with different λ values. Discuss your results in terms of the relationship between training/testing performance and the λ values. Write your code so that you get the results for question 3 using the following command: python q2 3.py usps train.csv usps test.csv lambdas

where lambdas contains the list of λ values to be tested. The output should include:

•    plot of the training accuracy (y-axis) as a function of the λ value (x-axis)

•    plot of the testing accuracy (y-axis) as a function of the λ value (x-axis)

Remark 1 For logistic regression, it would be a good idea to normalize the features to the range of [0,1]. This will makes it easier to find a proper learning rate. You can find information about feature normalization at https://en.wikipedia.org/wiki/Feature_scaling)

More products