Starting from:

$29.99

Machine Learning Homework #3 Solution

1.  (25 points)  Let Z = {(x1, y1 ), . . . , (xn , yn )} be a given training  set.  We consider the following regularized  logistic regression objective  function:

 



1
 


n
 
f (w) =


n

X  −yi wT xi + log(1 + exp(wT  xi ))   +

i=1


λ



2
 
2 kwk   ,

 

where λ 0 is a constant. Let w∗  be the global minimizer of the objective, and let kw∗ k2 ≤ c, for some constant c 0.

 

(a)  (10 points)  Clearly  show and  explain  the  steps  of the  projected  gradient descent algo- rithm  for optimizing  the  regularized  logistic  regression  objective  function.   The  steps should include an exact  expression for the gradient.

(b)  (5 points)  Is the objective  function  strongly  convex?  Clearly explain your answer using the definition of strong  convexity.

(c)  (5  points)  Is the  objective  function  smooth?    Clearly  explain  your  answer  using  the definition of smoothness.

(d)  (5 points) Let wT  be the iterate  after T steps of the projected  gradient descent algorithm.

What  is a bound  on the difference f (wT ) − f (w∗ )?  Clearly explain all quantities in the

bound.

 

2.  (25 points)  Let X  = {x1 , . . . , xN }, xt ∈ Rd  be a set of n samples drawn  i.i.d. from a mixture of k multivariate Gaussian  distribution in Rd .  For  component  Gi , i = 1, . . . , k, let πi , µi , Σi respectively  denote  the  prior  probability,  mean,  and  covariance  of Gi .   We  will focus on the  expectation maximization (EM)  algorithm  for learning  the  mixture  model, in particular for estimating the  parameters {(π, µi , Σi ), i  = 1, . . . , k} as  well the  posterior  probabilities

ht                   t

i = p(Gi |x ).

 

(a)  (10 points)  In your  own words,  describe  the  EM algorithm  for mixture  of Gaussians, highlighting  the  two key steps  (E- and  M-), illustrating the  methods  used in the  steps on a high level, and what  information  they  need.



i
 
(b)  (10 points)  Assuming the posterior  probabilities ht are known, show the estimates  of the component prior, mean, and covariance  πi , µi , Σi , i = 1, . . . , N given by the M-step (you do not need to show how they  are derived).

(c)  (5 points)  Assuming  the component prior,  mean,  and covariance  πi , µi , Σi , i = 1, . . . , N



i
 
are known, show how the posterior  probabilities ht  are computed  in the E-step.

Programming assignments:

The  next  problem  involve  programming.   For  Question  3, we will be using  the  2-class clas- sification  datasets from Boston50, Boston75, and  for Question  4, we will be using  the  10-class classification  dataset from Digits which were used in Homework 1.  For Q3, we will develop code for 2-class logistic regression with only one set of parameters (w, w0).  For Q4, we will develop code for k-class logistic regression with k sets of parameters (wi , wi0).

 

3.  (25 points)  We will develop  code for 2-class logistic  regression  with  one set  of parameters (w, w0 ).  Assuming the two classes are {C1, C2}, the posterior  probability of class C1  is given by

   exp(wT x + w0) 



0
 
log P (C1|x) = 1 + exp(wT x + w ) ,

 

and P (C2 |x) = 1 − P (C1|x).

We  will develop  code for MyLogisticReg2 with  corresponding  MyLogisticReg2.fit(X,y) and  LogisticReg2.predict(X) functions.   Parameters for the  model can be initialized  fol- lowing suggestions  in the textbook.

We  will compare  the  performance  of MyLogisticReg2 with  LogisticRegression1   on two datasets: Boston50 and  Boston75. Using my cross val with  5-fold cross-validation, report the  error  rates  in each fold as well as the  mean and standard deviation  of error  rates  across all folds for the two methods:  MyLogisticReg2 and LogisticRegression, applied to the two

2-class classification datasets: Boston50 and Boston75.

 

You will have to submit  (a) code and (b) summary of results:

 

(a)  Code: You will have to submit  code for MyLogisticReg2() as well as a wrapper  code

q3().

For MyLogisticReg2(), you are encouraged to consult the code for MultiGaussClassify()

from HW2 (or code for classifiers in scikit-learn). You need to make  sure you have

init , fit, and  predict implemented in MyLogisticReg2.  Your class will NOT  in- herit  any base class in sklearn.

The  wrapper  code (main  file)  has  no  input   and  is  used  to  prepare   the  datasets, and  make  calls  to  my cross val(method,X ,y,k)  to  generate  the  error  rate  results  for each  dataset and  each  method.    The  code  for  my cross val(method,X ,y,k)  must  be yours  (e.g.,  code you made  in HW1 with  modifications  as needed)  and  you cannot  use cross val score() in sklearn. The results  should be printed  to terminal  (not  generat- ing an additional file in the folder).  Make sure the calls to my cross val(method,X ,y,k) are made in the following order and add a print to the terminal  before each call to show which method  and dataset is being used:

1. MyLogisticReg2 with Boston50; 2. MyLogisticReg2 with Boston75; 3. LogisticRegression

with Boston50; 4. LogisticRegression with Boston75.

*For the  wrapper  code, you need to make a q3.py file for it, and  one should be able to run your code by calling ”python q3.py” in command  line window.

 

1 You should  use LogisticRegression from scikit-learn, similar  to HW1 and  HW2.

(b)  Summary of results:  For each dataset and each method,  report  the test  set error rates for each of the k = 5 folds, the mean error rate over the k folds, and the standard deviation of the  error  rates  over the  k folds.  Make a table  to present the  results  for each method and each dataset (4 tables  in total). Each column of the table  represents a fold and add two columns at the end to show the overall mean error rate  and standard deviation  over the k folds.

 

4.  (25  points)  We  will develop  code  for c-class  logistic  regression  with  c sets  of parameters

{(wi , wi0 ), i = 1, . . . , c}. Assuming the c classes are {C1 , C2, . . . , Cc}, the posterior probability of class Ci is given by

      exp(wT x + wi0)   



T
 
log P (Ci |x) = Pc                i                           .

i0 =1 exp(wi0 x + w


i0 0)

We will develop code for MyLogisticRegGen with corresponding  MyLogisticRegGen.fit(X,y) and LogisticRegGen.predict(X) functions.  Parameters for the model can be initialized  fol- lowing suggestions  in the textbook.

We will compare  the performance  of MyLogisticRegGen with LogisticRegression2  on one dataset:  Digits, for which  the  number  of classes  c  = 10.   Using  my cross val with  5- fold  cross-validation,  report   the  error  rates  in  each  fold  as  well  as  the  mean  and  stan- dard  deviation  of error  rates  across  all folds for the  two  methods:   MyLogisticRegGen and LogisticRegression, applied  to the 10-class classification dataset: Digits.

You will have to submit  (a) code and (b) summary of results:

 

(a)  Code: You will have to submit  code for MyLogisticRegGen() as well as a wrapper  code

q4().

For MyLogisticRegGen(), you are encouraged to consult the code for MultiGaussClassify()

from HW2 (or code for classifiers in scikit-learn). You need to make  sure you have

init , fit, and  predict implemented in MyLogisticRegGen.  Your  class will NOT

inherit  any base class in sklearn.

The  wrapper  code (main  file)  has  no  input   and  is  used  to  prepare   the  datasets, and  make  calls  to  my cross val(method,X ,y,k)  to  generate  the  error  rate  results  for each  dataset and  each  method.    The  code  for  my cross val(method,X ,y,k)  must  be yours  (e.g.,  code you made  in HW1 with  modifications  as needed)  and  you cannot  use cross val score() in sklearn. The results  should be printed  to terminal  (not  generat- ing an additional file in the folder).  Make sure the calls to my cross val(method,X ,y,k) are made in the following order and add a print to the terminal  before each call to show which method  and dataset is being used:

1. MyLogisticRegGen with Digits; 2. LogisticRegression with Digits.

*For  the  wrapper  code, you need to create  a q4.py file, and  one should  be able to run your code by calling ”python q4.py” in command  line window.

(b)  Summary of results:  For each dataset and each method,  report  the test  set error rates for each of the k = 5 folds, the mean error rate over the k folds, and the standard deviation of the  error  rates  over the  k folds.  Make a table  to present the  results  for each method and each dataset (2 tables  in total). Each column of the table  represents a fold and add

 

2 You should  use LogisticRegression from scikit-learn, similar  to HW1 and  HW2.

two columns at the end to show the overall mean error rate  and standard deviation  over the k folds.

 

Additional instructions:  Code can only be written  in Python (not IPython notebook);  no other programming languages  will be accepted.    One  should  be able  to  execute  all programs  directly from command  prompt  (e.g.,  “python q3.py”) without  the  need to run  Python interactive shell first.  Test  your code yourself before submission  and  suppress  any warning  messages that may be printed.  Your  code must  be run  on a CSE  lab  machine  (e.g.,  csel-kh1260-01.cselabs.umn.edu). Please make sure you specify the full Python version you are using as well as instructions on how to run your program  in the README  file (must  be readable  through  a text  editor  such as Notepad). Information on the  size of the  datasets, including  number  of data  points  and  dimensionality of features,  as well as number  of classes can be readily extracted from the datasets in scikit-learn. Each  function  must  take  the  inputs  in the  order  specified in the  problem  and  display  the  output via the terminal  or as specified.

For each part,  you can submit  additional files/functions (as needed)  which will be used by the main file. Please put  comments  in your code so that one can follow the key parts  and steps in your code.

Follow the rules strictly. If we  cannot run your code, you  will  not get any credit.

 

•  Things to submit

 

1.  hw3.pdf:  A document which contains  the  solution  to Problems  1, 2, 3 and  4 including the  summary  of results  for 3 and  4.  This  document must  be in PDF  format  (no word, photo,  etc.,  is accepted).  If you submit  a scanned  copy of a hand-written document, make sure the copy is clearly readable,  otherwise  no credit  may be given.

2.  Python code for Problems  3 and 4 (must  include the required  q3.py and q4.py).

3.  README.txt: README  file that contains  your name,  student ID, email, instructions on how to run your code, the full Python version (e.g., Python 2.7) your are using, any assumptions you are making, and any other  necessary details.  The file must be readable by a text  editor  such as Notepad.

4.  Any other  files, except the data,  which are necessary for your code.

More products