Starting from:

$25

EE526-Homework 2 Solved

Problem 1. Prove that the softplus function

                                                                                f(z) = log(1 + ez)                                                    (1)

is convex in z by showing that its second derivative is positive for all z.

Problem 2. Let z denote a vector z = [z1,z2,...,zn]T . Let p = [p1,p2,...,pn]T , such that

                                                                                  .                                                       (2)

That is p is the output when the softmax function is applied to z. Derive the Jacobian matrix

                                                                                                                     .                                                 (3)

Problem 3. Let z denote a “logit” vector z = [z1,z2,...,zn]T . Let p = [p1,p2,...,pn]T , such that

                                                                                                                                         (4)

Let y denote a probability vector y = [y1,y2,...,yn]T such that yi ≥ 0,∀i ∈ [1,n], and = 1. Let J denote the cross entropy between p and y:

                                                                           ,                                               (5)

where log is natural logarithm.

Derive the gradient vector

                                                                                                                                              (6)

Hint: You can either use the Jacobian, or directly take the gradient. If you take the derivative directly, without using the Jacobian matrix, then it is helpful to write log(pi) =



Problem 4. Let a neural network be such that it has two neurons in one single layer. The neurons has two common inputs. The model can be described as

                                                                                      z = Wx + b                                                          (7)

We are given two training points:

(a) When x = [1,0]T , y = [1,0]T . (b) When x = [0,1]T , y = [0,1]T .

Page 1 of 2

Note that y, the output has been one-hot encoded. Let X and Y both be the 2×2 identity matrix, denoting the input and output for the training data. Use softmax on z and use crossentropy as the cost function, run the forward and backward propagation by hand calculation to update W and b for two iterations, with the following conditions:

(a)    Both initialized to all zeros.

(b)    Learning rate is η = 1.

(c)    Use gradient descent.

That is, make two updates of W and b by hand calculation. You need to show the intermediate steps (values of z, dz, p, dW, db, etc).

Problem 5. Design a three layer neural network that has the following specifications:

(a)    Input dimensions: 57 (all real numbers);

(b)   Output dimension: 1 (binary);

(c)    Layer 1: d1 neurons, with ReLU (rectified Linear Unit) nonlinearity;

(d)   Layer 2: d2 neurons, with ReLU nonlinearity;

(e)    Output Layer: 1 neuron, with logistic nonlinearity;

(f)     The objective function is cross-entropy (logistic regression).

Use the same training and test data from the Spambase Data Set. Train the neural network using forward and backward propagation and gradient descent.

You can use the DNN.py program available on Canvas. You need to submit the source program, in Python. Also the results on the learning rate(s) used, and training and test errors, and running time should be reported.

Problem 6.

Using the code available on canvas DNN.py, experiment with doing classification with the MNIST data set, using the following settings:

(a)    Single layer, 10 neurons, softmax + cross entropy objective function.

(b)    Three layers, [(50, ReLU), (50, ReLU), (10, Linear)], softmax + cross entropy objectivefunction.

(c)    Experiment with other neuron settings, with no more than 3 layers, and no more than 150 neurons in total.

More products