Starting from:

$30

CS6643-Final Exam Solved

1.  In the basic stereo imaging setup below, the origin of the world coordinate system W is located at the lens center of the left camera. The distance between the lens centers of the two cameras is 12 cm. The two cameras have a focal length of 50 mm and the sensor chips (real image planes) of the cameras have a physical size of 1.2 cm × 1.2 cm. The output of the cameras is a pair of digital stereo images, each of size 512 × 512 pixels. The tip of vertical pole # 1 appears in the left image at pixel location (𝑖𝑖, 𝑗𝑗) = (185,125) and appears in the right image at location (𝑖𝑖, 𝑗𝑗) =  (185,115). The tip of vertical pole # 2 appears in the left image at pixel location (𝑖𝑖, 𝑗𝑗) = (185,179) and appears in the right image at location (𝑖𝑖, 𝑗𝑗) = (185,169). Compute the horizontal distance between the tips of the two poles in the world coordinate system (horizontal distance = distance in the 𝑥𝑥 direction.) Show all work to get full credits. (The integer image plane uses the i-j coordinate system with i going from top to bottom and j going from left to right.)

 

  

 

2.  We would like to use a minimum-distance classifier formulated using linear discriminant functions 𝐷𝐷𝑖𝑖(𝑋𝑋) to classify input X into one of three classes. The prototype vectors for the three classes are given below. Find the equation of the decision boundary between classes 1 and 3 and simplify the equation into an algebra equation (not matrix equation) and then plot the decision boundary as a graph. 

             

  

         

3.  Given an input grayscale image, we would like to use Harris Corner Detector to detect interest points from the image. Write the pseudo code to compute the Local Structure Matrix A of the image at every pixel location. Do not write more than 10 lines in your pseudo code.  

 

4.  ] We would like to use the signed representation of the Histogram of Oriented Gradients (HOG) descriptor to detect human in images. In the signed representation, the histogram has 18 bins.

 

(a)   What is the dimension of the descriptor if we assume the following parameter settings:

detection window size = 296 x 168 pixels (rows x columns), cell size = 8 x 8 pixels, block size = 3 x 3 cells, and block overlap = 8 pixels.

(b)   The bin centers for the 18 histogram bins, the gradient magnitudes and gradient angles of an 8 x 8 cell are as given below, compute the histogram of the cell (before block normalization.)  

 

 

Bin # 









10 
11 
12 
13 
14 
15 
16 
17 
18 
Bin centers (in degrees) 

20 
40 
60 
80 
100 
120 
140 
160 
180 
200 
220 
240 
260 
280 
300 
320 
340 
 

 



























220 









180 




120 






















            Gradient Magnitudes 

 

200 
45 
23 
98 
130 
260 
255 
250 
125 
295 
85 
90 
130 
265 
249 
240 
123 
35 
85 
95 
125 
260 
250 
240 
100 
90 
45 
90 
120 
265 
240 
230 
95 
99 
105 
106 
355 
120 
100 
110 
90 
205 
110 
120 
120 
130 
125 
120 
85 
90 
100 
110 
110 
120 
120 
110 
80 
80 
100 
110 
100 
100 
100 
110 
                    Gradient Angles 

 

5.  Suppose we have already computed the normalized co-occurrence matrix 𝑃𝑃[𝑖𝑖, 𝑗𝑗] of an input image using displacement vector 𝑑𝑑 = (𝑑𝑑𝑑𝑑, 𝑑𝑑𝑑𝑑), can we obtain the normalized co-occurrence matrix 

𝑃𝑃′[𝑖𝑖, 𝑗𝑗] for displacement vector 𝑑𝑑′ = (−𝑑𝑑𝑑𝑑, −𝑑𝑑𝑑𝑑) without referring to the original input image? If so, how do we do that? Do not write more than six sentences. (Hint: displacement vector 𝑑𝑑′ has the same magnitude as d but in the opposite direction.) 

 

6.  Consider the camera coordinate system C and the world coordinate system W as

shown in the figure below. The origin of the camera coordinate system is located at 𝑤𝑤(𝑥𝑥,𝑦𝑦,𝑧𝑧)=𝑤𝑤(6,2,0) with respect to the world coordinate system. The x axis of the camera coordinate system is parallel to the y axis of the world coordinate system, the y axis of the camera coordinate system is parallel but points in the opposite direction of the x axis of the world coordinate system, and the z axis of the camera coordinate system is parallel to the z axis of the world coordinate system. The camera has a focal length of 45 mm and the real image plane (𝑥𝑥′, 𝑦𝑦′) of the camera is of size 1 cm × 1 cm. The real image plane is digitized into a digital image of size 1024 × 1024 pixels. Derive the 𝟑𝟑 × 𝟒𝟒 camera transform that transforms points in the world coordinate system to the pixel coordinate system of the camera. 

 

Note: Assume that the real image plane has origin at the lower left corner, with the 𝑥𝑥′ axis pointing to the right and the 𝑦𝑦′ axis pointing upward. The digital image plane has origin (0,0) at the upper left corner, with the i axis pointing downward and the j axis pointing to the right. The range for both i and j is [0, 1023].

 

  

 

7.  In the LeNet-5 convolutional neural network below, (a) what is the total number of links between the input layer and the C1 layer? (b) How many different parameters need to be trained for the links between the input layer and the C1 layer?

 

  

 

8.  A deep neural network has been designed to classify the input into one of five classes. The final output layer of the network is a Softmax layer. Suppose the input to the Softmax layer is [0 7 5 0 1]𝑇𝑇, what are the final outputs of the neural network?

 

Hint: the formula for the Softmax function is:  

 

               

 

9.  In the Eigenface method for face recognition, we compute the distance between an input face and its reconstruction as 𝑑𝑑0 = dist(𝐼𝐼𝑅𝑅⃗, 𝐼𝐼⃗). The distance between an input face image and its reconstruction should be small. Explain why the distance will be large for a non-face input image. Do not write more than six sentences.

More products