# ECE 269 Project solution

ECE 269 Project Description

List of Projects

1. Sparse Dictionary Learning: In many signal processing problems, the objective is to represent a

signal as a linear combination of known vectors (also called atoms), and these atoms form a dictionary.

Let Φ ∈ CM×N be a dictionary and φi

, i = 1, 2, · · · , N are the atoms. We seek to represent a signal y

as a linear combination of these atoms, i.e find coefficients xi such that y =

PN

i=1 xiφi which essentially

requires us to find solution to the following system of linear equations:

y = Φx (1)

If the dictionary Φ is chosen to be overcomplete (N M) containing a basis, it is possible to represent

any signal y ∈ CM. However, the system of linear equations (1) will have infinite number of solutions

in x. Thus, we need the notion of sparsity where the objective is to look for the sparsest x (i.e. x

with minimum number of non-zero elements) which satisfies (1). This amounts to solving the following

optimization problem:

min

x

||x||0 (2)

subject to y = Φx

1

Popular examples of the dictionary Φ are DFT, DCT or Wavelet dictionaries. The ability to obtain a sparse representation depends on the class of signals, as well as the choice of the dictionary.

This is the motivation to understand if it is possible to “learn” a dictionary given a certain class

of input signals. Let yi

, i = 1, · · · , k be training samples from which we want to learn the dictionary Φ such that it permits sparse representation for every yi with sparsity at most L. Let

X =

x1 x2 · · · xk

, Y =

y1 y2 · · · yk

. Dictionary learning can be cast as the following

optimization problem:

min

X,Φ

||Y − ΦX||F (3)

subject to ||xi

||0 ≤ L

In this project you are expected to do the following:

(a) Read the papers listed in the reference and reformulate the sparse representation problem when

noise is added to the signals.

(b) Discuss at least three commonly used techniques in Dictionary learning. State the assumptions

that are necessary for ensuring uniqueness of this problem. Is the decomposition Y = ΦX always

unique, or are there some trivial ambiguities ?

(c) Given a fixed dictionary, implement Orthogonal Matching Pursuit (OMP) to obtain a sparse

representation for any given signal. Is OMP always successful ? Show numerical examples.

(d) Implement k-SVD to generate adaptive dictionary using a training dataset of clean images. You

may want to try synthetic data first before using real data.

(e) One of the applications of signal representation using overcomplete dictionaries is de-noising. Add

Gaussian noise to a test image and obtain a sparse representation using the dictionary learned in

(d). Is the reconstructed image noisy ? Explain your observation. Compare the reconstruction

quality with fixed dictionary and adaptive dictionary. Show the effect of addition of noise at

different levels and different noise models (Gaussian, Poisson etc). Plot the dictionary that you

learned using k-SVD.

(f) (Bonus:) In practical applications, you may not have access to clean images and thus, you may

need to learn the dictionary from the corrupted data itself. Paper (iii) provides an algorithm to

achieve this by using k-SVD. Study and implement the algorithm.

References:

i M. Aharon, M. Elad, and A. M. Bruckstein. “On the uniqueness of overcomplete dictionaries,

and a practical way to retrieve them.” Linear algebra and its applications, vol. 416, no. 1,

pp.48-67,2006.

ii M. Aharon, M. Elad, and A. Bruckstein A. “k-SVD: An algorithm for designing overcomplete

dictionaries for sparse representation.” IEEE Transactions on signal processing, vol. 54, no. 11,

pp. 4311-22, 2006.

iii M. Elad, M. Aharon. “Image denoising via sparse and redundant representations over learned

dictionaries.” IEEE Transactions on Image processing, vol. 15, no. 12, pp. 3736-45, 2006.

iv D. P. Wipf, and B. D. Rao. “Sparse Bayesian learning for basis selection.” IEEE Transactions on

Signal processing, vol. 52, no. 8, pp.2153-2164, 2004.

v J. A. Tropp, and A. C. Gilbert, “Signal recovery from random measurements via orthogonal

matching pursuit.” IEEE Transactions on information theory, vol. 53, no. 12, pp. 4655-4666,2007.

vi Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad. “Orthogonal matching pursuit: Recursive

function approximation with applications to wavelet decomposition.” In Conference Record of

The Twenty-Seventh Asilomar Conference on Signals, Systems and Computers. pp. 40-44. IEEE,

1993.

2

2. Recommender system using Matrix Factorization:

A recommender system tries to model the preference of a user by analyzing the usage pattern and

make appropriate suggestions. Recommendation systems have become ubiquitous in many areas such

as movies, music, e-commerce. The data for a recommendation system is often stored as a matrix

A ∈ RM×N where tM users provide ratings for N items. Matrix factorization models aim to decompose

this rating/preference matrix as a product of two matrices.

A = PQ, P ∈ RM×K, Q ∈ R

K×N

This factorization of the matrix allows us to map each user as well as each item to a common latent

space of dimension K ≤ min(M, N). The resulting dot product, p

T

i

qj , captures the interaction between

user-i and item j that will allow the recommender system to make predictions. In this project you are

expected to do the following:

(a) Read the papers listed in the reference and discuss different techniques used for designing a

recommendation system.

(b) Read (i) and (iii). Using real data, implement a recommendation system using the Singular Value

Decomposition (SVD) based technique. In most cases, the matrix A is incomplete. Can you

directly use SVD or do you need to make certain assumptions?

(c) Compare the performance of your implementation of (b) with alternative approaches given in (a).

Explain the performance evaluation metrics used.

(d) Does the implementation in (b) take care of new users/items ? Suggest a modification in (b) to

incorporate addition of new items (and users) with proper mathematical justification. Compare

the performance of this on-line system with the off-line version.

(e) (Bonus:) There are alternative ways to deal with the problem of incomplete data matrix. Reference (ii) provides two algorithms based on Stochastic Gradient Descent and Alternating Least

Squares. Study and implement these algorithms.

References:

i P. Cremonesi, Y. Koren, and R. Turrin. “Performance of recommender algorithms on top-n

recommendation tasks.” In Proceedings of the fourth ACM conference on Recommender systems,

pp. 39-46. ACM, 2010.

ii Y. Koren, R. Bell, and C. Volinsky. “Matrix factorization techniques for recommender systems.”

Computer vol. 42, no. 8 (2009).

iii B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. “Application of dimensionality reduction in

recommender system-a case study”, innesota Univ Minneapolis Dept of Computer Science 2000.

iv B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. “Incremental singular value decomposition algorithms for highly scalable recommender systems.” In Fifth International Conference on Computer

and Information Science, pp. 27-28. Citeseer, 2002.

v M. W. Berry, S. T. Dumais, and G. W. O’Brien. “Using linear algebra for intelligent information

retrieval.” SIAM review, vol.37, no. 4, pp. 573-595, 1995.

Dataset Suggestion: Jester (dimensions 24983 X 101)- http://www.ieor.berkeley.edu/ goldberg/jesterdata/

3. Robust Linear Regression

This paper studies a robust linear regression problem based on OMP.

3

(a) Read the entire paper carefully (including the appendix) and implement the original GARD

algorithm (see Algorithm 1 on page 4). The authors further use Cholesky decomposition to speed

up their algorithm. Try to use QR decomposition as well as Matrix Inversion Lemma instead of

Cholesky decomposition for faster implementation.

(b) Compare GARD (one of your fast implementations) with least squares and M-estimators (see

MATLAB robustfit function, and choose any two weight functions) by conducting the experiments

described in section V.A, V.C, V.D of the paper (no need to compare with other methods listed

in the paper).

(c) In experiment section V.A, compare the run-time of all versions of GARD (original, QR, Matrix Inversion Lemma) with least squares and M-estimators. Finally, test these methods on any two real

datasets given below: http://people.sc.fsu.edu/ jburkardt/datasets/regression/regression.html (x01.txtx28.txt) E.g., The selling price of houses is to be represented as a function of 11 variables:

http://people.sc.fsu.edu/ jburkardt/datasets/regression/x26.txt

References:

i G. Papageorgiou, P. Bouboulis, S. Theodoridis, and K. Themelis, “Robust linear regression analysis - a greedy approach.” IEEE Transactions on Signal Processing vol. 63,pp. 3872-3887, 2015.

4. Transceiver Design via Generalized Triangular Decomposition (GTD)

These two papers introduce the idea of Generalized Triangular Decomposition (GTD) that provides the

most general unitary decomposition of a complex matrix. Several known decompositions such as the

Singular Value Decomposition (SVD) and the QR Decomposition become special cases of GTD. Reference (2) establishes the mathematical foundations of GTD (such as its existence) and algorithms for

computing GTD, whereas reference (1) shows how GTD plays a central role in transceiver optimization

for MIMO channels. In this project, you will

(a) Implement the GTD algorithm and use the GMD decomposition for transceiver design with the

zero-forcing VBLAST (ZF-VBLAST) detector, following the method described in Sec IV and V

of reference (1).

(b) Produce the BER plots similar to Fig 3, 4, 5 in reference (1).

(c) Test your code and verify that SVD and QR decomposition are special cases of GTD. Note:

Although reference (2) contains MATLAB codes for GTD, you should not use it, but develop

your own MATLAB code for implementing GTD.

i Y. Jiang, J. Li, W.W. Hager. “Joint transceiver design for MIMO communications using geometric

mean decomposition.” IEEE Transactions on Signal Processing, vol. 53, pp. 3791-3803, 2005.

ii Y. Jiang, W. Hager, J. Li. “The generalized triangular decomposition.” Mathematics of computation vol. 77, pp. 1037-1056, , 2008.

4

List of Projects

1. Sparse Dictionary Learning: In many signal processing problems, the objective is to represent a

signal as a linear combination of known vectors (also called atoms), and these atoms form a dictionary.

Let Φ ∈ CM×N be a dictionary and φi

, i = 1, 2, · · · , N are the atoms. We seek to represent a signal y

as a linear combination of these atoms, i.e find coefficients xi such that y =

PN

i=1 xiφi which essentially

requires us to find solution to the following system of linear equations:

y = Φx (1)

If the dictionary Φ is chosen to be overcomplete (N M) containing a basis, it is possible to represent

any signal y ∈ CM. However, the system of linear equations (1) will have infinite number of solutions

in x. Thus, we need the notion of sparsity where the objective is to look for the sparsest x (i.e. x

with minimum number of non-zero elements) which satisfies (1). This amounts to solving the following

optimization problem:

min

x

||x||0 (2)

subject to y = Φx

1

Popular examples of the dictionary Φ are DFT, DCT or Wavelet dictionaries. The ability to obtain a sparse representation depends on the class of signals, as well as the choice of the dictionary.

This is the motivation to understand if it is possible to “learn” a dictionary given a certain class

of input signals. Let yi

, i = 1, · · · , k be training samples from which we want to learn the dictionary Φ such that it permits sparse representation for every yi with sparsity at most L. Let

X =

x1 x2 · · · xk

, Y =

y1 y2 · · · yk

. Dictionary learning can be cast as the following

optimization problem:

min

X,Φ

||Y − ΦX||F (3)

subject to ||xi

||0 ≤ L

In this project you are expected to do the following:

(a) Read the papers listed in the reference and reformulate the sparse representation problem when

noise is added to the signals.

(b) Discuss at least three commonly used techniques in Dictionary learning. State the assumptions

that are necessary for ensuring uniqueness of this problem. Is the decomposition Y = ΦX always

unique, or are there some trivial ambiguities ?

(c) Given a fixed dictionary, implement Orthogonal Matching Pursuit (OMP) to obtain a sparse

representation for any given signal. Is OMP always successful ? Show numerical examples.

(d) Implement k-SVD to generate adaptive dictionary using a training dataset of clean images. You

may want to try synthetic data first before using real data.

(e) One of the applications of signal representation using overcomplete dictionaries is de-noising. Add

Gaussian noise to a test image and obtain a sparse representation using the dictionary learned in

(d). Is the reconstructed image noisy ? Explain your observation. Compare the reconstruction

quality with fixed dictionary and adaptive dictionary. Show the effect of addition of noise at

different levels and different noise models (Gaussian, Poisson etc). Plot the dictionary that you

learned using k-SVD.

(f) (Bonus:) In practical applications, you may not have access to clean images and thus, you may

need to learn the dictionary from the corrupted data itself. Paper (iii) provides an algorithm to

achieve this by using k-SVD. Study and implement the algorithm.

References:

i M. Aharon, M. Elad, and A. M. Bruckstein. “On the uniqueness of overcomplete dictionaries,

and a practical way to retrieve them.” Linear algebra and its applications, vol. 416, no. 1,

pp.48-67,2006.

ii M. Aharon, M. Elad, and A. Bruckstein A. “k-SVD: An algorithm for designing overcomplete

dictionaries for sparse representation.” IEEE Transactions on signal processing, vol. 54, no. 11,

pp. 4311-22, 2006.

iii M. Elad, M. Aharon. “Image denoising via sparse and redundant representations over learned

dictionaries.” IEEE Transactions on Image processing, vol. 15, no. 12, pp. 3736-45, 2006.

iv D. P. Wipf, and B. D. Rao. “Sparse Bayesian learning for basis selection.” IEEE Transactions on

Signal processing, vol. 52, no. 8, pp.2153-2164, 2004.

v J. A. Tropp, and A. C. Gilbert, “Signal recovery from random measurements via orthogonal

matching pursuit.” IEEE Transactions on information theory, vol. 53, no. 12, pp. 4655-4666,2007.

vi Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad. “Orthogonal matching pursuit: Recursive

function approximation with applications to wavelet decomposition.” In Conference Record of

The Twenty-Seventh Asilomar Conference on Signals, Systems and Computers. pp. 40-44. IEEE,

1993.

2

2. Recommender system using Matrix Factorization:

A recommender system tries to model the preference of a user by analyzing the usage pattern and

make appropriate suggestions. Recommendation systems have become ubiquitous in many areas such

as movies, music, e-commerce. The data for a recommendation system is often stored as a matrix

A ∈ RM×N where tM users provide ratings for N items. Matrix factorization models aim to decompose

this rating/preference matrix as a product of two matrices.

A = PQ, P ∈ RM×K, Q ∈ R

K×N

This factorization of the matrix allows us to map each user as well as each item to a common latent

space of dimension K ≤ min(M, N). The resulting dot product, p

T

i

qj , captures the interaction between

user-i and item j that will allow the recommender system to make predictions. In this project you are

expected to do the following:

(a) Read the papers listed in the reference and discuss different techniques used for designing a

recommendation system.

(b) Read (i) and (iii). Using real data, implement a recommendation system using the Singular Value

Decomposition (SVD) based technique. In most cases, the matrix A is incomplete. Can you

directly use SVD or do you need to make certain assumptions?

(c) Compare the performance of your implementation of (b) with alternative approaches given in (a).

Explain the performance evaluation metrics used.

(d) Does the implementation in (b) take care of new users/items ? Suggest a modification in (b) to

incorporate addition of new items (and users) with proper mathematical justification. Compare

the performance of this on-line system with the off-line version.

(e) (Bonus:) There are alternative ways to deal with the problem of incomplete data matrix. Reference (ii) provides two algorithms based on Stochastic Gradient Descent and Alternating Least

Squares. Study and implement these algorithms.

References:

i P. Cremonesi, Y. Koren, and R. Turrin. “Performance of recommender algorithms on top-n

recommendation tasks.” In Proceedings of the fourth ACM conference on Recommender systems,

pp. 39-46. ACM, 2010.

ii Y. Koren, R. Bell, and C. Volinsky. “Matrix factorization techniques for recommender systems.”

Computer vol. 42, no. 8 (2009).

iii B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. “Application of dimensionality reduction in

recommender system-a case study”, innesota Univ Minneapolis Dept of Computer Science 2000.

iv B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. “Incremental singular value decomposition algorithms for highly scalable recommender systems.” In Fifth International Conference on Computer

and Information Science, pp. 27-28. Citeseer, 2002.

v M. W. Berry, S. T. Dumais, and G. W. O’Brien. “Using linear algebra for intelligent information

retrieval.” SIAM review, vol.37, no. 4, pp. 573-595, 1995.

Dataset Suggestion: Jester (dimensions 24983 X 101)- http://www.ieor.berkeley.edu/ goldberg/jesterdata/

3. Robust Linear Regression

This paper studies a robust linear regression problem based on OMP.

3

(a) Read the entire paper carefully (including the appendix) and implement the original GARD

algorithm (see Algorithm 1 on page 4). The authors further use Cholesky decomposition to speed

up their algorithm. Try to use QR decomposition as well as Matrix Inversion Lemma instead of

Cholesky decomposition for faster implementation.

(b) Compare GARD (one of your fast implementations) with least squares and M-estimators (see

MATLAB robustfit function, and choose any two weight functions) by conducting the experiments

described in section V.A, V.C, V.D of the paper (no need to compare with other methods listed

in the paper).

(c) In experiment section V.A, compare the run-time of all versions of GARD (original, QR, Matrix Inversion Lemma) with least squares and M-estimators. Finally, test these methods on any two real

datasets given below: http://people.sc.fsu.edu/ jburkardt/datasets/regression/regression.html (x01.txtx28.txt) E.g., The selling price of houses is to be represented as a function of 11 variables:

http://people.sc.fsu.edu/ jburkardt/datasets/regression/x26.txt

References:

i G. Papageorgiou, P. Bouboulis, S. Theodoridis, and K. Themelis, “Robust linear regression analysis - a greedy approach.” IEEE Transactions on Signal Processing vol. 63,pp. 3872-3887, 2015.

4. Transceiver Design via Generalized Triangular Decomposition (GTD)

These two papers introduce the idea of Generalized Triangular Decomposition (GTD) that provides the

most general unitary decomposition of a complex matrix. Several known decompositions such as the

Singular Value Decomposition (SVD) and the QR Decomposition become special cases of GTD. Reference (2) establishes the mathematical foundations of GTD (such as its existence) and algorithms for

computing GTD, whereas reference (1) shows how GTD plays a central role in transceiver optimization

for MIMO channels. In this project, you will

(a) Implement the GTD algorithm and use the GMD decomposition for transceiver design with the

zero-forcing VBLAST (ZF-VBLAST) detector, following the method described in Sec IV and V

of reference (1).

(b) Produce the BER plots similar to Fig 3, 4, 5 in reference (1).

(c) Test your code and verify that SVD and QR decomposition are special cases of GTD. Note:

Although reference (2) contains MATLAB codes for GTD, you should not use it, but develop

your own MATLAB code for implementing GTD.

i Y. Jiang, J. Li, W.W. Hager. “Joint transceiver design for MIMO communications using geometric

mean decomposition.” IEEE Transactions on Signal Processing, vol. 53, pp. 3791-3803, 2005.

ii Y. Jiang, W. Hager, J. Li. “The generalized triangular decomposition.” Mathematics of computation vol. 77, pp. 1037-1056, , 2008.

4

Starting from: $34.99

You'll get 1 file (6.9MB)