Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
1ECE 273 Course Projects
General Instructions
1) Deadline for Project Report: June 5, 2021, 11:59 pm. Online Submission on Canvas.
2) Guidelines: You should work individually on the project and choose one topic from the list provided below.
Alternatively, you can define your own project (see announcement from Prof. Pal on Canvas). You are expected
to submit a project report, along with figures, as well as MATLAB/Python source codes, and the datasets with
proper citations. You will also present your project results in the form of a short presentation ( „ 15 minutes per
student) which will be scheduled during the finals week. You are welcome to compose your report in Jupyter
Notebook with live code and visualizations. Wherever applicable, you should write your own code. Any
plagiarism will be considered a violation of academic integrity and will be dealt with accordingly. Ideally, we
should be able to reproduce the exact results as those in your report; otherwise you will lose points. Negative
results are welcome, but be sure to discuss them in your report. The report should include the following
sections:
‚ Objective/Goal of the project.
‚ Background and motivation: You should cite relevant references, explain why the problem is important,
and its applications. You should also cite the datasets being used.
‚ Method/Techniques.
‚ Results: Include all figures; you can also use tables to present your results, especially for comparing
multiple algorithms.
‚ Discussion: This section will provide a critical discussion of your results, and evaluate the strengths and
weaknesses of different algorithms and the associated trade-offs.
‚ Dataset: In your report, you should include a web link to all datasets you use (if any) and give proper
credit. If the datasets are not publicly accessible, submit them with your report in a separate file. If you
have more than one file, pack all files into a .zip file. Dont use the formats (like .rar) that require specific
software to unpack your file.
‚ You will use CVX (for MATLAB) or CVXPY (for Python) which come with inbuilt solvers (optimized
for respective platforms) for convex problems. Since CVX is a software package meant for modeling, you
will only need to formulate the optimization problem in a standard form using MATLAB/Python syntax.
Refer to your syllabus for more details for resources on how to quickly get started with CVX/CVXPY.
List of Topics
1. Topic 1: Blind Deconvolution
References:
1) Ahmed, A. and Recht, B. (2014). “Blind Deconvolution Using Convex Programming” IEEE Transactions
on Signal Processing,vol. 60, no. 3, Mar. 2014..
Blind deconvolution is the problem of recovering two signals given the output of their convolution. The
signals can not be recovered unless certain restrictions are imposed on them. In this project, you will study
and implement convex algorithms for solving the blind deconvolution problem under suitable conditions, and
evaluate their performance guarantees.
– Study the paper by Ahmed, Recht and Romberg to understand the formulation of the blind deconvolution
problem. Why is the solution to the problem non-unique in general? Can you explain what kind of
ambiguities exist? Under what conditions on the signals (sparsity, subspace structure etc), the problem
can permit unique solutions? Explain why.
– Why is the problem non-convex? Explain the main idea in this paper under which the measurements can be
expressed as a linear function of a suitable transformed variable. How is this transformed variable related
to the constituent signals, and given this transformed variable, how can one obtain the constituent signals?
2Does the variable transformation make the problem convex? What additional relaxations are necessary to
cast it as a convex problem?
– What are the theoretical guarantees of the convex algorithm proposed in the paper? Are they deterministic,
or probabilistic? What should be the relation of the number of measurements, sparsity and the dimension
of the subspace, so that exact (or stable, in presence of noise) recovery is possible? Clearly state the
mathematical guarantees and spell out the assumptions.
– Implement the convex algorithm proposed in the paper. You should be able to recreate the figures. Test
the algorithm by generating the phase transition plots which plots the probability of successful recovery
as a function of number of measurements, and the sparsity and/or dimension of the subspace. Do you see
a sharp region around which the probability changes from close to zero, to close to 1? Compare this with
the theoretical guarantees from the paper.
– Compare performance of the blind deconvolution algorithm against non-blind deconvolution (i.e. where
you are given one of the constituent signals). Do a literature survey on alternating minimization techniques
for blind deconvolution (which are non-convex), and implement one algorithm of your choice. Compare
the performance of convex and non-convex blind deconvolution.
– Design numerical experiments to evaluate the performance of the algorithm when one (or more) conditions
(such as sparsity, low rank) are violated. Can you comment on the robustness of the algorithm against
such violations?
– (BONUS): Extend the algorithm to perform blind deconvolution in 2D. One example is image deblurring.
Test the algorithm on an image of your choice (by blurring it with a suitable kernel).
2. Topic 2: Low-Rank Matrix Recovery From Quadratic Measurements (Application in Polynomial Neural
Network)
References
1) Soltani, Mohammadreza, and Chinmay Hegde. “Towards provable learning of polynomial neural networks
using low-rank matrix estimation”. International Conference on Artificial Intelligence and Statistics. 2018.
2) Candes, Emmanuel J., and Yaniv Plan. “Tight oracle inequalities for low-rank matrix recovery from a
minimal number of noisy random measurements.” IEEE Transactions on Information Theory 57.4 (2011):
2342-2359.
3) Recht, B., Fazel, M. and Parrilo, P.A., 2010. “Guaranteed minimum-rank solutions of linear matrix equations
via nuclear norm minimization”. SIAM review, 52(3), pp.471-501.
4) Goldstein, Tom, Christoph Studer, and Richard Baraniuk. “FASTA: A generalized implementation of forward-
backward splitting.” arXiv preprint arXiv:1501.04979 (2015).
In [1], the authors showed that training a neural network with one hidden layer consisting of fewer number of
units compared to the size of input, and with quadratic activation function can be modeled as low-rank matrix
recovery problem with noisy quadratic measurements.
min
LPRpˆp F pLq “
1
2m
mÿ
i“1
pyi ´ xTi Lxiq2
s.t. rankpLq ď r (1)
where tpyi,xiqumi“1,xi P Rp is the input data set, r is the number of hidden units with the activation function
σpzq “ z2. The ground truth matrix L˚ is constructed from the weights in the first and second layers of the
corresponding neural network in the following form:
L˚ “
rÿ
j“1
αjwjwTj
where wj is the weight vector in the jth hidden unit and α P Rr is the weight vector of the second layer.
This optimization problem is not a convex problem (Why?), however, there are approaches which first convert
the original problem to a simpler convex problem and then solve it as a convex optimization problem. The
3authors in [2,3] have proposed a convex relaxation of the low-rank constraint based on the nuclear norm of a
matrix. In particular, consider the following problem:
y “ ApLq ` n (2)
where L P Rpˆp is an unknown matrix, A : Rpˆp Ñ Rm is a linear mapping,
rApLqsi “ xTi Lxi
and n P Rm denotes additive noise. The desired matrix L˚ (which has p2 unknown entries) can be recovered
from (compressive) measurements (with m ! p2) by solving the following convex problem:
min
LPRpˆp }L}˚
s.t. }A˚peq} ď λ
e “ y ´ApLq (3)
where A˚ is the adjoint operator of A,
A˚peq “
mÿ
i“1
eixix
T
i
}.} is the operator norm, }.}˚ is its dual (i.e the nuclear norm), and λ is a constant which is dependent on the
size of L and noise.
In this project, you are expected to do all of the followings (exlcuding BONUS):
– Read the references and explain why solving the problem (1) is equal to training the corresponded network?
Why is it not a convex optimization problem? Why is problem (3) a suitable convex relaxation for problem
(2)?
– Generate a random data set txiumi“1 from a normal distribution (you can choose the mean and the
covariance), and one ground truth network (Based on the given architecture in [1]) which generates
the output yi corresponding to the input xi. Implement Algorithm 1 in [1] and test it on the data set you
have generated. Try data sets with different sizes and different initialization. Can you recover the ground
truth L˚ (or equivalently, the network)? Plot the probability of success by varying the values of m, p and
r, as well as different initializations (you can refer to [1] for some representative plots). Comment on
your results. Add noise to the measurements and try different noise powers. Report where the algorithm
fails and succeeds.
– Formulate the problem of training neural network (or equivalently, finding L˚) as a convex problem,
based on [2] and [3]. Use the data set you generated in the last part and solve this convex problem with
CVX/CVXPY library. You may need to tune λ by trying different values and cross validating against the
training dataset (Read [2] for suitable ways to choose λ). Compare your result with previous solutions
and plots, including the need for initialization. Try different sizes of data sets. How accurate is the convex
relaxation for this problem? How sensitive is it to the size of data set and noise?
– Generate a new data set where each xi can come from two different Gaussian distributions (representing
two classes) with two different means (you can keep the covariance same if you wish). Assign each data
point a binary label based on its class. (Note that there is no ground truth network in this case). Run
the two algorithms that you have implemented in the last two parts (non-convex and convex) to train the
polynomial neural network (which is also equivalent to learning a matrix L). Generate some new test data
points from your classes and evaluate the performance of the learned network on the test dataset. Are
these algorithms able to classify the data? If not, why? Try to justify.
– (BONUS) If you were able to solve a binary classification problem on the random data set in previous
question, try to apply your algorithm on the MNIST dataset (You can restrict your dataset to only 0 and
1 digits). Is there any way to extend this idea for multi-class classification? How?
43. Topic 3: 1-bit Compressed Sensing
References:
1) Plan, Y., and Vershynin, R. (2013) “Robust 1-bit compressed sensing and sparse logistic regression: A convex
programming approach”, IEEE Transactions on Information Theory, 59(1), 482-494.
2) Chen, L., Hassani, S. H., and Karbasi, A. (2017) “Near-Optimal Active Learning of Halfspaces via Query
Synthesis in the Noisy Setting”, In AAAI (pp. 1798-1804).
Compressed Sensing (sparse signal recovery) has been a popular research topic in signal processing since
early 90s. It has many applications in image processing, machine learning, communication, radar, and medical
imaging. The main goal of Compressed Sensing is to recover a sparse signal x (}x}0 ď s) from linear
measurements of the form y “ Ax where A is a measurement matrix. For many practical applications, infinite-
precision linear measurements are not available. Instead, linear measurements are collected with finite precision
(quantization). In the extreme case with 1-bit quantization, the noiseless 1-bit compressed sensing measurements
is modeled as
yi “ signpaTi xq, i “ 1, 2, ¨ ¨ ¨m (4)
In this project, you are expected to do the following:
– Find two different applications of 1-bit Compressive Sensing. Describe why 1-bit is important in those
applications. Examine their noise models and comment on whether you think the noise is modelled
reasonably in each application. Cite references in your report
– Read [1] and compute, by simulation, the Gaussian mean width of the sparse signal set wpSn,sq. Fix s “ 5
and plot wpSn,sq versus n with its upper bound and lower bound given by Lemma 2.3
– Let S1n,s “ tx : x P Sn,s, xi ě 0u, S2n,s “ tx : x P Sn,s, xi “ c or 0u and S3n,s “ tx : x P Sn,s, xi “
˘c or 0u Plot wpS1n,sq, wpS2n,sq and wpS3n,sq with (b). Interpret the result by using Theorem 1.1 in [1]
– Explain why the relaxation from l0-norm to l1-norm is reasonable, by using the Gaussian mean width,
(III.1) and Theorem 1.1
– In [2], the measurement/observation is also t`1,´1u. Compare the problem in [1] and [2] and describe
the mathematical differences between them
– By using the noise model in [2], implement both algorithms in [1] and [2] and compare them numerically
for s “ 1 to s “ n with various n, which one is better and why.
– (BONUS) Develop an algorithm that incorporates the sparse constraint s into the algorithm in [2] and
compare your algorithm with [1].
4. Topic 4: Coordinate Descent, Dykstra’s Algorithm and LASSO
References:
1) R. J. Tibshirani, “Dykstra’s algorithm, ADMM, and Coordinate Descent: Connections, insights, and ex-
tensions”, in Advances in Neural Information Processing Systems (NeurIPS), I. Guyon, U. V. Luxburg, S.
Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30. Curran Associates, Inc.,
2017.
2) Tseng, Paul, “Convergence of a block coordinate descent method for nondifferentiable minimization”, Journal
of optimization theory and applications 109.3 (2001): 475-494.
3) Hastie, Trevor, Robert Tibshirani, and Martin Wainwright, “Statistical learning with sparsity: the LASSO
and generalizations”. CRC press, 2015.
In this project, you will implement two different algorithms: Dykstra and Block Coordinate descent to solve
certain types of optimization problems. You are expected to implement those algorithms (i.e. write the codes
on your own) on different problems and study their connections. Consider the following two types of problems
(P1) and (P2)
min
β P Rp
1
2
}y ´Xβ}22 `
dÿ
i“1
hipβq (P1)
5where y P Rn,X P Rnˆp, hi : Rp Ñ R
and
min
z P Rn }y ´ z}2 (P2)
s.t. z P C1 X C2 X . . .X Cd
where Ci’s are closed convex sets.
1) Review coordinate-descent algorithm. Briefly describe how and why it works. What are the conditions on
hi for coordinate-descent to succeed on pP1q?
2) Review Dykstra’s algorithm. Describe for what kind of problems it works, and how. How is it different
from alternating projection algorithm? When are they equivalent?
3) What are the relationships between pP1q and pP2q? Under what conditions are they equivalent? When they
are equivalent, how are Dykstra’s algorithm and Coordinate Descent related?
4) Consider the problem pP1q when hi “ λ|βi| and d “ p. This problem is called LASSO. This problem is
often used to recover sparse β˚, |β˚|0 “ s from measurements of the form
y “ Xβ˚ `w
where w denotes additive noise. What is the dual of the LASSO problem? How are the solutions of LASSO
and its dual related?
5) Write a code implementing the coordinate descent algorithm to solve LASSO. What should be the coordinate
descent iterations? Design various experiments (by generating data in suitable ways) for analyzing the
recovery performance of LASSO in different regimes of N, s, p. Analyze both l2 error }β´β˚}2 and support
recovery performance. Clearly describe your settings and experiments. Study the performance convergence
performance of coordinate descent. Compare the speed of your implementation of coordinate descent and
that from CVX for large problem sizes.
6) Solve the dual of LASSO with Dykstra’s algorithm. Verify the relationship between coordinate descent and
Dykstra’s algorithm through your simulations.
7) (BONUS). Choose another problem/application in literature for which you can apply coordinate descent.
Describe coordinate descent iterations. Similarly find and implement another problem which can be solved
using Dykstra.
5. Topic 5: Federated Learning
References:
1) Gafni, Tomer, et al. “Federated Learning: A Signal Processing Perspective”, arXiv preprint arXiv:2103.17150
(2021).
2) T. Li, A. K. Sahu, A. Talwalkar and V. Smith, “Federated Learning: Challenges, Methods, and Future Direc-
tions”, in IEEE Signal Processing Magazine, vol. 37, no. 3, pp. 50-60, May 2020, doi: 10.1109/MSP.2020.2975749.
3) Yin, Dong, Yudong Chen,Kannan Ramchandran and Peter Bartlett. “Byzantine-robust distributed learning:
Towards optimal statistical rates”, International Conference on Machine Learning. PMLR, 2018.
4) L. Zhu, Z. Liu, and S. Han, “Deep leakage from gradients”, in Advances in Neural Information Processing
Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alch-Buc, E. Fox,and R. Garnett, Eds., vol.
32.Curran Associates, Inc., 2019.
In this project, you will study various task pertaining to Federated Learning. Federated Learning (FL) is a
distributed Machine Learning scheme where multiple users train a model using only local (and possibly private)
data. Then each user shares its trained model with a central server. The server aggregates the model of each
user, judiciously generates one global model and sends that model back to the user devices. You can find more
information about federated learning in references [1] and [2].
1) Find two applications of Federated Learning in literature and include them in your report. Describe a simple
Federative Learning model. Describe how each edge device trains their model, what update they send to the
central server and how central device aggregates them.
62) Implement a simple Federated Learning task of your choice using Fedarative Averaging Scheme from [1].
Describe your data set, loss function etc. Report your results. How quickly does the algorithm converge?
Suppose now you can train the same model using a centralized learning. How quick is it compared to
federated learning?
3) The success of Federated Learning depends on successful participation of edge users in process. Now suppose
a percentage of edge users sends wrong information to the central server during training as described in
[3]. Implement algorithms described in [3] to adapt your federative learning task to this challenge. Conduct
simulations for different percentages of corrupt users and report your results.
4) Even though one of the purposes of Federated Learning is to ensure user privacy, it is possible to infer user
data from their updates to the central server as described in [4]. Implement the algorithm described in [4]
to learn user data from their gradient updates. Do multiple experiments and report your results.
5) (BONUS) Peform literature review on Federated Learning and identify one more associated challenge related
to Federated Learning and discuss possible solutions for that challenge as reported in literature. Implement
the problem that you found interesting and report your results.