Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
1ECE 273 Course Projects
General Instructions 1) Deadline for Project Report: June 5, 2021, 11:59 pm. Online Submission on Canvas. 2) Guidelines: You should work individually on the project and choose one topic from the list provided below. Alternatively, you can define your own project (see announcement from Prof. Pal on Canvas). You are expected to submit a project report, along with figures, as well as MATLAB/Python source codes, and the datasets with proper citations. You will also present your project results in the form of a short presentation ( „ 15 minutes per student) which will be scheduled during the finals week. You are welcome to compose your report in Jupyter Notebook with live code and visualizations. Wherever applicable, you should write your own code. Any plagiarism will be considered a violation of academic integrity and will be dealt with accordingly. Ideally, we should be able to reproduce the exact results as those in your report; otherwise you will lose points. Negative results are welcome, but be sure to discuss them in your report. The report should include the following sections: ‚ Objective/Goal of the project. ‚ Background and motivation: You should cite relevant references, explain why the problem is important, and its applications. You should also cite the datasets being used. ‚ Method/Techniques. ‚ Results: Include all figures; you can also use tables to present your results, especially for comparing multiple algorithms. ‚ Discussion: This section will provide a critical discussion of your results, and evaluate the strengths and weaknesses of different algorithms and the associated trade-offs. ‚ Dataset: In your report, you should include a web link to all datasets you use (if any) and give proper credit. If the datasets are not publicly accessible, submit them with your report in a separate file. If you have more than one file, pack all files into a .zip file. Dont use the formats (like .rar) that require specific software to unpack your file. ‚ You will use CVX (for MATLAB) or CVXPY (for Python) which come with inbuilt solvers (optimized for respective platforms) for convex problems. Since CVX is a software package meant for modeling, you will only need to formulate the optimization problem in a standard form using MATLAB/Python syntax. Refer to your syllabus for more details for resources on how to quickly get started with CVX/CVXPY. List of Topics 1. Topic 1: Blind Deconvolution References: 1) Ahmed, A. and Recht, B. (2014). “Blind Deconvolution Using Convex Programming” IEEE Transactions on Signal Processing,vol. 60, no. 3, Mar. 2014.. Blind deconvolution is the problem of recovering two signals given the output of their convolution. The signals can not be recovered unless certain restrictions are imposed on them. In this project, you will study and implement convex algorithms for solving the blind deconvolution problem under suitable conditions, and evaluate their performance guarantees. – Study the paper by Ahmed, Recht and Romberg to understand the formulation of the blind deconvolution problem. Why is the solution to the problem non-unique in general? Can you explain what kind of ambiguities exist? Under what conditions on the signals (sparsity, subspace structure etc), the problem can permit unique solutions? Explain why. – Why is the problem non-convex? Explain the main idea in this paper under which the measurements can be expressed as a linear function of a suitable transformed variable. How is this transformed variable related to the constituent signals, and given this transformed variable, how can one obtain the constituent signals? 2Does the variable transformation make the problem convex? What additional relaxations are necessary to cast it as a convex problem? – What are the theoretical guarantees of the convex algorithm proposed in the paper? Are they deterministic, or probabilistic? What should be the relation of the number of measurements, sparsity and the dimension of the subspace, so that exact (or stable, in presence of noise) recovery is possible? Clearly state the mathematical guarantees and spell out the assumptions. – Implement the convex algorithm proposed in the paper. You should be able to recreate the figures. Test the algorithm by generating the phase transition plots which plots the probability of successful recovery as a function of number of measurements, and the sparsity and/or dimension of the subspace. Do you see a sharp region around which the probability changes from close to zero, to close to 1? Compare this with the theoretical guarantees from the paper. – Compare performance of the blind deconvolution algorithm against non-blind deconvolution (i.e. where you are given one of the constituent signals). Do a literature survey on alternating minimization techniques for blind deconvolution (which are non-convex), and implement one algorithm of your choice. Compare the performance of convex and non-convex blind deconvolution. – Design numerical experiments to evaluate the performance of the algorithm when one (or more) conditions (such as sparsity, low rank) are violated. Can you comment on the robustness of the algorithm against such violations? – (BONUS): Extend the algorithm to perform blind deconvolution in 2D. One example is image deblurring. Test the algorithm on an image of your choice (by blurring it with a suitable kernel). 2. Topic 2: Low-Rank Matrix Recovery From Quadratic Measurements (Application in Polynomial Neural Network) References 1) Soltani, Mohammadreza, and Chinmay Hegde. “Towards provable learning of polynomial neural networks using low-rank matrix estimation”. International Conference on Artificial Intelligence and Statistics. 2018. 2) Candes, Emmanuel J., and Yaniv Plan. “Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements.” IEEE Transactions on Information Theory 57.4 (2011): 2342-2359. 3) Recht, B., Fazel, M. and Parrilo, P.A., 2010. “Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization”. SIAM review, 52(3), pp.471-501. 4) Goldstein, Tom, Christoph Studer, and Richard Baraniuk. “FASTA: A generalized implementation of forward- backward splitting.” arXiv preprint arXiv:1501.04979 (2015). In [1], the authors showed that training a neural network with one hidden layer consisting of fewer number of units compared to the size of input, and with quadratic activation function can be modeled as low-rank matrix recovery problem with noisy quadratic measurements. min LPRpˆp F pLq “ 1 2m mÿ i“1 pyi ´ xTi Lxiq2 s.t. rankpLq ď r (1) where tpyi,xiqumi“1,xi P Rp is the input data set, r is the number of hidden units with the activation function σpzq “ z2. The ground truth matrix L˚ is constructed from the weights in the first and second layers of the corresponding neural network in the following form: L˚ “ rÿ j“1 αjwjwTj where wj is the weight vector in the jth hidden unit and α P Rr is the weight vector of the second layer. This optimization problem is not a convex problem (Why?), however, there are approaches which first convert the original problem to a simpler convex problem and then solve it as a convex optimization problem. The 3authors in [2,3] have proposed a convex relaxation of the low-rank constraint based on the nuclear norm of a matrix. In particular, consider the following problem: y “ ApLq ` n (2) where L P Rpˆp is an unknown matrix, A : Rpˆp Ñ Rm is a linear mapping, rApLqsi “ xTi Lxi and n P Rm denotes additive noise. The desired matrix L˚ (which has p2 unknown entries) can be recovered from (compressive) measurements (with m ! p2) by solving the following convex problem: min LPRpˆp }L}˚ s.t. }A˚peq} ď λ e “ y ´ApLq (3) where A˚ is the adjoint operator of A, A˚peq “ mÿ i“1 eixix T i }.} is the operator norm, }.}˚ is its dual (i.e the nuclear norm), and λ is a constant which is dependent on the size of L and noise. In this project, you are expected to do all of the followings (exlcuding BONUS): – Read the references and explain why solving the problem (1) is equal to training the corresponded network? Why is it not a convex optimization problem? Why is problem (3) a suitable convex relaxation for problem (2)? – Generate a random data set txiumi“1 from a normal distribution (you can choose the mean and the covariance), and one ground truth network (Based on the given architecture in [1]) which generates the output yi corresponding to the input xi. Implement Algorithm 1 in [1] and test it on the data set you have generated. Try data sets with different sizes and different initialization. Can you recover the ground truth L˚ (or equivalently, the network)? Plot the probability of success by varying the values of m, p and r, as well as different initializations (you can refer to [1] for some representative plots). Comment on your results. Add noise to the measurements and try different noise powers. Report where the algorithm fails and succeeds. – Formulate the problem of training neural network (or equivalently, finding L˚) as a convex problem, based on [2] and [3]. Use the data set you generated in the last part and solve this convex problem with CVX/CVXPY library. You may need to tune λ by trying different values and cross validating against the training dataset (Read [2] for suitable ways to choose λ). Compare your result with previous solutions and plots, including the need for initialization. Try different sizes of data sets. How accurate is the convex relaxation for this problem? How sensitive is it to the size of data set and noise? – Generate a new data set where each xi can come from two different Gaussian distributions (representing two classes) with two different means (you can keep the covariance same if you wish). Assign each data point a binary label based on its class. (Note that there is no ground truth network in this case). Run the two algorithms that you have implemented in the last two parts (non-convex and convex) to train the polynomial neural network (which is also equivalent to learning a matrix L). Generate some new test data points from your classes and evaluate the performance of the learned network on the test dataset. Are these algorithms able to classify the data? If not, why? Try to justify. – (BONUS) If you were able to solve a binary classification problem on the random data set in previous question, try to apply your algorithm on the MNIST dataset (You can restrict your dataset to only 0 and 1 digits). Is there any way to extend this idea for multi-class classification? How? 43. Topic 3: 1-bit Compressed Sensing References: 1) Plan, Y., and Vershynin, R. (2013) “Robust 1-bit compressed sensing and sparse logistic regression: A convex programming approach”, IEEE Transactions on Information Theory, 59(1), 482-494. 2) Chen, L., Hassani, S. H., and Karbasi, A. (2017) “Near-Optimal Active Learning of Halfspaces via Query Synthesis in the Noisy Setting”, In AAAI (pp. 1798-1804). Compressed Sensing (sparse signal recovery) has been a popular research topic in signal processing since early 90s. It has many applications in image processing, machine learning, communication, radar, and medical imaging. The main goal of Compressed Sensing is to recover a sparse signal x (}x}0 ď s) from linear measurements of the form y “ Ax where A is a measurement matrix. For many practical applications, infinite- precision linear measurements are not available. Instead, linear measurements are collected with finite precision (quantization). In the extreme case with 1-bit quantization, the noiseless 1-bit compressed sensing measurements is modeled as yi “ signpaTi xq, i “ 1, 2, ¨ ¨ ¨m (4) In this project, you are expected to do the following: – Find two different applications of 1-bit Compressive Sensing. Describe why 1-bit is important in those applications. Examine their noise models and comment on whether you think the noise is modelled reasonably in each application. Cite references in your report – Read [1] and compute, by simulation, the Gaussian mean width of the sparse signal set wpSn,sq. Fix s “ 5 and plot wpSn,sq versus n with its upper bound and lower bound given by Lemma 2.3 – Let S1n,s “ tx : x P Sn,s, xi ě 0u, S2n,s “ tx : x P Sn,s, xi “ c or 0u and S3n,s “ tx : x P Sn,s, xi “ ˘c or 0u Plot wpS1n,sq, wpS2n,sq and wpS3n,sq with (b). Interpret the result by using Theorem 1.1 in [1] – Explain why the relaxation from l0-norm to l1-norm is reasonable, by using the Gaussian mean width, (III.1) and Theorem 1.1 – In [2], the measurement/observation is also t`1,´1u. Compare the problem in [1] and [2] and describe the mathematical differences between them – By using the noise model in [2], implement both algorithms in [1] and [2] and compare them numerically for s “ 1 to s “ n with various n, which one is better and why. – (BONUS) Develop an algorithm that incorporates the sparse constraint s into the algorithm in [2] and compare your algorithm with [1]. 4. Topic 4: Coordinate Descent, Dykstra’s Algorithm and LASSO References: 1) R. J. Tibshirani, “Dykstra’s algorithm, ADMM, and Coordinate Descent: Connections, insights, and ex- tensions”, in Advances in Neural Information Processing Systems (NeurIPS), I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30. Curran Associates, Inc., 2017. 2) Tseng, Paul, “Convergence of a block coordinate descent method for nondifferentiable minimization”, Journal of optimization theory and applications 109.3 (2001): 475-494. 3) Hastie, Trevor, Robert Tibshirani, and Martin Wainwright, “Statistical learning with sparsity: the LASSO and generalizations”. CRC press, 2015. In this project, you will implement two different algorithms: Dykstra and Block Coordinate descent to solve certain types of optimization problems. You are expected to implement those algorithms (i.e. write the codes on your own) on different problems and study their connections. Consider the following two types of problems (P1) and (P2) min β P Rp 1 2 }y ´Xβ}22 ` dÿ i“1 hipβq (P1) 5where y P Rn,X P Rnˆp, hi : Rp Ñ R and min z P Rn }y ´ z}2 (P2) s.t. z P C1 X C2 X . . .X Cd where Ci’s are closed convex sets. 1) Review coordinate-descent algorithm. Briefly describe how and why it works. What are the conditions on hi for coordinate-descent to succeed on pP1q? 2) Review Dykstra’s algorithm. Describe for what kind of problems it works, and how. How is it different from alternating projection algorithm? When are they equivalent? 3) What are the relationships between pP1q and pP2q? Under what conditions are they equivalent? When they are equivalent, how are Dykstra’s algorithm and Coordinate Descent related? 4) Consider the problem pP1q when hi “ λ|βi| and d “ p. This problem is called LASSO. This problem is often used to recover sparse β˚, |β˚|0 “ s from measurements of the form y “ Xβ˚ `w where w denotes additive noise. What is the dual of the LASSO problem? How are the solutions of LASSO and its dual related? 5) Write a code implementing the coordinate descent algorithm to solve LASSO. What should be the coordinate descent iterations? Design various experiments (by generating data in suitable ways) for analyzing the recovery performance of LASSO in different regimes of N, s, p. Analyze both l2 error }β´β˚}2 and support recovery performance. Clearly describe your settings and experiments. Study the performance convergence performance of coordinate descent. Compare the speed of your implementation of coordinate descent and that from CVX for large problem sizes. 6) Solve the dual of LASSO with Dykstra’s algorithm. Verify the relationship between coordinate descent and Dykstra’s algorithm through your simulations. 7) (BONUS). Choose another problem/application in literature for which you can apply coordinate descent. Describe coordinate descent iterations. Similarly find and implement another problem which can be solved using Dykstra. 5. Topic 5: Federated Learning References: 1) Gafni, Tomer, et al. “Federated Learning: A Signal Processing Perspective”, arXiv preprint arXiv:2103.17150 (2021). 2) T. Li, A. K. Sahu, A. Talwalkar and V. Smith, “Federated Learning: Challenges, Methods, and Future Direc- tions”, in IEEE Signal Processing Magazine, vol. 37, no. 3, pp. 50-60, May 2020, doi: 10.1109/MSP.2020.2975749. 3) Yin, Dong, Yudong Chen,Kannan Ramchandran and Peter Bartlett. “Byzantine-robust distributed learning: Towards optimal statistical rates”, International Conference on Machine Learning. PMLR, 2018. 4) L. Zhu, Z. Liu, and S. Han, “Deep leakage from gradients”, in Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alch-Buc, E. Fox,and R. Garnett, Eds., vol. 32.Curran Associates, Inc., 2019. In this project, you will study various task pertaining to Federated Learning. Federated Learning (FL) is a distributed Machine Learning scheme where multiple users train a model using only local (and possibly private) data. Then each user shares its trained model with a central server. The server aggregates the model of each user, judiciously generates one global model and sends that model back to the user devices. You can find more information about federated learning in references [1] and [2]. 1) Find two applications of Federated Learning in literature and include them in your report. Describe a simple Federative Learning model. Describe how each edge device trains their model, what update they send to the central server and how central device aggregates them. 62) Implement a simple Federated Learning task of your choice using Fedarative Averaging Scheme from [1]. Describe your data set, loss function etc. Report your results. How quickly does the algorithm converge? Suppose now you can train the same model using a centralized learning. How quick is it compared to federated learning? 3) The success of Federated Learning depends on successful participation of edge users in process. Now suppose a percentage of edge users sends wrong information to the central server during training as described in [3]. Implement algorithms described in [3] to adapt your federative learning task to this challenge. Conduct simulations for different percentages of corrupt users and report your results. 4) Even though one of the purposes of Federated Learning is to ensure user privacy, it is possible to infer user data from their updates to the central server as described in [4]. Implement the algorithm described in [4] to learn user data from their gradient updates. Do multiple experiments and report your results. 5) (BONUS) Peform literature review on Federated Learning and identify one more associated challenge related to Federated Learning and discuss possible solutions for that challenge as reported in literature. Implement the problem that you found interesting and report your results.