Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
ECE599/CS539 Nonlinear Optimization Homewor
5. Newton’s Method/ Conjugate Gradient Method (Due: 11:59pm on May 28, Friday)
Instruction: Students should provide enough detail of the logical procedure of deriving answers.
Answers without sufficient justification will receive partial or no credit. Reading: Sections 8.6-8.8 and
Chapter 9 of the textbook (Luenberger and Ye). 1. (MATLAB experiment) Consider the following nonlinear optimization problem: min x,y∈R f(x, y) = x2 − 5xy + y4 − 25x− 8y. (1) In the lecture, we discussed how we can modify Newton’s method to ensure global convergence. Read Section 8.8 for more details, especially the part on modification of Newton’s method to guarantee global convergence. Implement a modification of Newton’s method to ensure both global convergence and quadratic local convergence. Explain the modifications you made. Plot (i) ‖∇f(xk)‖ versus k, and (ii) f(xk) versus k and provide interpretation. Check whether the algorithm output satisfies the first and second order necessary conditions for local minimum points. 2. Exercise 10 in Chapter 9 of the textbook. What does this result imply regarding convergence rate of a conjugate gradient method, especially in comparison with convergence rate of Gradient Descent? (You are allowed to use the inequality in the textbook hint for Exercise 10 without proving it. In addition, check the note on Chebyshev Polynomials, which was posted together with this problem set.) 3. (MATLAB experiment) Consider the following minimization problem: min x∈Rn f(x) = 1 2 xTQx− bTx+ 1 2 µ(cTx)2. (2) where µ is a large positive constant. As discussed in the past lectures, the above formulation is used to find an approximate solution to minimizing 1 2 xTQx − bTx subject to cTx = 0. Use Q, b, and c given in HW5 data.mat, which was posted together with this problem set. For your information, the matrix Q is a 1000× 1000 positive definite matrix. (a) Using MATLAB, solve the first order necessary condition to identify a local minimum point x∗. Argue that this local minimum point is the unique global minimum point. (Remark : Note that when n is very large, this method of obtaining the solution is infeasible due to the memory and computation cost associated with directly solving the first order condition.) 1 (b) Implement Gradient Descent with Backtracking Line Search to find a local minimum point. Use x0 = 0 as the initial point. Explain the parameters of the algorithm you implemented. For µ = 1, plot (i) ‖∇f(xk)‖ versus k, (ii) f(xk) versus k, and (iii) ‖xk − x∗‖ versus k. For µ = 1000, plot (i) ‖∇f(xk)‖ versus k, (ii) f(xk) versus k, and (iii) ‖xk − x∗‖ versus k. Interpret the plots. Note that the Gradient Descent method may need a large number of iterations for convergence when µ is large; explain why this is expected. (c) Implement PARTAN with Backtracking Line Search. Specifically, implement the following algo- rithm (here g(x) denotes ∇f(x)T )): 2 Step 1. Starting at x0, let x1 = x0 − α0g(x0) (α0 found by Backtracking Line Search). 2 Step 2. (PARTAN update) For k = 1, . . . , n− 1 i. yk = xk − αk · g(xk) (αk found by Backtracking Line Search) ii. xk+1 = yk + βk · (yk − xk−1) (βk found by Backtracking Line Search) iii. If xk+1 satisfies the stopping criterion, return xk+1 and terminate. Otherwise, continue. 2 Step 3. (Restart) Set x0 to xn and go to Step 1. Use x0 = 0 as the initial point. Let {x¯k} denote the sequence of points generated by PARTAN. For µ = 1, plot (i) ‖∇f(x¯k)‖ versus k, (ii) f(x¯k) versus k, and (iii) ‖x¯k − x∗‖ versus k. For µ = 1000, plot (i) ‖∇f(x¯k)‖ versus k, (ii) f(x¯k) versus k, and (iii) ‖x¯k − x∗‖ versus k. Interpret the plots. Compare the convergence behavior with that of Gradient Descent in Part (b) and provide interpretations. 2