Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
Project: Regularisation for Least Squares Problems
In the course, we have defined the least squares problem (LSQ) as that of finding the
minimiser x ∈ R
n of kAx bk2
, for A ∈ R
m×n and b ∈ R
m. Theorem 4.1 tells us that this
minimiser x solves the normal equations
ATAx = ATb. (1)
For rank(A) = n and m > n, ATA is invertible, and the unique solution of the normal
equations (1) is
x = Ab, where A= (ATA)
1AT
. (2)
In Assignment A3, we saw how the method of least squares could be used to fit a linear
model of the form b = x0 + ax1 to a set of observed data points {a
(i)
, b(i)}i=1,...,m, by forming
the Vandermonde matrix A ∈ R
m×2 with entries Aij = (a
(i)
)
j1
, and solving the normal
equations (1). This process is called simple linear regression, and finds the straight line which
best fits the observations.
Suppose a straight line doesn’t quite fit the observations {a
(i)
, b(i)}i=1,...,m. We want to fit
a polynomial of degree (n ? 1) to the data, of the form
b = x0 + ax1 + a
2x2 + . . . + a
n1xn1 =
nX1
j=0
a
jxj ,
We can do this with polynomial regression, which is simply an extension of simple linear regression.
The least squares method for polynomial regression requires forming the Vandermonde
matrix A ∈ R
m×n
, with elements Aij = (a
(i)
)
j1
for j = 1, . . . , n, and solving the normal
equations (1).
Although the model is a polynomial in terms of the input variable a, it is a linear function
of the unknown model parameters xj — this is why we can still perform linear regression.
(a) In this question, we want to implement the least squares method to fit a polynomial to a
set of data observations. The degree of the best fitting polynomial is unknown.
In a MATLAB script named LSQ.m:
(i) Load data_points_50.m, containing a set of 50 data points with values {a
(i)
, b(i)}i=1,...,50
stored in the arrays a and b.
(ii) We want to fit the data with a polynomial of degree (n ? 1), for n = 2, . . . , 11. For
each n, set up the Vandermonde matrix A of size m × n.
(iii) For each n, solve the normal equations for x using backslash. Compute the condition
number of ATA in the 2-norm with the cond function in MATLAB. Comment on the
results.
1
(b) The matrix ATA is badly conditioned. As we’ve seen in the lectures, we can use the reduced
SVD of A to solve the least squares problem, allowing us to avoid forming ATA.
Write a new MATLAB script named LSQ_SVD.m, adapted from LSQ.m. Instead of solving
the normal equations directly, for each n, compute the SVD decomposition of A using the
function svd. Compute x using Algorithm LSQ-SVD.
(c) A more complex model is not necessarily a better model. Overfitting occurs when a model
is complex enough to correspond very closely to a particular set of data points, but does
not generalise well, and will not be able to predict future observations. Instead of capturing
the relationship between input and output in a dataset, an overfitted model describes the
statistical noise in the data.
(i) Plot the discrete data points with plot(a,b,’.’). Plot all the resulting polynomials
using the plot_poly function provided on Learn, on the same graph as the data points,
over the interval [0, 6]. Comment on your plots; do the high-degree polynomials fit
the data more closely Would they be useful to extrapolate the behaviour of the
data beyond the interval over which we have collected observations? Do you observe
overfitting?
(ii) A method to avoid overfitting is regularisation. Instead of minimising kAx ? bk
2
2
, we
seek to minimise the new cost function
kA bk
2
2 + μ kxk
2
2
,
where μ > 0 is a regularisation parameter. The idea is to introduce a penalty term,
so that the polynomial coefficients x (particularly the high-order coefficients) remain
relatively small.
Show that if x minimises kAx ? bk
2
2 + μ kxk
2
2
, then it solves the regularised normal
equations
(ATA + μI)x = ATb. (3)
(iii) Let A = U1Σ1VT be the reduced SVD of A. Show that x is given by
x = VSU1
Tb,
where the matrix S ∈ R
n×n
is diagonal, with diagonal entries given by
sii =
σi
σ
2
i + μ
, i = 1, . . . , n, (4)
with σi the singular values of A.
(iv) Write a script called LSQ_reg.m, adapted from LSQ_SVD.m, which computes x using
Equation (4) (again, for each n).
(v) Test your script with μ = 5, 0.5, 0.01, and 10?5
. For each value of μ, produce the same
plots as in Question (i).
How does regularisation affect the fitted polynomials, and the coefficient values?
Which value of μ produces the best results? What happens when μ is too large,
or too small?