Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
ECON3203-ECON5403 Final Exam
The exam duration is 2 hours and 30 minutes, plus 10 minutes reading time, for both
the lab part and written part. The buffer time for submission is 20 minutes; this is not
exam time and should be used only for submission. Please make sure you submit on
time - late penalty will apply at 5% per 5 minute late.
The exam has two parts: a Lab part and a Written Part. You should spend about
60% of your time on the Written Part.
For students with Equitable Learning, please submit your work within your allowed time
frame either on Moodle or via email to
[email protected]. Late penalty will
apply.
Plagiarism and student misconduct are treated very seriously. You are allowed to use
any course materials. This exam material cannot be shared in any form. Students
cannot communicate during the exam. UNSW uses a range of technologies to detect
cheating. Serious misconduct may result in a fail mark for the course, suspension or
permanent exclusion from the university.
The Lab part is given in the jupyter notebook. The written part starts on the next
page. Make sure you submit both parts.
1
ECON3203-ECON5403 Final Exam (Written Part)
ECON3203 students should attempt questions 1 to 4. Your written exam is out of 35
marks.
ECON5403 students should attempt all questions. Your written exam is out of 40 marks.
In answering questions, be precise in your answer but concise. Do not answer just yes
or no unless asked to do so.
All answers should be correct up to 4 decimal points (for example, 0.0121, 12.3444)
You may type the written exam in Word or hand-write it. Please clearly mark the
question numbers.
Question 1 (9 marks)
(a) For large study courses, it is important for the teaching team to be able to identify well
in advance students who are having difficulty in studying the course. This enables the team
to bring those students back on the right track. Using the data from the courses offered in
the past, the team runs a classification algorithm using a logistic regression model in order to
classify students into the “fail” or “pass” categories, based on three predictors: hours studied
per week, mid-term exam marks and number of absences from the class. What statistical
estimation method can be used to estimate this model? Briefly explain what it is and how to
use it.
(b) The table below summarizes the estimate of the logistic regression model from part (a)
for the probability of “pass”
Variable Estimate
Intercept -0.05
Hours studied 0.12
Mid-term 0.03
Absence -0.37
Under the 0-1 loss, which category would the teaching team classify a student who studies
2 hours per week, has a midterm mark of 45 and 6 absences into?
2
(c) The distribution of a waiting time for an event to occur is often modeled by an exponential
distribution
p(y|θ) = θe−θy, y > 0
with θ > 0 being the parameter to be estimated. Given a dataset y = {y1, ..., yn} with the
sample mean y¯ = 10, find the maximum likelihood estimate of θ.
Question 2 (12 marks)
We want to fit a multiple linear regression model to a dataset of n = 97 observations and
carry out variable selection using Lasso. We find that λmax = 0.9 is a value of the shrinkage
parameter λ that all the coefficients are shrunk to zero. To find the optimal shrinkage, we
create a range of 10 values for λ : 0, 0.1, 0.2, ..., 0.9, and compute the Lasso estimates β̂lassoλ at
each of these values. Let σ̂2λ =
∥y−Xβ̂lassoλ ∥2
n
be the estimate of the variance of the error term ϵ.
The table below gives the degrees of freedom dfλ and σ̂
2
λ
λ 0 0.1 0.2 0.3 0.4 0.5 .6 .7 .8 .9
dfλ 8 5 3 3 2 1 1 1 1 0
σ̂2λ 0.9033 0.8116 1.8865 4.1080 4.6396 5.1962 5.7991 6.4485 7.1444 7.4611
(a) Find the best value of λ among these 10 values using the BIC criterion.
(b) Explain why variable selection is often necessary in regression and classification.