ETF5952 Quantitative Methods for Risk Analysis
Quantitative Methods for Risk Analysis
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
ETF5952 Quantitative Methods for Risk Analysis
ASSIGNMENT 2
Important Instruction
? This assignment comprises 20% of the assessment for ETF5952. This is an individual, NOT a syndicate,
assignment. On the Assignment Cover Sheet, read the references to plagiarism and collusion from University
Statute 4.1. Part III-Academic Misconduct.
? Answer all questions, and start from a new page for each question. Your assignment must be typed and
you must submit a pdf file (A4 pages) with an Assignment Cover Sheet (from the “ASSIGNMENTS”
section of Moodle). To later confirm your upload was successful, go to the “ASSIGNMENTS” section and
click. On the “Assignment 2” uploading link. The uploaded file’s name will be shown.
? If you have a valid reason not to meet the deadline, you will be requested to submit what you have done at
the due date and receive your grade relative to opportunity. Without any valid reasons, 10% of Assignment’s
allocated marks will be deducted for each day that it is late.
? Submit one pdf file only. Do NOT submit/attach R scripts or output files. Do not submit your assignment
in a folder.
? You should summarize what you obtain to answer questions, instead of providing all codes and outputs.
If you provide too many outputs relative to questions, then we will consider that you may not understand
the questions and your answers would be subject point deduction.
? If you have questions regarding materials, you are encouraged to use our consultation. The course email
should be used only for pointing out typos and personal matters.
1
Question 1 (35 points: 5+10+10+10)
In AS1, we analyzed vaccination data on daily doses administered in Australia. The data set, “AU dose.csv”1,
which contain
? Date: date (format: Day/Month)
? First.doses: the number of vaccinated individuals (one shot)
? Second.doses: the number of vaccinated individuals (two shots)
? X70..adults: the number of vaccinated individuals to achieve 70% adult population
? X80..adults: the number of vaccinated individuals to achieve 80% adult population
We compare our predictions with realized data, which is contained in “AU dose2.csv”.
1. In AS1, we estimated a model with time trends (t, t2 and t3). Report a figure displaying auto-correlations
of residuals from the estimated model up to the number of lags of 20. Explain statistical significance of
auto-correlations (no more than 20 words).
2. Obtain predicted values from the model of Q1.1 and report the predicted and realized values in a figure.
You can only report the out-sample periods (i.e. you do not need to display in-sample periods.) Explain
the difference between predicted and actual outcomes (no more than 30 words).
3. Estimate AR(1) model time trends (t, t2 and t3) and report the result. Present a figure of auto-correlations
of residuals up to 20 lags. Report values of AIC and BIC of the models in Q1.1 and here. Based on the
results (AFC and AIC/BIC), explain which model is preferred and reasons. (no more than 30 words).
4. Obtain predicted values from the model of Q1.3 and report the predicted and realized values in a figure,
as in Q1.2. Explain the difference between predicted and actual outcomes (no more than 30 words).
Question 2 (30 points: 5 + 10 + 5 + 10)
In United States, the Home Mortgage Disclosure Act requires financial institutions to provide mortgage data to
the public. The data set, Q2.csv, contains 2,380 observations with 14 variables:
? deny: Was the mortgage denied?
? pirat: Payments to income ratio.
? hirat: Housing expense to income ratio.
? lvrat: Loan to value ratio.
? chist: Credit history: consumer payments.
? mhist: Credit history: mortgage payments.
? phist: Public bad credit record?
? unemp: 1989 Massachusetts unemployment rate in applicant’s industry.
? selfemp: Is the individual self-employed?
? insurance: Was the individual denied mortgage insurance?
? condomin: Is the unit a condominium?
? afam: Is the individual African-American?
? single: Is the individual single?
1Data Source: http://www.covid19data.com.au/vaccines
2
? hschool: Does the individual have a high-school diploma?
In this question, we want to understand determinants of mortgage decision. Set a seed as “98765” before
starting the analysis.
1. Estimate a classification tree in which the dependent variable is deny and regressors are pirat, hirat, and
lvrat. Report the estimated tree by a plot. Explain characters of the node with the highest of deny = yes.
(no more than 20 words).
2. Estimate a classification tree in which the dependent variable is deny and regressors are the rest of variables
in the data set. Report the estimated tree by a plot. Explain characters of the node with the lowest of
deny = yes. (no more than 30 words).
3. Estimate a logistic regression model in which the dependent variable is deny and regressors are the rest of
variables in the data set. Report the estimation result. Explain the effect of payments to income ratio on
deny (no more than 30 words).
4. Apply LASSO to estimate a logistic regression model in which the dependent variable is deny and regressors
are the rest of variables in the data set and their pair-wise interactions. Use BIC to select a tuning parameter
and then report only non-zero coefficients. How many regressors have non-zero coefficients? Explain the
effect of insurance on deny (no more than 30 words).
Question 3 (35 points: 5 + 10 + 10 + 10)
We consider the effect of price on cigarette consumption, using the US state-level data (Q3.csv):
? state: Factor indicating state.
? year: Factor indicating year.
? population: State population.
? packs: Number of packs per capita.
? income: State personal income (total, nominal).
? tax: Average state, federal and average local excise taxes for fiscal year.
? price: Average price during fiscal year, including sales tax.
Set a seed as “12345” before starting the analysis.
1. Estimate a linear regression model by regressing log of packs on log of price and the rest of variables in the
data set. Report the estimation result and explain the effect of price on the number of packs per capita
(no more than 30 words).
2. Estimate a linear regression model in Q3.1 by additionally including squared income and tax. Report the
estimation result and explain the effect of price on the number of packs per capita (no more than 30 words).
3. We are interested in a relation between log of packs and log of price, while the other variables used in Q3.2
are control variables. Apply LASSO to select control variables (regressors except log of price) to estimate
a linear regression model in Q3.2 with cross-validation for the choice of a tuning parameter (minimum cv).
Report the estimation result about only the effect of price on the number of packs per capita and explain
the result (no more than 30 words).
4. Explain why the estimation result in Q3.3 cannot be interpret as a causal effect (no more than 40 words).
Apply double machine learning to estimate the causal effect in the model of Q3.3 with cross-validation for
the choice of a tuning parameter (minimum cv). Report the estimation result about only the effect of price
on the number of packs per capita and explain the result (no more than 30 words).