QBUS2810 Statistical Modelling for Business
Statistical Modelling for Business
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
QBUS2810
Statistical Modelling for Business
Individual Assignment
This assignment is worth 20% of your final mark in the unit.
The deadline is Monday, March 28th by 11:59pm Sydney time.
Submission is via TurnItIn on Canvas.
Requirements:
• Complete your entire assignment in Jupyter Notebook, including your code and
markdown sections for your written answers. Use Latex in markdown sections
where needed.
• Submit the resulting downloaded html file as your entire assignment. Care must
be taken with presentation in this file, however unavoidable error messages and
page formatting issues will be ignored in marking.
• Only relevant analysis outputs (graphs, tables, etc) should appear in the as-
signment file and all output should appear together with the discussion of that
output, in the file.
Task 1 (30 marks). Business problem:
This assignment follows the analysis conducted in the lectures regarding the dependence
between earnings and asset returns for companies listed on the NYSE. You will assess
whether earnings in one year (say t−1) affect asset returns in the subsequent year (say
t), and in particular whether returns are typically higher following positive, compared
to negative, earnings years and also assess whether there may be a linear relationship
between returns and lagged earnings.
Data: The data file for the analysis is “SampleData from US 90 08 wk3.csv” which
was sampled from “US 90 08 wk3.csv”.
2Questions:
(a) Conduct an appropriate exploratory analysis on the asset returns, both individually
and in terms of one of the primary questions being considered in this assignment:
are returns in the subsequent year t typically higher following positive, compared to
negative, earnings years in year t − 1? Discuss any cleaning of the data you did,
including why and how you did it, or why you did not do it. (3 marks)
(b) Conduct the appropriate t-test (with α = 0.05), median and Mann-Whitney tests,
to assess whether returns are typically higher following positive, compared to negative,
earnings years. For median tests, use two-sided testing. Assess all assumptions made.
(10 marks)
(c) Which test’s result do you believe the most in part (b)? Discuss and explain. (2
marks)
(d) Conduct an appropriate exploratory analysis to assess whether there may be a
linear relationship between returns and lagged earnings. (3 marks)
(e) Conduct a simple linear regression analysis, using OLS estimation, for returns on
lagged earnings. Fully assess all assumptions of OLS. Also obtain the LAD estimates
and list and assess the assumptions of LAD. Discuss any cleaning of the data you did,
including why and how you did it, or why you didn’t do it. (9 marks)
(f) Write a brief (< 0.5 page) report summarising and discussing your findings and
conclusions in layman’s terms. Include a discussion of whether you would recommend
an investment strategy based on your findings. (3 marks)
Task 2 (20 marks). Theoretical derivations:
Consider the population SLR model:
Yi = β0 + β1Xi + εi
and an observed, random sample of data (y1, x1), . . . , (yn, xn) from that model. An
OLS regression is run on this data.
3Questions:
(a) Show that under LSA 1, the OLS slope estimator can be written as βˆ1 = β1 +∑n
i=1 aiεi, where ai =
xi−x¯∑n
i=1(xi−x¯)2 . Hint: take the formula for βˆ1 from slide 13 of Lecture
4, write it in terms of yi and ai and then plug in the formula for yi under LSA 1. (3
marks)
(b) Show that under LSA 3 and 5 the variance of the OLS estimator of β1 can be
written as V (βˆ1|x) = σ2
∑n
i=1 a
2
i . Where did you use each assumption? Hint: use
result in part (a) to write out the variance in a general form and see what simplifies
and why. (5 marks)
(c) Show that the formula in par (b) can be equivalenly written as V (βˆ1|x) = σ2∑n
i=1(xi−x¯)2
(3 marks)
(d) Show that if LSA 5 is relaxed (but LSA 3 is maintained) the variance formula
can be written as V (βˆ1|x) =
∑n
i=1 a
2
iV (εi|xi) . Argue why
∑n
i=1 a
2
i e
2
i is a reasonable
estimator of V (βˆ1|x) in this case. (4 marks)
(e) Suppose you had estimates of covariances ˆcov(εi, εj) and propose a reasonable
estimator of V (βˆ1|x) if both LSA 3 and 5 are relaxed. How would you estimate the
component that depends on ˆcov(εi, εj)? (5 marks)