QBUS2810 Statistical Modelling for Business
Statistical Modelling for Business
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
QBUS2810
Statistical Modelling for Business
Individual Assignment
This individual assignment will contribute 20% towards your final result in
the unit. The deadline is 11:59pm Friday 9th September, 2022. Submission
is via Canvas.
Key requirements:
It is required to create your entire assignment in a Jupyter notebook, with Markdown
sections for your written answers, and to submit the resulting Jupyter notebook file
as your entire assignment. Use Latex in markdown sections of code as required. Care
must be taken with presentation for this option, however unavoidable (e.g. pink) error
messages will be ignored in marking, as discussed in class.
Only relevant analysis outputs (graphs, tables, etc) should appear in the assignment
file, while all output should appear together with, or very close to, the discussion of
that output, in the file. Less relevant outputs may be placed at the end, e.g. in an
appendix.
Section 1 (35 marks)
Business problem:
Investors and financial institutions are greatly interested in assessing measuring risk.
The increase in the availability of intra-day data has only increased the focus for these
groups, and of researchers in associated areas (like me), in how to measure, model and
predict financial risk. Volatility (or variance) of financial returns plays a major role
here, and it is commonplace to report and use daily financial return volatility measures
to assess risk. One of the most popular daily measures is called ”Realized Volatility”
(RV). The RV on any day is the sum of the squared intra-day returns, over a specific
time interval; the most common interval employed is 5 minutes, giving the so-called ”5
2min RV”. Such measures are widely used and publicly available, for both individual
assets and various financial indices.
Today, we focus on the 5 min RV series from the Australian All Ordinaries (AORD) in-
dex on the Australian Stock Exchange (ASX). The data are in the file ”aord last 5min.csv”
and were collected by Dr Wilson Chen, manually, by downloading 5 minute return data
from Thomson-Reuters tick History database.
So-called ”day of the week” effects are commonly theorized in finance for financial
returns. One theory has it that Monday is a high-volatility day, compared to the rest
of the week. You will examine and assess this theory in the ASX market using the
daily 5 min RV data from 2000-2021.
Python code is provided in ’Assignment prep.ipynb’ to help you download, clean the
data and calculate the 5 minute RV series required.
Your goal is to analyse this data and assess whether or not there is a day of the week
effect, specifically examining whether Monday is a higher volatility day, compared to
the rest of the trading week.
Data:
The data file is ”aord last 5min.csv”. Use the Python commands in ”Assignment prep.ipynb”
to prepare the data for analysis, including taking a random sample of the data for anal-
ysis. The primary response variable is the square root of 5 min RV, labelled ’rv5 sqr’.
Tasks:
1. Conduct an exploratory analysis on the sqrt 5 min RV series; both individually and
in terms of the primary question being considered in this assignment. (6 marks)
2. Conduct a two-sample t-test related to assess the hypothesized Monday volatil-
ity effect. List all assumptions made, then assess and discuss whether each of these
assumptions are met, or not. (8 marks)
33. Conduct the median test to assess the hypothesized Monday volatility effect. List
all assumptions made, then assess and discuss whether each of these assumptions are
met, or not. (5 marks)
4. Conduct the Mann-Whitney test to assess the hypothesized Monday volatility effect.
List all assumptions made, then assess and discuss whether each of these assumptions
are met, or not. (8 marks)
5. Which test should we put the most faith in here. Discuss in detail (up to 0.5 page)
(4 marks)
6. Write a brief (e.g. 0.5 page) report summarising and discussing your findings and
conclusions. (4 marks)
4Section 2 (24 marks). Theoretical derivations:
Consider the population SLR model:
Yi = β0 + β1Xi + εi
and an observed, random sample of data (y1, x1), . . . , (yn, xn) from that model. An
OLS regression is run on this data yielding OLS estimates b0, b1.
Questions:
(a) Show that b1 is the effect on Y of increasing X by 1 unit, stating all assumptions
and showing all working (3 marks)
(b) Show that Y¯ = b0+b1X¯, stating all assumptions and showing all working (3 marks)
(c) Find the relationship between b1 and the sample correlation coefficient for Y,X,
i.e. ρˆY,X , stating all assumptions and showing all working (6 marks)
(d) Show that the t-test testing H0 : ρY,X = 0 is mathematically equivalent to the
t-test H0 : β1 = 0, stating all assumptions and showing all working (6 marks)
(e) Show that Cov(Xi, εi) = 0, stating all assumptions and showing all working (6
marks)