ETF2121/ETF5912 Data Analysis in Business
Data Analysis in Business
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
ETF2121/ETF5912 Data Analysis in Business
Week 4: Hypothesis testing and prediction for one population
1 Hypothesis testing
Setting up the hypotheses
Specifying a significance level
Estimation, sampling distribution, and test statistic
Making the decision
Hypothesis tests and confidence interval
Summary
2 Prediction
Prediction interval and confidence interval
Dr Wei Wei (Monash University) ETF2121/5912 2 / 41
Hypothesis testing
Design of Hypothesis Testing
There are five steps in the process of hypothesis testing:
1 Formulate the hypotheses: set up the null (H0) and alternative (Ha)
hypotheses for the question at hand.
2 Specify the level of significance, α.
3 Determine the appropriate estimator, test statistic and its sampling
distribution.
4 Compute the value of the test statistic from the sample.
5 Make a decision (of whether or not to reject the null) using one of the
following approaches:
1 the p-value approach;
2 the critical value approach.
Dr Wei Wei (Monash University) ETF2121/5912 3 / 41
Hypothesis testing Setting up the hypotheses
Step 1: setting up the hypotheses
The null hypothesis, H0, is assumed to be true in the absence of
contradictory evidence.
The alternative hypothesis, HA, is designed to answer a specific
question.
In two-sided (two-tailed) tests, H0 states that the population
parameter is equal to a single value, while HA states that the
population parameter is not equal to that value.
H0 : θ = θ0
HA : θ 6= θ0
In one-sided (one-tailed) tests, H0 states that the population
parameter is equal to a single value, while HA states that the
population parameter is greater than or less than that value.
case 1:
H0 : θ = θ0
HA : θ > θ0
case 2:
H0 : θ = θ0
HA : θ < θ0
Dr Wei Wei (Monash University) ETF2121/5912 4 / 41
Hypothesis testing Setting up the hypotheses
Example 4.1: sugar content in breakfast cereal
The manufacturer of Cocoa Puffs claims that average sugar content in
100g of this cereal is 25.7g.
A dietitian want to check if the average sugar content in Cocoa Puffs
is indeed what the company claims. She obtains a sample of 50 boxes,
each with 100g of cocoa puffs, and measures the sugar content in the
sample.
Let µ denote the population parameter, state the null and the
alternative hypotheses.
Dr Wei Wei (Monash University) ETF2121/5912 5 / 41
Hypothesis testing Setting up the hypotheses
Example 4.2: pizza waiting times
A pizza outlet advertises that its average waiting time is 12 minutes
from the time an order is placed.
A frequent customer claims that the average waiting time is more than
the advertised time and presented waiting times for his last 10 orders
as evidence.
Let µ denote the population parameter, state the null and the
alternative hypotheses.
Dr Wei Wei (Monash University) ETF2121/5912 6 / 41
Hypothesis testing Setting up the hypotheses
Example 4.3: are voting results manipulated?
In a recent election, candidate A is reported to received 45% of the
vote. However, candidate A believes that the voting results are
manipulated and more people must have voted for him.
Candidate A hires a consulting agency to randomly sample 1000 voters
to record whether or not each person voted for him.
Let pi denote the population parameter, state the null and the
alternative hypotheses.
Dr Wei Wei (Monash University) ETF2121/5912 7 / 41
Hypothesis testing Setting up the hypotheses
Example 4.4: standard deviation of shafts
Company ABC manufactures steel shafts with a standard length of
20cm. The industry standard allows a standard deviation of 2cm in
the production process. Company ABC claims that they have better
quality control and the standard deviation of the lengths of their shafts
is smaller than the industry standard. They provided length of 50
shafts from a recently produced batch as evidence.
Let σ denote the population parameter, state the null and the
alternative hypothesis.
Dr Wei Wei (Monash University) ETF2121/5912 8 / 41
Hypothesis testing Setting up the hypotheses
Example 4.5: presumption of innocence
In most legal systems around the world, a person accused of any crime
is “presumed innocent until proven guilty”.
How does the presumption of innocence translate to the null and
alternative hypothesis?
Dr Wei Wei (Monash University) ETF2121/5912 9 / 41
Hypothesis testing Specifying a significance level
Type I and type II errors
In hypothesis testing, we can make two kinds of mistakes.
Type I error: the probability of rejecting H0 given that H0 is true.
Type II error: the probability of not rejecting H0 given that H0 is false.
State of the world
H0 is true HA is true
Reject Type I error:
Pr(R|H0)
Correct
Do not Reject Correct Type 2 error:
Pr(NR|HA)
Type I and type II errors are inversely related; generally speaking, we
can not decrease one error without increasing the other unless we
change the sample/estimator.
Dr Wei Wei (Monash University) ETF2121/5912 10 / 41
Hypothesis testing Specifying a significance level
Step 2: specifying a significance level
Hypothesis tests are constructed to control type I error. Specifically,
we define the significance level = Pr(R|H0) and denote it by α.
In economics/business, 1%, 5% or 10% are the commonly used
significance level. In other words, α = 0.01, 0.05 or 0.1.
Which α to use depends on how we evaluate the cost of each error.
α evidence needed
to reject the null
probability of type
2 error
Preferred if the
loss from type one
error is
0.01 stronger higher higher
0.05
0.10 weaker lower lower
Dr Wei Wei (Monash University) ETF2121/5912 11 / 41
Hypothesis testing Specifying a significance level
Example 4.2: pizza waiting times
A pizza outlet advertises that its average waiting time is 12 minutes
from the time an order is placed. A frequent customer claims that the
average waiting time is more than its advertised time and presented
waiting times for his last 10 orders as evidence.
The null and alternative hypothesis is given below
H0 : µ = 12
HA : µ > 12
Two statisticians, A and B, are asked to evaluate if there’s enough
evidence to support the customer’s claim. Statistician A cares more
about protecting customer rights, while statistician B cares more
about protecting the rights of small business owners.
Interpret type 1 and type 2 error in this example.
Which statistician would use a higher significance level for the test
above?
Dr Wei Wei (Monash University) ETF2121/5912 12 / 41
Hypothesis testing Specifying a significance level
Example 4.5: presumption of innocence
In most legal systems around the world, a person accused of any crime
is “presumed innocent until proven guilty”.
The English jurist William Blackstone once wrote “It is better that ten
guilty persons escape than that one innocent suffer.” This has become
a maxim known as the Blackstone ratio. The German politician
Bismarck has “allegedly” said "it is better that ten innocent men suffer
than one guilty man escape".
Interpret type 1 and type 2 error in this example.
If these two are using statistical evidence to make a decision, who
would use a higher significance level?
Dr Wei Wei (Monash University) ETF2121/5912 13 / 41
Hypothesis testing Estimation, sampling distribution, and test statistic
Step 3: estimation, sampling distribution, and test statistic
1 Work out an appropriate sample estimator for the population
parameter.
2 Work out the sampling distribution for the estimator under the null
hypothesis. In practice, we transform/standardize the estimator using
the null to obtain a test statistic with a known distribution.
3 The distribution of a test statistic under the null are also called the
null distribution.
Dr Wei Wei (Monash University) ETF2121/5912 14 / 41
Hypothesis testing Estimation, sampling distribution, and test statistic
Example 4.1: sugar content in breakfast cereal
The manufacturer of Cocoa Puffs claims that average sugar content in
100g of this cereal is 25.7g.
A dietitian want to check if the average sugar content in Cocoa Puffs
is indeed what the company claims. She obtains a sample of 50 boxes,
each with 100g of cocoa puffs, and measures the sugar content in the
sample.
What is an appropriate estimator for the population parameter?
What is the sampling distribution of this estimator under the null
(assume that the sugar content follows a normal distribution with
standard deviation equal to 5g)?
Standardize the estimator under the null to obtain the test statistic.
What is the null distribution of the test statistic?
Dr Wei Wei (Monash University) ETF2121/5912 15 / 41
Hypothesis testing Estimation, sampling distribution, and test statistic
Example 4.2: pizza waiting times
A pizza outlet advertises that its average waiting time is 12 minutes
from the time an order is placed.
A frequent customer claims that the average waiting time is more than
the advertised time and presented waiting time for his last 10 orders as
evidence.