AMS3328 Categorical Data Analysis
Categorical Data Analysis
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
Statistics and Insurance
AMS3328 Categorical Data Analysis
Midterm Test
Question 1. Suppose that all customers of an Internet service provider either enrolled
in Plan A service or Plan B service. They are allowed to switch the plan. The sampled
customers are asked the plans they have used in Year 2020 and Year 2021. The data is
as follows.
Year 2021
Branch Year 2020 Plan A Plan B
Shatin Plan A
Plan B
35
30
75
80
Taipo Plan A
Plan B
25
20
30
35
Tsuen Wan Plan A
Plan B
35
18
6
3
(a) Find the marginal odds-ratio (involving the plans used in Year 2020 and Year
2021, but not controlled for branch). [5marks]
[Ans] The marginal odds-ratio is
95×118
68×111
= 1.49 (or the reciprocal 1/1.49=0.67).
(b) For the partial table of Shatin, test the independence between the plans used in
Year 2020 and Year 2021. State clearly the test method, null hypothesis, and
alternative hypothesis that you are using. [5 marks]
[Ans] Chi-squared test can be used. Set the null hypothesis H0: Plan used in 2020
and Plan used in 2021 are independent and alternative H1: not H0. The test
statistic is
(35−
65×110
220
)
2
65×110
220
+
(75−
155×110
220
)
2
115×110
220
+
(30−
65×110
220
)
2
65×110
220
+
(80−
155×110
220
)
2
155×110
220
= 0.55. It is
less than 1,5%
2 = 3.841. Therefore, the null hypothesis is not rejected.
(c) Explain the meanings of “conditional independence” and “homogenous
association”. [5 marks]
[Ans] Conditional independence means that the odds-ratios of all three towns are
one. Homogenous association means that the odds-ratios of all three towns are the
same.
2
(d) For the partial table of Tsuen Wan, construct the 95% confidence interval for the
odds-ratio. [5 marks]
[Ans] The odds-ratio is =
35×3
18×6
= 0.97 . The confidence interval is therefore
exp {log ± 1.96√
1
35
+
1
18
+
1
6
+
1
3
} = (0.22,4.34)
Question 2. [10 marks] Suppose that there are three schools in a university, namely
science, humanity, and engineering. The university would like to understand the
career status of students graduated for one year. The record shows (i) the school that
the student belongs to, (ii) whether the student participated in summer internship, and
(iii) whether the student get a job within one year. Suggest a model for the prediction
of the probability that a student get a job within one year. State clearly the meaning of
each symbol that you have introduced.
[Ans] Let 1 = 1 if the student belongs to science school and 0 otherwise, 2 = 1 if
the student belongs to humanity school and 0 otherwise, 3 = 1 if the student
participated in summer internship and 0 otherwise, = 1 if the student got a job
within one year and 0 otherwise. Model:
( = 1) =
1 +
, = 0 + 11 + 22 + 33.
Here, 0, 1, 2, 3 are some unknown parameters.