Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
MATH 136 CH 12 INFERENCE ON CATEGORICAL DATA GUIDED NOTES
Page 1 of 6 12.1 GOODNESS-OF-FIT TEST A Goodness-of-Fit Test is used to see if the proportions of the categories
of a qualitative variable fit a certain distribution. (Is it a good fit?) Usually this is given by data values but sometimes
also given as a distribution. • If a die is rolled many times, the die should land on each number about 1/6th of the time.
Does your sample distribution of dice rolls fit this shape? • A population has a certain demographics; do its government
offices (or juries) employ the same percent of each group? MAIN IDEA Start with the assumption that the proportions
of your experiment DO follow a certain distribution. • Calculate the number of data points we should get in each
category IF the distribution is a good fit. In general, we expect about ______________ data points from each category
of the experiment if the distribution is a good fit for the given sample size. • Collect data and count how many times we
observe each category. • Find the level of agreement/disagreement in the values If what we observe from the actual
data is relatively close to what we expect to see if the assumption was true, If the difference between our observations
from what we expected to see is relatively large, EXAMPLE 0 Find the expected counts for each category. Perform the
Goodness-of-Fit Test = = 1 = 2 = 3 = 4 0.2 0.1 0.45 0.25 Expected Counts Suppose Observe 100 50 150 100 PERFORM A
GOODNESS OF FIT TEST 1. Write the hypothesis 0: The random variable fits/follows ... 1: The random variable does not
2. Check the conditions for the test (assume categories) 3. Sketch Distribution (2) with critical value. Use = − 1. 4.
Find Test statistic 2 = ∑ ( − ) 2 5. Answer the question MATH 136 CH 12 INFERENCE ON CATEGORICAL DATA GUIDED NOTES
Page 2 of 6 EXAMPLE 1 (12.1 #10) Determine the 2 test statistic, the degrees of freedom, critical value for = 0.05, and test the
hypothesis. 0: The Random variable is binomial with = 4, = 0.3 1: The Random variable is not binomial with = 4, = 0.3 EXAMPLE 2 (12.1 #12) According to the manufacturer of peanut M & M’s, in one bag there should be 12% brown M & M’s, 15% yellow, 12% red, 23% blue, 23% orange, 15% green. You buy a bag of peanut M & M’s and count the number of different colored M & M’s. The frequencies are shown below. Test to see if the manufacturer has correctly advertised these percentages using a 5% significance level. Observed Expected 0 260 240.1 1 400 411.6 2 280 264.6 3 50 75.6 4 10 8.1 Color Frequency Brown 53 Yellow 66 Red 38 Blue 96 Orange 88 Green 59 MATH 136 CH 12 INFERENCE ON CATEGORICAL DATA GUIDED NOTES Page 3 of 6 REVIEW: What is the Goodness-of-Fit Test checking? EXAMPLE 3 A supervisor at FastTrak Professional Temporary Services, Inc. suspects that her employees are more likely to call in sick on Mondays and Fridays than on the other days of the week. To test this claim, she collects data on the number of people who call in sick on each day of the week. Her data are summarized in the following table. Test the claim that the relative frequency of days in which employees call in sick is not equal for the five days in the work week. Use a level of significance of 0.05. Does it appear that the absentees are not equally distributed across the five days? 12.2 TEST FOR INDEPENDENCE AND THE HOMOGENEITY OF PROPORTIONS Motivating Problem: Are health and happiness related (dependent on each other in some way)? Suppose you survey 2,000 individuals by giving them the questionnaire below: How happy are you in your life right now? Not Too Happy Pretty Happy Very Happy How is your current health? Poor Fair Good Excellent The totals (not the complete dataset) for each category collected from the survey is in the table below. Health Condition Poor Fair Good Excellent Total H ap p in es s Not Too Happy 36 92 105 33 266 Pretty Happy 53 231 567 247 1098 Very Happy 20 82 263 271 636 Total 109 405 935 551 2,000 ASUME INDEPENDENCE BETWEEN CATEGORIES Day of Week Monday Tuesday Wednesday Thursday Friday Frequency 9 7 7 11 13 MATH 136 CH 12 INFERENCE ON CATEGORICAL DATA GUIDED NOTES Page 4 of 6 Under independence ( ) = () ⋅ () Pattern for Expected Frequency Now we test for INDEPENDENCE (is it true our data agrees with this independence assumption?... THAT’S A GOODNESS-OF-FIT TEST!) EXAMPLE 4 Are happiness and health related? Test to see if there is a relationship using a 5% significance level. Health Condition Poor Fair Good Excellent Total H ap p in es s Not Too Happy 266 Pretty Happy 1098 Very Happy 636 Total 109 405 935 551 2,000 PERFORM A TEST FOR THE HOMOGENEITY OF PROPORTIONS (Are the Proportions the same?) 1. Write the hypothesis 0: Categorical variables are INDependent 0: Categorical variables are DEPendent 2. Check the conditions for the test: 3. Sketch Distribution (2) Use = ( − 1)( − 1) 4. Find Test statistic 2 = ∑ ( − ) 2 5. Answer the question Health Condition Poor Fair Good Excellent Total H ap p in es s Not Too Happy 36 92 105 33 266 Pretty Happy 53 231 567 247 1098 Very Happy 20 82 263 271 636 Total 109 405 935 551 2,000 MATH 136 CH 12 INFERENCE ON CATEGORICAL DATA GUIDED NOTES Page 5 of 6 EXAMPLE 5 Are drugs treating the same ailment just as effective. Zocor is a drug manufactured by Merck and Co. that is meant to reduce the level of LDL (bad) cholesterol and increase the level of HDL (good) cholesterol. In clinical trials of the drug, patients were randomly divided into three groups. Group 1 received Zocor; group 2 received a placebo; group 3 received cholestyramine, a cholesterol- lowering drug that is currently available. Is there evidence to indicate that the proportion of subjects in each group who experienced abdominal pain is different at the = 0.01 level of significance? EXAMPLE 6 A paper compared head injuries for collegiate soccer players, athletes in sports other than soccer, and a group of students who were not involved in collegiate sports. The table contains the results for each group of students. Are the proportions of concussions the same for all groups? Use = 0.05. Group 1 (Zocor) Group 2 (placebo) Group 3 (cholestyramine) Number of people who experience abdominal pain 51 5 16 Number of people who did not experience abdominal pain 1532 152 163 Number of Concussions 0 1 2 3 or More Marginal Total Soccer Player 45 25 11 10 91 Non-Soccer Athletes 68 15 8 5 96 Non-Athletes 45 5 3 0 53 Marginal Total 158 45 22 15 240 MATH 136 CH 12 INFERENCE ON CATEGORICAL DATA GUIDED NOTES Page 6 of 6 12.3 INFERENCE ABOUT TWO POPULATION PROPORTIONS: DEPENDENT SAMPLES EXAMPLE 7 A recent General Social Survey asked the following two questions of a random sample of 1492 adult Americans under the hypothetical scenario that the government suspected that a terrorist act was about to happen: • Do you believe that the authorities should have the right to tap people’s telephone conversations? • Do you believe that the authorities should have the right to stop and search people on the street at random? Do the proportion of people who agree with each scenario differ significantly? Use the = 0.05 level of significance. EXAMPLE 8 A randomized controlled trial was designed to test the effectiveness of hip protectors in preventing hip fractures in the elderly. Nursing home residents each wore protection on one hip, but not the other. Results are summarized in the table (based on data from “Efficacy of Hip Protector to Prevent Hip Fracture in Nursing Home Residents,” by Kiel et al., Journal of the American Medical Association, Vol. 298, No. 4).
Test the null hypothesis that the following two proportions are the same. Random Stop Agree Disagree Ta p P h o n e Agree 494 335
Disagree 126 537 No Hip Protector Worn No Hip Fracture Hip Fracture H ip P ro te ct o r W o rn No Hip Fracture 309 10
Hip Fracture 15 2 PERFORM A TEST ON TWO (DEP) POPULATION PROPORTIONS 1. Write the hypothesis 0:
Proportions from two populations are equal (1 = 2) 0: Proportions from two populations are not equal (1 ≠ 2)
2. Check the conditions for the test: 3. Sketch Distribution (2) with critical value ( = 1)
4. Find Test statistic (McNemar’s test) 2 = ∑ (12 − 21) 2 12 + 21 5. Answer the question