PPGA 503 Measurement and Data Analysis for Policy
Measurement and Data Analysis for Policy
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
PPGA 503 Measurement and
Data Analysis for Policy
Week 11
Lab Session
Outline of Lab Session Today
• Prep for HW4
• Reporting and interpreting regression results
• Dealing with categorical and dummy variables in multiple regression
• Interaction effects
• STATA Exercise
2
3STATA code
reg dependentvar iv1 iv2 iv3
dependent variable (DV): a column that contains the variable
you want to explain or predict
independent variable (IV): a column that contains a variable you
use to explain dependent variable
Homework 4 prep
4
What do you do after you estimate a regression?
Answer the question in the prompt!
• What is your conclusion?
• Determining influence of IV(s) on DV – Don’t stop at p-values!!!
Substantive significance
• Beta coefficient and estimated effect of IV on DV
• Is it large enough for us to care?
• Depends on research question, or the scale of values for the DV
• Large relative to…? The effect of the other IV(s)
Statistical significance
• P-value and likelihood of observing the result due to random chance
6
Communicating regression results
Report the key statistics
Beta coefficient, p-value, R-square
• Beta coefficient – “For every dollar (unit of IV) increase in taxes (IV), there will be an associated decrease
in net domestic immigration rate (DV) by 0.03, or a decrease of 30 migrants per 1 million population
(unit of DV).”
• P-value – “The likelihood of observing the effect of taxes on net domestic immigration rate due to
random chance / if there is no correlation between the two variables is 0.41%, or about 1 out of 250
times.”
• R-square – “16.6% of the variation in net domestic immigration rate is explained by taxes.”
• Multiple regression (beta coeff & p-values) – For each IV that we include, we assess its impact on the
DV holding the effect of all other included IVs constant (kinda – not the actual technical explanation).
7
Communicating regression results
Tables!
• Include relevant details
• Beta coefficient, p-value, R-square
Potential limitations of multiple regression analysis
• Small sample size –> standard deviation increases –> larger standard errors –> less likely to
reach statistical significance
• Small sample size + many IV(s) –> lower degrees of freedom –> less likely to reach
statistical significance
• How variables were measured – some variables were measured for one year while others
were measured for a range of years
8
Interpreting dummy and
categorical variables in multiple
regression
9
Dummy variables in regression
10
What are dummy variables? Dummy variables, or binary variables, contain
2 groups:
The two groups are usually coded as “1” and “0”
How to interpret regression results?
• Identify the two groups (use tab indep_var and tab indep_var, nolabel)
• Remember that the “1-unit change” idea still applies here
• Basically, that compared to the “0s group” , the DV for the “1s group” is
higher/ lower by x units
Example: Interpreting dummy variables
11
Y [the DV] Coeff Std. Err. t P>|t| [95% Conf. Interval]
Age 0.10 0.05 2 0.048 0.0902 0.110
Gender -1.50 0.7 2.14 0.035 -1.640 -1.360
Education Levels 0.02 1.0 0.02 0.984 -0.176 0.216
_cons 1.30 0.40 3.25 0.002 1.22 1.38
Star Wars example:
• For the Gender variable, 1= Female and 0 = Male
• The outcome Y, “Love for Start Wars” is a response on a 5-point scale from 0 to 5
• 0 = do not love Star Wars at all
• 5 = 100% love Star Wars
Interpretation of the Gender variable?
Example: Interpreting dummy variables
12
Y [the DV] Coeff Std. Err. t P>|t| [95% Conf. Interval]
Age 0.10 0.05 2 0.048 0.0902 0.110
Gender -1.50 0.7 2.14 0.035 -1.640 -1.360
Education Levels 0.02 1.0 0.02 0.984 -0.176 0.216
_cons 1.30 0.40 3.25 0.002 1.22 1.38
Star Wars example:
• For the Gender variable, 1= Female and 0 = Male
• The outcome Y, “Love for Start Wars” is a response on a 5-point scale from 0 to 5
• 0 = do not love Star Wars at all
• 5 = 100% love Star Wars
Interpretation: Compared to males, females on average reported scoring
1.5 points lower on how much they love Star Wars
What happens when there are multiple dummy
variables?
• If there are multiple (non-overlapping) groups in one dataset (Q4)
• You can create dummy variables for each of them and include these in
your regression
• Remember that Stata requires dummy variables to be numeric (0 or
1), NOT string (text)!
• Essentially the groups that are not included in the regression becomes
the comparison / reference group
• What does this mean?
13
What happens when there are multiple Dummy
variables?
Think of it as a party…. Where we compare a group (IV) to the reference category
14
A
B
CD
E AB
CD
E
No comparison Coeff in regression table:
A compared B&C&D&E
In regression
A
BCD
E
In regression
Coeffs in regression table:
A compared to C&D&E
B compared to C&D&E
Note: Always leave at least 1 group out of the regression to know what you are comparing against
Comparison Group Comparison Group Comparison Group
A
B
C
D
E
In regression
Coeffs in regression table:
A compared to E
B compared to E
Comparison Group
What happens when there are multiple
Dummy variables?
15
A
B
C
D
E
In regression
Coeffs in regression table:
A compared to E
B compared to E
Comparison Group Y [the DV] Coeff P>|t|
A 0.10 0.048
B -1.50 0.035
C 0.02 0.984
D 1.30 0.002
“On average, individuals in group A earn
$0.10 more than individuals in group E”
Homework 4 Questions?
16
Interaction effects
17
Interaction effects – What the heck are they?
18
Definition: An interaction effect occurs when the effect of one variable depends on the value of
another variable.
Example for the sake of a simplicity
Imagine our model is:
Satisfaction = Food Condiment Food##Condiment
To keep things simple, we’ll include only two foods (ice cream and hot dogs) and two condiments
(chocolate sauce and mustard) in our analysis.
If someone asks you, “Do you prefer ketchup or chocolate sauce on your food?” Undoubtedly,
you will respond, “It depends on the type of food!” That’s the “it depends” nature of an
interaction effect.
Interaction effects – What the heck are they?
19
Interaction effects – What the heck are they?
20
Interaction effects – What the heck are they?
21
Interaction effects – still confused?
Very helpful video!!
Exercise – Predicting BMI
We are doing analysis for a healthcare company, building a model
to predict BMI of their patients.
Download “W11 Viktoria Lab.do”
**some fun with interaction effects and margins command**