Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
EFIMM0120
Applied Quantitative
Research in Accounting
and Finance
Lecture 5: Difference-in-Difference Design
and Quasi-Natural Experiments
Part 1: Endogeneity concerns in OLS regression
Dr. Zilu Shan
[email protected]
Endogeneity Problem in Causal Inference
§ Causal inference is the leveraging of theory and deep knowledge
of institutional details to estimate the impact of events and choices
on a given outcome of interest.
§ In traditional OLS regression, there are three main source of bias
–Omitted Variable
–Simultaneously
–Measurement error
Endogeneity Source (1)
Omitted variables
§ The possibility that an explanatory variable modeled as exogenous
will in fact be endogenous because of omitted variable
Assume that the structural equation is given by! = #! + #"%" + ##&" + '
§ The OLS method provides the Best Linear Unbiased Estimate
(BLUE) if all explanatory variables in this regression is exogenous
§ The bias is equal to
E[ (#"] − #" = ## $%&((!)*!),-.((!)
We are only concerned with the omitted variable that impact both
explanatory and explained variables
Endogeneity Source (1)
Omitted variables –example
§ Omitted variable issue are particularly severe in corporate finance,
because the objects of study (Firms or CEOs, for example) are
heterogeneous along many different dimensions
§ Some examples:
–Executive compensation executives’ abilities
–Corporate financial and investment politics financing frictions (i.e.
information asymmetry and incentive conflicts)
–Corporate decisions both public and unpublic information
Endogeneity Source (2)
Simultaneity
Assume a two-equation structural model:!! = #" + #!%! + ##&! + '!%! = (" + (!!! + (#&! + '#
If written in reduced form%! = $!%$"&!!'$"&" + $"&#!'$"&" &! + $#!'$"&" &# + $"!'$"&" '! + !!'$"&" '#
If (! ≠ 0, then there exists endogeneity due to simultaneity
§ Simultaneity arises when one or more explanatory variables are jointly
determined with the explained variable in an equilibrium.
§ If simultaneity exists, the structural error term is correlated with the explanatory
variable
§ The causal relationship between an explained and an explanatory variable runs
both ways
Endogeneity Source (2)
Simultaneity - example
§ In a regression of a value multiple (such as market-to-book) on an
index of anti-takeover provisions, the usual result is a negative
coefficient on the index.
§ However, it doesn’t mean that the presence of anti-takeover
provisions leads to a loss of firm value.
§ Alternative explanation: Mangers of low value firms adopt anti-
takeover provisions in order to entrench themselves.
Endogeneity Source (3)
Measurement errors
§ Measurement error comes from any discrepancy between the true
variables and the proxy for unobservable or difficult to quantify
variables.
§ It could happen for either dependent variables or independent
variables.
§ Some examples:
–Market value of debt: Most debt is privately held by banks and other
financial institutions, so there is no observable market value
–Executive compensation: Stock options often vest over time and valued
using an approximation, such as Black-Scholes
–Corporate governance: it’s a nebulous concept with a variety of different
facets. Current use of anti-takeover provision index or the presence of large
blockholders are unlikely to be sufficient
EFIMM0120
Applied Quantitative
Research in Accounting
and Finance
Lecture 5: Difference-in-Difference Design
and Quasi-Natural Experiments
Part 2: An introduction to DID method
Dr. Zilu Shan
[email protected]
ControlledVs. (Quasi-) Natural
Experiments (1)
Controlled experiments are often used in other area of science
§ i.e. check if certain drug reduces cholesterol, researchers could
randomly assign patients to treatment group (certain drug) and
control group (placebo pills)
§ The difference in the average change in cholesterol between the
two group of patients is average treatment effect (ATE), or in
regression format, difference-in-difference estimator.
§ It’s very rare in social science that researcher can apply controlled
experiments, because of the difficulty of imposing treatment.
ControlledVs. (Quasi-) Natural
Experiments (2)
Natural experiments is a sharp change in one or more variables of
interest that occurs for exogenous reasons.
§ It could either
– by natural causes (e.g. natural disasters),
– or by some kind of human action, such as changes in regulation, economic
policy and political changes (generally referred to as “quasi-natural”
experiments)
§ Key assumption to allow causal inference
– the treatment assignment is random or at least “as good as random”
– In other words, any other variable that is important to determine the
outcome variable is uncorrelated with the treatment assignment
§ In literature, it often be called “the shock”
Causal inference issue
§ We ideally would like the average treatment effect (ATE)* !" − * !!
§ But we only observe an estimate of the naïve estimator:* !"|, = 1 − * !!|, = 0
§ If we assume * !"|, = 1 = * !!|, = 1 then we can get the ATT
(average treatment on the treated) from observed data* !"|, = 1 − * !!|, = 0
Single difference in cross-section
Comparing the post-treatment values of the outcome variable between
treated and untreated firms!! = #" + ##%! + &!
§ !! is the outcome variable
§ %! is a dummy variable indicates whether firm ' is in treatment group
§ If treatment is random, then %! is uncorrelated with error term &!
§ (## is therefore an unbiased estimate of the average treatment effect(ATE)
§ The approach is useful when the researcher does not have data on
the values of outcome ! previously to the treatment
§ The potential problem:
the average ! of treated and untreated firms were different ex-ante (i.e.
before the treatment)
Time-difference regressions
Comparing post-treatment values to pre-treatment values for all firms!/,1 = #! + #"P1 + '/,1
§ !/,1 is the outcome variable
§ 01 is a dummy variable which takes value 1 for the observations in
the post-shock period, and 0 for the observations in the pre-shock
period
§ No other event that affect the outcome !/,1 occurred between the
pre-shock and post-shock period.
§ In other words, there is no omitted variable, correlated to 01 , that
affects !/,1
Difference-in-Difference
§ Difference in difference model combines the cross-sectional and
the time-series differences model into a single model
§ Intuition: compute the difference of
– the change in outcome ! pre- versus post-treatment for the treated group
and
– the change in outcome ! pre- versus post-treatment for the control group
§ We would need a panel of treated and untreated firms, with
observations before the shock and after it
§ A typical DID regression would be!",$ = #% +#&P$ +#'&" +#(P$ ∗ &" +(",$
DiD coefficient
!",$ = #% +#&P$ +#'&" +#(P$ ∗ &" +(",$
§ (#" captures the average change in !/,1 from the pre- to post- shock
periods for the untreated group (,/ = 0)
§ (## captures the pre-shock difference in !/,1 between treated and
untreated firms (01 = 0)
§ Our main coefficient (#2 (DiD coefficient) captures the effect of the
shock, that is the average differential change in !/,1 from the pre- to
post- treatment period for the treatment group relative to control
group
Further comments on Diff-in-Diff
§ We can add individual level covariates, including fixed effects, but they
really aid only for getting more precise estimate.
§ Diff-in-diff logic with ' treated and ) not treated (control group), taking
expectations to eliminate the * which we assume orthogonal to
treatment+ !$%## |%$%# = 1 − + !"|%$ = 1 + + !$"|%$ = 0 - + !$%#" |%$%# = 0
= 0 + 1 + 2$%# − 0 + 2& +(0+ 2&) − (0 +2$%#)
=1
§ If we assume + !"|% = 0 = + !"|% = 1 then we can get the ATT
(average treatment on the treated) from observed data+ !#|% = 1 − + !"|% = 1
Graphic illustration – DiD without trends
§ Pre- and post- shock average ! of
treated and untreated observations
are constant (no trend)
§ +#! captures the difference betweenaverage pre- and average post-
shock outcome for untreated
observations
§ +## captures the pre-shock differencein outcome between treated and
untreated firms
§ +#( (DiD coefficient) capturesdifference between observed
average post-shock ! and the
average unobserved counterfactual !
after the shock (i.e. The hypothetical
value of ! of treated observation
absent the shock)
Graphic illustration – DiD with trends
§ Pre- and post- shock average ! of
treated and untreated observations
increase at a constant trend
§ +#! captures the difference betweenaverage pre- and average post-
shock outcome for untreated
observations
§ +## captures the pre-shock differencein outcome between treated and
untreated firms