Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
EFIMM0120
Applied Quantitative Research in Accounting and Finance Lecture 5: Difference-in-Difference Design and Quasi-Natural Experiments Part 1: Endogeneity concerns in OLS regression Dr. Zilu Shan [email protected] Endogeneity Problem in Causal Inference § Causal inference is the leveraging of theory and deep knowledge of institutional details to estimate the impact of events and choices on a given outcome of interest. § In traditional OLS regression, there are three main source of bias –Omitted Variable –Simultaneously –Measurement error Endogeneity Source (1) Omitted variables § The possibility that an explanatory variable modeled as exogenous will in fact be endogenous because of omitted variable Assume that the structural equation is given by! = #! + #"%" + ##&" + ' § The OLS method provides the Best Linear Unbiased Estimate (BLUE) if all explanatory variables in this regression is exogenous § The bias is equal to E[ (#"] − #" = ## $%&((!)*!),-.((!) We are only concerned with the omitted variable that impact both explanatory and explained variables Endogeneity Source (1) Omitted variables –example § Omitted variable issue are particularly severe in corporate finance, because the objects of study (Firms or CEOs, for example) are heterogeneous along many different dimensions § Some examples: –Executive compensation executives’ abilities –Corporate financial and investment politics financing frictions (i.e. information asymmetry and incentive conflicts) –Corporate decisions both public and unpublic information Endogeneity Source (2) Simultaneity Assume a two-equation structural model:!! = #" + #!%! + ##&! + '!%! = (" + (!!! + (#&! + '# If written in reduced form%! = $!%$"&!!'$"&" + $"&#!'$"&" &! + $#!'$"&" &# + $"!'$"&" '! + !!'$"&" '# If (! ≠ 0, then there exists endogeneity due to simultaneity § Simultaneity arises when one or more explanatory variables are jointly determined with the explained variable in an equilibrium. § If simultaneity exists, the structural error term is correlated with the explanatory variable § The causal relationship between an explained and an explanatory variable runs both ways Endogeneity Source (2) Simultaneity - example § In a regression of a value multiple (such as market-to-book) on an index of anti-takeover provisions, the usual result is a negative coefficient on the index. § However, it doesn’t mean that the presence of anti-takeover provisions leads to a loss of firm value. § Alternative explanation: Mangers of low value firms adopt anti- takeover provisions in order to entrench themselves. Endogeneity Source (3) Measurement errors § Measurement error comes from any discrepancy between the true variables and the proxy for unobservable or difficult to quantify variables. § It could happen for either dependent variables or independent variables. § Some examples: –Market value of debt: Most debt is privately held by banks and other financial institutions, so there is no observable market value –Executive compensation: Stock options often vest over time and valued using an approximation, such as Black-Scholes –Corporate governance: it’s a nebulous concept with a variety of different facets. Current use of anti-takeover provision index or the presence of large blockholders are unlikely to be sufficient EFIMM0120 Applied Quantitative Research in Accounting and Finance Lecture 5: Difference-in-Difference Design and Quasi-Natural Experiments Part 2: An introduction to DID method Dr. Zilu Shan [email protected] ControlledVs. (Quasi-) Natural Experiments (1) Controlled experiments are often used in other area of science § i.e. check if certain drug reduces cholesterol, researchers could randomly assign patients to treatment group (certain drug) and control group (placebo pills) § The difference in the average change in cholesterol between the two group of patients is average treatment effect (ATE), or in regression format, difference-in-difference estimator. § It’s very rare in social science that researcher can apply controlled experiments, because of the difficulty of imposing treatment. ControlledVs. (Quasi-) Natural Experiments (2) Natural experiments is a sharp change in one or more variables of interest that occurs for exogenous reasons. § It could either – by natural causes (e.g. natural disasters), – or by some kind of human action, such as changes in regulation, economic policy and political changes (generally referred to as “quasi-natural” experiments) § Key assumption to allow causal inference – the treatment assignment is random or at least “as good as random” – In other words, any other variable that is important to determine the outcome variable is uncorrelated with the treatment assignment § In literature, it often be called “the shock” Causal inference issue § We ideally would like the average treatment effect (ATE)* !" − * !! § But we only observe an estimate of the naïve estimator:* !"|, = 1 − * !!|, = 0 § If we assume * !"|, = 1 = * !!|, = 1 then we can get the ATT (average treatment on the treated) from observed data* !"|, = 1 − * !!|, = 0 Single difference in cross-section Comparing the post-treatment values of the outcome variable between treated and untreated firms!! = #" + ##%! + &! § !! is the outcome variable § %! is a dummy variable indicates whether firm ' is in treatment group § If treatment is random, then %! is uncorrelated with error term &! § (## is therefore an unbiased estimate of the average treatment effect(ATE) § The approach is useful when the researcher does not have data on the values of outcome ! previously to the treatment § The potential problem: the average ! of treated and untreated firms were different ex-ante (i.e. before the treatment) Time-difference regressions Comparing post-treatment values to pre-treatment values for all firms!/,1 = #! + #"P1 + '/,1 § !/,1 is the outcome variable § 01 is a dummy variable which takes value 1 for the observations in the post-shock period, and 0 for the observations in the pre-shock period § No other event that affect the outcome !/,1 occurred between the pre-shock and post-shock period. § In other words, there is no omitted variable, correlated to 01 , that affects !/,1 Difference-in-Difference § Difference in difference model combines the cross-sectional and the time-series differences model into a single model § Intuition: compute the difference of – the change in outcome ! pre- versus post-treatment for the treated group and – the change in outcome ! pre- versus post-treatment for the control group § We would need a panel of treated and untreated firms, with observations before the shock and after it § A typical DID regression would be!",$ = #% +#&P$ +#'&" +#(P$ ∗ &" +(",$ DiD coefficient !",$ = #% +#&P$ +#'&" +#(P$ ∗ &" +(",$ § (#" captures the average change in !/,1 from the pre- to post- shock periods for the untreated group (,/ = 0) § (## captures the pre-shock difference in !/,1 between treated and untreated firms (01 = 0) § Our main coefficient (#2 (DiD coefficient) captures the effect of the shock, that is the average differential change in !/,1 from the pre- to post- treatment period for the treatment group relative to control group Further comments on Diff-in-Diff § We can add individual level covariates, including fixed effects, but they really aid only for getting more precise estimate. § Diff-in-diff logic with ' treated and ) not treated (control group), taking expectations to eliminate the * which we assume orthogonal to treatment+ !$%## |%$%# = 1 − + !"|%$ = 1 + + !$"|%$ = 0 - + !$%#" |%$%# = 0 = 0 + 1 + 2$%# − 0 + 2& +(0+ 2&) − (0 +2$%#) =1 § If we assume + !"|% = 0 = + !"|% = 1 then we can get the ATT (average treatment on the treated) from observed data+ !#|% = 1 − + !"|% = 1 Graphic illustration – DiD without trends § Pre- and post- shock average ! of treated and untreated observations are constant (no trend) § +#! captures the difference betweenaverage pre- and average post- shock outcome for untreated observations § +## captures the pre-shock differencein outcome between treated and untreated firms § +#( (DiD coefficient) capturesdifference between observed average post-shock ! and the average unobserved counterfactual ! after the shock (i.e. The hypothetical value of ! of treated observation absent the shock) Graphic illustration – DiD with trends § Pre- and post- shock average ! of treated and untreated observations increase at a constant trend § +#! captures the difference betweenaverage pre- and average post- shock outcome for untreated observations § +## captures the pre-shock differencein outcome between treated and untreated firms