class: center, middle, inverse, title-slide .title[ # Instrumental Variables Assumptions, Tests, and Interpretations ] .subtitle[ ##
] .author[ ### Ian McCarthy | Emory University ] .date[ ### Econ 771, Fall 2022 ] --- class: inverse, center, middle <!-- Adjust some CSS code for font size and maintain R code font size --> <style type="text/css"> .remark-slide-content { font-size: 30px; padding: 1em 2em 1em 2em; } .remark-code { font-size: 15px; } .remark-inline-code { font-size: 20px; } </style> <!-- Set R options for how code chunks are displayed and load packages --> # Instrumental Variables Assumptions <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1055px></html> --- # Key IV assumptions 1. *Exclusion:* Instrument is uncorrelated with the error term<br> 2. *Validity:* Instrument is correlated with the endogenous variable<br> 3. *Monotonicity:* Treatment more (less) likely for those with higher (lower) values of the instrument<br> -- <br> Assumptions 1 and 2 sometimes grouped into an *only through* condition. --- # Exclusion Kippersluis and Rietveld (2018), "Beyond Plausibly Exogenous" - "zero-first-stage" test - Focus on subsample for which your instrument is not correlated with the endogenous variable of interest 1. Regress the outcome on all covariates and the instruments among this subsample 2. Coefficient on the instruments captures any potential direct effect of the instruments on the outcome (since the correlation with the endogenous variable is 0 by assumption). --- # Exclusion Beckert (2019), "A Note on Specification Testing in Some Structural Regression Models" - With at least `\(n\)` valid instruments, test if all instruments are valid against the alternative that up to `\(m - n\)` instruments are valid 1. Estimate the first-stage regressions and save residuals, denoted `\(\hat{u}\)`. 2. Estimate the "artificial" regression `$$y=\beta x + \delta \tilde{z} + \gamma \hat{u} + \varepsilon$$` where `\(\tilde{z}\)` denotes a subset of `\(m-n\)` instruments from the full instrument matrix `\(z\)`. 3. Test the null that `\(\delta=0\)` using a standard F-test --- # Solving and Exclusion Problem Conley et al (2010) and "plausible exogeneity", union of confidence intervals approach - Suppose extent of violation is known in `\(y_{i} = \beta x_{i} + \gamma z_{i} + \varepsilon_{i}\)`, so that `\(\gamma = \gamma_{0}\)` - IV/TSLS applied to `\(y_{i} - \gamma_{0}z_{i} = \beta x_{i} + \varepsilon_{i}\)` works - With `\(\gamma_{0}\)` unknown...do this a bunch of times! - Pick `\(\gamma=\gamma^{b}\)` for `\(b=1,...,B\)` - Obtain `\((1-\alpha)\)` % confidence interval for `\(\beta\)`, denoted `\(CI^{b}(1-\alpha)\)` - Compute final CI as the union of all `\(CI^{b}\)` --- # Solving an Exclusion Problem Nevo and Rosen (2012) ReStat: `$$y_{i} = \beta x_{i} + \delta D_{i} + \varepsilon_{i}$$` - Allow instrument, `\(z\)`, to be correlated with `\(\varepsilon\)`, but `\(|\rho_{x, \varepsilon}| \geq |\rho_{z, \varepsilon}|\)` - i.e., IV is better than just using the endogenous variable - Assume `\(\rho_{x, \varepsilon} \times \rho_{z, \varepsilon} >0\)` (same sign of correlation in the error) - Denote `\(\lambda = \frac{\rho_{z, \varepsilon}}{\rho_{x, \varepsilon}}\)`, then valid `\(IV\)` would be `\(V(z) = \sigma_{x} z - \lambda \sigma_{z} x\)` - Can bound `\(\beta\)` using range of `\(\lambda\)` --- class: inverse, center, middle # Validity and Weak Instruments <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1055px></html> --- # Validity Just says that your instrument is correlated with the endogenous variable, but what about the **strength** of the correlation? .center[ ![](https://media.giphy.com/media/3oFzlXvco5Wt2gnMcg/giphy.gif) ] --- # Why we care about instrument strength Recall our schooling and wages equation, `$$y = \beta S + \epsilon.$$` Bias in IV can be represented as: `$$Bias_{IV} \approx \frac{Cov(S, \epsilon)}{V(S)} \frac{1}{F+1} = Bias_{OLS} \frac{1}{F+1}$$` - Bias in IV may be close to OLS, depending on instrument strength - **Bigger problem:** Bias could be bigger than OLS if exclusion restriction not *fully* satisfied --- # Testing strength of instruments Two things going on simultaneously: 1. Strength of the first-stage 2. Inference on coefficient of interest in the structural equation -- <br> Applied researchers tend to (wrongly) think of these as separate issues. --- # Testing strength of instruments **Many endogenous variables** - Stock & Yogo (2005) test based on Cragg & Donald statistic (homoskedasticity only) - Kleibergen & Paap (2007) Wald statistic - Sanderson & Windmeijer (2016) extension - Effective F-statistic from Olea & Pflueger (2013) (as approximation) --- # Testing strength of instruments **Single endogenous variable** - Partial `\(R^{2}\)` (never see this) - Stock & Yogo (2005) test based on first-stage F-stat (homoskedasticity only) - Critical values in tables, based on number of instruments - Rule-of-thumb of 10 with single instrument (higher with more instruments) - Lee (2021): With first-stage F-stat of 10, standard "95% confidence interval" for second stage is really an 85% confidence interval --- # Testing strength of instruments **Single endogenous variable** - Partial `\(R^{2}\)` (never see this) - Stock & Yogo (2005) test based on first-stage F-stat (homoskedasticity only) - Kleibergen & Paap (2007) Wald statistic - Effective F-statistic from Olea & Pflueger (2013) --- # Testing strength of instruments **Single endogenous variable** - Inference with Anderson-Rubin CIs (homoskedastic) - Inference with Lee (2021) tables (allows for more general error structure) --- # Testing strength of instruments: First-stage .pull-left[ **Single endogenous variable** 1. Homoskedasticity - Stock & Yogo, effective F-stat 2. Heteroskedasticity - Effective F-stat ] .pull-left[ **Many endogenous variables** 1. Homoskedasticity - Stock & Yogo with Cragg & Donald statistic, Sanderson & Windmeijer (2016), effective F-stat 2. Heteroskedasticity - Kleibergen & Papp Wald is robust analog of Cragg & Donald statistic, effective F-stat ] --- # Testing strength of instruments: Inference .pull-left[ **Single endogenous variable** 1. Homoskedasticity - Anderson-Rubin CIs, Lee tF, likelihood ratio test of Moreira (2003) 2. Heteroskedasticity - Lee tF ] .pull-left[ **Many endogenous variables** 1. Homoskedasticity - Projection-based confidence sets using the Anderson-Rubin CI (Dufour & Taamouti, 2005) but low power 2. Heteroskedasticity - Go figure it out! ] --- # Making sense of all of this... 1. Test first-stage using effective F-stat 2. Present weak-instrument-robust inference using Anderson-Rubin CIs or Lee tF tests -- <br> Many endogenous variables problematic because strength of instruments for one variable need not imply strength of instruments for others --- class: inverse, center, middle # Interpreting Instrumental Variables Results <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1055px></html> --- # Heterogenous TEs - In constant treatment effects, `\(Y_{i}(1) - Y_{i}(0) = \delta_{i} = \delta, \text{ } \forall i\)` - Heterogeneous effects, `\(\delta_{i} \neq \delta\)` - With IV, what parameter did we just estimate? Need **monotonicity** assumption to answer this --- # Monotonicity Assumption: Denote the effect of our instrument on treatment by `\(\pi_{1i}\)`. Monotonicity states that `\(\pi_{1i} \geq 0\)` or `\(\pi_{1i} \leq 0, \text{ } \forall i\)`. - Allows for `\(\pi_{1i}=0\)` (no effect on treatment for some people) - All those affected by the instrument are affected in the same "direction" - With heterogeneous ATE and monotonicity assumption, IV provides a "Local Average Treatment Effect" (LATE) --- # LATE and IV Interpretation - LATE is the effect of treatment among those affected by the instrument (compliers only). - Recall original Wald estimator: `$$\delta_{IV} = \frac{E[Y_{i} | Z_{i}=1] - E[Y_{i} | Z_{i}=0]}{E[D_{i} | Z_{i}=1] - E[D_{i} | Z_{i}=0]}=E[Y_{i}(1) - Y_{i}(0) | \text{complier}]$$` - Practically, monotonicity assumes there are no defiers and restricts us to learning only about compliers --- # Is LATE meaningful? - Learn about average treatment effect for compliers - Different estimates for different compliers - IV based on merit scholarships - IV based on financial aid - Same compliers? Probably not --- # LATE with defiers - In presence of defiers, IV estimates a weighted difference between effect on compliers and defiers (in general) - LATE can be restored if subgroup of compliers accounts for the same percentage as defiers and has same LATE - Offsetting behavior of compliers and defiers, so that remaining compliers dictate LATE --- class: inverse, center, middle # Marginal Treatment Effects <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1055px></html> --- # Idea - Lots of different treatment effects...ATT, ATU, LATE - Would be nice to have full distribution, `\(f(\delta-{i})\)`, from which we could derive ATE, ATT, LATE, etc. - Marginal treatment effect (MTE) is nonparametric function that links all of these together - Treatment effect, `\(\delta_{i} = Y_{i}(1) - Y_{i}(0)\)` - Treatment governed by index, `\(D_{i} = \mathbb{1}(\gamma z_{i} \geq v_{i})\)`, with instruments `\(z\)` `$$\delta^{MTE} (x, v_{i}) = E[\delta_{i} | x_{i} = x, \gamma z_{i} = v_{i}]$$` --- # Idea `$$\delta^{MTE} (x, v_{i}) = E[\delta_{i} | x_{i} = x, \gamma z_{i}=v_{i}]$$` - `\(\gamma z_{i} = v_{i}\)` captures indifference between treatment and no treatment - Instead of number (like ATT or LATE), `\(\delta^{MTE}(x,v)\)` is a function of `\(v\)` - Average effect of treatment for everyone with the same `\(\gamma z_{i}\)` --- # Idea What is `\(\gamma z_{i}\)`? - For any single index model `$$D_{i} = \mathbb{1}(\gamma z_{i} \geq v_{i}) = \mathbb{1}(u_{is} \leq F(\gamma z_{i})) \text{ for } u_{is} \in [0,1]$$` - `\(F(\gamma z_{i})\)` is CDF of `\(v{i}\)` - Inverse CDF transform to translate into uniform distribution - Yields propensity score, `\(P(z_{i}) = Pr(D_{i}=1 | z_{i}) = F(\gamma z_{i})\)` (ranking all units according to probability of treatment) --- # Idea - MTE is LATE as change in `\(z\)` approaches 0 (Heckman, 1997) - MTE is Local IV: `$$\delta^{LIV} (x,p) = \frac{\partial E[Y | x, P(z)=p]}{\partial p}$$` - How does outcome `\(Y_{i}\)` change as we push one more person into treatment --- # Derivation `$$\begin{eqnarray*} Y(0) &=& \gamma_0 x + u_0\\ Y(1) &=& \gamma_1 x + u_1 \end{eqnarray*}$$` `\(P(D=1 | z) = P(z)\)` works as our instrument with two assumptions: 1. `\((u_0, u_1, u_s) \perp P(z) | x\)`. (Exogeneity) 2. Conditional on `\(z\)` there is enough variation in `\(z\)` for `\(P(z)\)` to take on all values `\(\in(0,1)\)`. - Much stronger than typical validity assumption, akin to special regressor in Lewbel's work - Binary variable, `\(D=\mathbb{1}(V+W^{*}\geq 0)\)`. "If `\(V\)` is independent of `\(W\)`, then variation in `\(V\)` changes the probability of `\(D=1\)` in such a way that traces out the distribution of `\(W^{*}\)` --- # Derivation - Now we can write, `$$\small \begin{eqnarray*} Y &=& \gamma_0' x + D(\gamma_1 - \gamma_0)' x + u_0 + D(u_1 - u_0)\\ E[Y| x,P(z)=p] &=& \gamma_0' x + p(\gamma_1 - \gamma_0)'x + E[D(u_1 - u_0)|x,P(z)=p] \end{eqnarray*}$$` - Observe `\(D=1\)` over the interval `\(u_s = [0,p]\)` and zero for higher values of `\(u_s\)`. Let `\(u_1-u_0 \equiv \eta\)`. `$$\small \begin{eqnarray*} E[D(u_1 - u_0) | P(z) =p,x] &=& \int_{-\infty}^{\infty} \int_{0}^{p} (u_1 - u_0) f((u_1-u_0) | u_s) d u_s d(u_1 -u_0)\\ E[D(\eta) | P(z) =p,x] &=& \int_{-\infty}^{\infty} \int_{0}^{p} \eta f(\eta | u_s) d\, \eta d\, u_s\\ \end{eqnarray*}$$` --- # Derivation Recall: `$$\begin{align*} E[Y| x,P(z)=p] &= \gamma_0' X + p(\gamma_1 - \gamma_0)'X + E[D(u_1 - u_0)|x,P(z)=p]\\ \end{align*}$$` And the derivative: `$$\begin{align*} \delta^{MTE}(p) = \frac{\partial E[Y | x, P(z)=p]}{\partial p} &= (\gamma_1 - \gamma_0)'x + \int_{-\infty}^{\infty} \eta f(\eta | u_s =p) d\, \eta\\ &= \underbrace{(\gamma_1 - \gamma_0)'x}_{ATE(x)}+ E[\eta | u_s =p] \end{align*}$$` What is `\(E[\eta | u_s =p]\)`? The expected unobserved gain from treatment of those people who are on the treatment/no-treatment margin `\(P(z)=p\)`. --- # MTE to ATE Calculate the outcome given `\((x,z)\)` (actually `\(z\)` and `\(P(z)=p\)`). `$$\begin{align*} \delta^{ATE}(x, T=1)&=E\left(\delta^{MTE} | X=x \right)\\ \end{align*}$$` ATE: We treat everyone. `$$\begin{eqnarray*} \int_{-\infty}^{\infty} \delta^{MTE}(p) = (\gamma_1 - \gamma_0)'x + \underbrace{\int_{-\infty}^{\infty} E(\eta | u_s) d\, u_s}_{0} \end{eqnarray*}$$` --- # MTE to ATT Calculate the outcome given `\((x,z)\)` (actually `\(z\)` and `\(P(z)=p\)`). `$$\begin{align*} \delta^{ATT}(x, P(z), T=1)&=E\left(\delta^{MTE} | X=x, u_{s} \leq P(z)\right)\\ \end{align*}$$` ATT: Treat only those with a large enough propensity score `\(P(z)>p\)`: `$$\begin{eqnarray*} \int_{-\infty}^{\infty} \delta^{MTE}(p,x) \frac{Pr(P(z | x) > p)}{E[P(z | x)]} d\,p \end{eqnarray*}$$` --- # MTE to LATE Calculate the outcome given `\((x,z)\)` (actually `\(z\)` and `\(P(z)=p\)`). `$$\begin{align*} \delta^{L A T E}\left(x, P(z), P\left(z^{\prime}\right)\right)&=E\left(\delta^{MTE} | X=x, P\left(z^{\prime}\right) \leq u_s \leq P(z)\right) \end{align*}$$` LATE: Integrate over the compliers: `$$\begin{eqnarray*} LATE(x,z,z')= \frac{1}{P(z) - P(z')} \int_{P(z')}^{P(z)} \delta^{MTE}(p,x) \end{eqnarray*}$$` --- # MTE to OLS and IV This is harder... `$$\begin{align*} w^{I V}\left(u_{s}\right)=\left[E\left(P(z) | P(z)>u_{s}\right)-E(P(z))\right] \frac{E(P(z))}{\operatorname{Var}(P(z))}\\ w^{O L S}\left(u_{s}\right)=1+\frac{E\left(u_{1} | u_{s}\right) h_{1}-E\left(u_{0} | u_{s}\right) h_{0}}{\delta^{M T E}\left(u_{s}\right)}\\ h_{1}=\frac{E\left(P(z) | P(z)>u_{s}\right)}{ E(P(z))}, \quad h_{0}=\frac{E\left(P(z) | P(z)<u_{s}\right)}{E(P(z))} \end{align*}$$` --- # MTE in practice 1. Estimate `\(P(z)= Pr(D=1 | z)\)` nonparametrically ($z$ includes instruments and all exogenous `\(x\)` variables) 2. Flexible/nonparametric regression of `\(Y\)` on `\(x\)` and `\(P(z)\)` 3. Differentiate w.r.t. `\(P(z)\)` 4. Plot for all values of `\(P(z)=p\)` As long as `\(P(z)\)` covers `\((0,1)\)`, we can trace out the full distribution of `\(\delta^{MTE}(p)\)` --- # Example [Grennan et al (2021)](https://www.stern.nyu.edu/sites/default/files/assets/documents/GMSC_nofreelunch.pdf) - Idea: what is the effect of detailing in prescription drugs (statins) - IV: Restrictions on detailing from conflict-of-interest policies imposed by large Academic Medical Centers...spillover to other hospitals - Estimate MTE and examine welfare effects in structural framework