class: center, middle, inverse, title-slide .title[ # Panel Data and Fixed Effects ] .subtitle[ ##
] .author[ ### Ian McCarthy | Emory University ] .date[ ### Econ 771, Fall 2022 ] --- class: inverse, center, middle name: panel <!-- Adjust some CSS code for font size and maintain R code font size --> <style type="text/css"> .remark-slide-content { font-size: 30px; padding: 1em 2em 1em 2em; } .remark-code { font-size: 15px; } .remark-inline-code { font-size: 20px; } </style> <!-- Set R options for how code chunks are displayed and load packages --> # Understanding Panel Data <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1055px></html> --- # Nature of the Data - Repeated observations of the same units over time (balanced vs unbalanced) - Identification due to variation **within unit** -- **Notation** - Unit `\(i=1,...,N\)` over several periods `\(t=1,...,T\)`, which we denote `\(y_{it}\)` - Treatment status `\(D_{it}\)` - Regression model, <br> `\(y_{it} = \delta D_{it} + \gamma_{i} + \gamma_{t} + \epsilon_{it}\)` for `\(t=1,...,T\)` and `\(i=1,...,N\)` --- # Benefits of Panel Data - *May* overcome certain forms of omitted variable bias - Allows for unobserved but time-invariant factor, `\(\gamma_{i}\)`, that affects both treatment and outcomes -- **Still assumes** - No time-varying confounders - Past outcomes do not directly affect current outcomes - Past outcomes do not affect treatment (reverse causality) --- # Some textbook settings - Unobserved "ability" when studying schooling and wages - Unobserved "quality" when studying physicians or hospitals --- class: inverse, center, middle name: panelreg # Panel Data and Regression <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1055px></html> --- # Fixed effects and regression `\(y_{it} = \delta D_{it} + \gamma_{i} + \gamma_{t} + \epsilon_{it}\)` for `\(t=1,...,T\)` and `\(i=1,...,N\)` -- - Allows correlation between `\(\gamma_{i}\)` and `\(D_{it}\)` - Physically estimate `\(\gamma_{i}\)` in some cases via set of dummy variables - More generally, "remove" `\(\gamma_{i}\)` via: - "within" estimator - first-difference estimator --- # Within Estimator `\(y_{it} = \delta D_{it} + \gamma_{i} + \gamma_{t} + \epsilon_{it}\)` for `\(t=1,...,T\)` and `\(i=1,...,N\)` -- - Most common approach (default in most statistical software) - Equivalent to demeaned model,<br> `$$y_{it} - \bar{y}_{i} = \delta (D_{it} - \bar{D}_{i}) + (\gamma_{i} - \bar{\gamma}_{i}) + (\gamma_{t} - \bar{\gamma}_{t}) + (\epsilon_{it} - \bar{\epsilon}_{i})$$` - `\(\gamma_{i} - \bar{\gamma}_{i} = 0\)` since `\(\gamma_{i}\)` is time-invariant - Requires *strict exogeneity* assumption (error is uncorrelated with `\(D_{it}\)` for all time periods) --- # First-difference `\(y_{it} = \delta D_{it} + \gamma_{i} + \gamma_{t} + \epsilon_{it}\)` for `\(t=1,...,T\)` and `\(i=1,...,N\)` -- - Instead of subtracting the mean, subtract the prior period values<br> `\(y_{it} - y_{i,t-1} = \delta(D_{it} - D_{i,t-1}) + (\gamma_{i} - \gamma_{i}) + (\gamma_{t} - \gamma_{t-1}) + (\epsilon_{it} - \epsilon_{i,t-1})\)` - Requires exogeneity of `\(\epsilon_{it}\)` and `\(D_{it}\)` only for time `\(t\)` and `\(t-1\)` (weaker assumption than within estimator) - Sometimes useful to estimate both FE and FD just as a check --- # Keep in mind... - Discussion only applies to linear case or very specific nonlinear models - Fixed effects at lower "levels" accommodate fixed effects at higher levels (e.g., FEs for hospital combine to form FEs for zip code, etc.) - Fixed effects can't solve reverse causality - Fixed effects don't address unobserved, time-varying confounders - Can't estimate effects on time-invariant variables - May "absorb" a lot of the variation for variables that don't change much over time --- class: inverse, center, middle name: irl # Panel Data and Fixed Effects IRL <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1055px></html> --- # Within Estimator (Default) in practice .pull-left[ **Stata**<br> ```stata ssc install causaldata causaldata gapminder.dta, use clear download gen lgdp_pc=log(gdppercap) tsset country year xtreg lifeExp lgdp_pc, fe ``` ] .pull-right[ **R**<br> ```r library(fixest) library(causaldata) reg.dat <- causaldata::gapminder %>% mutate(lgdp_pc=log(gdpPercap)) feols(lifeExp~lgdp_pc | country, data=reg.dat) ``` ] --- # Within Estimator (Default) in practice <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:center;"> Default FE </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Log GDP per Capita </td> <td style="text-align:center;"> 9.769 </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:center;"> (0.702) </td> </tr> </tbody> </table> --- # Within Estimator (Manually Demean) in practice .pull-left[ **Stata**<br> ```stata causaldata gapminder.dta, use clear download gen lgdp_pc=log(gdppercap) foreach x of varlist lifeExp lgdp_pc { egen mean_`x'=mean(`x') egen demean_`x'=`x'-mean_`x' } reg demean_lifeExp demean_lgdp_pc ``` ] .pull-right[ **R**<br> ```r library(causaldata) reg.dat <- causaldata::gapminder %>% mutate(lgdp_pc=log(gdpPercap)) %>% group_by(country) %>% mutate(demean_lifeexp=lifeExp - mean(lifeExp, na.rm=TRUE), demean_gdp=lgdp_pc - mean(lgdp_pc, na.rm=TRUE)) lm(demean_lifeexp~ 0 + demean_gdp, data=reg.dat) ``` ] --- # Within Estimator (Manually Demean) in practice <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:center;"> Default FE </th> <th style="text-align:center;"> Manual FE </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Log GDP per Capita </td> <td style="text-align:center;"> 9.769 </td> <td style="text-align:center;"> 9.769 </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:center;"> (0.702) </td> <td style="text-align:center;"> (0.701) </td> </tr> </tbody> </table> **Note:** `feols` defaults to clustering at level of FE, `lm` requires our input --- # First differencing (default) in practice .pull-left[ **Stata**<br> ```stata causaldata gapminder.dta, use clear download gen lgdp_pc=log(gdppercap) reg d.lifeExp d.lgdp_pc, noconstant ``` ] .pull-right[ **R**<br> ```r library(plm) reg.dat <- causaldata::gapminder %>% mutate(lgdp_pc=log(gdpPercap)) plm(lifeExp ~ 0 + lgdp_pc, model="fd", individual="country", index=c("country","year"), data=reg.dat) ``` ] --- # First differencing (manual) in practice <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:center;"> Default FE </th> <th style="text-align:center;"> Manual FE </th> <th style="text-align:center;"> Default FD </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Log GDP per Capita </td> <td style="text-align:center;"> 9.769 </td> <td style="text-align:center;"> 9.769 </td> <td style="text-align:center;"> 5.290 </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:center;"> (0.702) </td> <td style="text-align:center;"> (0.284) </td> <td style="text-align:center;"> (0.291) </td> </tr> </tbody> </table> --- # First differencing (manual) in practice .pull-left[ **Stata**<br> ```stata causaldata gapminder.dta, use clear download gen lgdp_pc=log(gdppercap) reg d.lifeExp d.lgdp_pc, noconstant ``` ] .pull-right[ **R**<br> ```r reg.dat <- causaldata::gapminder %>% mutate(lgdp_pc=log(gdpPercap)) %>% group_by(country) %>% arrange(country, year) %>% mutate(fd_lifeexp=lifeExp - lag(lifeExp), lgdp_pc=lgdp_pc - lag(lgdp_pc)) %>% na.omit() lm(fd_lifeexp~ 0 + lgdp_pc , data=reg.dat) ``` ] --- # First differencing (manual) in practice <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:center;"> Default FE </th> <th style="text-align:center;"> Manual FE </th> <th style="text-align:center;"> Default FD </th> <th style="text-align:center;"> Manual FD </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Log GDP per Capita </td> <td style="text-align:center;"> 9.769 </td> <td style="text-align:center;"> 9.769 </td> <td style="text-align:center;"> 5.290 </td> <td style="text-align:center;"> 5.290 </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:center;"> (0.702) </td> <td style="text-align:center;"> (0.284) </td> <td style="text-align:center;"> (0.291) </td> <td style="text-align:center;"> (0.291) </td> </tr> </tbody> </table> --- # FE and FD with same time period <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:center;"> Default FE </th> <th style="text-align:center;"> Default FD </th> <th style="text-align:center;"> Manual FD </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Log GDP per Capita </td> <td style="text-align:center;"> 8.929 </td> <td style="text-align:center;"> 5.290 </td> <td style="text-align:center;"> 5.290 </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:center;"> (0.741) </td> <td style="text-align:center;"> (0.291) </td> <td style="text-align:center;"> (0.291) </td> </tr> </tbody> </table> Don't want to read too much into this, but... - Likely strong serial correlation in this case (almost certainly) - Mispecified model