Chapter 6 Difference-in-Differences
Difference-in-differences can be used to evaluate our Health Insurance Subsidy Program (HISP). In this scenario, you have two rounds of data on two groups of households: one group that enrolled in the program, and another that did not. Remembering the case of the enrolled and non- enrolled groups, you realize that you cannot simply compare the average health expenditures of the two groups because of selection bias. Because you have data for two periods for each household in the sample, you can use those data to solve some of these challenges by comparing the change in health expenditures for the two groups, assuming that the change in the health expenditures of the non-enrolled group reflects what would have happened to the expenditures of the enrolled group in the absence of the program. Note that it does not matter which way you calculate the double difference.
Next, you estimate the effect using regression analysis. Using a simple linear regression to compute the simple difference-in- differences estimate, you find that the program reduced household health expenditures by US$8.16. You then refine your analysis by adding additional control variables. In other words, you use a multivariate linear regression that takes into account a host of other factors, and you find the same reduction in household health expenditures.
out_did <- lm_robust(health_expenditures ~ round * enrolled,
data = df %>% filter(treatment_locality == 1),
clusters = locality_identifier)
out_did_wcov <- lm_robust(health_expenditures ~ round * enrolled +
age_hh + age_sp + educ_hh + educ_sp +
female_hh + indigenous + hhsize + dirtfloor +
bathroom + land + hospital_distance,
data = df %>% filter(treatment_locality == 1),
clusters = locality_identifier)
htmlreg(list(out_did, out_did_wcov), doctype = FALSE,
custom.coef.map = list('enrolled' = "Enrollment",
'round' = "Round",
'round:enrolled' = "Enrollment X Round"),
caption = "Evaluating HISP: Difference-in-Differences with Regression",
caption.above = TRUE,
custom.model.names = c("No Covariate Adjustment", "With Covariate Adjustment"))
No Covariate Adjustment | With Covariate Adjustment | |
---|---|---|
Enrollment | -6.30* | -1.51* |
[-6.69; -5.91] | [-1.77; -1.25] | |
Round | 1.51* | 1.45* |
[ 0.79; 2.24] | [ 0.73; 2.17] | |
Enrollment X Round | -8.16* | -8.16* |
[-8.81; -7.52] | [-8.81; -7.52] | |
R2 | 0.34 | 0.55 |
Adj. R2 | 0.34 | 0.55 |
Num. obs. | 9919 | 9919 |
RMSE | 7.91 | 6.54 |
N Clusters | 100 | 100 |
* Null hypothesis value outside the confidence interval. |
What are the basic assumptions required to accept this result from difference-in-differences?
To accept this result, we assume that there are no differential time varying factors between the two groups other than the program. We assume that the treatment and comparison groups would have equal trends or changes in outcomes in the absence of treatment. While this assumption can’t be tested in the postintervention period, we can compare trends before the intervention starts.
Based on the result from difference-in-differences, should HISP be scaled up nationally?
No, based on this result, the HISP should not be scaled up nationally because it has decreased health expenditures by less than the $10 threshold level. Taking the estimated impact under random assignment as the “true” impact of the program suggests that the difference in difference estimate may be biased. In fact, in this case, using the nonenrolled households as a comparison group does not accurately represent the counterfactual trend in health expenditures.