Chapter 4 Instrumental Variables

Let us now try using the randomized promotion method to evaluate the impact of the Health Insurance Subsidy Program (HISP). Assume that the ministry of health makes an executive decision that the health insurance subsidy should be made available immediately to any household that wants to enroll. You note that this is a different scenario than the randomized assignment case we have considered so far. However, you know that realistically this national scale-up will be incremental over time, so you reach an agreement to try and accelerate enrollment in a random subset of villages through a promotion campaign. In a random subsample of villages (indicated by promotion_locality), you undertake an intensive promotion effort that includes communication and social marketing aimed at increasing awareness of HISP. The promotion activities are carefully designed to avoid content that may inadvertently encourage changes in other health-related behaviors, since this would invalidate the promotion as an instrumental variable (IV). Instead, the promotion concentrates exclusively on boosting enrollment in HISP.

What was the effect of the promotion campaign upon enrollment?

Note you should use the variable enrolled_rp for this question

m_enroll <- lm_robust(enrolled_rp ~ promotion_locality,
                      clusters = locality_identifier,
                      data = df %>% filter(round == 1))

htmlreg(m_enroll, doctype = FALSE,
        custom.coef.map = list(`promotion_locality` = "Promotion Locality",
                               `(Intercept)` = "Intercept"),
        caption = "Randomized Promotion Comparison of Enrollment Rate in HISP",
        custom.model.names = "Enrollment Rate")
Randomized Promotion Comparison of Enrollment Rate in HISP
  Enrollment Rate
Promotion Locality 0.41*
  [0.34; 0.48]
Intercept 0.08*
  [0.04; 0.13]
R2 0.20
Adj. R2 0.20
Num. obs. 9914
RMSE 0.41
N Clusters 200
* 0 outside the confidence interval.

After two years of promotion and program implementation, you find that 0.49 percent of households in villages that were randomly assigned to the promotion have enrolled in the program, while only 0.08 percent of households in non-promoted villages have enrolled.

Because the promoted and nonpromoted villages were assigned at random, you know that the average characteristics of the two groups should be the same in the absence of the promotion. You can verify that assumption by comparing the baseline health expenditures (as well as any other characteristics) of the two populations.

Compare baseline healthcare expenditures based upon assignment to promotion.

m_base_health <- lm_robust(health_expenditures ~ promotion_locality,
                           clusters = locality_identifier,
                           data = df %>% filter(round == 0)
                           )
htmlreg(m_base_health, doctype = FALSE,
        custom.coef.map = list(`promotion_locality` = "Promotion Locality",
                               `(Intercept)` = "Intercept"),
        caption = "Randomized Promotion Comparison of Mean Household Expenditures at Baseline",
        custom.model.names = "Household Expenditures at Baseline")
Randomized Promotion Comparison of Mean Household Expenditures at Baseline
  Household Expenditures at Baseline
Promotion Locality -0.05
  [-0.62; 0.51]
Intercept 17.24*
  [16.76; 17.71]
R2 0.00
Adj. R2 -0.00
Num. obs. 9913
RMSE 5.59
N Clusters 200
* 0 outside the confidence interval.

Estimate the difference in health expenditures by assignment to promotion, in the post-treatment period

m_post_health <- lm_robust(health_expenditures ~ promotion_locality,
                           clusters = locality_identifier,
                           data = df %>% filter(round == 1)
                           )
htmlreg(m_post_health, doctype = FALSE,
        custom.coef.map = list(`promotion_locality` = "Promotion Locality",
                               `(Intercept)` = "Intercept"),
        caption = "Randomized Promotion Comparison of Mean Household Expenditures at Follow-Up",
        custom.model.names = "Household Expenditures at Follow-Up")
Randomized Promotion Comparison of Mean Household Expenditures at Follow-Up
  Household Expenditures at Follow-Up
Promotion Locality -3.87*
  [-5.14; -2.61]
Intercept 18.85*
  [17.87; 19.82]
R2 0.03
Adj. R2 0.03
Num. obs. 9914
RMSE 11.73
N Clusters 200
* 0 outside the confidence interval.

Using this health expenditure estimate and the estimated proportion of “compliers”, estimate the LATE/CACE

After two years of program implementation, you observe that the average health expenditure in the promoted villages is 14.97 USD, compared with 18.85 USD in nonpromoted areas (a difference of -3.87 USD). However, because the only difference between the promoted and nonpromoted villages is that enrollment in the program is higher in the promoted villages (thanks to the promotion), this difference of -3.87 USD in health expenditures must be due to the additional 0.41 percent of households that enrolled in the promoted villages because of the promotion. Therefore, we need to adjust the difference in health expenditures to be able to find the impact of the program on the Enroll-if-promoted. To do this, we divide the intention-to-treat estimate— that is, the straight difference between the promoted and nonpromoted groups—by the percentage of Enroll-if-promoted: -3.87 / 0.41 = -9.50.

Your colleague, an econometrician who suggests using the randomized promotion as an IV, then estimates the impact of the program through a two-stage least-squares procedure.

Conduct this estimation with and without covariate adjustment. Interpret.

m_cace <- iv_robust(health_expenditures ~ enrolled_rp |
                      promotion_locality,
                    clusters = locality_identifier,
                    data = df %>% filter(round == 1))
m_cace_wcov <- iv_robust(health_expenditures ~ enrolled_rp + 
                           age_hh + age_sp + educ_hh + educ_sp + 
                           female_hh + indigenous + hhsize + dirtfloor + 
                           bathroom + land + hospital_distance | 
                           promotion_locality + 
                           age_hh + age_sp + educ_hh + educ_sp + 
                           female_hh + indigenous + hhsize + dirtfloor + 
                           bathroom + land + hospital_distance ,
                           clusters = locality_identifier,
                           data = df %>% filter(round == 1))
htmlreg(list(m_cace, m_cace_wcov), doctype = FALSE,
        custom.coef.map = list(`enrolled_rp` = "Enrollment",
                               `(Intercept)` = "Intercept"),
        custom.model.names = c("No Covariate Adjustment", "With Covariate Adjustment"),
        caption = "Evaluating HISP: Randomized Promotion as an Instrumental Variable")
Evaluating HISP: Randomized Promotion as an Instrumental Variable
  No Covariate Adjustment With Covariate Adjustment
Enrollment -9.50* -9.74*
  [-11.76; -7.24] [-11.63; -7.86]
Intercept 19.65* 29.17*
  [ 18.70; 20.59] [ 27.49; 30.85]
R2 0.22 0.41
Adj. R2 0.22 0.40
Num. obs. 9914 9914
RMSE 10.49 9.17
N Clusters 200 200
* Null hypothesis value outside the confidence interval.

What are the key conditions required to accept the results from the randomized promotion evaluation of HISP?

There are three basic assumptions required to accept the result from the randomized promotion evaluation of HISP. First, the promoted and nonpromoted villages have the same characteristics before the HISP. This assumption holds because of the randomized assignment of promotion at the village level, and can be verified by comparing the baseline data from both groups. Second, the promotion is effective in encouraging households to enroll in the HISP. This assumption can be verified if the promoted villages have substantially higher enrollments in HISP than nonpromoted villages. Third, we assume the promotion itself does not directly affect health expenditures. This assumption usually can not be verified but is informed by theory and experience.

Based on these results, should HISP be scaled up nationally?

Based strictly on the estimate from the multivariate linear regression, the HISP should not be scaled up nationally because it decreased health expenditures by 9.74 USD, which is less than the government-determined threshold level of 10USD. However, the 9.74USD estimate is very close to 10 USD. In statistical terms, it is not statistically different from 10 USD. Therefore, you might still argue that the HISP should be expanded nationally.