Machine Learning and Causal Inference

Here are Jupyter Notebooks scripts, my fellows Eljaer Eusebio, Andre Tapia, Luis Sandoval and I replicated for our course of Machine Learning and Causal Inference.

Replication of “Estimating Treatment Effects with Causal Forests: An Application” - Athey S. & Wagner S.

Here I replicate the results of the article ‘Estimating Treatment Effects with Causal Forests: An Application’ by Athey and Wager.

Script available for R here

Causal Trees

We used an abridged version of a public dataset from the General Social Survey (GSS) (Smith, 2016). The setting is a randomized control trial. Individuals were asked about their thoughts on government spending on the social safety net. The treatment is the wording of the question: about half of the individuals were asked if they thought government spends too much on “welfare” $(W_i = 1)$, while the remaining half was asked about “assistance to the poor” $(W_i = 0)$. The outcome is binary, with $Y_i = 1$ corresponding to a positive answer. We applicated honest causal tree estimation using the Pennsylvania re-employment bonus experiment data.

Script available for R here

Bootstrapping using unemployment databases

The bootstrap can be used to estimate the standard errors of the coefficients from a linear regression fit, or a confidence interval for that coefficient. The power of the bootstrap lies in the fact that it can be easily applied to a wide range of statistical learning methods. In this replication, we use the Pennsylvania re-employment bonus experiment in order to compare treatment group 4 and the control group, for this purpose, we compute the standard errors of 1,000 bootstrap estimates for the $T4$, $female$ and $black$ variables.