Member-only story
CAUSAL DATA SCIENCE
DAGs and Control Variables
How to select control variables for causal inference using Directed Acyclic Graphs
When analyzing causal relationships, it is very hard to understand which variables to condition the analysis on, i.e. how to “split” the data so that we are comparing apples to apples. For example, if you want to understand the effect of having a tablet on a student’s performance, it makes sense to compare schools where students have similar socio-economic backgrounds. Otherwise, the risk is that only wealthier students can afford a tablet and, without controlling for it, we might attribute the effect to tablets instead of to the socio-economic background.
When the treatment of interest comes from a proper randomized experiment, we do not need to worry about conditioning on other variables. If tablets are distributed randomly across schools, and we have enough schools in the experiment, we do not have to worry about the socio-economic background of students. The only advantage of conditioning the analysis on some so-called “control variable” could be an increase in power. However, this is a different story.
In this post, we are going to have a brief introduction to Directed Acyclic Graphs and how they can be useful to select variables to condition a…

