Sitemap
TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Member-only story

DAGs and Control Variables

How to select control variables for causal inference using Directed Acyclic Graphs

7 min readApr 25, 2022

--

Press enter or click to view image in full size
All images by Author, generated with mermaid-js

When analyzing causal relationships, it is very hard to understand which variables to condition the analysis on, i.e. how to “split” the data so that we are comparing apples to apples. For example, if you want to understand the effect of having a tablet on a student’s performance, it makes sense to compare schools where students have similar socio-economic backgrounds. Otherwise, the risk is that only wealthier students can afford a tablet and, without controlling for it, we might attribute the effect to tablets instead of to the socio-economic background.

When the treatment of interest comes from a proper randomized experiment, we do not need to worry about conditioning on other variables. If tablets are distributed randomly across schools, and we have enough schools in the experiment, we do not have to worry about the socio-economic background of students. The only advantage of conditioning the analysis on some so-called “control variable” could be an increase in power. However, this is a different story.

In this post, we are going to have a brief introduction to Directed Acyclic Graphs and how they can be useful to select variables to condition a…

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.