Adventures with Correlation and Causation

One of the first things that I learned for this project was that correlation does not imply causation. While it is easy to be critical of misrepresentations of causation, it is much trickier to apply the concept myself! This week, I was composing a research proposal and struggling to design my experiment so that it tests causation. My first iterations would have only revealed correlations. After working with a research professor to redesign my proposed experiment, I added a qualitative test to determine the effect of the independent variable on the dependent variable. This change would hopefully show causation if it existed. My experience taught me what a slippery concept causation is!

To improve my understanding, I revisited one of the books that our whole team read to grow in our data literacy. Naked Statistics by Charles Wheelan covers basic statistics with real-world examples. Wheelan offers a clear explanation of the difference between correlation and causation:

…a positive or negative association between two variables does not necessarily mean that a change in one of the variables is causing a change in the other. For example, I alluded earlier to a likely positive correlation between a student’s SAT scores and the number of televisions that his family owns. This does not mean that overeager parents can boost their children’s test scores by buying an extra five televisions for the house. Nor does it likely mean that watching lots of television is good for academic achievement.

The most logical explanation for such a correlation would be that highly educated parents can afford a lot of televisions and tend to have children who test better than average. Both the televisions and the test scores are likely caused by a third variable, which is parental education. I can’t prove the correlation between TVs in the home and SAT scores. (The College Board does not provide such data.) However, I can prove that students in wealthy families have higher mean SAT scores than students in less wealthy families. (p. 63)

This illuminating passage helped me grasp the distinction between correlation and causation. Televisions do not cause higher test scores but are correlated with them. Digging deeper reveals other variables — parental education and family wealth — that do affect test scores.

From learning how to apply these concepts and going back to a resource, I now have a much deeper understanding of correlation and causation!

Source: Wheelan, Charles. 2014. Naked Statistics: Stripping the Dread from the Data. New York: W.W. Norton.

Image: “Family watching television 1958” by Evert F. Baumgardner on Wikimedia Commons. Public Domain.