Correlation, causation and association studies

Correlation analysis is a quick-and-easy, first-line technique for data exploration in large data sets - but when searching for novel causal relationships, it's only the initial hypothesis generator, before moving to more sophisticated statistical or domain-specific analyses, or targeted collection of more data.

If you ever need a quick and easy demonstration why, have a look at Tyler Vigen's excellent website, Spurious Correlations.  In the last nine years there is a correlation of 0.97 between number of people who died by becoming tangled in their bedsheets in the US, and the total revenue generated by US skiing facilities.

Another example is the negative correlation of -0.93 between honey bees and convictions for cannabis possession:

The last word goes to the excellent xkcd:



Share