Linear regression


This illustration shows three different data sets in different colors that all give the same linear regression function. The linear regression function is shown as a red line. The illustration shows that in spite of very different characteristics, data sets can map to the same single linear regression.

Strictly speaking, many statisticians would consider the linear regression model inapropriate for data that do not meet certain criterion including approximate linearity in the correlation of two continous variables. Here only the blue data would be truly appropriate to use a linear regression model on. 

The blue data are best represented as a true first degree polynomial and therefore match the linear regression line. The yellow data are best represented by a polynomial of a degree between zero and one. The pink data show almost no variance.

The idea that radically different data sets can have many of the same statistical characteristics including the same linear regression was elegantly illustrated by the statistician Francis John Anascombe in a quartet of data sets and graphs now called Anascombe’s quartet.