A statistical rival for Anscombe’s quartet


Quantitative and statistical models are extremely useful in investing. But they are guides, not gospel.

Forgetting there are real consumers behind the sales numbers and real companies beneath the profit numbers is the first step to an investment model that is going to fail when you need it most. There is a reason you never see a bad investment backtest – because if a quant analyst gets a bad result they simply run the backtest again and again until they get a good one.

Anscombe’s quartet

For years I have kept a chart of Anscombe’s quartet as a reminder of how stats can mislead. Anscombe’s quartet are four different data series where all the summary statistics are close to identical:

  • The average x value is 9 for each dataset
  • The average y value is 7.50 for each dataset
  • The variance for x is 11 and the variance for y is 4.12 for each dataset
  • The correlation between x and y is 0.816 for each dataset
  • A linear regression (line of best fit) for each dataset follows the equation y = 0.5x + 3

So, at face value all four data sets are very similar. However, graph the data and the differences become obvious:


XKCD’s Curve Fitting Methods

Along the same lines, XKCD has a light-hearted look at curve fitting models that many an investment analyst would be wise to keep near their desk. With many data sets, trends are in the eye of the beholder:


Damien Klassen is Head of Investments at the Macrobusiness Fund, which is powered by Nucleus Wealth.

The information on this blog contains general information and does not take into account your personal objectives, financial situation or needs. Past performance is not an indication of future performance. Damien Klassen is an authorised representative of Nucleus Wealth Management, a Corporate Authorised Representative of Integrity Private Wealth Pty Ltd, AFSL 436298.