A statistical rival for Anscombe’s quartet

Quantitative and statistical models are extremely useful in investing. But they are guides, not gospel.

Forgetting there are real consumers behind the sales numbers and real companies beneath the profit numbers is the first step to an investment model that is going to fail when you need it most. There is a reason you never see a bad investment backtest – because if a quant analyst gets a bad result they simply run the backtest again and again until they get a good one.

Anscombe’s quartet

For years I have kept a chart of Anscombe’s quartet as a reminder of how stats can mislead. Anscombe’s quartet are four different data series where all the summary statistics are close to identical:

  • The average x value is 9 for each dataset
  • The average y value is 7.50 for each dataset
  • The variance for x is 11 and the variance for y is 4.12 for each dataset
  • The correlation between x and y is 0.816 for each dataset
  • A linear regression (line of best fit) for each dataset follows the equation y = 0.5x + 3

So, at face value all four data sets are very similar. However, graph the data and the differences become obvious:

XKCD’s Curve Fitting Methods

Along the same lines, XKCD has a light-hearted look at curve fitting models that many an investment analyst would be wise to keep near their desk. With many data sets, trends are in the eye of the beholder:

Curve-Fitting

 

 

Damien Klassen is Head of Investments at the Macrobusiness Fund, which is powered by Nucleus Wealth.

The information on this blog contains general information and does not take into account your personal objectives, financial situation or needs. Past performance is not an indication of future performance. Damien Klassen is an authorised representative of Nucleus Wealth Management, a Corporate Authorised Representative of Integrity Private Wealth Pty Ltd, AFSL 436298.

Follow me

Damien Klassen

Damien has a wealth of experience across international equities (Schroders), asset allocation (Wilson HTM) and he helped create one of Australia’s largest independent research firms, Aegis Equities. He lectured for over a decade at the Securities Institute, Finsia and Kaplan and spent many of those years as the external Chair for the subject of Industrial Equity Analysis.
Follow me

Latest posts by Damien Klassen (see all)

Comments

  1. “Essentially all models are wrong, but some are useful.” George Box – statistical scientist.

    One of the great deficiencies of statistical models used in economics and finance is the lack of a control and an inability to run an actual experiment in most cases. Trend analysis may become self-fulfilling i.e. if the stock trend is ‘up” that’s that path we assume it will follow – so we reinforce it by buying. If most believe that house prices always go up – they do. If they think they are crashing, they will.

    Statistical models are most useful in real science if they can predict the future from past data. Overwhelmingly I believe that sentiment and group psychology are far more important in economics. If statistics worked so well there would be no need for the bullshit industries of marketing, advertising and public relations and governments would manage economies using data analysis. That industry spends billions on bullshit tells you everything you need to know. Greed is fuelled by hubris until the Minsky moment arrives. Only retrospectively will the model be fitted and the market declared to be ‘right’.

  2. The Traveling Wilbur

    LOL. Keep publishing stuff like this and I’ll be out of a job with busted myths!

    I’m stunned there’s no ’18 months from now’ fit though…

    Maybe the y-axis isn’t long enough?

  3. I suspect you’re stuck in a bit of a millennial era time warp if even caring about the nature of the curve is part of your modelling method.
    Time to upgrade your methods and break open a book or two on Statistical modelling.
    Ideas you’ll need to understand include:
    Particle filtering
    Stochastic modelling
    Kalman filters (but whatever you do don’t forget the chapter on Extended and Unscented Kalman filters)
    Understand what others understand about VAR and most importantly why they’re probably wrong

    Finally understand the limits of any modelling methodology and the reason why you should simply sit out some of the dances.
    Hint: good models tell you, when it’s time to pack up shop and take the family on a world cruise (as much as your clients bitch, they’ll thank you in the end)

    • there are gunna be lots of disappointed greenies
      between gladstone and rockhampton there were end to end coal trains going south
      and endless mt coal trains going north.
      I have never seen the line so busy.