Thursday, 15 July 2010


Anthony Goldbloom of the data prediction website sent me this piece on the competition they ran for quants to predict the World Cup.

In the lead-up to the world cup, Kaggle invited statisticians and data miners to take on the big investment banks in predicting the outcome of the World Cup. Now that the final has been decided, we can take a look at how Kagglers stacked up against the quants at JP Morgan, Goldman Sachs, UBS and Danske Bank in forecasting the World Cup.

In total, 65 teams participated in the Take on the Quants challenge. JP Morgan finished 28th, Goldman Sachs 33rd, UBS 55th and Danske Bank 64th. The betting markets fared better, finishing 16th.

The winner of the competition was Thomas Mahony, an Australian economist. His approach relied on Elo ratings with an adjustment for home country/continent advantage. His strategy correctly tipped Spain to win, the Netherlands to finish second and Germany to finish in the top four. The investment banks all had their top picks bow out early (UBS, Goldman Sachs and Danske Bank picked Brazil and JP Morgan picked England), hurting their overall performance.

The next big question is whether Kagglers can also outperform the quants in forecasting financial markets (we won’t have to wait long to find out, as Kaggle is currently hosting a competition to predict stock price movements).
I did a little poking around and for those like me who have faith in Elo ratings, not only was it good to see that the competition winner used these, but also:
When statisticians entered Kaggle’s World Cup forecasting competition, they had the option to give a brief outline of their methods. A glance at these description tells us what ingredient statisticians think is most important in predicting the World Cup winner. The variable that appears in most statistical models isn’t FIFA ranking, betting prices or the aggregate salary of a team’s players. It is the Elo rating. 
With the new football season just around the corner, I have made the updates to my spreadsheet so that superiority will no longer be rounded to a whole number, but will be to 0.25 of a goal. I'll wait until September before using the ratings for real, to give the summer changes and promotions / relegations time to settle a little, and also I shall be away from my computer until the middle of the month anyway.


Anonymous said...

A thought:

When trumpeting ELO and the fact most of these statistical models use ELO, have a think:

If ELO is so widely used then the market-makers (vague term I know, not time to be more specific) will of course be aware of it.

If ELO is good then the market-makers will use it.

If that is the case then you have no edge from using ELO - you need a different approach.

Or, you need to use an ELO rating system which is superior to those used by others.

Typed this in a bit of a rush, may not quite make sense so feel free to pull apart.

John said...

Hi Cassini,

I've recently transferred my blog to my own host and would you mind updating the link so it shows along your sidebar when I update a post?