Tuesday, 24 May 2011

I-O, Compare, Win


Imagine a game played with two coins. One head / one tail is a home win. Two heads is a draw, and two tails is an away win. To price up a match precisely would be rather easy, but when it comes to football, it’s a little trickier.

As I mentioned a couple of posts ago, mano said in a comment:

You mentioned in an earlier post that you have configured your Elo spreadsheet so that it can give you a goal expectation for each team in a forthcoming game. I maintain Elo ratings myself for a few leagues, and was wondering how you managed this. I don't expect you necessarily to reveal your workings and endanger your edge, but would you be able at all to point me at least in the right direction?
When I first started rating teams, I thought I would be able to plug in the two numbers and have a magic formula spit out some prices for me, but that didn't work.

What did prove to be more successful was taking the next step towards calculating the result needed to maintain a team's rating, and this is the strategy used to generate the Strong Draw picks that have proved so successful, but when it comes to generating prices, it finally dawned on me that the starting point has to be the number of goals each team is expected to score.

Once you have these two numbers, it is relatively easy to use Excel and the POISSON function to generate the probability that each team will score exactly 0, 1, 2, 3… goals. And from there it’s simple enough to calculate the probability of any correct score, match odds or over / under outcome. If both teams have a probability of 0.25 of scoring zero goals, then we can price up the 0-0 score at 16.0. Repeat this exercise for all the Under 2.5 results, add them together, and you have a price for that market. Add up all the home results and you have the Match Odds home price and so on.

It’s easier said than done of course, because you first have to come up with the number of goals a team might be expected to score. I use a team's previous games, both home and away, with any 'extreme' results smoothed out.

As might be expected, recent results carry the most weight, but it's more than just the result goes into generating this number. I look at the shots, shots-on-goal, and corners numbers too, as well as the relative strength of the opponents in each game. This is where the Elo ratings come in. Since each team has a rating, a ‘performance’ worth 1.5 versus a highly rated team might be worth 2.0 versus a bottom of the table team. A goal at Manchester United is given more value than a goal at home to Blackpool, and a 4-0 win is significantly devalued if the team ‘lost’ on shots, shots-on-goal and corners.

The end result of all these inputs is one number per team per game – the estimated number of goals that team will score against their opposition. Again, this number will vary depending on the strength of the opposition for obvious reasons.

The numbers are also time-sensitive because ratings are constantly changing,

In the same way that the ‘average’ UK family used to have an impossible 2.4 kids (now down to an equally impossible 1.9 I believe), the probable number of goals a team will score is unlikely to be an integer, but we can still use this fractional number as input to the POISSON function.

From Wikipedia
The first model predicting outcomes of football matches between teams with different skills was proposed by Maher in 1982. According to his model, the goals, which the opponents score during the game, are drawn from the Poisson distribution.
While the Poisson distribution arguably doesn’t work well for ‘binary’ scores (0-0, 0-1, 1-0 and 1-1), by factoring in more than just goals scored, the model seems to work reasonably enough. Besides, I remember Poisson from my Pure Mathematics With Statistics 'A' Level, whereas my recollection, knowledge and understanding of Binomial distribution are pretty much non-existent. No, that's a lie. They are non-existent.

Early observations are that my draw prices do tend to come in on the high side, possibly the result of my model underestimating these ‘binary’ scores, which contain two of the most common draw results, but a couple of statisticians, Mark J. Dixon and Stuart Coles, (authors of the 1997 paper Association Football Scores and Inefficiencies in the Football Betting Market), invented a correlation factor to compensate for these low scores, and I may have to reinvent something similar.

The maths can get complicated, but if I am finding value in the 0-0, either backing or laying, in every game, then clearly an adjustment is needed, and I have no qualms about tweaking.

I’ve been saying that this spreadsheet is a work in progress for a while now, but I do feel that there is finally light at the end of the tunnel. Of course, it could be a train, and while 31 games at the end of a season is a very small sample, a profit on the Match Odds markets of 14.43 points from 26 markets where value was identified, is a good start.

The Under / Over 2.5 selections were a little more, well, selective, generating 5.75 points from 14 qualifying matches.

Roll on 2011-12.

1 comment:

mano said...

Many thanks for this Cassini. It has given me some food for thought.

I have written a simple program, in Ruby, that, when given the goal expectancy for each team, spits out fair prices for the markets I like to get involved in. Of course, it is only as good as the inputs. Rubbish in, rubbish out, as the old saying goes.

I'm now working on making my goal expectancy estimations as reasonable as possible. I think your approach of coupling performance (goals + shots etc) with strength of opposition, derived from Elo ratings, is a sensible one.

Estimating goal expectancy appears to be the dark art. I guess the P/L sheet will be the best indicator of how good I am at it.