Saturday, 5 April 2014

Zero-Inflation Revisited

I had a couple of questions about Elo based ratings, one I can no longer find but fortunately I recall the gist of it, and the second was on my clarification about how I handled the zero-inflation problem in this recent post.

To address the second one first, the comment from Alkalmazasok was this:

Hi Cassini! I'm a bit confused here... You say you track the goal expectancy against the actual number of goals scored. The zero-inflation comes in when you compute the probabilities for different number of goals (with the help of the goal exp.). So how does the comparison of goal expectancies and the actual results say anything about the zero-inflation?
Zero-inflation is a problem with the Poisson Distribution when applied to football - essentially it under estimates the likelihood of a team scoring zero goals and to a lesser extent, of it scoring one goal. The input into the formula is the number of goals you expect a team to score, a number that will not usually be an integer, but something like 1.67341. When you plug this number into Poisson, it tells you what percentage of the time the time the team will score 0 goals, one goal, two goals etc. The numbers for zero and one are too low, and need to be 'inflated' with a manual adjustment.

Cassini Top 10
My solution to this, and there are most likely better ways out there, was to experiment a little and come up with an inflation formula that appeared to make the correct, or close to correct adjustments. In simple numbers, if the goal expectancy is 1.5, Poisson tells me that 22.3% of the time, the team will not score, and 33.5% of the time they will score one goal. If over a period of time my goal expectancy of 1.5 is on average correct, but the team scores zero goals 30% of the time, and one goal 36% of the time, then I can adjust my inflation accordingly. If my average is 1.5 and over a large number of matches teams are scoring 3 goals, that's a problem with the expected number of goals.

Club Elo Top 10
The first question was how do I determine the starting ratings for each team, and again, this is a best guess, although one good resource is to use the numbers already calculated by a site such as Once you start applying your own bespoke adjustments to them, they will settle down soon enough. I started my own ratings about five or six years ago now, using league points along with the EA coefficients for each league as a starting point, and the rankings are not hugely dissimilar to those at ClubElo although my scale appears to be a little more stretched out. Paris St Germain suffer because Ligue 1 is rated the lowest league, but as points also transfer between clubs as a result of Champions League and Europa League matches, a run in the Champions League will help them. Since the ratings are used for domestic league games, how Bayern Munich compare to say Manchester City is of academic interest only anyway.


Hejik S said...

There seems to have been much debate about Poisson on various forums of late and the danger would appear to be the perfectly natural beginners obsession with building the perfect model.

Such a thing does not exist as you'll well be aware and 'best guess' is a reasonably fair assessment of what professional bettors are doing.

As long as your 'guess' is extremely well educated you have half a chance at the books who are also, apparently surprisingly to some, employing 'best guess' pricing models, not psychic witches with crystal balls.

For what it's worth, I think it's important that we question how efficiently Poisson fits to football and observation would suggest it's far from perfect.

However, even without that information it's easy enough to arrive at the same conclusion when we consider that some of the biggest bookmakers in the world are actively promoting it as a method of beating them. I'm not sure they'd be so inclined to do that were it even near as effective as they suggest.

Danny Murphy said...

Cardiff 0 Crystal Palace 3 was a poisson buster alright!