Thursday, 27 March 2014

Zero-Inflation

As the carnage continues unabated in the FTL table, (I'll update the numbers after tonight's matches complete the midweek schedule), I thought I'd address the topic of zero-inflation, a subject I mentioned in a post I wrote for Betting Expert back in 2012, and which a few people have asked me about since that time, including one recent request for help on Twitter. In my post, I wrote:
The intricacies of the Poisson distribution need not be fully understood for you to make use of it, because Microsoft’s Excel has a built-in Poisson function. Before we look at Poisson in action, it is important to know that several studies have found that the probability of draws is underestimated by Poisson.
The reason for this is that the probability of zero, and to a lesser extent 1, is under-estimated by Poisson, which is why an adjustment needs to be made to the output. A search of the Internet for ‘zero-inflation’ – a fancy term for increasing the probability of zero – will reveal a number of studies, and some ideas on how to apply this.
The zero-inflation in football is a challenge, and my attempts to solve the problem involved a certain amount of trial and error since there is, to my knowledge at least, no set formula for football models. Basically, I made an initial 'best guess' and then tracked my expected (predicted) scores against the actual scores. It's not a quick fix, but over a period of time (you probably need at least a season), you can see if your calculated goal expectancies are reasonably close to the actual results.

If they are not, then you adjust the inflation parameter accordingly. I track my goal expectancies in 0.1 increments so over time, an expectation of say 1 goals should see an average close to 1 goals. Some bands will be off by more than others, for example my 0.7 band currently averages 0.702 which is excellent, but my 0.9 band averages 0.995 which is not so good (off by 10.5%). If your expectancies are all lower or higher, then you need to make an adjustment, but if some are higher and some lower, the difference is probably noise, and the extreme expectancies will have a relatively low sample size.

Whatever numbers you inflate the zero and one by, a deflation needs to take place on the other outcomes to compensate since the combined probabilities of all possible goal totals cannot be greater than 1.0. The two goals bears the brunt of the deflation in my model, with less on the three and less again on the four, five and so on.

1 comment:

alkalmazasok.blogspot.com said...

Hi Cassini! I'm a bit confused here... You say you track the goal expectancy against the actual number of goals scored. The zero-inflation comes in when you compute the probabilities for different number of goals (with the help of the goal exp.).
So how does the comparison of goal expectancies and the actual results say anything about the zero-inflation?