Monday, 1 January 2001

Elo Ratings In Football

Introduction

Often incorrectly written as ELO, Elo ratings actually take their name from the inventor, Arpad Elo, a Hungarian-born American physics professor and Chess player who invented the ratings method as a way of comparing the skill levels of players from his game. Its use has expanded, and has been adapted for several sports including American Football and basketball, but also in football, and it is their use here that is the focus for the rest of this article.

The Basics

The essence of Elo ratings is that each team has a rating. When comparing two teams, the team with the higher rating is considered to be stronger. The ratings are constantly changing, and are calculated based upon the results of matches. The winner of a match between two teams typically gains a certain number of points in their rating while the losing team loses the same amount. The number of points in the total pool thus remains the same. The number of points won or lost in a contest depends on the difference in the ratings of the teams, so a team will gain more points by beating a higher-rated team than by beating a lower-rated team.

Raw Elo suggests that both teams ‘risk' a certain percentage of their rating in each contest, with the winner gaining the total pot, i.e. their rating increases by the losing team’s ante. In the event of a draw, the pot is shared equally.

A Simple Example

A simple example shows how this works when two evenly matched teams meet, and both have 5% of their rating at risk. Arsenal and Chelsea both have a rating of 1000 so both teams risk 5%, i.e. 50 points, and the pot contains 100 points.

There are three possible outcomes.

1) Arsenal win, and the result of this is that Chelsea’s rating drops by 50 to 950, and Arsenal’s rating increases by 50 to 1050.

2) Chelsea win, and the result of this is that Arsenal’s rating drops by 50 to 950, and Chelsea’s rating increases by 50 to 1050.

3) The result is a draw. The pot is divided between the two teams, resulting in the ratings for both Arsenal and Chelsea remaining unchanged at 1000.

A Second Example

A second example shows how this works when the home side is stronger. Manchester City (with a rating of 1200) plays Aston Villa (with a rating of 1000). Again, both sides risk 5% (60 points and 50 points respectively), so the pot contains 110 points.

The three possible results and their effect of the ratings are:

1) Manchester City win, and the result of this is that Aston Villa’s rating drops by 50 to 950, while Manchester City’s rating increases by 50 to 1250.

2) Aston Villa win, in which case Manchester City lose their 60 points and their rating drops to 1140, while Aston Villa gain the 60 to improve their rating to 1060.

3) The result is a draw. The (60+50) 110 points in the pot are divided by two, resulting in Manchester City’s rating dropping by 5 points to 1195, and Aston Villa’s rating improving to 1005.

A Third Example

A third example shows how this works when the away side is stronger. Wigan Athletic (with a rating of 800) plays Manchester United (with a rating of 1000). Again, both sides risk 5% (40 points and 50 points respectively), so the pot contains 90 points.

The three possible results and their effect of the ratings are:

1) Wigan win. Their rating increases by 50 to 850, while Manchester United’s rating decreases by 50 to 950.

2) Manchester United win, in which case Wigan lose their 40 points and their rating drops to 760, while Manchester United gain the 40 to improve their rating to 1040.

3) The result is a draw. The (40+50) 90points in the pot are divided by two, resulting in Manchester United’s rating dropping by 5 points to 995, and Wigan’s rating improving to 805.

The table below summarises these combinations of pre-match ratings, match results, and updated ratings:

Some Issues

All very simple, but for football, it is much too simple. Anyone with a basic understanding of football can see a number of problems with the above examples. One obvious problem is that home advantage is not taken into account, so in a match between two evenly rated teams, in the event of a draw, the away side should be rewarded, and the home side penalised. In the ‘teams evenly rated’ example above, a draw for Chelsea at Arsenal is clearly a better result for them than it is for Arsenal, and it is illogical that both teams walk away at full-time with the same rating as when the match started.

In Part Two, I will look at some ways in which these problems can be remediated.

In Part One we explained the basic premise of Elo ratings, and illustrated how they are applied. Part two will offer some suggestions on how the principles of Elo can be enhanced to make our ratings more useful. It is important to understand that these are only suggestions. There are no hard and fast rules that dictate what these parameters should be. There is no right and no wrong, only what works and what doesn’t work.

We finished Part One with an example of two evenly rated teams, risking the same percentage of their ratings, and identified one major problem which is that an away draw is better than a home draw, and it is thus illogical for both teams to end the match with the same rating as they started.

The Punter’s Revenge: Adjusting For Home Field Advantage

One way to handle this is by having the home team risk a slightly higher percentage of their rating than the away team. Back in the early 1980s, two authors, Tony Drapkin and Richard Forsyth wrote a book called “The Punter’s Revenge: Computers In The World Of Gambling”, which was targeted at computer literate punters at a time when the personal computer was just becoming popular. One of the more memorable chapters was on rating football teams, and the author’s suggestion, after running trials, was to use 7% for the home team, and 5% for the away team. I’ve found no reason to diverge too far from these numbers.

If we re-visit the earlier examples from part one, using the 7% and 5% numbers, the results become:

When the teams are identically rated going in, after a drawn match, the away team gains slightly, the home team loses slightly, something that intuitively seems right. If you’re not happy with the adjustments that 7% and 5% give you, then there’s absolutely no reason not to tweak these, but I would caution against exceeding 10% or going below 3%. Changes in rating should be in modest increments, but at the same time, not too modest that it takes a season for a declining team’s rating to reflect its form.

Result Adjustment: Incorporating Margin Of Victory

Now to address the next problem – match results. Basic Elo doesn’t quantify wins. A win is a win, whether it is by one goal or by a dozen. Most readers will agree that this is an unsatisfactory state of affairs, and will make adjustments. One method is to increase the percentages that each team risks, but to award a certain percentage of the pool to the winners / losers varying depending on the margin of victory / defeat.

For example, Arsenal and Liverpool are both rated at 1000, and Arsenal are at home. The pot (or pool) contains 120 points, 70 from Arsenal, 50 from Liverpool. If the game finishes 6-0 to Arsenal, it’s reasonable to give all the points to them. My own preference is for a four-goal win or more to be sufficient to secure the entire pot. A three-goal win is pretty good, and earns most of the pool, whereas a two-goal win earns a little less, and a one-goal win the minimum. The following table is a suggestion.

Winning is worth at least 70% of the pot, with the margin of victory becoming less significant as it grows. Winning 6-0 rather than 5-0 is neither here nor there, but winning 1-0 rather than drawing 0-0 is much more significant – even though the difference between both pairs of scores is just one goal. You may want to consider a 1-2 defeat as a better result than a 0-1 defeat, but again, decisions such as these come down to personal preference. With all the time in the world, you might analyse goal times, and conclude that a 2-0 win decided in the 30th minute is a stronger win than a 2-0 win in which the second goal was scored on a breakaway in the 93rd minute with the vanquished team pressing hard for an equalizer. A fair conclusion in my opinion, and an example of how you can modify Elo to suit your own needs, and add flexibility based on the amount of time you have available.

Maintaining accurate ratings is time consuming, and in previous years I would attempt to maintain ratings for the Premier League, Football League and Conference as well as the Scottish Leagues. These days, I restrict my tracking to the top divisions of England, France, Germany, Italy and Spain, in part because there is a wealth of data readily available to input, and on the output side, there are many liquid markets available. It is also my opinion that in the lower leagues, ratings are not so stable. A modest amount of money goes a long way, as recently seen with Crawley Town and Fleetwood Town, and ratings can soon be out of date.

In Part Three, I will look at more ideas for maintaining accurate ratings.

In Part Two, I looked at one way in which Elo ratings could be improved by measuring the strength of a win based on winning margin. However, the low scoring nature of football means that the match result often does not reflect the performance of the teams.

We have all seen games where one team has dominated, only to lose 0-1 to a goal very much against the run of play. If you limit your input to this single figure, goals scored minus goals conceded, you risk entering less than accurate data into your ratings.

While it is true that Birmingham City did beat Chelsea 1-0 on November 20th, 2010 is it fair and reasonable to award 100% or 70% of the points available to them? You might think it is, and I would say that is your decision to make, but my take on it is to look behind the result and use some of the other data that is readily available these days.

When deciding what data I should include, my rule is that there is a correlation between the data and goals. For example, simple logic tells you that there is a relationship between shots, shots on target, and goals. 10 shots, of which 5 were on target, doesn’t necessarily mean that a team will score 2 goals, but for each league there are fairly consistent ratios which we can use.

Charles Reep: Incorporating Shots On Goal

Pioneering football statistician Charles Reep began his research in 1950 (at 3:50pm on 18 March while watching Swindon Town play Bristol Rovers to be precise) and discovered (among other things) "that over a number of seasons it appears that it takes 10 shots to get 1 goal (on average)".

This average will of course vary from season to season, by league and by team, but the important thing is that there is a correlation between shots, shots on target, and goals scored. A note here that some of this data has an element of subjectivity about it, and you will often see major differences in the statistics for the same game from individual observers.

Again, how much effort you want to put into this is a personal choice. Researching the leagues you are interested in will show there are differences, which you can incorporate if you wish, for example as of 2011, the EPL is more efficient at converting shots to goals than Serie A.

I would however caution against changing these parameters too frequently once you have determined reasonable values, with my preference being to use an average for the past three seasons. The soccerbythenumbers.com website often has some interesting articles on this subject, along the lines of this entry from January 2011:

"Recall that, over the long run, the goal to shot ratio tends to be around .111 - or 1 goal in 9 shots. Across the four big leagues, it's clear that Serie A has by far the lowest goal to shot ratio - that is, it takes Serie A teams systematically more shots to score goals than teams in the other leagues. So far this season, Serie A is at .085 or roughly 1 in 12 shots - a third lower than what is "normal" for the big leagues. In contrast, the other three leagues are around the historical average at .104 (La Liga), .117 (EPL), and .123 (Bundesliga). So spectators in the Bundesliga only have had to see their teams take 8 shots before scoring a goal, while those in Serie A have seen their teams take a full 50% more shots (12) before getting on the scoreboard."

Adding Meaning

This data is important because it allows you to enter more meaningful data into your calculations. Arsenal 2 Chelsea 1 is a start, but in my view, the data is made more valuable by entering the shots and shots-on-target figures also, so you now have for example Arsenal 2:5:12 Chelsea 1:8:19 - a set of numbers that might reasonably lead you to conclude that Chelsea were a little unlucky in that their goals scored were lower than might have been expected.

You can include other data too, although I have yet to see any evidence of correlation between free-kicks or yellow cards. Red cards can obviously be more significant, but you would want to factor in the amount of time remaining at the time of the dismissal. A headline of "10 man City see of United" might sound dramatic and sell newspapers, or draw clicks, but if the dismissal was in the 90th minute, it's a little misleading to say the least.

In Part Four, I'll look at corner kicks and whether this additional data should be included in your Elo based ratings.  

Dangerous Corner

I concluded Part Three with a discussion about what data can or should be included when adjusting a team's Elo ratings. It might seem logical and reasonable to include corner kicks, but perhaps surprisingly, the evidence shows that there is essentially no correlation between the number of corner kicks and goals scored. The English Premier League is actually the strongest, while Serie A and La Liga are the weakest.

This isn’t to say that corners do not lead to goals. One of the problems is that the readily available data is based on match totals, i.e. they do not reveal how many corners lead to a goal; only that over the course of a match, Team A had 2 goals and 12 corners. Both goals may have come from corners, but at this 'macro' level, there is no evidence that says there should be say one goal for every eight corners.

Having decided what data to include, we are no win a position to expand upon the simple table seen in Part Two, which looked like this:

Putting It All Together

By using more data than simply the round figure of goals, it is possible to 'more accurately' reflect the result of a game. I mentioned in Part Three the real-life example of Birmingham City beating Chelsea 1-0 on November 20th, 2011, and we will use this match as an example of how additional data can be incorporated.

Birmingham City had one shot, and they scored one goal. Chelsea had 24 shots, 9 were on target, yet none resulted in a goal. If we have done our analysis and concluded that from ten shots, you can expect one goal (on average), or that from three shots on target, one goal can be expected, you have tripled the amount of data you are entering, and this helps to smooth out any outlying data points.

It is at this point that a basic knowledge of spreadsheets will be useful, since the easiest way to automate these calculations is by creating LOOKUP tables.

Using the ratios for this example (Shots : Shots-On-Target : Goals) we have a result of 4:1:1 to 20:10:0. Dividing the first parameter by 10 (10 shots approximates to 1 goal), and the second by 3 (3 shots-on-goal approximates to 1 goal), we have in goal units 0.4:0.33:1 to 2:3.33:0. You can average (mean or median) these numbers out, or apply a weighting to them so that the match result becomes Birmingham City 0.58 Chelsea1.78. Any weightings or the choice of average is a personal preference. While Chelsea did not win the game, their overall performance based on these numbers suggests they were the better team, and my ratings would adjust in accord with a more accurate scale, for example:

Again, it is personal preference how granular you make these numbers. Breaking them down into 0.25 increments is one idea, but you can use any number. Once the factors are entered into your spreadsheet, and you have set the LOOKUPs correctly, they do not need to be maintained. Your spreadsheet can calculate your match result, e.g. 1.78 to 0.58 and update the Elo ratings accordingly.

Modified Results

At this point in the process, you might also want to consider weighting the ‘modified’ result based on the strength of the opposition. An implied score of 1.5 to 0.5 can reasonably be considered a more merit worthy achievement against Manchester City than against a struggling team.

Update the Elo ratings based on the Table A above, or your version of it, and you’re done. Most matches will see a small change in rating for both teams, some one-sided affairs may see a bigger shift, but the ratings, once established, ‘should’ reflect the strength of one team when compared with another.

Predictions

How do I use my ratings to make a fortune I hear you ask? One way is to expand your spreadsheet to incorporate a predictive feature. For predicting a future match, you would enter in the two team’s ratings, say 800 and 1000. Create a table with the same margins as there are in Table A, and this can easily be programmed to calculate the post-match Elo ratings for each team if the winning margin is 0, 0.25, 0.5 etc. Your spreadsheet can be coded to display the margin of victory which will keep the ratings as close to their pre-game position as possible. Note that you will also need to allow for the negative equivalents to cater for away wins, and the table above would also have values assigned for -0.25, -0.5 etc.

For example, Wigan Athletic are currently rated at 1257, Manchester United at 1549. If Wigan plays Manchester United at home, a margin of 0 would result in the ratings being unchanged (top right number 0.00) as in the picture below:

If we look at the reverse fixture between these teams using the same ratings, the spreadsheet shows the following:

The ‘expected’ result in this example is that Manchester United will win by 1.5 goals.

If the modified result entered is 3.21 to 0.91, (e.g. United win 3:1 and these numbers are modified for shots and other criteria) the picture below shows how the ratings would change. Manchester United would gain 9 points, and Wigan Athletic would lose the same 9 points. United's win by a margin of 2.3 exceeded expectations, so they are duly rewarded, but winning does not always boost a team's ratings.

By entering in all the upcoming fixtures, your spreadsheet will give you a starting point before you bet. Whether your preference is to focus on the matches expected to be draws, or to look for value on the Asian Handicaps based on your computations, is up to you. I tend to focus on the draws, matches where the predicted result is 0 to 0.5, but that’s just my preference as I consider the draw price to be somewhat ignored, and thus be more likely to offer value.

Caution

I mentioned that this prediction is a starting point. You should always be aware of the relevance of the match to both teams – early and late season can be treacherous, and if you use the ratings in domestic cup games that some teams may take less seriously than others, the spreadsheet won’t help you.

This also raises the question of whether you should include Cup matches in your ratings or not. My preference is to not use domestic cup matches, but I do use inter-league Champions League games or Europa League games to adjust my ratings, e.g. AC Milan v Chelsea.

For anyone interested in my starting point for these ratings, I used the 2008-09 season standings, and used UEFAs coefficient to make the English league stronger than the French league for example. After three years of maturity, the top four clubs in sequence are Barcelona, Real Madrid, Manchester City and Manchester United. The weakest is Ligue 1’s Espérance Sportive Troyes Aube Champagne – a.k.a. Troyes.

Conclusion

This concludes the series on Elo ratings, and once again, I would like to make it clear that many of the parameters I use area personal preference and can be adjusted in any way you wish. The process described above is quite possibly unique to me, as it is a combination of ideas and thoughts collected over more years than I care to remember. It is not intended to be a ‘copy and paste’ answer for you; the purpose of these articles has been to show you how one person’s thought process works, and perhaps prompt you to have some ideas of your own.

There is another component to the spreadsheet which is the use of the ‘modified result’ mentioned above as input to a Poisson calculation from which you can estimate the probability of every result, and thus all the Over / Under, Match Odds and Correct Score markets, and that will be the subject of a future article.

While I have tried to make this series as clear and as easy to understand as possible, it is not impossible that I have assumed some knowledge or understanding that I should not have, so if anyone has any questions on the above, please comment or send me an email, and I will try to respond.

No comments: