Green All Over: Data Mining

Wednesday, 9 June 2010

Data Mining

Three good comments on my early Elo observations.

1)

Your draw hit rate seems purely in line with expected % of draws. Fluctuations from League to league are obviously to be expected in the relative short-term and short-medium term.

Whatever your short-medium term results I don't see how you can expect such a model to be profitable in the long-term if you're not actually pricing up matches yourself and are simply going on the way you use your ELO model (seemingly with scant regard for team news etc.)

Overall this is somewhat true in that the strike rate is not hugely better than expected, but it is better and my focus was primarily on the Premier League, where, for whatever reason, the draw hit rate far exceeded expectations. 25.26% of Premier League games resulted in draws, yet the Elo ratings had a hit rate of 38.67%. The other leagues where the hit rate was significantly higher was the Scottish First Division, Ligue 1 and Bundesliga 1. I use the word 'significantly' not in the statistical meaning of the word, although they might well be. I need to refresh my statistics skills. Also note that only the Conference National was more than a percentage point below the expected percentage.

Whether the success in this league is down to the quality of the league, a statistical anomaly that will regress to the mean next season, or simply that higher, more spaced out, ratings perform better, I really don't know.

2)

Have you thought of fleshing out you data but using the historic odds on say football-data.co.uk instead of the fairly meaningless percentages?

Yes, but to do this retoactively will be a lengthy task that I doubt that I'll have time for. Also a lot of my plays were on the correct score markets for matches where a selection was expected to win by 2 or more goals, and I'm not sure the correct score prices are available anywhere.

3)

How do these strike rates compare to average prices you did(/could) have obtained?

Looking at the overall results, I think I was lucky that I concentrated on the Premier League, because the results here were very good. The draws I have already mentioned, and these were profitable. Also promising were the winners by 2 goals fared extremely well with 34.28% winning by the 2-0 or 3-1 scorelines. 26.67% of the 3 goal games finished 3-0, also profitable.

Because this was the first season for keeping these records, and the process of what data to collect evolved as the season progressed, I do not have all the data I need for a full analysis though. My Elo based live betting was non-existent after February when I lost the latest data, and the all-important odds are approximate at best. I also made the (in hindsight) mistake of including cup games between same division sides. Although I doubt they skewed the numbers too much, there's certainly that possibility.

There are other observations that could be significant. For example, in the Premier League, a league where 47% of matches finish Under 2.5 goals, my expected draws finished under 63.27% of the time. If not statistically significant, it's certainly something to watch for next season.

If Over 2.5 is your thing, back the away selections in Leagues One and Two - 63.63%. But not in Serie A where 58.62% of such selections finish Under 2.5.

Why do 21.41% of draw selections in the Premier League finish 0-0, but the favoured score in the Championship is 1-1 (18.45%), while in League One 0-0, 1-1 and 2-2 each occur (10.1% each).

I'll continue to pore over this data during the summer, and keep even better records next season. Given that one only needs a small edge in the football markets to do very well, I'm confident that these ratings are a good first step towards that aim.

5 comments:

Anonymous said...: re: point 1. You are consistently missing the point about short-term results.

Also, you have an overall sample which may, possibly, be approaching a significant size. However, by splitting it down into divisions you are guilty of creating many insignifcantly sized data samples.

The advice you give re: overs in different divisions also shows you to be guilty - unknowingly I believe - of data mining to suit your own needs.; 9 June 2010 at 09:25
vital statistix said...: @ Anonymous

Instead of constantly criticizing what Cassini is trying to do why not be a little constructive and offer suggestions to improve or better approaches to finding an edge

1) He is making money
2) He is offering food for thought to others

@ Cassini
I respect what you are doing with your trading and blog but most of all I am amazed you have the patience and energy to spend on this person; 9 June 2010 at 10:50
Anonymous said...: I guess he gives the anon posters a bit of his energy because sometimes you'll learn more from people questioning your approach than people kissing your arse, vital; 9 June 2010 at 11:37
Anonymous said...: Anonymous's post (anonymous 1 that is) makes some extremely valid points.; 9 June 2010 at 13:19
Curly said...: Cassini,

I didn't leave my name, in haste, the other day, apologies.

Moving on; I appreciate you taking the time to answer comments left but I do feel somewhat let down by the reply.

My question was number 3. Knowing 34.28% of matches predicted to win by 2 goals did so doesn't really tell us a lot about how successful it is. If the average price obtained was 3.0 after commission then it would be slightly profitable; 2.7 and it would be unprofitable.

I appreciate that you may not have complete figures but it would be interesting to know them if you do.

Best of luck; 9 June 2010 at 22:07