Wednesday, 9 June 2010

Quality Or Quantity


It is suggested that I should not be splitting my data into divisions because that makes me guilty of creating many insignificantly sized data samples.

I take the point, but as Talkbet observed and as I have made reference to before myself, each division and league has its own 'personality'. When doing analysis on these results, it doesn't logically make sense to me to lump the Premier League findings in with say, the amateur and part-time teams from the Scottish Third Division.

I fully expect there to be differences. I would be concerned if there weren't in fact.

League One averaged 2.22 goals a game last year - Bundesliga 1 averaged 2.83.

39% of games in Scottish Second Division are won by the home team - 50.79% are home wins in the Premier League.

Away wins: 24% in La Liga, 35% in the Scottish Second Division.

Ligue 1 and Scottish Third Division - 57% of matches finish Under. 55% Over in the Conference National and Bundesliga 1.

If we simply total up all the stats and work on averages alone, yes, we have a large collection of data, but it's my opinion that we miss potential nuggets.

After another season or two, each division will have data samples that are significantly sized, but until then, to me it makes more sense to use the limited data with caution than to throw everything in together and miss the opportunities that the individual personalities of the leagues provide.

Anonymous then goes on to say:

The advice you give re: overs in different divisions also shows you to be guilty - unknowingly I believe - of data mining to suit your own needs.
Let no one reading this consider for a moment that I am giving any advice. I am simply pointing out some preliminary observations that I found worthy of note.

Thanks for the comment 'vital statistix'. Offering food for thought is one of the aims of this blog. As for Anonymous - as critical as he may be, and sometimes the criticism is positive, what is most illuminating is that he keeps returning to this blog and not only reading it, but also taking the time to post. A form of flattery, and he does make for good posting material.

1 comment:

Anonymous said...

"39% of games in Scottish Second Division are won by the home team - 50.79% are home wins in the Premier League"

Are won? Were won last season? Were won over the last few season?

Short-term / long-term again my friend.

"Ligue 1 and Scottish Third Division - 57% of matches finish Under. 55% Over in the Conference National and Bundesliga 1."

Another sweeping statement. Presumably based on one season?

These "individual personalities" of different leagues cannot be assumed to be correct on the evidence of such a small sample size. It is certainly correct that the Bundesliga is a high scoring league but that is a fact which can't be proved over one season.