Saturday, 21 November 2009

Benford Revisited

I have written before about Benford's Law, and how it can be applied to identify faked results, and this article by Leighton Vaughan Williams on the Betfair blog from five days ago is worth a read.

http://betting.betfair.com/specials/politics-betting/prediction-markets/the-betfair-prof/the-betfair-prof-how-benfords-law-might-help-you-beat-the-od-161109.html#comments

In a fascinating article published in the New York Times, a certain Malcolm Browne relates how Dr. Theodore Hill would ask his maths students to go home and either toss a coin 200 times and record the results, or else pretend that they had done so. Either way, he would ask them to produce for him the results of their (real or imaginary) coin-tossing experiment.

Dr. Hill's purpose in this experiment was to show just how difficult it is to fake data convincingly. It just isn't that easy to make up a random sequence. Based on this knowledge, he would astound his students by unerringly picking out the fakers from the tossers!

One of the ways he would do this would be to spot how many times heads or tails would be listed six or more times in a row. In real life, this occurrence is overwhelmingly probable in 200 coin throws. To most of his students this long a sequence is counter-intuitive, an example of what is often termed the Gamblers' Fallacy, i.e. the erroneous perception that independent random sequences will balance out over time, so that for example an extended sequence of heads is more likely to be followed by a tail than a head. The fakers, susceptible to the Fallacy, are thus easily exposed. Ordinary people, even mathematics students, simply can't help introducing patterns into what is random noise.

This is an example of a broader analysis which is usually referred to a Benford's Law, which essentially states that if we randomly select a number from a table of real-life data, the probability that the first digit will be one particular number is significantly different to it being a different number. For example, the probability that the first digit will be a '1' is about 30%, rather than the intuitive 10%, which assumes that all digits are equally likely.

The empirical support for this proportion can be traced to the man after whom the Law is named, physicist Dr. Frank Benford, in a paper he published in 1938, called 'The Law of Anomalous Numbers'. In that paper he examined 20,229 sets of numbers, as diverse as baseball statistics, the areas of rivers, numbers in magazine articles and so forth, confirming the 30% rule for number 1. For information, the chance of throwing up a '2' as first digit is 17.6%, and of a '9' just 4.6%. The same principle applies to trailing (i.e. last) digits. It's a great way, therefore, of checking the veracity of receipts. If, for example, there is an unusual number of trailing digit '7's, there's a decent chance that the figures are cooked. Tax authorities are alert to this.

So randomness is not so random, it seems. Applying the analysis to some recent real-world events, Nate Silver, founder of www.fivethirtyeight.com, polling guru and all-round statistical wizard, examined the results of the recent Iranian election, declaring them "probably forged." More recently still, he's turned his attention to one of the US's high-profile pollsters. The results of his analysis of their findings and his interpretation make fascinating reading.

It's important that survey results are genuine, as they are integral factors in driving the markets, and in particular the election markets. If you are able to differentiate the genuine from the phony, it may thus give you the edge in beating the odds. Let's call it Benford's bonus!

No comments: