Thursday, 23 April 2020

Flaw in the Fifth Column

A couple of weeks ago, I mentioned that I had just started reading Ben Cohen's 'The Hot Hand'. I rather wish I hadn't, although the book is actually very readable.

The problem I have is with Chapter 7, Section 4, p223 where the author writes about two researchers Josh Miller and Adam Sanjuro flipping a coin and recording the outcomes of a toss 'after he gets a head'. 

They order their beers, read the napkin, check the phone, and study the Hs - the heads after heads. They are shocked to see that their intuitive sense of randomness has led them astray. They realize that the proportion of heads after heads on a coin flip is not equal to the odds of getting heads on any old coin flip.
At this point my brain was struggling, but it gets worse. They then decided that to prove this, they:
...don't have to flip a coin hundreds of times... you only have to do it three times. The short version of the math behind their breakthrough is simple enough to fit on a real bar napkin. Here is every possible outcome in a sequence of three coin flips: 
Now let's take each of those three-flip sequences and look at the flips after heads flips. What percentage would you expect to be heads? It feels like the answer should be 50% - another coin flip. But lets' average the results from the fifth column.
Before I get to the table with the fifth column, yes I would expect the answer to be 50%. I had to read the previous section again to see what I was missing, it's been a rough few days, but this seemed complete nonsense.

Next followed a table with five columns; the three-flip sequence, the number of Flips after a Head, the number of Heads on those Flips, Heads / Flips and the Percentage of Heads after Heads:
I included the two lines of text because I see a flaw here in the logic. Yes, 250% divided by the six rows with data is close to 42%, but this isn't how averages should be calculated. (Simpson's Paradox?) 

The average should be the sum the number of heads divided by the number of flips, i.e. the sum of the third column divided by the sum of the second column, which in this case is 4/8 which is 50%, which unless I am missing something (admittedly not the remotest of possibilities), is the expected outcome. 

The text continues:
Miller and Sanjuro found the proportion of success after a streak is less than the underlying probability of success. If you were to generate a short, finite sequence like this string of coin flips and randomly select one of the heads, then the probability that the next flip would be a heads is closer to 40% than 50%. This is so trippy that Miller and Sanjuro could hardly believe it. Their brains weren't biased. The statistic was. They double-checked and triple-checked and would have quadruple-checked and quintuple-checked if they weren't already sure their careers were about to change forever.
Well, I'm not so sure. Clearly in that example you can see 8 Heads with a subsequent flip, four of which are an H and four are a T. 

Hopefully someone else has read this book, or will read it at some point, and let me know if I am completely mad or if the referenced study indeed contains a rather basic mathematical flaw.

The flawed conclusion was that:
...if a basketball player was a 50% shooter, and he was still a 50% shooter when he was hot, this was evidence against the hot hand. In fact that 50% was evidence for the hot hand.  
I'm not seeing this at all. Streaks will occur whether tossing a coin, spinning a roulette wheel, throwing a die etc., but past events do not change the probability that certain events will occur in the future. To believe otherwise is surely the Gambler's Fallacy which:
occurs when an individual erroneously believes that a certain random event is less likely or more likely, given a previous event or a series of events.
Whether this remains true when you add a human element to it - such as with shooting a basketball - remains to be proven, but this isn't proof.  


Unknown said...

Yeah, that's totally invalid. Their error is to take a simple average of the percentages, thus giving equal weight to each of the 6 cases. They needed to take a weighted average of the percentages taking into account what proportion of the 'after heads' situations occurred in that case. So of 8 total 'after heads' situations, 1 (1/8 of the total) occurred in four of the cases and 2 (1/4 of the total) occurred in the last 2 cases. So the weighted average is

[1/8 x (0% + 0% + 100% + 0%)] + [1/4 x (50% + 100%)] = 12.5% + 37.5% = 50%

RedBallData said...

I agree it’s a badly written section in an otherwise fine book.

What I believe the writer is trying to get across is that even though the coin toss is always 50%, the chance of a streak continuing in a finite sequence of tosses appears to be less than 50%.

Even that only holds if you measure streaks in a flawed way “mean of each player’s percentage of heads after heads”, rather than the logical “percentage heads”.