Sunday 16 June 2019

Flipping Variables

I mentioned recently the importance of having a logical reason why a system should work, rather than using 'data mining' to find a back-fitted system which has no predictive value at all.

For example, yesterday's post shared an idea for a system which takes advantage of the idea that the issue of time zones in the NBA may not be fully understood by the public, and as hopefully everyone understands by now, these situations offer an opportunity, at least in the short-term.

Contrast the solid logic and rationale of this idea with my tongue in cheek example of:

Back the Home Favourite in an American League game when the pitcher lost last time out, the visiting team won their last game but gave up two runs or more in the third, the temperature was 73F or hotter, and the game is being played on a Thursday afternoon.
One of the forums I frequent regularly has such 'systems' recommended by contributors, but sadly most have them have multiple conditions applied. For most of these, there is simply no logic or rationale as to why this condition might lead to a market inefficiency. Fortunately, there is the occasional nugget that does makes sense, but these tend to be rare.

There was a voice of reason who jumped in on one 'idea' as it spiraled out of control, and explained the problem quite neatly, and while I have taken the liberty of correcting errors of spelling, punctuation and grammar to improve readability, the gist of the comment remains valid.

I need to speak out in terms of experience here before someone gets hurt.
The system included here has little likelihood of future success, and here is why. My experience, which is plenty, has shown me that when you back-fit a situation with so many conditions (at least 12 here), you're no longer predictive in terms of the future, but creating the most perfect flow chart of the past.
Most of these systems are built on finding something with a modest ROI, and then experimenting with variables, meaningful, or meaningless, adding only those that make the situation appear better.
If you experiment with a lot of variables, then you will "stumble" into those that make a certain situation look better, but in the process, you "contaminate" future predictability.
What you are left with is thinking and believing you have found the holy grail of sports betting, only to be fooled by a false premise of profitability. I guarantee you over the next 100 games, the win rate will be closer to a 0% ROI than to the back-fitted ROI.
We are putting faith in a system that is built upon 12-15 conditions, many of which are used, not out of known predictive advantages, but out of anything that improves the win rate. By definition you will find something that is better than the one condition variable.
Think of this. You flipped a coin 200 times, and without adding any filters your results were 100 heads and 100 tails, no predictability of future flips.
You video taped every flip. Now you go back and analyse what happened. You notice that when you flipped with your left hand, heads came up 55 times out of 100, while when you flipped with your right hand, heads came up just 45.
So you add the variable, if flipped with left hand, 55% of the time you get heads!
Further looks see that if you placed the coin in your left hand from your right hand you got 30 heads and 20 tails, but when you picked the coin up instead without any use of your right hand, you got 25 heads and 25 tails. So you're now up to a way to get 60% heads!
Next, you noticed if you paused for more than 10 seconds after placing the coin in your left hand from your right, you got 17 heads and just 8 tails. You now have a situation that generates 68% heads, just by flipping the coin from your left hand after placing it there from your right hand, and waiting at least 10 seconds before you flipped it.
So my question is:
If you then flipped the coin 300 more times, doing everything the way you did to get 68%, what percentage of heads is expected from the 300 flips?
The answer is 50%!
The variables used above were selected not based on anything that is predictive, but based on anything that made the system look better!
THAT IS THE PROBLEM!
The heads scenario here was built the same way, none of the added conditions are predictive in any way!
The moral of the story is this:
Build a concept on known +EV variables, not a system built to make +EV concepts.
What do I mean by meaningful variables? When you do a search in a given sport with one variable, and it shows an advantage, then you have found a meaningful variable.
Build a pile of meaningful variables for a given sport, then stack the meaningful variables, to get meaningful situations.
Hope that helps everyone here.
I should add that it is quite possible to identify a variable that appears meaningless but for which it later turns out there was a valid reason all along. A Tweet from A Lucky A Day referenced the 15th a few weeks ago.
Some readers may be familiar with the apparent illogical pattern in the 15th round of Sumo Wrestling competitions mentioned in Freakonomics where, in certain contests, the supposedly 'weaker' wrestler was winning 80% of the contents.
In a sumo tournament, all wrestlers in the top division compete in 15 matches and face demotion if they do not win at least eight of them.
The sumo community is very close-knit, and the wrestlers at the top levels tend to know each other well. The authors looked at the final match, and considered the case of a wrestler with seven wins, seven losses, and one fight to go, fighting against an 8–6 wrestler.
Statistically, the 7–7 wrestler should have a slightly below even chance, since the 8–6 wrestler is slightly better. However, the 7–7 wrestler actually wins around 80% of the time. Levitt uses this statistic and other data gleaned from sumo wrestling matches, along with the effect that allegations of corruption have on match results, to conclude that those who already have 8 wins collude with those who are 7–7 and let them win, since they have already secured their position for the following tournament.
There was an underlying reason for this apparently illogical finding, even if the reason wasn't understood until later, so it's important not to dismiss every impacting variable as meaningless. 

No comments: