Sunday, 19 June 2011

Moneyball Comes To Football


From the Financial Times this weekend comes an interesting piece by Simon Kuper on the increase in the use of statistics in football. A Football Revolution.
I recently visited Manchester City’s tranquil training ground in the village of Carrington. It was a glorious sunny morning, and outside the gates hired hands were washing footballers’ SUVs and sports cars. The defender Kolo Touré coasted past in a giant black contraption straight out of The Godfather. Carrington is used to cars like that: Manchester United train in the village too.

“Abu Dhabi Travellers Welcome”, said the message on the façade of City’s sky-blue training centre. Abu Dhabi’s ruling family owns Manchester City, and one thing it has done since buying the club is hire a large team of data analysts. Inside the building I found Gavin Fleig, City’s head of performance analysis, a polite sandy-haired man in a neat black City sweater. Hardly anyone outside Carrington has heard of him, and yet Fleig is a prime mover in English football’s data revolution. Largely unseen by public and media, data on players have begun driving clubs’ decisions – particularly decisions about which players to buy and sell. At many clubs, obscure statisticians in back-rooms will help shape this summer’s transfer market.

Fleig gave me the sort of professional presentation you’d expect from a “quant” in an investment bank. Lately, to his excitement, City had acquired stats on every player in the Premier League. Imagine, said Fleig, that you were thinking of signing an attacking midfielder. You wanted someone with a pass completion rate of 80 per cent, who had played a good number of games. Fleig typed the two criteria into his laptop. Portraits of the handful of men in the Premier League who met them flashed up on a screen. A couple were obvious: Arsenal’s Cesc Fàbregas and Liverpool’s Steven Gerrard. You didn’t need data to know they were good. But beside them was a more surprising face: Newcastle’s Kevin Nolan. The numbers wouldn’t immediately spur you to sign him. But they might prompt you to take a closer look.

In recent years, after many false starts, the number-crunchers at big English clubs have begun to unearth the player stats that truly matter. For instance, said Fleig, “The top four teams consistently have a higher percentage of pass completion in the final third of the pitch. Since the recruitment of Carlos Tévez, David Silva, Adam Johnson and Yaya Touré to our football team, in the last six months alone, our ability to keep the ball in the final third has grown by 7.7 per cent.”

That stat had not necessarily driven their recruitment, Fleig cautioned. Indeed, there are probably clubs that lean far more on stats than Manchester City do. I recently toured several actors in football’s data revolution, and was struck by how far it had progressed. “We’ve somewhere around 32 million data points over 12,000, 13,000 games now,” Mike Forde, Chelsea’s performance director, told me one morning in February in the empty stands of Stamford Bridge. Football is becoming clever.

Probably ever since the personal computer arrived, a few pioneers in football have tried to use data to judge players. Among the first was Arsenal’s future manager, Arsène Wenger, an economics graduate and keen mathematician. In the late 1980s, as manager of Monaco, Wenger used a computer program called Top Score, developed by a friend. A less likely pioneer was the late, great vodka-sodden Ukrainian manager Valeri Lobanovski. When I visited Kiev in 1992, Lobanovski’s pet scientist, Professor Anatoly Zelentsov, had me play the computer games that Dynamo Kiev had developed to test players. When Lobanovski said things like, “A team that commits errors in no more than 15 to 18 per cent of its actions is unbeatable,” he wasn’t guessing. Zelentsov’s team had run the numbers.

But the broader breakthrough came in 1996, after the Opta Index company began collecting “match data” from the English Premier League, explains the German author Christoph Biermann in Die Fussball-Matrix, the pioneering book on football and data. For the first time, clubs knew how many kilometres each player ran per match, and how many tackles and passes he made. Other data companies entered the market. Some football managers began to look at the stats. In August 2001 Manchester United’s manager Alex Ferguson suddenly sold his defender Jaap Stam to Lazio Roma. The move surprised everyone. Some thought Ferguson was punishing the Dutchman for a silly autobiography he had just published. In truth, although Ferguson didn’t say this publicly, the sale was prompted partly by match data. Studying the numbers, Ferguson had spotted that Stam was tackling less often than before. He presumed the defender, then 29, was declining. So he sold him.

As Ferguson later admitted, this was a mistake. Like many football men in the early days of match data, the manager had studied the wrong numbers. Stam wasn’t in decline at all: he would go on to have several excellent years in Italy. Still, the sale was a milestone in football history: a transfer driven largely by stats.

At Arsenal, Wenger embraced the new match data. He has said that the morning after a game he’s like a junkie who needs his fix: he reaches for the spreadsheets. In about 2002 he began substituting his forward Dennis Bergkamp late in matches. Bergkamp would go to Wenger to complain. “Then he’d produce the stats,” Bergkamp later recalled. “‘Look Dennis, after 70 minutes you began running less. And your speed declined.’ Wenger is a football professor.”

Few would suspect it of West Ham’s new manager “Big Sam” Allardyce, and yet his somewhat neolithic appearance also conceals a professorial mind. As a player, Allardyce spent a year with Tampa Bay, Florida, where he grew fascinated with the way American sports used science and data. In 1999 he became manager of little Bolton. Unable to afford the best players, he hired good statisticians instead. They unearthed one particular stat that enchanted Allardyce. “The average game, the ball changes hands 400 times,” recites Chelsea’s Forde, who got his start in football under Allardyce. “Big Sam” would drum it into his players. To him, it summed up the importance of switching instantly to defensive positions the moment the ball was lost.

More concretely, stats led Allardyce to a source of cheap goals: corners, throw-ins and free kicks. Fleig, another Allardyce alumnus, recalled that Bolton would score 45 to 50 per cent of their goals from such “set-pieces”, compared with a league average of about a third. Fleig said, “We would be looking at, ‘If a defender cleared the ball from a long throw, where would the ball land? Well, this is the area it most commonly lands. Right, well that’s where we’ll put our man.’”

In 2003, football’s data revolution got a new impetus from across the Atlantic. Michael Lewis published his seminal baseball book Moneyball, and some people in English football read it and sat up. Moneyball recounts how the Oakland A’s general manager Billy Beane used new stats to value baseball players. Aided by data, the little A’s briefly punched far above their weight until bigger clubs began hiring statisticians too. The Boston Red Sox, owned by John Henry, himself a “numbers guy” who had made his fortune trading commodities, won two world series using “Moneyball” methods.

This February I visited Beane at the Oakland Coliseum. We spoke in what looked like the junk room, but is in fact the dingy clubhouse where the A’s players change. Beane – soon to be portrayed by Brad Pitt in the movie Moneyball – was keen to talk about the data revolution in soccer. Like many Americans this last decade, Beane has embraced the European game with the almost unhealthy fervour of the convert. He can often be found sprawled on a dilapidated sofa in the clubhouse watching European soccer matches.

He believes that just as baseball has turned into “more of a science”, soccer will too. Beane said, “If somebody’s right 30 per cent of the time using gut feel, and you can find a way to be right 35 per cent, you create a 5 per cent arbitrage, and in sports that can make the difference between winning and losing.” If using numbers gives you an edge, then everyone will end up having to do it, Beane thinks.

Mike Forde, who had studied in Beane’s hometown of San Diego and followed American sports, made the pilgrimage to Oakland to quiz Beane about the uses of data. That proved tricky: Beane spent the first few hours of the conversation quizzing Forde about soccer. “In the last half an hour I managed to turn it around to talk about his role in baseball,” laughs Forde. He became friends with Beane, as did the Frenchman Damien Comolli, a former assistant of Wenger’s. In 2005, Comolli became director of football at Tottenham and began using data there.

Comolli’s three years at Spurs encapsulated many of the early struggles of the data revolution. British football had always been suspicious of educated people. The typical football manager was an ex-player who had left school at 16 and ruled his club like an autocrat. He relied on “gut”, not numbers. He wasn’t about to obey a spreadsheet-wielding Frenchman who had never played professionally himself. Comolli was always having to fight “nerds versus jocks” battles. With hindsight, he unearthed some excellent players for Spurs: Luka Modric, Dimitar Berbatov, Heurelho Gomes and the 17-year-old Gareth Bale. Yet eventually Comolli was forced out.

There was one question the nerds kept having to answer. Yes, the traditionalists would say, stats may well be useful in a stop-start game like baseball. The pitcher pitches, the batter hits, and that event provides oodles of clear data for nerds to crunch. But surely football is too fluid a game to measure?

Forde responds: “Well, I think it’s a really genuine question. It’s one that we ask ourselves all the time.” However, the nerds can answer it. For a start, good mathematicians can handle complex systems. At Chelsea, for instance, one of Forde’s statisticians has a past in insurance modelling. Football – a game of 22 men played on a limited field with set rules – is not of unparalleled complexity.

Second, in recent years the fluid game of basketball has found excellent uses for data. Beane says: “If it can be done there, it can be done on the soccer field.” And third, a third of all goals in football don’t come from fluid situations at all. They come from corners, free kicks, penalties and throw-ins – stop-start set-pieces that you can analyse much like a pitch in baseball.

The new nerds could point to so many obvious irrationalities in football, especially in the transfer market, so many areas where smart clubs could clean up. For instance: goalkeepers have longer careers than forwards, yet earn less and command much lower transfer fees. Clubs often sign large players but actually tend to use the smaller ones, having belatedly realised that they have overvalued size. And few clubs have asked themselves even basic questions such as: do they earn more points when certain players are on the field?

Given that you can hire perhaps 30 statisticians for the £1.5m that the average footballer in the Premier League earns each year, you’d think it might be worth paying some nerds to study these questions. Nonetheless, to some degree football’s suspicion of numbers persists. “Letting even a top-level statistician loose with a more traditional football manager is not really the right combination,” Forde once told me. He himself looks like a football man: trim, greying, regional accent, nice suit. That helps him sell numbers to old-style football men. But, in many clubs, the nerds are only slowly gaining power. Probably every club in the Premier League now employs analysts, but some of these people get locked in computer-filled back-rooms and never meet the manager.

That’s why the data revolution was led by clubs where the manager himself trusted numbers. Arsenal and Allardyce’s Bolton began to value players in much the way that financial investors value cattle futures. Take Bolton’s purchase of the 34-year-old central midfielder Gary Speed in 2004. On paper, Speed looked too old. But Bolton, said Fleig, “was able to look at his physical data, to compare it against young players in his position at the time who were at the top of the game, the Steven Gerrards, the Frank Lampards. For a 34-year-old to be consistently having the same levels of physical output as those players, and showing no decline over the previous two seasons, was a contributing factor to say: ‘You know what, this isn’t going to be a huge concern.’” Speed played for Bolton until he was 38.

Football’s shrewdest number-crunchers have always understood that data can only support a decision about a player. They cannot determine it. Biermann tells the story of how Wenger in 2004 was looking for an heir to Arsenal’s all-action midfielder Patrick Vieira. Wenger wanted a player who could cover lots of ground. He scanned the data from different European leagues and spotted an unknown teenager at Olympique Marseille named Mathieu Flamini, who was running 14km a game. Alone, that stat wasn’t enough. Did Flamini run in the right direction? Could he play football? Wenger went to look, established that he could, and signed him for peanuts. Flamini prospered at Arsenal before joining Milan to earn even more.

Conversely, the clubs that stuck with “gut” rather than numbers began to suffer. In 2003, Real Madrid sold Claude Makélélé to Chelsea for £17m. It seemed a big fee for an unobtrusive 30-year-old defensive midfielder. “We will not miss Makélélé,” said Madrid’s president Florentino Pérez. “His technique is average, he lacks the speed and skill to take the ball past opponents, and 90 per cent of his distribution either goes backwards or sideways. He wasn’t a header of the ball and he rarely passed the ball more than three metres. Younger players will cause Makélélé to be forgotten.”
Pérez’s critique wasn’t totally wrong, and yet Madrid had made a terrible error. Makélélé would have five excellent years at Chelsea. There’s now even a position in football named after him: the “Makélélé role”. If only Real had studied the numbers, they might have spotted what made him unique. Forde explained: “Most players are very active when they’re aimed towards the opposition’s goal, in terms of high-intensity activity. Few players are strong going the other way. If you look at Claude, 84 per cent of the time he did high-intensity work, it was when the opposition had the ball, which was twice as much as anyone else on the team.”

If you watched the game, you could miss Makélélé. If you looked at the data, there he was. Similarly, if you looked at Manchester City’s Yaya Touré, with his languid running style, you might think he was slow. If you looked at the numbers, you’d see that he wasn’t. Beane says, “What stats allow you to do is not take things at face value. The idea that I trust my eyes more than the stats, I don’t buy that because I’ve seen magicians pull rabbits out of hats and I just know that rabbit’s not in there.”

Yet by the mid-2000s, the numbers men in football were becoming uneasily aware that many of the stats they had been trusting for years were useless. In any industry, people use the data they have. The data companies had initially calculated passes, tackles and kilometres per player, and so the clubs had used these numbers to judge players. However, it was becoming clear that these raw stats – which now get beamed up on TV during big games – mean little. Forde remembers the early hunt for meaning in the data on kilometres. “Can we find a correlation between total distance covered and winning? And the answer was invariably no.”

Tackles seemed a poor indicator too. There was the awkward issue of the great Italian defender Paolo Maldini. “He made one tackle every two games,” Forde noted ruefully. Maldini positioned himself so well that he didn’t need to tackle. That rather argued against judging defenders on their number of tackles, the way Ferguson had when he sold Stam. Forde said, “I sat in many meetings at Bolton, and I look back now and think ‘Wow, we hammered the team over something that now we think is not relevant.’” Looking back at the early years of data, Fleig concludes: “We should be looking at something far more important.”

That is starting to happen now. Football’s “quants” are isolating the numbers that matter. “A lot of that is proprietary,” Forde told me. “The club has been very supportive of this particular space, so we want to keep some of it back.” But the quants will discuss certain findings that are becoming common knowledge in soccer. For instance, rather than looking at kilometres covered, clubs now prefer to look at distances run at top speed. “There is a correlation between the number of sprints and winning,” Daniele Tognaccini, AC Milan’s chief athletics coach, told me in 2008.

That’s why Fleig cares about “a player’s high-intensity output”. Different data companies measured this quality differently, he said, “but ultimately it’s a player’s ability to reach a speed threshold of seven metres per second.” If you valued this quality, you would probably have never made the mistake Juventus did in 1999 of selling Thierry Henry to Arsenal. “For Henry to reach seven metres per second, it’s a relative coast,” said Fleig admiringly. The Frenchman got there almost whenever he ran.

Equally crucial is the ability to make repeated sprints. Tévez, Manchester City’s little forward, is a bit like a wind-up doll: he’ll sprint, briefly collapse, then very soon afterwards be sprinting again. Fleig said, “If we want to press from the front, then we can look at Carlos’s physical output and know that he’s capable of doing that for 90 minutes-plus.”

Just as clubs have learned to isolate sprints from other running, they have learned to isolate telling passes from meaningless square balls. On the screen in Carrington, Fleig flashed up a list of City’s players, ranked by how many chances each had created. One name stood out: David Silva had passed for a third more goal-scoring opportunities than any of his teammates.

The new wash of data has made it easy to compare players to players, and clubs to clubs. Wigan, for instance, were recently conceding a greater proportion of their goals from crosses than any other team in the Premier League. If you’re playing Wigan, that’s handy to know. Increasingly, clubs are acting on the data. A quant has controlled Arsenal for 15 years now, but last autumn the numbers guys took over another English giant. The Boston Red Sox’s owner John Henry, who in 2002 had tried to hire Billy Beane, bought Liverpool and immediately hired Beane’s mate Comolli to do a “Moneyball of soccer”.

From his perch at Anfield, Comolli often chats to the father of Moneyball 5,000 miles away. Beane says: “You can call him anytime. I’ll e-mail him and it will be two in the morning there and he’ll be up, and he’ll e-mail me and say, ‘Hey, I’m watching the A’s game’, because he watches on the computer. The guy never sleeps.” At Liverpool, Comolli has genuine power. He has said that data informed the club’s recent purchases of Andy Carroll and Luis Suarez for a combined £60m.
And football’s data revolution has only just got going. Fleig thinks there is an exciting future in sociograms: who passes to whom, who tends to start a team’s dangerous attacks? If you play Barcelona, that man is obviously Xavi. But in another team, the data may show that the launcher of attacks is someone unexpected. If you know the zones where he puts his key passes, you can try blocking them.

Someone who has thought harder than most about the future of soccer stats is the director of baseball operations at the Oakland A’s. Farhan Zaidi is a round MIT economics graduate with a sense of humour. He’s the sort of guy you’d expect to meet late one night in a bar in a college town, after a gig, not at a professional sports club. For work, Zaidi crunches baseball stats. But he and Beane spend much of their time at the Coliseum arguing about their other loves: the British band Oasis, and soccer. In 2006, in the middle of the baseball season, they travelled to the soccer world cup in Germany together. Zaidi chuckled: “We spend so much time together, that if all we ever talked about was the numbers on these spreadsheets, we would have killed each other a long time ago.”

Because Zaidi knows where the data revolution in baseball has gone, he can make predictions for soccer. The sport’s holy grail, he thinks, is a stat he calls “Goal Probability Added”. That stat would capture how much each player’s actions over his career increased the chance of his team scoring (for instance, whenever he successfully passed the ball five yards forward from the halfway line), or decreased it (for example, whenever his pass was unsuccessful). I asked Zaidi whether one day pundits might say things like, “Luis Suarez has a Goal Probability Added of 0.60, but Carroll’s GPA is only 0.56.”

Zaidi replied, “I tend to think that will happen, because that’s what happened in baseball. We talk now about players in ways that we wouldn’t have dreamed of 10 or 15 years ago.”

In their ancient battle against the jocks, the nerds are finally taking revenge.
Simon Kuper is author of ‘The Football Men: Up Close with the Giants of the Modern Game’ (Simon & Schuster, £16.99)

No comments:

Post a Comment