clock menu more-arrow no yes

Filed under:

Predicting the World Cup, through the power of math

New, comments

There are many ways of coming up with predictions for the World Cup, so let's go down the "rigorous mathematical model" route. There might be a surprise or two along the way.

Paolo Bruno

SB Nation's 2014 World Cup Preview'

Is it possible to predict the outcome of soccer matches? Certainly, it's impossible to do so perfectly. But on the other hand, once you get past the fundamental uncertainty of human events, there is evidence that predictive models can provide value. The key insight that current cutting-edge soccer stats provide is this: there is a huge amount of variability in finishing. The rate at a which a team converts their chances fluctuates significantly, even over 20 or more matches.

Predictive models focus instead on the quality of the chances that teams create or concede, and pay far less attention to the top-line goals scored and allowed numbers that are more heavily influenced by game-to-game variability in finishing. In the past year in the English Premier League, the chance-quality model predicted Liverpool's title run because the Reds had been the league's best team in 2012-2013 at creating chances. This year, they finished those chances; their points total duly spiked upwards.

This key insight -- chances are more predictive than what a team does with them -- forms the basis of our World Cup model when it is applied to international soccer. It is based on shot locations and shot types for over 1000 international matches since 2009, plus scores for another 2000 matches. The model accounts for strength of schedule, the recency of the match, and whether it was a competitive fixture or a friendly. For more on the model, see the Appendix below.

Brazil and Spain: Co-Favorites

My model predicts a greater than 50 percent chance that either Brazil or Spain wins the World Cup. No one else, including other favorites Argentina and Germany, has a chance over one in ten of being crowned champions.

Top6_medium

The first unexpected result here is probably Spain's very high championship odds: they come very close to Brazil despite the Selecao's home field advantage. Spain had fantastic expected goals numbers throughout the qualification campaign and during their 2012 European Championship win. And yes, they lost 3-0 to Brazil in the Confederations Cup final, but the chances margin was far tighter; had both sides' finishing been up to their "normal" standards Spain would only have lost by a goal.

Dark Horse Picks: Portugal and France

Along with the two pre-tournament favorites Germany and Argentina, the other two second-tier favorites are Portugal and France. This may be something of a surprise considering neither finished top of their groups in UEFA qualification. Nevertheless, they rated as the next two best teams in Europe based on expected goals. The Portuguese side in particular were dominant in qualifying by expected goals, but they dropped points to Israel and Northern Ireland in matches where they had by far the best of the shot chart.

The following image maps all the shots in Portugal's three qualifying draws against lesser sides, with the size of the marker relative to the expected goals value of the shot. If Portugal's finishing had been better -- and remember, the evidence we have suggests that much of game-to-game finishing is the result of random variation -- they'd probably have come away with seven points, if not all nine.

Port_qualifying3_medium

These sorts of performances tend to even out. In their last tournament performance, Portugal made the semifinals of Euro 2012 before losing to champions Spain on penalties. It may be that Portugal are not the third-best team in the world, but their chance-quality numbers seems solid evidence that they are in the top six. Which, of course, is bad news for the U.S..

France require somewhat less explanation. This is a team with perhaps the best midfield in international soccer and so much depth in wide forwards that they can replace the injured Franck Ribery with one of Mathieu Valbuena and Antoine Griezmann. France's qualifying campaign required a trip to the playoffs, but only because they were drawn in the same group as Spain. The French played the defending champions well in the group stages, losing a close match at home and taking a point in Spain from a match which they could easily have won. This is a side stocked with elite talent, drawn into probably the weakest group in this World Cup. The numbers just reflect that.

So let's go to the odds. The following tables for each group list average W/D/L numbers as well as average points. "Qualify" is the odds of reaching the knockout stage, while "Champ" lists odds of winning the World Cup. As a check on the model's numbers, the tables also include the broad consensus of the betting houses. This is not intended as betting advice, but rather as a marker of divergences from the global consensus which might require further explanation.

Groups and Odds

Group A W D L Pts Qualify Champ Bk Qual Bk Champ
Brazil 2.1 0.6 0.3 6.9 93% 28% 92% 22%
Mexico 0.9 0.8 1.3 3.6 43% 0.7% 43% 0.5%
Croatia 0.8 0.8 1.5 3.0 33% 0.4% 46% 0.6%
Cameroon 0.7 0.9 1.4 2.9 31% 0.4% 19% 0.1%

The Brazilians are the favorites. They not only have either the best or second-best roster in the world, they also have the home-field advantage. In competitive international matches international matches since 2009, the home side has scored about 40 percent more goals than the away side. By contrast, home field advantage in the English Premier League is generally in the range of 20 percent. In international soccer, playing at home is a huge deal.

Mexico might seem a little high here, although the model has them at the same level as the bookies. While El Tri were legitimately bad in their last several matches in CONCACAF qualifying, the chance-quality model takes the long view. The Mexican resume still includes quality performances in friendlies against Cote d'Ivoire, Nigeria and Ecuador, plus a capable showing at the Confederations Cup including a good win over Japan. Like Croatia, this side is a clear step down from the quality of an Italy or Chile, who could make a run to the semifinals if they catch a break or two. But Mexico do not project to be nearly as bad as they appeared in losses to Honduras and Panama in the Hex.

Group B W D L Pts Qualify Champ Bk Qual Bk Champ
Spain 2.1 0.6 0.4 6.8 91% 24% 82% 12%
Chile 1.2 0.8 1.0 4.4 59% 2% 52% 2%
Netherlands 0.9 0.8 1.4 3.3 35% 1% 58% 3%
Australia 0.5 0.7 1.8 2.1 15% 0% 9% 0.1%

If there's one consistent theme in these rankings, it's that the depth in South American soccer equals Europe's. Our numbers rate not just Chile, but Colombia, Uruguay and Ecuador as good bets to get to the knockout rounds. While this model's numbers diverge significantly from Nate Silver's 538 projections, mostly because he uses goals scored and not underlying stats, one place of agreement is the evaluation of CONMEBOL. Just getting through qualifying is a huge accomplishment. Chile had only a roughly 0.53 expected goals ratio* in qualifying, but that is enough to get them a significant step up on the Netherlands for second in a very difficult group. You can see the case against the Netherlands at the Washington Post's Fancy Stats blog. In short, the Dutch romp through qualifying looks like a function of fantastic shot conversion, and the doubtful talent in the side probably gives us a better idea of their quality than their raw goal numbers.

*Expected Goals Ratio: xG/(xG+xGA); 0.50 representing an average team, with higher numbers better.

Group C W D L Pts Qualify Champ Bk Qual Bk Champ
Cote d'Ivoire 1.3 0.9 0.8 4.9 66% 3% 49% 0.6%
Colombia 1.2 0.9 0.9 4.5 59% 2% 73% 3%
Japan 0.9 0.8 1.3 3.5 39% 0.3% 45% 0.5%
Greece 0.8 1.0 1.3 3.3 36% 0.3% 33% 0.3%

The model's embrace of Cote d'Ivoire, it should be noted, is based on lower-quality data than its embrace of Spain or Portugal. The data available for African soccer is fragmentary, and mostly based on raw goals scored or total shots numbers in CAF qualifying and the recent Africa Cup of Nations. One hopeful note for Les Éléphants is that they are somewhat underrated by their recent record in friendlies. An 11-to-10 goals ratio in eight matches is not world-beating, but none of the eight friendlies have been on home turf. Five away, three neutral, including away draws with Belgium and Russia. Cote d'Ivoire have experience playing away from home and a solid record of success doing so. That cannot hurt heading into Brazil.

These numbers cannot account for player fitness, and so the Colombia rating here is based significantly on matches in which Radamel Falcao featured. It does not appear that the rating needs to be downgraded significantly to account for his absence, but Japan and Greece will both likely have a slightly easier time getting out of Group C for his absence.

Group D W D L Pts Qualify Champ Bk Qual Bk Champ
Uruguay 1.5 0.8 0.7 5.4 72% 4% 62% 3%
England 1.4 0.7 0.9 4.8 63% 2% 61% 3%
Italy 1.2 0.7 1.0 4.4 53% 1% 68% 3%
Costa Rica 0.4 0.7 1.9 1.9 12% 0% 9% 0.1%

The same may not be true of Uruguay and Group D. If Luis Suarez is struggling for fitness as reported, the South Americans lose the significant edge of featuring the best player in the group. Still, the model likes Uruguay quite a bit. One thing that has been underrated is how well they performed down the stretch in CONMEBOL qualifying. In danger of falling to sixth, the Uruguayans beat back challengers Peru and Venezuela in clutch road wins, and added home victories over Colombia and Argentina. Those four matches rate at a total of 7.5 xG to 4.5. Uruguay's defense, disparaged in some circles, stepped up huge when they needed it and that is enough to get them a small edge on England and Italy.

Group E W D L Pts Qualify Champ Bk Qual Bk Champ
France 1.6 0.9 0.6 5.6 78% 5% 79% 4%
Ecuador 1.4 0.9 0.7 5.1 71% 3% 49% 0.5%
Switzerland 0.9 0.8 1.3 3.4 36% 0.1% 58% 1%
Honduras 0.5 0.8 1.7 2.2 16% 0% 13% 0.1%

The stats hate Switzerland. Drawn into an extremely easy group in European qualifying, they played close match after close match with teams like Albania, Slovenia and Iceland. Somehow this was enough to also get them seeded into another favorable group in the World Cup. There's a bean counter in Bern who knows his or her way around the FIFA rankings, perhaps.

Group F W D L Pts Qualify Champ Bk Qual Bk Champ
Argentina 1.6 0.8 0.6 5.5 75% 4% 92% 18%
Bosnia-Herzegovina 1.4 0.8 0.8 5.0 65% 2% 54% 0.5%
Nigeria 1.2 0.7 1.0 4.4 53% 1% 40% 0.3%
Iran 0.3 0.8 2.0 1.6 7% 0% 15% 0.1%

Expected goals methodology works, such as it does, because usually teams do not have enough great strikers, with a large enough sample size of matches, to swamp random variation in finishing skill. Argentina's ludicrous strike force might be the exception on the international stage. Argentina have not performed at the level of Spain or Brazil, but xG is almost certainly underrating them because it cannot know they have Lionel Messi, Gonzalo Higuaín and Sergio Agüero taking a huge percentage of their shots.

Group G W D L Pts Qualify Champ Bk Qual Bk Champ
Portugal 1.6 0.7 0.7 5.5 75% 7% 66% 3%
Germany 1.5 0.7 0.8 5.1 69% 5% 84% 12%
Ghana 0.9 0.7 1.4 3.4 34% 0.6% 25% 0.4%
United States 0.6 0.7 1.7 2.6 22% 0.2% 25% 0.4%

There's not one simple explanation for the model rates Germany so far below the consensus line. Their numbers in friendlies and qualifiers are good. They reached the semifinals of the last Euros. The issue is that the stats in all cases are sub-elite. Excluding matches against the Faroes, Germany put together about a 0.65 expected goals ratio in qualifying, which is excellent, but well below the 0.70 of Portugal and the 0.74 of Spain. Their Euros performance was good, but they were eliminated (and beaten badly) by Italy, while Spain won and Portugal came close to taking out the champions. It's all little margins, but they add up to a below-par rating for Germany.

Group H W D L Pts Qualify Champ Bk Qual Bk Champ
Belgium 1.5 0.8 0.8 5.2 70% 2% 80% 5%
Algeria 1.3 0.8 0.9 4.6 58% 1% 19% 0.1%
Russia 1.2 0.8 0.9 4.5 56% 1% 63% 1%
South Korea 0.5 0.7 1.8 2.2 16% 0% 37% 0.2%

Belgium look a touch overrated. They notched some solid wins in qualifying over Serbia and Croatia, but those are the full extent of good recent wins for Belgium. They have lost home friendlies to Japan and Colombia while drawing at home with France and Cote d'Ivoire. There's not a lot there. Further, Belgium are also one of the more cross-happy sides in the World Cup, by my numbers. For a hip pick, they play a traditional brand of soccer that doesn't display a lot of beauty or creative brilliance. Without fullbacks to provide width, they space the field with attackers and play a lot of long passes and speculative crosses. There's nothing wrong with traditionalism, but the numbers suggest that the world has fallen in love with the vague concept of the Belgian side rather than its lived reality.

That outlying South Korea ranking is a reflection of their results. While for Asia there were only goals scored numbers available, it would take a lot of random variation in finishing to make South Korea's record look respectable, let alone good. They lost twice to Iran and drew to Lebanon and Uzbekistan in their final group. Their World Cup tune-up schedule includes a loss to Tunisia. The model rates South Korea as one of the very worst teams in this tournament, and their performance record backs it up.

Algeria are a bit of mystery. Michael Cox likes them, so maybe they're good.

The Other Models

In recent weeks, other World Cup models have been rolled out by FiveThirtyEight, Bloomberg Sports, and Goldman Sachs. Goldman and 538 are explicitly based on goals, though both include a few adjustments beyond that. Bloomberg's model is entirely hidden from public view.

The top-line numbers here converge most for Goldman and Five Thirty Eight. They are both very high on Brazil's chances, giving the hosts odds of winning that fall just barely short of the 50 percent mark. Strangely, all of the other projections agree in skepticism of Spain's chances. While Spain rate at about 25 percent in the Expected Goals model, no one else has them above one in ten. 538 says 7.6 percent, Bloomberg 9.1 percent, Goldman 9.8 percent. Spain, even more than dark horses France and Portugal, look like the big Expected Goals favorite in this World Cup.

The Expected Goals model's love for France and Portugal is not reflected elsewhere, and the model's skepticism of Germany, the Netherlands and Argentina seems likewise peculiar to Expected Goals. In 538's writeup, they even cite precisely Portugal's draws to Northern Ireland and Israel. This is a clear case where a goals-based model reads events differently to a chance-quality one. Expected goals considered those matches to be predictive of future good Portuguese performances, not evidence of their weakness. We will have to wait for the games to begin to see how well the model performs.

Click on "Appendix" for the full method specifications.

Appendix +

Michael Caley's writing can be found at SB Nation blog Cartilage Free Captain and the Washington Post Fancy Stats blog.