After I released a mid-tournament updated forecast for the French Open last week, Matt Zemek had a mild criticism of my giving Nicolas Almagro higher odds of advancement than Jo-Wilfried Tsonga. This led to an interesting side conversation:
I understand the appeal of the argument. Grand Slams are the biggest stage in the sport, and it's not hard to imagine a hypothetical player that's good in the regular season but can't stand the pressure of the big tournament and underperforms, It's also not hard to imagine a big competitor who lives for the spotlight and rises above his or her regular-season performance.
So in a performance-based rating system, is there justification to incorporate something like a Slam-specific variable that accounts for players that either thrive or wilt in the majors?
First, a bit of nitpicking in regard to Matt's specific argument. The claim of "Player X will underperform in Grand Slams because they have not gone far before" ignores a fundamental question: All else being equal, how well would we expect Player X to have performed to date at Grand Slams? If Player X is ranked 80th in the world, I would say he's not likely to make a semifinal because there are 79 players better than him, not because he has never made a deep run before. For the specific example of Almagro, it's fair to hold a player of his talent to a higher standard than, say, Michael Russell, but we can still calibrate those expectations a little further.
First, accounting for surfaces alone means we can have wildly different expectations for the same player at different majors. Not all Slams are created equal; half of them are on hard-court and one is on grass. So for those Slams, maybe it's not all that unreasonable that Almagro hasn't made a semifinal since he's easily at his best on clay. Second, citing a player's historical Slam performance doesn't account for the difficulty of their draws. In a vacuum, we could say Almagro has underachieved at the French Open since it's his best surface. However, Almagro also happens to be playing in the time of Rafael Nadal. You have to avoid drawing him in both the fourth round and the quarterfinals as a 9-16 seed to even have a chance at making the semifinals. (For the record, Almagro has reached the quarterfinals of the French Open three times and has drawn Nadal in every single appearance.) And even if you avoid Nadal in either of those two rounds, you're almost guaranteed to get Roger Federer or David Ferrer in at least one, which is no picnic either. It's tough to imagine a French Open draw where we can reasonably expect Almagro to make the semifinals.
Now, none of this detracts from the legitimacy of the hypothesis that Slams should be predicted differently than the regular season. So in order to answer the question in a way that addresses the above concerns, I went back and looked at all the past win probabilities Advanced Baseline has generated for each match, Slam and non-Slam, since 2006. I narrowed the data set to players who have played at least 20 best-of-five Grand Slam matches in that period, and summed their win probabilities to get an idea of how many wins we would've expected them to tally in that period. The difference between their expected wins and their actual wins, a metric I call Relative Performance, is a measure of how much each player has over- or under-performed his expectations to date.
If there are players that AB overrates or underrates in Slams, they should do worse or better with respect to relative performance in Slams than in the regular season. Comparing regular season and Slam expectations also reduces a potential bias if AB overrates or underrates a player in general. If a player regularly beats expectations by 10 percent in both regular and Slam games, it's fair to question if AB doesn't quite capture that player's skill level, but if that bias shows up equally in both types of matches, it's probably still safe to say AB wouldn't need to change its expectations of that player just for Slams.
And a quick note for reference: AB does make a minor adjustment for Slams, where players with higher ranks are more likely to win in best-of-five than in best-of-three. The theory is that it's more difficult to sustain success in the longer format against better players, which gives an advantage to the favorite. This empirically does improve AB's Slam predictions a little bit.
Below is a table of each player's relative performance in regular season and Slam matches:
|Player||Non-Slam Relative Performance||Slam Relative Performance||Slam Minus Non-Slam|
|Martin Vassallo Arguello||-0.1%||6.6%||6.7%|
|Juan Carlos Ferrero||0.4%||-1.6%||-2.0%|
|Juan Martin Del Potro||7.4%||1.1%||-6.3%|
|Juan Ignacio Chela||0.4%||-7.7%||-8.2%|
If you're looking for evidence of a player who remains calm in big pressure moments to deliver their best, you were probably not expecting Bernard Tomic -- BERNARD TOMIC -- to be at the top of the list. Why aren't Federer, Nadal, and Djokovic up here? Remember exactly what this is comparing: how players perform relative to their expectation, not how well they perform overall. AB's win probabilities do a decent job of predicting the big names' performance already based on their regular season results, and the question is whether it should do anything differently for the Slams.
Is this table a definitive answer for which players are predictably better or worse in Slams? Hardly. If I have a thousand people flip a coin 50 times, I can almost guarantee I'll get at least 10 people that get twice as many heads as tails, and another 10 that get twice as many tails as heads. This doesn't mean these people are especially good or bad at getting heads, it's just the way randomness shakes out. Compare a distribution graph of coin flips to a distribution graph of the table above:
Those two look awfully similar. A rule of thumb: The more your relative performance distribution resembles coin flipping, the higher the likelihood it shows randomness instead of meaningful differences. Additionally, the distribution graph isn't exactly an apples-to-apples comparison between players. As I've written before, not all players have the same number of data points, and that has a number of consequences. Here's a scatter plot of each player's number of Slam matches compared to their relative Slam performance:
Notice how the highest and lowest points on the graph are all clustered on the left? That's not a coincidence, that's the effect of small sample size. When you have a small number of matches, your relative performance is far more subject to short-term variance, so you can overperform or underperform wildly. As sample sizes increase and the graph goes further out to the right, expected performance tends to come in line with actual performance. You can see that in action on this graph, where the points gradually converge towards the middle. If there truly were a player who does significantly better or worse in Slams with a sufficient sample size, we would see at least one dot on the right side that's high or low, but there's not a single point past 60 matches that strays more than 10 percent in either direction. All signs are pointing to differences in Slam performance being random.
Okay, maybe you believe that differences in Slam sperformances are random for players in general, but there's still a player out there that really does do better in Slams and their genuine clutchness is obscured by everyone else's randomness. Setting aside the fact that it's really hard to tell the difference between the two from looking at results alone, it would be fair to assume that they would overperform in all Slams. Here's the relative performance level for all of the listed Slam overachievers broken out by tournament:
|Player||Australian Open||French Open||U.S. Open||Wimbledon|
There's not a lot here to indicate any given player is consistently better across all Slams. Why does Jo-Wilfried Tsonga overperform in the Australian Open but underperform in the U.S. Open, when they're 90 percent the same tournament? Why does Kei Nishikori do the exact opposite, thriving at the U.S. Open but faltering at the Australian Open? And if Bernard Tomic really does elevate his Slam performance better than anyone else, why does it only happen at the Australian Open and Wimbledon?
And for fairness, let's look at the underachievers too:
|Player||Australian Open||French Open||U.S. Open||Wimbledon|
This looks slightly more promising. You see a little more consistency with big underachievers across all Slams, although not by much. AB doesn't (and from box scores alone, cannot) account for something like fitness level in more tiring best-of-5 matches, and it's a very plausible explanation for why some players might do consistently worse in Slams. But look at how few players this affects. That was a lot of work to say Juan Monaco, James Blake, and Ivo Karlovic probably do worse in Slams. If AB is doomed to overrate these three specific players by not accounting for their Slam-specific tendencies, I'm okay with that.
There isn't a lot of meaningful evidence that Slam performance should be predicted any differently than regular season performance. More specifically, there isn't enough to distinguish differences in Slam performance from general randomness. I don't know what kind of Slam-specific variable you'd incorporate besides a correction factor for over- or under-performance, and the risk of fitting to noise would outweigh any potential marginal gain in accuracy. This doesn't mean there aren't players that are more or less affected by Slams than the regular season, and I will readily admit AB wouldn't capture that. But the idea that those players can be reliably discerned -- in a sport where opt-in, single-elimination tournaments introduce an enormous amount of variance -- sounds incredibly overconfident. Substituting confidence intervals for narratives isn't a very fun thing to do as a fan (this remains one of the truest things ever written about sports), but even acknowledging the role of randomness in the sport affected by it the most would be a start. Sometimes that means saying the big moments can be predicted just fine from the little moments.