I don't think it can get any weirder than Wimbledon 2013. The forecast gave me heartburn last year before everyone started losing. And any time stuff happens you gave a 1-in-10,000 chance of occurring, it's healthier to assume you're wrong than unlucky. So, I spent a fair amount of time digging into grass court data to see if I could find any particular weirdness that would've made my 2013 projections a little less wrong. I didn't find anything that would've dramatically changed Advanced Baseline's outlook, but I did find plenty of grass court weirdness.
One of the first things I looked at was if the aging curves I recently developed looked any different on grass than the other surfaces. Roger Federer went out super-early after I pegged him as the favorite, so I thought players might age worse on grass. I split up the AB men's data set by each of the major surfaces (clay, hard, indoor hard, and grass) and generated the basic aging curve for all matches played on each of those surfaces. Here are each of the aging curves normalized by peak value:
One of these things is not like the other. Grass just doesn't have that much of aging curve compared to the other three surfaces. Yes, there are fewer points for grass since there are less tournaments, but there should be enough to pick out any kind of signal. It's just not there.
And here's what the curves like for women's players:
The grass curve looks even weirder for the women's players. Apparently you receive a rebirth at age 27 on grass, and aging is straight-up reversed.
It wouldn't have exactly helped last year's disaster by saying Roger Federer and Serena Williams were underrated on grass. But why do the curves look that way? I don't have a good answer yet, but I have two theories so far.
One is there's a bias in the data set. Grass tournaments are pretty much exclusively Wimbledon and its warm-up tournaments, which means only the top 150 or so players are in the grass data set. It's possible that the top players have different aging curves than lower players. You might think it's because better players are more physically fit and age slower, but I would suspect a completely different reason: they can afford to mitigate the effects of aging more than the lower players. That's just conjecture for now, and it deserves its own article down the road.
The second explanation is that grass plays much more randomly than other surfaces. It's intuitive that faster surfaces play more unpredictably to players, since reaction times are shorter and bounces can take more unexpected directions. And randomness has a very important characteristic: it always moves directionally against the favorite in each match.
Take an extreme example. Let's say I were to play Rafael Nadal right now. All normal rules apply, except every fifth point will be determined by a coin toss. I would have no chance of winning a point off of him by playing actual tennis, but I'd win 50 percent of coin tosses, so I'd "win" 10 percent of my points against him.
So how does increased randomness explain screwy aging curves? If all else is equal, you'd expect a 22-year old player to be better than a 32-year old player due to typical aging effects. But if those two play in an environment with increased randomness, e.g. a grass court, then the 22-year old's edge is mitigated. Randomness alone isn't enough to explain the women's curve where older players somehow get better on grass past 27, but it could still be a contributing factor.
The small sample size of grass tournaments compounds the uncertainty that goes into predicting them, which can certainly lead to some unexpected results. There's no way around the lack of data, so to make grass predictions better, it becomes necessary to squeeze every drop of information out of non-grass matches. That involves going beyond simple surface adjustments and looking at what players' performance on other surfaces can tell us about how they expect to do on grass.
Clay and hard matches have big enough sample sizes that they're useful for establishing some kind of ground truth as to what each players' general style is. So, what does their data show about players' grass abilities? I grouped each of the current men's and women's players based on the simple hard/clay definition I use for aging curves. Then, I plotted the average grass factor for all the players along the hard-clay axis. Here's what a plot of grass factors versus hard-minus-clay factors looks like:
That's a pretty straight line, indicating there's a pretty strong relationship to be found. The result isn't exactly anything earth-shattering, either: hard-court players do better on grass courts than clay player. It's completely believable that typical hard-court advantages, such as booming serves and fast groundstrokes, would be amplified on even faster grass courts, and clay-friendly topspin and footwork would be nullified on grass.
If this seems obvious, you're probably wondering why even do a post about it at all. The interesting question to me isn't whether or not you can infer grass-court information from hard and clay results, it's quantifying exactly how much information you can extract. That's determined by the slopes of those plots, which are actually pretty small all things considered. For every point of hard-minus-clay each player has, you can throw about 10 percent into their grass rating. That number is a reminder that there's plenty of noise in the results as well as a signal, and the noise level ultimately limits how strong a conclusion you can draw from non-surface specific results. Sure, hard courters will generally do better on grass, but there is a multitude of other variables that go into grass play: home-court advantage, confidence levels, matchup-specific considerations, etc.
In addition, it's useful to remember that all match results should be contextualized according to surface as best as possible. Rafael Nadal is coming off a French Open victory, so you might think he has plenty of momentum to carry over into Wimbledon. Yes, he definitely played well, but his clay streak also reinforces that a lot of his advantage over the field is surface-specific. Based on the above plot, the more you believe clay is Nadal's best surface, the more you have to believe grass is his worst surface. That doesn't mean that losing on clay is a recipe for grass court success, but it does contextualize Nadal's results a little better than just assuming his winning will continue into grass court season.
I know I'll probably have Nadal's Wimbledon chances a lot lower in a couple weeks than most people think they should be, and a lot of it is due to the conclusions above.