Advanced Baseline ranks and forecasts produced a few player rankings I didn't agree with last year, but none bothered me more than Roger Federer and Venus Williams. Both were former No. 1 players who were in the beginning stages of age-based declines. However, they still had a lot of past results logged from when they were closer to their peak level. Even after receiving less weight for older results, they still ended up a little higher than they should've been. The problem with waiting for more recent results to pull down their ranks "naturally" is they don't play as many tournaments as they used to. The best way to get ahead of the problem is to anticipate how Federer and Williams would decline on account of their age and incorporate it in to their ranks.
Solving this problem produces a couple interesting questions along the way. What does a tennis aging curve look like, and how does it compare to other sports? How does the curve differ across surfaces? How do the men's and the women's curves differ?
To start to answer these questions, I went back through historical AB win probabilities and looked at the age of all the players in each match. I rounded each player's age to the nearest half-year and summed the relative performance of each age across all players to see if there was a pattern in the results. (This is basically the delta method used to construct aging curves for baseball and basketball players.) Here are graphs of the relative performance across all players for each age:
The graphs are a little noisy, but both graphs still have the basic shape of a diagonal line moving downwards, with younger players over-performing and older players under-performing. If we sum the values of the lines from left to right (or for the calculus-minded among you, take the integral of the graphs), the end result is an average aging curve for all tennis players.
These curves are in pretty good alignment with what conventional wisdom would say about how players perform at different ages. Men peak just after age 23, and women peak somewhere between 20 and 21. This roughly matches up with the age distribution of Grand Slam winners in the Open Era. For men, 21 is a much earlier peaking age than football/baseball (27) and basketball (25). The most common explanation for tennis having an earlier peak is that most players turn pro at an earlier age and start accumulating wear and tear much earlier, but I'm not entirely sold on that idea (more on that in a future post). And I'd love to be able to compare the women's curve to other sports, but to my knowledge, no one has constructed an aging curve for women's sports before. So hooray for Advanced Baseline maybe making the first one.
While these curves look nice and smooth for the general case, there are plenty of high-profile cases for players whose careers don't take this nice arc at all (Serena Williams and Tommy Haas come to mind). As Nate Silver will remind you, getting any one particular player to fit to the general case is hopeless most of the time. Nate's signature forecasting system, PECOTA, had a great approach of grouping players by similarity scores across a range of stats and constructing separate aging curves for each type of player. I think there's a lot of potential for grouping tennis players according to something like playing style -- big servers versus counter-punchers, baseliners versus serve-and-volleyers (if any still exist), etc. -- and seeing how the aging curve is different among each group, but the real challenge is deducing each player's style from data instead of the eyeball test.
In the next post, I'll take a very simple approach to grouping players and seeing how each group's aging curve differs from the others.