Surfaces are one of the most interesting things to analyze in tennis. Most sports try to keep playing conditions as homogeneous as possible across all venues, and those that don't at least have variations that are predictable on balance. You'll rarely see hitters in baseball consistently underperform in ballparks with high park factors, for example. Tennis embraces the variation and invites wildly different playing conditions, even arranging the tour schedule so each surface gets its own mini-season during the year. I think there is plenty of progress to be made with respect to dealing with surface variation at all levels of tennis (Tamira Paszek is 165th in Advanced Baseline's clay rankings, and she was seeded 28th at the French Open), and a good place to start is understanding surfaces at a general level.
Why do certain players play better or worse on certain surfaces? Advanced Baseline can empirically measure a player's surface variation pretty well, but it can't explain why it happens. There are a lot of factors in play when coming up with an explanation -- playing style, country of origin, and even changes in racket technology. In theory, you could come up with something like a linear regression equation that would find all the significant variables and produce surface factors for each player, but coming up with meaningful quantifications for that list is a lot easier said than done. The better way to start understanding surfaces, in my opinion, is taking one factor at a time and looking for very general patterns around that factor. Country of origin is as good a place as any to start.
There are a couple of accepted truisms about surfaces and countries. Americans generally do better on hard courts, and European and South Americans tend to do better on clay courts, because these are the most common surfaces found in those regions. But how much do Europeans prefer clay compared to South Americans based on their results? And are the surface preferences for each country the same for men's and women's tennis?
To explore the conventional wisdom a little better, I wanted to see if I could quantify exactly how much better each country does on each type of surfaces. For easier visualization, I limited this analysis to clay and hard surfaces, since they make up 70% of all tennis matches and are the only two that would yield statistically significant results.
I took the full list of players' AB-generated surface factors and pared it down to players with at least 100 career matches in the database. I subtracted each player's clay surface factor from their hard surface factor to show each player's relative surface preference, and averaged relative preferences for countries with at least 16 players. Here are color-coded maps of the world's collective surface preferences for men's tennis. Red indicates a preference for clay courts, and blue indicates a preference for hard courts.
The amateur sociologist in me could stare at these maps for hours. They look like they could inspire a hundred terrible term papers ("Why Venezuela's Lack of Clay Court Players Demonstrates A Failure Of Hugo Chavez's Socialist Policies"). The colors match conventional wisdom for the most part, with a couple interesting exceptions. And while the maps themselves don't explain that much by themselves, they're useful as a starting point for more interesting questions. Here are two that popped into my head:
How much does common court type by country really explain the above map?
Surface preferences are probably explained by more than just your home country; they also have to do with things like playing style. Why aren't there many big hard-court servers from Spain? Is it possible that the potential Spanish big servers don't do well at a young age because the clay courts don't suit their game well, and they switch their Nadal posters for Sergio Ramos? And are all the potential American clay-courters getting blown off the hard courts as kids and playing basketball instead? There are all sorts of interesting questions for where tennis players come from, including selection bias (and for that matter, plain old luck).
Is it fair to give a player a surface preference before they've played a match?
AB starts everyone's preference at zero and increments based on their results. It's one thing to give everyone something like a head start on their surface preferences based on these maps for predictive purposes. But what if you did something like incorporate surface factors for seeding purposes? Would it be fair to automatically give new Colombians a boost to their clay rating early on just because most Colombians are better on clay? Sounds a little queasy.
These are just the ones that came to me. Hopefully people who know tennis better than I do can look at these and come up with interesting questions of their own.