clock menu more-arrow no yes

Filed under:

Marching along the statistical color spectrum

New, comments

From hits and runs batted in to WAR and wOBA, the past and future evolution of baseball statistics may be explained through an old theory involving linguistics and color.

Jared Wickerham

It's a truth that has been around for nearly as long as the game itself that baseball loves its statistics. Batting average, RBI, ERA, etc. have been a part of the game since before Ty Cobb, Cy Young, or Babe Ruth. And yet, if you plopped any of these legends into today's world and asked them to check out the leader boards on FanGraphs or Baseball Prospectus, there's a good chance they wouldn't understand half of what they saw. Like everything else, our understanding of a ballplayer's performance has greatly evolved through the decades.

But how far along are we in that evolution? Is there any more to learn? Believe it or not, a theory involving language and all the colors in the rainbow might give us an answer ...

According to a theory initially put forth in 1969 by anthropologist Brent Berlin and linguist Paul Kay, the first words for describing color in any language are for white and black. That is, every culture on earth, no matter how primitive, can distinguish between the light and dark sides of the color spectrum. As the culture and language evolves, we begin to see evidence that citizens can break down the color spectrum into more discrete chunks. After white and black, languages develop a word for the color red. This is a universal truth, says the theory. If a language has only three words for color, they are "white," "black," and "red." Next, societies learn to recognize yellow and green (first one, then the other) before developing the word for blue. Finally, languages develop a word for brown before the final grouping of purple, pink, orange, and gray can be distinguished.

The Berlin/Kay Basic Color Term evolutionary diagram

The beauty of the Berlin/Kay theory is not that it gives us a nifty little chart describing the growth of languages. No, the beauty is that it creates a map of how the human brain works in breaking down something as dense as the color spectrum. First, we see the entire spectrum in just two shades. As we get used to the spectrum and interact with it in more involved ways, the brain adapts; it begins to distinguish details that it never noticed before. The details were always there, of course; our brains were just unable to recognize them. "Black and white" turns to "black and white and red". After growing accustomed to that breakdown, even more detail comes through. "Black and white and red" is now "black and white and red and green and yellow". And so on.

It's a process that we all recognize from other everyday endeavors. Finding a book or DVD from the many on the shelf, watching the stars emerge at dusk and trying to decipher the constellations, hunting for a contact lens on the floor, spotting loved ones walking through a packed crowd at Six Flags ...The Berlin/Kay theory finally gives us a scientific framework for talking about this phenomenon; a framework that we can easily bring over to baseball statistics.

Think about how statistics have evolved throughout the history of baseball. In the earliest box scores, things were much simpler. The bookkeepers tracked runs scored and outs made. It's only later that "hits" and "errors" were devised. Same with "runs batted in", "assists", "total bases" and everything else we see on the back of a baseball card or on a page.

from Baseball Legends of Brooklyn's Green-Wood Cemetery

In that athletic infancy, Henry Chadwick and other early statisticians looked at the spectrum of a base baller's performance and saw two tones: runs and outs. After all, in order to win a game of baseball, one team must score more runs than the other before its supply of outs is exhausted. Chadwick, the father of baseball statistics, perfectly encapsulated the essence of baseball's statistics with that division, much the same way a primitive culture recognizing "light" and "dark" can capture the essence of the color spectrum.

But things wouldn't stay two-toned for long. As Chadwick and others watched more games, they began to recognize that there was more detail to be captured. They began to see "red". Soon, "hits" and "at-bats" gained their less-than-straightforward definitions, leading to the hue that would dominate so much of the twentieth century: batting average. With these three distinctions -- these three colors -- baseball fans of the time could track how a game was won and lost while also following the hot and cold streaks of their favorite players. Black, white and red gave early baseball fans everything they needed about the game.

The metaphor isn't perfect, of course. Berlin and Kay purposely narrowed the color categories down to 11 distinct shades as a way to simplify comparisons. Meanwhile, there are a dozen super-basic statistics alone that have been around for nearly as long as the base hit (singles, doubles, stolen bases, etc.). We can't break up the performance spectrum for each of these for the same reason Berlin and Kay couldn't break up the color spectrum for eggshell, ivory, or cream. And we mustn't forget the RBI, which was a major driving force in those early years too. Is that a distinct enough concept to break into its own shade? It's hard to say. As with most analogies, things begin to break down when you dive too deep into the nitty-gritty. On a broader level, however, it is safe to say that early-twentieth century baseball was in the "red" zone when it came to statistics.

The brain will never stop working, however, and it soon became evident that these three shades weren't enough to truly describe the performance spectrum. What about all those "hits" that were successful only because the fielder flubbed the play? Or the new class of sluggers who could accomplish with one mighty wallop what others might need a ground ball and some creative baserunning to manage? Batting average just wasn't enough anymore; new colors were beginning to emerge. Now a baseball fan needed to consider a player's fielding percentage and his power (or "slugging") before truly being able to determine his worth. In essence, baseball fans had seen beyond red and discovered the warring shades of green and yellow.

Color map fs
One interpretation of the full color spectrum.

This is where baseball statistics stayed for most of the twentieth century, in that strange spot on the Berlin/Kay scale that shows a society has learned to distinguish a number of unique colors without yet exploring the full range of the spectrum. As the years went on, other minor distinctions came about. On-base percentage, for one, gave more color to a player's performance. However, it wasn't until the recent shift toward sabermetrics that we truly advanced on the Berlin/Kay scale.

Yes, we've recognized slugging and on-base percentage as a more nuanced view of batting average for years, and, yes, we've started to be more serious about measuring defense. Those steps are important, but they are mostly delving into something we've already seen before; like discovering the differences between forest green and olive. What has got us seeing blue on the Berlin/Kay scale are the more holistic statistics. WAR, for example, tries to account for a player's entire performance while wOBA recognizes the amazing level of nuance among even the simplest offensive contributions. This is a new way of looking at an old game. In the same way blue brings a deeper hue to yellow and green, these stats provide definition and depth to the basic numbers that we know so well. It's an evolutionary step in our quest to fully understand the performance spectrum, but it is far from the last.

For Berlin and Kay, a society on the blue stage of language development has one more major color to learn before the rainbow's final smorgasbord of color is unleashed on them. So what will be the brown on the baseball performance spectrum? Is it catcher framing? FIELDf/x? Pitch-spin analysis? Or is it so far down the road that we can't even grasp the concept yet? After all, most people who can see only blue on the rainbow just wouldn't be able to recognize the pink and purple that we all know is there.

The good news is that our eyes have been opened to the beauty and complexity of the performance spectrum, with all of its rich hues and dark shades. Even better, we are well on our way to unlocking more color before maybe, finally, glimpsing the full prism. Seeing those pinks and purples and oranges will be a wondrous thing. You think Mother Nature paints a beautiful sunset? Just wait until we have the tools to see exactly what Mike Trout and Bryce Harper can do with their brushes.