/cdn.vox-cdn.com/uploads/chorus_image/image/23838733/gyi0060132926.0.jpg)
Not all shots are the same, you know.
The stat community relies heavily on shot differential measures (Corsi and Fenwick) that don't make any effort to account for the quality of those shots. But of course shot quality also has to matter a little bit, so why not try to factor it in?
Michael Parkatti has published a series of articles that led up to the creation of a stat that he calls Expected Goals. The approach is pretty simple: instead of treating all shots as equal, Michael puts a weight on each shot based on how often shots of that type and location go in.
It's something that makes a lot of intuitive sense, an idea that's come up several times over the years.
In 2006, Ken Krzywicki published a detailed regression analysis of the likelihood that any given shot would go in. Tom Awad built on that to produce a stat that he called Delta, which was almost identical to Michael's expected goals.
Tom also added corrections for situation, opponents, and teammates in DeltaSOT. A couple of years later, Sports Analytics Institute was selling something very similar to Delta to the Penguins. Michael Schuckers added in plays other than shots, and also accounted for quality of teammate and competition in his Total Hockey Rating (THoR).
So why haven't any of these stats really caught on?
Putting stats to the test
It's easy to dream up a new stat, but how do you decide whether it's meaningful? Simple intuition isn't enough -- lots of people guess wrong about which stats will be important (think about the popularity of hits, plus/minus, and goalie wins).
As we discussed recently, there's a certain standard of proof required for a new stat to catch on widely. One basic expectation is that its advocates look at whether it is a repeatable talent. This is important because it tells us how to interpret strong results. Think about the two possible scenarios:
- If the stat is highly repeatable (Corsi, for example), then a player posting good numbers for a stretch is good news -- it means they're probably genuinely good at it and we should expect them to continue to do well.
- If the stat is poorly repeatable (on-ice save percentage, for example), then a player posting good numbers for a stretch is likely bad news -- it means we should bet on them fading as their performance regresses toward the mean.
That's a pretty important difference; it's hard to be enthusiastic about a new stat before we even know whether to be excited or concerned if a player we root for is at the top of the rankings.
There's another important question for the analyst to answer: if the stat is repeatable, does it correlate with future results? The number of hits a player is credited with is a highly repeatable stat, but in every corner of the internet you can find evidence that it doesn't bear much relation to success. So readers should be skeptical of any stat that doesn't come with an assessment of correlation to future results.
Until recently, that's never really been done for any of these stats.
Evaluating shot quality measures
Two weeks ago, Parkatti looked at this for his expected goals measure at the team level, finding that in one season expected goals became more predictive than Corsi after about 35 games. It's a really interesting article, and you should give it a read -- but be aware that Corsi significantly outperformed shot quality measures in other years.
A month ago, Schuckers looked more closely at THoR and showed that it had a higher year-over-year repeatability at the player level than Corsi. He subsequently observed that within a game, a team's total THoR was a better predictor of the outcome than their shots on goal, though not as good as Corsi.
I'm really excited to see the analysts testing their new statistics. This is a significant step forwards, one that allows us to assess the utility of theiri metrics.
However, I think there's still a hole in the analysis.
The key question for me is "should we expect a team that does well by this metric over one period of time to outscore their opponents in the future?" And relatedly, "if the team does well by this metric with a certain player on the ice over one period of time, should we expect them to outscore their opponents in his future minutes?
Parkatti and I answered the first one but not the second for Expected Goals. Schuckers published information from which you can infer implications about the second, but didn't directly answer either.
In this article, I'll dig into how a shot-quality measure performs in player evaluation.
A closer look at Delta
Few of these metrics have been published for public scrutiny, but Awad published four years worth of tabulated Delta, which allows us to do the legwork to assess its utility. And by comparing it to Corsi, we can look at how much value the shot location and type data has added.
The correlation between a player's Delta one year and his Delta in the next year is 0.35. That's a modest figure -- enough that a player's Delta in one year tells us something about what to expect next year, but players obviously bounce up and down a fair bit from year to year.
If the repeatability were really high, the dots on the above plot would fall on a straight line -- once you know how a guy did in year 1, you could predict very accurately how he'd do in year 2. Here, the result isn't completely random; there's definitely a bit of a slope from bottom-left to top-right.
But it's not a strong relationship -- players who are near the bottom of the league in year 1 are all over the map in year 2, so our points scatter into more of a blob than a line.
Is Corsi more repeatable? Yes, it is; the repeatability is 0.56, meaning the points are clustered appreciably more tightly towards a line:
There's still a fair bit of spread in the data, but the dots are more tightly grouped; the blob is clearly elongated into a shape with a distinct slope to it. That's how a more repeatable stat looks -- the range of possible year 2 outcomes is smaller, so the blob is compressed to more closely resemble a line.
Of course, repeatability isn't everything. Like we discussed with hit totals, a stat can be highly repeatable and still not predict future outcomes. The question is really whether factoring in shot quality leads to a stronger correlation to future scoring -- is the added information more important than the added noise?
There are lots of different ways to use the data for this assessment. Do we express Delta and Corsi as cumulative (counting) statistics or as rate statistics? Do we include all players or just ones who get significant playing time? It turns out that it doesn't matter in this case:
Data type | Correlation of Delta to next year's goal% | Correlation of Corsi to next year's goal% |
Cumulative, all players | 0.15 | 0.17 |
Per 60 mins, all players | 0.17 | 0.24 |
Cumulative, >500 min TOI | 0.20 | 0.24 |
Per 60 mins, >500 min TOI | 0.20 | 0.25 |
No matter what form we put the data in, Corsi does a better job of predicting next year's goal differential with the player on the ice than Delta does.
This is because the shot quality factor in Delta has a lot of randomness in it. In fact, the variability is so bad that not only is Delta a worse predictor than Corsi of future goal differential; Delta is even a slightly worse predictor of future Delta.
Remember, the only difference between Delta and Corsi is that Delta accounts for the location and type of each shot. So if it's doing significantly worse at predicting the future, that implies that including this shot quality factor really isn't helping.
THoR is more repeatable than Corsi or Delta, but unless Schuckers has markedly improved on Krzywicki's shot quality assessment, my guess would be that this arises from the inclusion of plays like penalties and faceoffs and not from the inclusion of shot location data.
The shot quality factor just appears to add more noise than value over sample sizes of ~82 games, which is why these metrics have never really caught on.
Extracting value from shot quality
Added information should never make analysis worse. But when people try to incorporate shot quality, it often does make things go the wrong way because they fail to account for variance.
Shot quality measures have a lot of random fluctuations, but they can add value in some instances as Parkatti and others have shown. However, to get the most out of them, they must be regressed properly -- we have to pull our estimates in towards the average so that those random fluctuations don't have a large influence on our assessments.
As long as the sample sizes are large enough that shot quality factors aren't completely dominated by randomness, including them with the proper regression will improve the quality of the analysis a bit. Here's what I'd recommend for someone who wants to put together the very best evaluations they can:
Step 1: Include shooting talent too. Shot type and location is easier to pull from the scoresheet than shooting skill, so it's what most people focus on. But shooting talent is nearly as large a factor in a player's shooting percentage as shot location is, and we might expect shot location to be an even smaller factor for on-ice shooting percentage. (The guy who plays in front of the net will have a lot of shots from in close, but if every line has one of those guys, we won't necessarily see a difference between lines in average shot distance the way we do for individuals.) So do the extra work to figure out not just what type of shots a player takes, but also whether he scores on more of those shots than the average player.
Step 2: Account for scorer bias. We know that rinks don't record shot location very accurately, and that some rinks tend to record shots as being closer than others. This can have a disastrous impact on the results -- over the four years that Tom tabulated Delta, Colorado had a middling 8.18% 5-on-5 shooting percentage and 99.8 PDO, yet they had 16 players among the top 10% in the league in Delta's shot quality factor. Without even looking, I'd be willing to bet they had a huge home/road split indicative of a biased scorer.
Step 3: Separate shot quality for and shot quality against, and separate forwards from defensemen. We know that forwards drive shooting percentage a lot more than defensemen do, and that save percentage differences are more heavily driven by variance than shooting percentage differences. So we don't want to lump everything together; some shot quality factors will need to be regressed more heavily than others.
Step 4: Regress the data. For each shot quality factor you're looking at, calculate the variance across the league and the variance contributed by simple random chance. The difference between the two is the amount of variance due to some factor of skill or usage; the smaller that is, the more you should pull each player's observed results in towards the mean to account for the role of chance.
Step 5: Show that your data is better. Now that you've gone through these steps to calculate something that should be better, don't stop here -- prove it to the reader. Dot the i's by calculating the correlation between your measure and future results and show that you've actually produced a better estimate of value.
I like the simplicity of just using shot differential for my writing -- it's very nearly as precise and much simpler to explain. Still, there are occasions where we want the very best accuracy we can get. In those cases, including all of the available information makes sense, but it needs to be done carefully, and readers should expect to see the result tested.