Neil Paine had a nice piece last week at FiveThirtyEight that's a basic primer on goalie variability.
In the footnotes, Neil suggested (based on this comment thread) that a replacement-level goaltender would have a .908 save percentage. In other words, if a team went looking to add a goalie on the cheap -- a free agent or waiver-wire add with a near-minimum salary -- they should expect to be able to find someone with something like .908 talent.
This seems high. As Twitter user @draglikepull observed, if you sort this season's goalies by minutes played, 15 of the top 60 had a save percentage below .908. Is it plausible that 1/4 of the goalies in the league are worse than freely available replacements?
No, it isn't, but we need to differentiate between observed performance and actual talent before we go any further.
Variability and the replacement threshold
Goaltending is variable. When the difference between elite and mediocre is as small as a save per hundred shots, it will take many hundreds of shots for us to reliably tell how good someone is -- several thousand, really.
Let's imagine we are omniscient beings who know that a certain backup's true talent level is .910 (of course, as omniscient beings we'd know exactly how much that would be affected by his team's defense and how often his team is shorthanded and so forth, but for now let's ignore those factors).
A typical lightly-used backup might face 630 shots in a season (the average of goalies #46 through 60 in minutes played this year was 633). So we'd expect our goalie, if he performs right at his .910 talent, to finish with .910 * 630 = 573 saves.
What if he has just one fewer save -- one ping off a post and in, one shot where he gets screened, one shot where he just doesn't react as well as usual, or whatever. His 572 saves on 630 shots would be a sub-.908 save percentage.
So if Neil is right that replacement level is .908, then this goalie's 572 saves mean he had a (very slightly) below-replacement performance this year, but it doesn't mean he's a below-replacement-level goalie. He just had a (very slightly) off year.
This can happen to anyone, of course. The better a goalie is -- and the more shots he faces -- the less likely it is that random chance would have him below .908 in any given year. But the point is that we shouldn't dismiss the .908 figure just because a bunch of goalies fall below it; given how easy it is for a tiny bit of bad luck to put a competent backup below that line, we should expect to see a fair number of guys below that threshold.
Replacement-level goaltending: A first cut
So OK, I'm open to the idea that .908 is the right answer, but it still sounds high to me. Let's back up and look at where this number came from.
Let's suppose we think the average team's third-string goalie is replacement level. Some teams might have a number three goalie who's better than that and some will be worse, but on average, I think it's reasonable to say you could get a third-string goalie for darn close to free. (Obviously some of them will be prospects who carry a higher value than their current performance, but that's beside the point here.)
So what's the save percentage of a third-string goalie? Well, we can take all the goalies who played this year, sort them by minutes played, and look at the aggregate save percentage of the goalies who fall below 60th on that list. Collectively this year, those goalies stopped 5843 of 6478 shots that they faced, for a .902 save percentage.
Historically, we've gotten to this point of the analysis and said "ok, replacement level is something like .902". But that's not quite right, because there's a selection bias at work.
Managing selection bias
Suppose our goalie who's a .910 talent gets signed to be a team's backup goalie, but butchers the start of the season.
He's still a .910 guy in the long run, but in the short run everyone has good games and bad games -- even the great Henrik Lundqvist was at just .895 over his first eight games this year, so our .910 guy is certainly capable of throwing up a putrid stretch to start the year.
And unlike Lundqvist, our .910 guy hasn't earned enough trust from the coaches to stick around when he struggles; they make a roster move of some sort and replace him after his terrible start. So at the end of the year, his stat line is, I dunno, let's say 8 games played, 185 saves on 209 shots faced, a .885 save percentage and 3.00 GAA.
When we go to work out what replacement-level goaltending is, his minimal workload makes him look like a replacement-level goalie. But really he's a .910 guy who just had a tough start -- when we count him as a .885 guy in our estimate of replacement-level goaltending, we're making the replacement-level goalies look worse than they really are.
So what do we do?
Tom Tango suggested looking at how the guys who played third-string minutes in one year did in the following year. So if our .910 goalie gets a shot again next year, he'll presumably be at around .910 on average -- obviously better some years and worse next years, but not selectively stuck down at .885 the way he was this year.
Commenter Michael Cheyne did that and found that the third-string-minutes guys saw their save percentage bump up about .007 on average the next year, to a total that was .006 below the league average. So Neil noted that the league average save percentage this year was .914 and concluded replacement level would be about .006 lower, or .908.
Over the last four years, the league average save percentage has been .913, so we'd say that a goalie who was at .907 over that span was a replacement-level goalie.
A quick double-check
Let's look at actual goalie performances and see if that makes sense. Here's every goalie who had a save percentage between .905 or .909 and played in at least three of those four years:
|Goalie||Games played||Shots faced||Save percentage|
One can't help but notice how many of them played for the Flyers, but that's neither here nor there. Is this really a list of replacement-level goalies? I'm not so sure.
Montoya and MacDonald seem to fit the bill, as players on one-year cheap contracts. Many of the others are less clear.
Mason and Pavelec are their teams' clear starting goalies, and both currently have contracts that reflect a perceived value much greater than replacement level. Ray Emery is a backup, but a highly-paid one, again reflecting significantly more perceived value than replacement level.
The other four are out of the NHL, so if we think they are roughly .907 talent then their availability would support setting replacement level at around .907. However, three (Kiprusoff, Hedberg, and Boucher) are at least 37 years old and the fourth (Biron) turns 37 this summer.
At that age, they're on a steep part of the aging curve -- if they averaged .907 over the last four years, they're probably appreciably below .907 now. Last year, Kiprusoff posted a .882 over 24 games and Hedberg a .883 over 19 games. Boucher only got four games (at .891) last year, but was also .881 the year before that. I don't think these are guys who teams think could step in today and post a .905.
In the end, I don't think this group's current situation is what I'd expect to find for a group of nine guys who are near replacement level talent.
Another issue is how many goalies performed considerably worse than this but remained in the NHL. Of the top 60 goalies in games played over this span, 15 had save percentages below .907. Eleven of them played in the NHL all four seasons.
Are we really prepared to say that 1/3 to 1/4 of all teams are sticking with guys game after game and year after year who are performing worse than a freely-available replacement?
I'm not, personally. I think you have to conclude at least one of three things: a) save percentage alone is a bad way to evaluate a goalie, b) teams are bad at evaluating goalies, or c) replacement level is lower than we think.
Personally, I'm inclined to guess it's a little of all three. I think save percentage does pretty well but isn't perfect; I think teams do make some clear mistakes with goalies, and I think replacement level is probably a tick or two lower than where Tom and Michael suggested it was.