- Joined: May 5, 2010
- Last Login: Oct 14, 2021, 10:21am EDT
- Posts: 43
- Comments: 6,064
We’re gonna get to this, somewhere around part 4 or 5.
Comment 2 recs
I give this a lot more points if Chenoweth is wearing a Yankees cap at the end.
Appreciate the kind words. Regarding BBA — would be open to other suggestions for what to call this thing…
Oh, for the second one about umpiring errors, I don't think it exists.
There’s definitely the components there already. There’s good catcher framing metrics available now both on FanGraphs and on Baseball Prospectus, and that’s the data that would be needed to do it — for each pitch, what was the probability of it being called a strike vs a ball, what was it called, and then the translation of that into run values.
But I don’t think anyone is actually folding that into their pitching value metrics. Which yes, that means that there’s something being double counted. When a pitcher gets a strikeout on a called strike that probably should have been a ball, the pitcher gets full credit for the run value of the strikeout and also the catcher gets partial credit for the run value of the strikeout.
But if anyone in the public domain is doing this, it would be Jonathan Judge and Co with their DRA- over at BP. What goes into that is complicated somewhat opaque (not a criticism, per se, I think it’s really good work they’re doing, but it’s harder to evaluate what exactly it is than a FIP/xFIP type thing), so I wouldn’t swear that they aren’t.
The other place where it is being incorporated is with the pitching metrics Ethan Moore created and that Eno is always quoting (Pitching+/Stuff+/Command+). Those metrics don’t know or care whether an individual pitch was called a ball or a strike, so they’re building from the ground up in a way that’s not affected by SSS umpire luck.
In the raw data set, it’s actually broken out by each number of outs. I first tried to graph and analyze it that way, with separate columns for 6, 7, 8, etc outs. That would be truer to what’s actually going on, but in the end I just couldn’t ever turn that into something visually appealing that a human could actually understand. After that, I did play around with those buckets, so might be guilty of P-hacking a little bit, but I settled on 7-12 outs because I do think it’s representing a particular type of pitcher. In most cases, 7 outs means he’s coming out for a third inning after completing two, and 12 means that he’s completed four innings, and that number of up/downs was what I was trying for (I’m missing out on players who came out for their third inning but got no outs, and I’m erroneously including players who came out for their fifth inning but got no outs).
As for why I chose to focus on 3-4 up/downs, it’s partly arbitrary, but I do think it’s a rough way to identify a type. There are lots of pitchers who commonly go either 1 IP or 2 IP, but almost never 3 IP, and then it’s a different group who go 2 IP or more based on situation, and those don’t have a ton of overlap with the group that’s usually going at least 5 IP. But also the Rays have been more fluid than that over the course of the season, and it’s especially messy at the 4/5 IP level and at the 2 IP level.
I was using Baseball Reference for this, but I think that if I were working off of Retrosheet data, there’s probably a way to target more precisely the pitching types, both by sorting based on schedule (how many rest days do they commonly have between starts) and also based on sort of a clumping algorithm to help sort pitchers based on how often they work which lengths over a set period of time (shorter than the whole season, because roles change over the whole season).
That’s a long and not entirely coherent answer, sorry, but I think it’s what you’re asking.