
slash12
Nov 10, 2009 May 22, 2012 7 18
a 31 year old Software Developer in the Chicagoland area, with a wife, and 6 month old son. I have a strong passion for fantasy baseball, and am enjoying learning more and more about the numbers behind the game.
website: the fantasy baseball advantage
email:
a fan of
Chicago Cubs
RSSUser Blog
Context Neutral Run and RBI projections
Projecting a players Runs and RBI’s is a pain, and it’s largely considered contextual. So if you’ve got some good hitters hitting in front of you, you’re going to get more RBI opportunities, or good players hitting behind you, more Run opportunities.
The problem with this, is that context changes often. A team that previously stunk, may have a guy or two breaking out, and now suddenly another player is thrust into a situation where he can generate more runs. A key guy might get injured, traded, or simply moved around in the lineup.
For this reason, I’ve been working on a way to project a guys runs and rbi’s based on his skills alone. I’ve tweaked my process a bit over the years, and here’s what I’ve found the best method:
xRuns = HR + -.218 + .191 * (BB - .333 * CS) + .273 * (HBP + 1B - .666 CS) + .363 * 2B + 1.366 * 3B + .505 * SB
So just to summarize what this means, Extra base hits generate more runs, with triples generating bonus runs (because they indicate a speedy player, who’s going to score on more Singles then other guys), and net stolen base gains also improve your chance to score runs.
xRBIs = 2 * HR + .640 + .004 * (BB + HBP) + .234 * 1B + .427 * (2B + 3B)
With this one, again we see hits generate RBI’s, with extra bases generating more. Home runs generate bonus runs because you’re knocking yourself in, as well as anyone on base, and home run hitters generate more Sacrifice Flies.
Let’s look at some sample results from my 2012 projections:
name xRuns xRBI
Kemp 109 108
Ellsbury 103 97
Bautista 100 113
Bautista 100 113
Kemp 109 108
P Sandoval 89 105
Ellsbury is an interesting one on this list, Sitting in the #1 hole he traditionally had very few RBI’s historically, this system picked him for an RBI increase last year based on his budding power (and his lineup position changed to fit his new skillset).
Sandoval is also an interesting inclusion, he’s had budding HR and 2B power, and a change in his context could put him in line for a lot of RBI’s.
Obviously this method is not perfect, context does exist, I just find that it’s so fluid throughout the year, it’s fun to just ignore it, and project based on a batters skills. I find this particularly satisfying in fantasy baseball, because it’s a fun way to identify breakout players. A player with budding power (ellsbury, granderson), will eventually have their lineup position improved to take advantage of that power. These are two guys who I specifically drafted last year based on my projections, who both had their context improve, to match their skills.
2011 Pitcher BABIP calculator
We all know that pitcher BABIP is a difficult thing to predict. In Fact, we’ve got things like xFIP, and FIP, and tRA to help mitigate the unpredictability of it. However, for certain uses, however, it’s beneficial for us to pay attention to it. Fantasy baseball is one example of a case where FIP doesn’t necessarily do us a lot of good. In this case we’d rather get an idea of what their real ERA is going to look like. To that end, I’ve developed a way to predict a pitchers BABIP given a few other statistics that a pitcher has some control over.
Specifically, what I’ve done, is take 3 years of team data to predict BABIP for the various batted ball types. This helps factor in things like a slow infield, a high outfield wall, and other park based factors. It also factors in things like infield defense (on ground balls), and outfield defense (on outfield fly balls). Using 3 years of data isn’t perfect, as teams change over time, but I think you’ll find that it does a pretty good job. Another problem with my implementation, is that I’m assuming that IFFB’s are all outs, since I don’t have a statistic for the BABIP of infield flyballs.
So when is this useful? Well, it’s extremely useful when you’ve got a pitcher with a small sample of data (or none at all) playing with a particular team. When a pitcher switches teams, this helps give you a fairly good idea of how their BABIP will be effected. For instance, a groundball pitcher will be helped greatly by moving to a more groundball friendly environment (better park, better defense).
Let’s use an example to illustrate. Let’s say Ricky Nolasco get’s traded to the Rays. Using his career batted ball profile, the calculator gives him a .305 BABIP for his current team. Switch his team to the Rays, and suddenly he’s a .291 BABIP pitcher. Now let’s delve into the details of why this happened. Ground balls pitching for the Rays have a .230 BABIP, while the marlins have a .252 BABIP. Outfield Fly balls with the Rays are at .127, while with the Marlins it’s at .147. A lot of this is probably based on the Ray’s having a better defensive team, but park factor’s could come into play as well. The bottom line, Nolasco’s batted ball luck should improve with a change in teams, and with the calculator we can take a good guess at by how much.
How about another quick, more relevant example, Matt Garza. With the Rays, he shows at about .270 BABIP. With his move to the Cubs, he’s shown as a .287 BABIP pitcher, still well below league average, but not quite as elite as it was with the Rays. This of course, isn’t the entire picture of the move to the Cub’s. His strikeout’s, and walks will probably improve, and there could be a change in his HR/FB ratio as well. That’s all beyond the scope of this particular article, but still important to keep in mind.
How to Use it:
Step 1: Using one of the link’s below, you can download the spreadsheet
Open Office Link: http://www.mediafire.com/?hme601igqqu1215
Excel Link: http://www.mediafire.com/?z0vk84vhr5ug23r
Step 2: Open the spreadsheet, and input the LD%, GB%, IFFB%, and HR/FB% for your pitcher (these stats are easily obtained from www.fangraphs.com)
Step 3: Set the pitchers team, using the following lookup table:
ARI -> Diamondbacks
ATL -> Braves
BAL -> Orioles
BOS -> Red Sox
CHC -> Cubs
CWS -> White Sox
CIN -> Reds
CLE -> Indians
COL -> Rockies
DET -> Tigers
FLA -> Marlins
HOU -> Astros
KCR -> Royals
LAA -> Angels
LAD -> Dodgers
MIL -> Brewers
MIN -> Twins
NYM -> Mets
NYY -> Yankees
OAK -> Athletics
PHI -> Phillies
PIT -> Pirates
SDP -> Padres
SEA -> Mariners
SFG -> Giants
STL -> Cardinals
TBR -> Rays
TEX -> Rangers
TOR -> Blue Jays
WSN -> Nationals
Predicting HR/FB Rates
A big part about knowing how a pitcher should do the following year, is knowing what his HR/FB rate will look like. It's understood in the sabermetrics community, that a pitchers HR/FB rate is mostly out of the pitchers control. This is to say, that it's mostly a factor of luck, and park based factors.
There are equations out there that try to normalize ERA based on a league average home run rate. This is not an accurate way to predict a future HR/FB rate. Someone pitching in a homerun friendly ballpark is obviously going to allow more home runs then someone pitching in a non-homerun friendly ballpark.
Likewise, some equations take into account park based factors as well. This is getting better, but it's still not perfect, because there are player based factors to factor in as well. What I mean by this is the following: Consider that ryan howard switches from the NL to the AL. Now a pitcher in the AL has to face ryan howard a few games a year, and his likelyhood to launch the ball over the fence is much higher then your average player. Now consider a change such is made to one of your division opponents, or better yet, consider that their roster is likely to change quite a bit. In reality, this is the case, and probably accounts for a lot of the variance in pitchers HR/FB rate differences from year to year.
I've attempted to determine a HR/FB rate for each ballclub. Theoretically, plugging this estimate in for each pitcher on a given club, should give you a good idea as to what their HR/FB rate should look like next year.
A achieved this by putting together a sample of data, and running some statistics against it. First I determined that using weights of 100, 66, and 33 for the previous 3 years respectively yielded the best results (the relevancy of HR/FB rate seems to fall off the further back you go). Then I took a group of players who had a significant amount of innings pitched, and played for the same club for the previous 3 years. Using this data, I attempted to determine what the most accurate way to predict the 2009 HR/FB would have been, using 2008 and older data.
My conclusion was that using a pitchers 2008 HR/FB as a predictor was poor. Using his previous 3 year average was equally poor. Using my "club factor" proved to be significantly more accurate at predicting the 2009 HR/FB rate.
Now without further a due here's the 2010 predicted HR/FB rates by ballclub:
Brewers 11.54
Yankees 11.46
Astros 11.34
Orioles 11.29
Nationals 11.13
Phillies 11.04
Blue Jays 10.95
Tigers 10.81
Rays 10.78
Rangers 10.51
diamondbacks 10.3
Indians 10.26
Marlins 10.24
Rockies 10.21
White Sox 10.2
Twins 9.96
mariners 9.91
Padres 9.88
Cubs 9.86
Royals 9.8
Cardinals 9.79
pirates 9.71
Angels 9.62
Red Sox 9.36
braves 9.1
Mets 9.06
A's 8.81
giants 8.77
Dodgers 8.62
Pitchers batted ball observations
I ran some comparisons of year to year data for pitchers, and came across some interesting observations that I thought I would share.
First off, I ran year to year correlations on batted ball data, and found the following correlations:
.75 GB%
.75 FB%
.24 IFFB%
.22 LD%
Conclusions: Pitchers have a strong amount of control over their GB/FB rates, and a very low amount of control over their LD% and IFFB% rates.
Next I did correlations to LD%:
GB% -.31
Conclusion: Interestingly, there is a minor correlation between LD%, and Groundball percentage. It would appear that ground ball pitchers induce fewer line drives, while flyball pitchers tend to induce more. Inducing fewer line drives, helps lower ground ball pitcher's BABIP.
Next I did correlations to IFFB%:
FB%: .56
GB%: -.52
Conclusion: A higher percentage of a flyball pitcher's flyballs, go for infield flyballs (automatic outs). Thus, fly ball pitchers tend to lower their BABIP by inducing more infield fly balls.
Lastly, I did correlations to HR/FB%:
GB%: .08
FB%: -.10
Conclusion: It would appear that ground ball pitchers will tend to post a slightly higher HR/FB rate, then a flyball pitcher will. I'm guessing that this is related to the increased infield fly rate that flyball pitchers tend to post. It is a pretty weak correlation, but an interesting one, nevertheless.
Batted ball types year to year correlation
Sticking with my batted ball theme, I ran some year to year correlations of various batted ball statistics, and figured I would share the results:
GB%: .8117
FB%: .7804
HR/FB: .7414
IFH%: .5587
LD%: .3419
IFFB%: .1726
This seems to tell us that a hitter has a strong control of his groundball, and flyball tendancies, as well as his homerun rate. To a lesser extent, a hitter controls his IFH% as well. While Line drive percentage, and IFFB% seem to be much less under the control of the batter. Incidentally, IFFB%, and LD%, make up the largest part of a players BABIP as well. So a hitter not only has to worry about how lucky he has been with his hits falling between defenders, but it would appear that whether or not a hitter consistantly makes solid contact, is also fairly luck based as well.
Which statistics relate to Line Drives?
In an effort to track down what makes a player hit for a better BABIP, I decided to run a correlation on a large number of statistics, as they pertain to LD%. I received the following results:
Fastball%: .23
Contact%: .23
BB/K .19
wFastball/100: .17
HR: .11
Hits: .10
wIFFB -.45
FlyBall%: -.28
wHR/FB -.20
SL: -.20
ISO: -.15
O-Swing%: -.14,
K -.14
CT: -.13
IFH: -.12
CB: -.11
Now some observations about the results, that many of which are probably obvious to a lot of people, but which I found interesting.
First off, it seems that there is a relationship between the pitches you are thrown, and your LD%. If you're thrown more fastballs, you'll hit more line drives, while if you're thrown more sliders, curve's, and cutter's you'll hit less. I found this interesting, because this is another factor that's really not in control of the batter, but ultimately will effect his BABIP.
It also seems there is a negative correlation between Homeruns, and ISO and your Line Drive %. This seems to imply, that if you're swinging for the fences, you're going to hit less line drives. At the same time, your SLG actually does increase, because the added hits that you'll receive from the line drive, will outweigh the extra bases you've lost.
It would appear that higher contact percentage tends to go hand in hand with higher line drive numbers as well. Along with this, swinging at less outside pitches, tends to yield more line drives. So pitch selection plays a role, in your ability to hit line drives as well.
I've found that IFFB% also has a very high relationship to your BABIP. And at the same time, your IFFB% also has a strong relationship with your LD% as well. This is to say, that hitting line drives, and hitting less infield fly balls go hand in hand (and together, makeup the largest part of your BABIP).
I just figured I would share, and see if it sparked any interesting discussion. Up next: I'm going to do a correlation with IFFB% itself.
a new xBABIP calculator
I've been a big fan of the hardball times xBABIP calculator over the last 6 months or so, but there were a couple of things that I didn't like about it. The first thing I didn't like, was having to stick in exact numbers for AB's, HR's, etc. When dealing with projections, I much prefer to work in percentages. With percentages you can see what their BABIP for a partial season, or even a span of several years, or a career much easier. I also am not so sure about the inclusion of stolen bases as a statistic.
I'm a big fan of the fangraphs website, and they provide a wide array of batted ball data for each player. I determined that BABIP is very strongly determined by a combination of LD%, GB%, FB%, IFFB%, HR/FB%, and IFH%. That is to say, as much as BABIP can be. This is right along with what the hardball times uses, except in my case, I'm dealing strictly with percentages, and I've substituted in IFH% as opposed to SB's. It's worth noting, that I'm not taking into account ballpark factors (which surely have some kind of effect on BABIP as well).
I came up with my numbers, plotting a large amount of data (3 years worth of individual player statistics), and doing a multi-variable regression analasys on it (I'm not sure if that's the right wording or not, I have no formal training in statistical analsys, just some stuff I've picked up).
Here's the equation I came up with:
xBABIP =0.391597252 + (LD% x 0.287709436 ) + ((GB% - (GB% * IFH%) ) x -0.151969035 ) + ((FB% - (FB% x HR/FB%) - (FB% x IFFB%)) x -0.187532776) + ((IFFB% * FB%) x -0.834512464) + ((IFH% * GB%) x 0.4997192 )
Here's a published view of a spreadsheet showing it in action:
http://spreadsheets.google.com/ccc?key=0AuaVTUnZda7fdFVpY2NoRC1zS1p0UlNPaDlVdlRhN1E&hl=en
Here's a download of the spreadsheet in open office (Forgive the lame hosting service, I wasn't sure where to upload):
http://www.filefactory.com/file/a1a2d5a/n/public_xBABIP_Calculator_ods
I've been using the following calculator (along with a number of other equations) to build my own projections for 2010, and here are a few of the interesting things I've noticed.
First off, LD% has a very strong correlation to BABIP (not exactly a revolutionary statement), but it's also very hard to project it seems. There seems to be a lot of luck built into it, so even taking career LD% rates is still factoring in some luck, so I tend to trend them closer towards the league average (19.5).
GB% is a little easier to predict Higher GB% tend to yield higher BABIP's, but that's based on your IFH% as well. A player who can post high IFH% with a lot of ground balls will greatly increase their BABIP, while a slow player with a terrible IFH% with a lot of GB% won't increase their BABIP nearly as much (makes sense).
FB% is again easier to predict then LD% typically, and high FB% tend to yield lower BABIP's, as they are more likely to record outs. But you've got to look at HR/FB, and IFFB% as well to get an accurate picture. A player who hits a ton of fly balls, but has a very high HR/FB rate, with a very low IFFB% (ryan howard), can post more respectable BABIP's (they have a better shot of landing if they are getting out of the in field)
HR/FB is also a little easier to predict, and doesn't directly effect your BABIP, it's only used to take the home runs out of your fly balls (which in turn helps your BABIP). One thing that strikes me as problematic here, is line drive home runs.
IFFB% seems somewhat player controlled, but also has a large luck component to it from year to year (probably largely due to sample size). This has a definite impact on your BABIP, as fly balls on the infield are automatic outs.
IFH% seems very speed dependant. The more in field hits you have, the higher your BABIP as well. This can vary from year to year with luck, but generally speedy players will post better (there are a few notable exceptions, like jason bay's abnormally high IFH%, which I chalk up to some luck) numbers. Ballpark factors play a role here I'm sure as well (which I'm not accounting for).
So in the end, what we get, is a way to take numbers directly from fangraph (over the course of a career, full season, or even partial season), and get a descent idea of what their BABIP should be like, and how lucky they have been. As always, this will still vary a lot from year to year (and the BA, OBP, and SLG along with it), but this is an attempt at trying to get an idea of what that middle number, that the BABIP will fluctuate around is for a given player. Outside of using a calculator like this one, or the hardball times, the next best way to evaluate BABIP is probably to look at a players career numbers, but even those are prone heavily to be skewed by some lucky streaks.
I'm very interested in any feedback/critique that anyone has to offer, or any ideas on improving it. I've also got a number of other calculators (one that does batting average, xHR, xR, xRBI, xSB, xAvg, xOBP, xSLG, that I'd be willing to throw out there as well, but I figured before I went through the trouble, I'd see what kind of buzz I get from this one.
23 comments
|
2 recs |
Tweet
Showing 1 - 7 of 7
by