
brackenthebox
Apr 17, 2008 Jun 02, 2012 9 1767
I'm a graduate student in Boston, studying computational biology. I moved here in 2004 and have hated all Boston sports with a passion barely matched by my love of STL teams ever since.
A vignette:
In my fall 2004 biomechanics class we spent a full 1.5 hr lecture discussing Curt Schilling's tendon.
a fan of
St. Louis Cardinals
St. Louis Rams
Liverpool
RSSUser Blog
Tweaking FIP: now with even less defense!
Here's some work I did recently on a modification of FIP that has even less dependence on defense and luck. The result is a metric with better year-to-year correlation with itself, and better predictive power for ERA.
A lot of the comparison built off of a BTB post from earlier this year, so I just wanted to say thanks and to share the link for anyone who's interested.
29 days ago
brackenthebox
1 comment
1 recs
Tweaking FIP: now with even less defense!
Here's some work I did recently on a modification of FIP that has even less dependence on defense and luck. The result is a metric with even better year-to-year correlation with itself, and better predictive power for ERA.
29 days ago
brackenthebox
8 comments
7 recs
If anyone's looking for a Torty shirt, here's one that I pulled together. Enjoy.
8 months ago
brackenthebox
5 comments
5 recs
Alt-text for completeness: "Also, all financial analysis. And, more directly, D&D."
It's unclear whether this is disheartening or comforting, but it feels at home around here. Linky.
Ran across this church while traveling in Bavaria. I knew Albert was royalty, but I didn't realize he'd been canonized.
A run scored vs. a run saved
There's been some discussion of late about the often repeated wisdom that "a run saved is equal to a run earned." For just about any reasonable decision-making process, I think that's a pretty safe rule to follow. Pedantic jerk that I am, however, I did a small amount of work to show how a simple and commonly used model (pythagorean win expectation) doesn't actually subscribe to the "run saved = run scored" statement. Before going any further, I want to point out that the results I show here rely on a trust in pythagorean expectation that is entirely unreasonable given the likely (in)accuracy of pythagorean expectation. But why would I let that stop me?
Pythagorean expectation is a simple way to predict a team's winning percentage as a function of runs scored and allowed. There are plenty of places to read about it, but wikipedia is probably good enough if you've never heard about it before. The resultant equation is very simple:
winning percentage = 1/(1+(RA/RS)^2)
where RA is the number of runs allowed and RS is the number of runs scored. The top left heatmap in the image shows the expected number of wins as a function of any reasonably achievable pair of runs scored and runs allowed. Hopefully the results of that plot aren't surprising: if you score the same number of runs as you allow, you have an expectation of winning 81 games (white); as you score more runs than you allow, you win more than 81 games (red) and vice versa if you are outscored (blue).
The rest of the plots show the results of a very simple experiment. Lets say that we are comparing the impact of upgrading the team by 10 runs. For simplicity's sake, let's pretend it's the AL and I'm deciding between upgrading a pitcher (and reducing my runs allowed by 10) or upgrading at DH (and increasing my runs scored by 10). The top right and lower left plots show the expected change in number of wins for these two options.
As you probably know, 10 runs generally means about 1 more win over the course of the season (white in both plots). For middle of the road teams that score and allow around 800 runs, the 10 runs = 1 win rule holds up pretty well. For teams that score a lot of runs already, adding an extra 10 runs on offense doesn't help as much (blue region of top right plot). Conversely, for teams that already allow fewer runs than is typical, saving 10 more runs is actually worth more than a win (red region of the bottom left plot).
The main point of this exercise is summarized in the bottom right plot. Here I've plotted the difference in wins added if you score 10 more runs rather than allowing 10 fewer. As you can see, for teams that score about the same as they allow, it doesn't really matter where you add the extra runs (white region on the diagonal). For good teams, however (ones that score more than they allow), a run saved is actually more valuable than a run scored (blue region). In contrast, for bad teams, a run saved is less valuable. To reiterate, pythagorean expectation probably isn't accurate enough for these results to be meaningful when making real roster decisions, but if you trust it completely, good teams should get more bang for their buck by adding pitchers and defensive wizards whereas bad teams should be targeting hitters (assuming all else is equal).
43 comments
|
9 recs |
Tweet
Alright, whose work is this?
Unless this Corky Ramone meme has gotten further than I've realized, it must be a VEB member.
The clutchiness of the home team
In the comments to this post about the results of his/her simulator Xeifrank recently mentioned that due to the rules of baseball, the most likely outcomes of closely matched teams have the home team winning by a single run. In particular, even if the away team is favored, the most likely single scores still involve the home team winning by one run, because the distribution over scores in which the away team wins is more evenly distributed. This is basically a result of the home team not tacking on additional runs in the 9th inning or later when they win (since the game ends). I asked if this trend is visible in real games and not just the simulations, and, on his/her suggestion, decided to take a look for myself. (Spoiler alert: the answer is yes)
I took as my data the scores of all of the games from 2009 (through last night, 9/24) yielding a total of 2288 games. Overall, the home team had a winning percentage of .546, in line with the average over the history of baseball of 54%. To be honest, I wasn't aware prior to this analysis that home field advantage was actually that strong in baseball (even though it is much weaker than most sports, from what I understand). In any event, here's a heatmap of the joint histogram of final scores:

Because the home team already has the slight edge, the results aren't as suprising in this case, but the most common scores all had the home team winning by one run (4-3, 3-2, 5-4). Because the home team wins more games overall, it's a little hard to interpret the above plot in terms of the one-run bias for which we are looking. One way to account for this, is to normalize the home team and away team victories separately. The result is a plot that shows the probability of a given score, assuming that the home team wins (upper left triangle) or that the away team wins (lower right triangle). The resulting plot is:

If there were no bias towards one-run games, the above plot would look symmetric about the diagonal. While the effect isn't particularly strong, there clearly is an asymmetry in the data. Outcomes of 4-3 and 3-2 are more likely for the home team than the away team, meaning that home teams win a higher percentage of games by these scores than away teams do. To offset this, the away team has more density further away from the diagonal. Note, for example, that 3-1, 4-1, and 5-1 victories are enriched in away team victories relative to home team victories. There also seems to be enrichment for away teams blowing out the home team (10+ runs vs 1-6 runs), but that might just be noise.
Finally, I wanted to look at the same data but focusing on the margin of victory instead of the exact score. This plot shows the total winning percentage of home teams given a particular margin of victory (but not conditioned on a home team victory):
This plot really shows the dramatic difference in one run wins. In one-run games, the home team won almost 61% of the time; that's equivalent to a record of 99-63. In contrast, the winning percentage in 2+ run games is closer to 52% (84-78). On average, obviously, this still comes out to the 54% winning percentage that home teams have overall. While I didn't demonstrate it here, this bias towards 1-run home-team victories is likely a result of the rules of baseball, and not some psychological lift that the home team gets in close games. This idea is supported by the fact that it still appears in the simulator, which obviously doesn't have any psychology or anything of the sort built in.
So, what's the point of all of this? As my title alludes to, one way to view this result is as a caution against selection bias. Winning percentage in one run games is often thrown around as some measure of how "clutch" a team is. While I know I'm preaching to the choir, this is just one example of how such discrepancies in results can arise without any human element at all. Next time you hear about how well a team has performed in one run games, I'd at least take a look at how many of those games were won on their home turf.
22 comments
|
14 recs |
Tweet
Here's a little php script I wrote to randomly generate combinations of C--- R--- names to represent that center fielder whose name we can never quite remember. The script outputs a png which means it can be embedded as an image (e.g. in comments) and will show a different combination every time the image is reloaded. Just use http://cbblitz.com/cr.php as the link in an image to embed it.
I'm pulling from a pretty small list of names right now, but if anyone has names they'd like added, just add them in a comment to this FanShot, and I'll update the script.
Enjoy.
Showing 1 - 9 of 9

