
RSSUser Blog
Some numbered game open thread. Cards and Brewers
Hopefully the pitching will be better than last night.
Some Albert Observations
I must admit that I was not able to catch much of the game last night so my commentary on it is pretty much nil. What I do have for you is what seems like an annual, "What is Albert doing differently this year so far?" article. Last year I penned such an article for ESPN Insider. This year you guys will get the privilege of that article. The clear irony of the situation is that after deciding to pen this article 10 or so days ago Pujols decided to blow up against the Cubs and 'Stros. A similar thing happened after I decided to write last years piece (Cardinals and Albert, I'm available to be on the payroll next year if you want me to think of said article in April). All that aside I thought it still may be interesting and insightful to dig deeper into Albert's year so far using Pitch f/x. Results after the jump.
69 comments
|
5 recs |
Tweet
K-Mac and His Bullpen Replacements
First a public service announcement, Tango is running his playing time community projections over at the Book Blog. If you are so inclined head on over and fill it out.
No back to your regularly, errr not regularly I guess, scheduled programing.
There has already been a large amount of ink spilled on Kyle McClellan and his move to the starting rotation. Azru discussed on this very site here, I've dealt with it at GHG here, and my colleague Andy Beard has been keeping a running account of the 5th starter battle over at GHG. That said, I think there's still a little room for discussion on the topic. I'd like to expand on Kyle's repertoire a little bit using pitch fx and see what insights that can give us into his transition. I'd also like to spend a little time on his likely replacements in the bullpen.
As I see it there are a few primary differences between being a starter and a right handed reliever. Clearly the biggest is the need for within outing stamina. One of the parts of stamina is being able maintain velocity from the beginning of a start through many innings. Clearly we don't have any data on how Macs velocity will stand up after 5 or 6 innings but we can still look at how it holds over relief outing in the past 3 years.
723 comments
|
8 recs |
Tweet
Cards Starters and Velocity
We have all become accustomed to the Cardinal's pitching philosophy of ground ball inducing fastballs. In that context, velocity would seem to take a backseat to both movement (preferably down) and location (also down). This thought is backed up somewhat by the data as the Cards starters ranked in the middle of the pack in both 2010and 2009 in average fastball velocity. Despite those facts, I was curious to see what kind of impact velocity has on the Cards starting pitchers. Mike Fast showed that in a general sense that increased velocity means increased effectiveness; however, how that results applies to individual pitchers is a separate question. In order to answer it I'm going to look at a couple of different things
- Run value per 100 pitches (rv100) vs. fastball velocity
- How different fastball speeds affect an entire at-bat
424 comments
|
9 recs |
Tweet
PAH9: Scouts and Stats
Inspired by a lot of the discussion in TPGs recent posts detailing mechanics and the pushback from some stats guys; I looked at ways that the stat community could better leverage the scouting community.
Quick Aging Study
I'm a baseball guy trying my hand at a little hockey analysis. This likely falls short of the work that Gabe has already done on aging, but it does have a new wrinkle in that I look at ATOI. It's all done using the delta method with no correction for survivor bias yet.
Colby, Fastballs and Pitch FX
Everyone's favorite robot is taking in some minor league action this weekend, so he asked if I would step in for him.
About a week and a half ago commenter stl522 asked about Colby Rasmus's swing being geared for the inside fastball and mentioned that the scouting report on him was to attack with the breaking ball. To me that feels like a two part discussion
- Is that really the scouting report?
- How well does perception match results?
| Hitter | FB% |
|---|---|
| Colby Rasmus | 54% |
| Joe Mauer | 62% |
| Brain McCann | 54% |
| Kelly Johnson | 56% |
| Carl Crawford | 61% |
| Billy Butler | 56% |
| David Dejesus | 63% |
| Shin-Soo Choo | 58% |
Comments on the table and the data relating to question 2 are after the "read more" thingy
220 comments
|
7 recs |
Tweet
Adam Wainwrights Curve
I did a guest post at VEB using Pitch FX to break down Waino's curve
Anatomy of a Curveball
Hello everybody, with Dan on vacation the VEB higher ups were kind enough to ask if I would be willing to pinch hit for a day. I jumped at the chance to put some analysis in front of you guys.
As a quick comment on yesterday’s game, here is a chart on run distributions comparing 2009 and 2010 actual runs scored.
Clearly these are actual data points and don’t necessarily speak to true talent; however, it does confirm the common talking points. Yes the offense is scuffling. Yes the Cards have been shutout or held to one runs more frequently than last year. That being said that graph even looks "flukey" (and I'd venture to say not representative of the true talent of our offense). Hopefully the guys will get if figured out.
Being the optimist that I am I wanted to talk about something a little more uplifting though, Adam Wainwright’s curveball. I’ve always been fascinated by good 12-6 curveball be it Darryl Kile’s, Matt Morris’, Wainwright’s, or Carpenter's, but I haven’t put much analytical effort behind what makes them effective.
634 comments
|
8 recs |
Tweet
Quantifying the Impact of Defensive Uncertainty
Recently in the sabermetric community there has been a lot of discussion about fielding stats and their inclusion in WAR (see for example this thread, or this one at The Book blog) given the uncertainty behind the data (batted ball type, hit location etc.). With that in mind I thought it would be an interesting exercise to see how applying uncertainty to the defensive runs above average (DRAA) numbers affects the 2009 fWAR leaderboard. My method for applying the uncertainty is pretty simple; I just ran a Monte Carlo simulation using a normal distribution for the simulated DRAA with a mean of the DRAA reported by Fangraphs and a standard deviation of 5 runs. The following table looks at how often the top 10 players in fWAR fell into each of the top 10 slots after running the simulation 10000 times.
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
|---|---|---|---|---|---|---|---|---|---|---|
| Albert Pujols | 62% | 23% | 9% | 4% | 1% | 0% | 0% | 0% | 0% | 0% |
| Ben Zobrist | 22% | 36% | 22% | 11% | 5% | 2% | 1% | 1% | 0% | 0% |
| Joe Mauer | 12% | 24% | 29% | 17% | 9% | 5% | 2% | 1% | 0% | 0% |
| Chase Utley | 3% | 8% | 16% | 23% | 19% | 13% | 9% | 5% | 3% | 1% |
| Derek Jeter | 1% | 4% | 10% | 16% | 18% | 17% | 13% | 9% | 5% | 3% |
| Hanley Ramirez | 0% | 2% | 7% | 12% | 16% | 18% | 16% | 11% | 8% | 5% |
| Evan Longoria | 0% | 2% | 4% | 10% | 14% | 17% | 17% | 14% | 9% | 6% |
| Prince Fielder | 0% | 0% | 2% | 5% | 8% | 12% | 15% | 16% | 15% | 10% |
| Ryan Zimmerman | 0% | 0% | 1% | 2% | 4% | 7% | 11% | 15% | 16% | 15% |
| Adrian Gonzalez | 0% | 0% | 0% | 1% | 3% | 6% | 8% | 13% | 16% | 15% |
So if you buy my 5 run SD assumption then the impact on ordinal ranking is the above. Clearly the impact on overall WAR (and thus $/WAR) isn't captured in the above analysis.
This is just a quick look at the subject, but I think there may be more to uncover like looking at different fielding metrics in place of UZR. Either way it answered one of my questions, "What orders of magnitude are we talking about?"
Update: Here's the same table with a SD of 10 runs
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
|---|---|---|---|---|---|---|---|---|---|---|
| Albert Pujols | 38% | 21% | 14% | 9% | 6% | 4% | 3% | 2% | 1% | 1% |
| Ben Zobrist | 22% | 20% | 16% | 11% | 9% | 6% | 4% | 3% | 2% | 2% |
| Joe Mauer | 15% | 17% | 15% | 12% | 10% | 8% | 6% | 4% | 3% | 2% |
| Chase Utley | 8% | 11% | 12% | 11% | 10% | 9% | 8% | 6% | 5% | 4% |
| Derek Jeter | 5% | 8% | 10% | 10% | 10% | 9% | 8% | 7% | 6% | 5% |
| Hanley Ramirez | 4% | 7% | 8% | 9% | 10% | 9% | 8% | 7% | 6% | 5% |
| Evan Longoria | 3% | 6% | 7% | 9% | 9% | 9% | 8% | 7% | 7% | 5% |
| Prince Fielder | 2% | 4% | 5% | 7% | 7% | 8% | 8% | 7% | 7% | 6% |
| Ryan Zimmerman | 1% | 2% | 4% | 5% | 6% | 7% | 7% | 7% | 7% | 6% |
| Adrian Gonzalez | 1% | 2% | 3% | 5% | 5% | 6% | 6% | 7% | 6% | 6% |
Bigger version here. The ranges represent the rank order of players by RC (i.e. 1-3 are the top 3 players for that team in current RC). I personally thought the Rockies breakdown was interesting. Another, probably more useful, chart would be to rank by preseason projection (instead of current RC) and see how the percents change
ESPN Isider TMI: Pujols' "Struggles"
I have an article up on the ESPN.com TMI blog (Insider subscription required) that deals with Pujols not exactly being Pujolsian. Of course I wrote this the morning of the 2 HR game, so he's hit 3 since it's writing.
Full 5 Year Projection Data
Here's the full 5 year data behind the BtB 50 best series for all of the players that CHONE projected before the season. Would the community be interested in having something like this that is interactive? Have it where you can change the players wOBA and Defense and his 5 year WAR automatically changes?
I realized I didn't properly explain the data I posted yesterday (thanks to a couple of readers). The data is the top ~150 position players and ~100 pitchers. It's not the top 250 players (Have to wait a bit on that as I recreate the data). There are some position players that were better than the top 100 pitchers, but not included in the list. Sorry for the confusion.
BtBs 50 Best of the Next 5 Years: Wrap Up
With the BtB top 50 of the next 5 years wrapping up I thought now was a good opportunity to do a few things
1. Display some summary stats about the list.
2. Provide the data set so that all of you can look at it however you want.
3. Address some specific players brought up in the comments that didn't make the list.
4. Open it up for discussion about improvements / tweaks you guys would like to see if/when we do this again next year
Before we get to those, here's the set of links in case you missed any of the pieces
BtBs 50 Best of the Next 5 Years - Intro and Methodology
Over the next two weeks we will unveil our (and when I say our I really mean the data's) list of the 50 Best Players of the Next Five Years. You first question is probably, "How is this going to be any different than Fangraph's Trade Value Series?" The simple answer is that this list will ignore contractual status. This list approaches the problem from a "all the contracts in MLB have been ripped up and we're picking teams playground style" angle. With that premise in mind the goal was to come up with a data driven list rather than an author(s) opinion list. The particular data in question is 5-year projected WAR using inputs I will outline for everyone now.
For position players I used pre-season CHONE projections to derive wOBA which I then aged over the 5 years using results from MGL's aging study. For the defensive component I used my own defensive projections when available and CHONE for players that I hadn't projected. I did not account for position switches over the course of the five years that were projected. Playing time was the most difficult component to project. Since the list is meant to be a "playground style list" it made sense to me to give all position players "starter caliber" playing time. To that end I found the average of the top ten in PA by position over the last two years. I then averaged each players last two years of PA and regressed towards the positional average. The modeled PA was the maximum of the positional average and the regressed averaged. Applying that playing time across all of the WAR components leads to the overall WAR value .
For pitchers I used CHONE's context neutral ERA and an innings pitched projection that mirrors the PA projection above. The only difference is I looked at the top 80 starters and relievers to get the positional averages. For aging I leveraged this blog post from MGL. The cliff notes are that the curve is flat from 21-26 and then goes up 0.2 runs allowed per season.
A couple caveats worth mentioning
- The projections are the mean projections and do not address the uncertainty levels. This is especially important since we are projection a lot of young players where the uncertainty level is going to be very high.
- The same aging curves were applied to all players.
- The playing time estimations were not aged (i.e. same number of PAs/IPs over the 5 years).
Justin Morneau is sporting a wOBA right around 0.500 and plate discipline may be the driving factor. With that in mind I again steal an idea from Dave Allen. The red squares mean more swings in 2010, blue means less. Bolded numbers on the periphery are the pitch fx location values. The chart is from the catcher's perspective.
My attempt at a poor man's Dave Allen chart. It's 2010 swing percentage - 2009 swing percentage for Albert Pujols. Red means more swings, blue less. Full article here at PAH9.
Mariners at Rays BtB Series Preview
The Rays and Mariners both have teams that were projected to be phenomenal defensively. Unfortunately for the Mariners, that's about where the similarities end. The Rays have a pythagorean record of 25-9 (one better than their actual record) and were 3rd in the most recent BtB Power Rankings. On the other hand the Mariners have a pythagorean record of 14-20 (also one better than their actual) and were 25th in the BtB Rankings (one spot ahead of the Royals!)
The best place to see the divergence of the two teams is to compare the offenses. The following chart looks at projected wOBA by lineup slot for the respective teams. The lineups used were what I could glean from recent lineups and use preseason projections combined with this years data.
Clearly the Rays hold the edge (often substantially) at most spots in the order. These projections are mostly indicative of reality as well as the Rays have scored 75 more runs than the Mariners so far and have a 44 run edge in runs created (RC).
Yanks vs Sox Preview and Discussion Thread - Updated
Hey folks, right now this is mainly a placeholder for a series preview post that will be finished up after my little one goes to sleep tonight. I'd like to start a thread like this each week where we preview one series from the upcoming week using a some sabermetric principles and a simple simulation tool I've been working on developing. This week I picked the Yanks-Sox series, but I think in coming weeks I'll let you, the reader, vote on the series you want to see previewed. Ok, enough introductions. To hold you over until I get the full preview I present a graphic courtesy of our resident graphic guru Justin Bopp
The graph compares the lineups of the two teams using my home-brewed version of an updated CHONE projection. Sorry to have to cut this short right now, but feel free to use the comments as an open discussion thread for all things baseball, and I'll get back with the rest of the preview in a bit.
Ok the rest of the article will describe what the simulation tool does, and then look at the inputs and outputs for the Yanks - Sox series.
Updating Playoff Probabilities - CHONE
So the title is a little misleading as I never got around to publishing the preseason CHONE based playoff probabilities. That being the case you get a two for one on new information today, both the preseason CHONE playoff probabilites and an updated version based on games already played. I'm calculating the probabilities using a simulation I described at Fangraphs as
The simulation is a simple Monte Carlo that determines the winner of each game using random draws bounced up against log5 based winning percentages. For example, if we want to simulate the outcome of a game between Team A that has a 0.600 true talent win percentage and Team B that has a 0.450 win percentage, we first calculate the probability that A beats B using the log5 equation linked above. That calculation says that Team A should have a 0.647 winning percentage against Team B.
To simulate a game between these teams then, the simulation draws a random number between 0 and 1 and if the number is less than or equal to 0.647 then Team A wins, otherwise Team B wins. This process is repeated for all of the games for the entire season. Run the simulation for 10,000 such seasons and you have your results. Also built into the simulation is some up front uncertainty about the true talent win percentage. Before each of the 10,000 simulated seasons, the true talent win percentages for each team are varied slightly by using a random draw from a normal distribution centered at the input win percentage (which is based off of the projected standings) with a standard deviation of 0.030. For example, some seasons the Yankees will simulate as a 0.605 team, sometimes a 0.600 team and sometimes a 0.610 team. The standard deviation was derived through testing (read trial and error) and some of the comments in this thread at The Book Blog.
All that was needed to do the updating was an updated schedule and each teams current wins. For this iteration I kept the preseason true talent levels; however, in the future I would like to adjust those either using updated in-season projections or some weighted mean with the BtB power rankings.
Tango's Playing Time Survey
Tango is asking fans to fill out a community playing time projection for their teams. If you want to help out and haven't already, click the link above.
A Couple of Questions for Sig Mejdal
Sig Mejdal, the lead quantitative analyst for the Cards was kind enough to answer a few questions about his job for me over at PAH9. Thought the readership here would be interested as well.
A Couple of Questions for Sig Mejdal
Sig Mejdal, the lead quantitative analyst for the Cards was kind enough to answer a few questions about his job for me over at PAH9. Thought the readership here would be interested as well.
Playoff Probabilities Simulation - PECOTA Edition
With some of the projection systems also coming out with projections of the standings now's probably a good time to attach some playoff probabilities to the projections. I'll start with PECOTA.
My methodology is fairly simple. I created a Monte Carlo simulation that modeled team wins as a normal random variable with a mean of the projected wins and a variable standard deviation (not variable within a set of simulation runs but across unique sets of runs).
I ran the simulation using a 9 win SD, an 8 win SD, and an 8 win SD with the caveat that total team wins had to fall within projected + or - 20.
The results using a 9 win standard deviation are after the jump and a spreadsheet containing all of the results is linked at the end.
BtB Sabermetric Writing Award Results: Best Sabermetric Research or Writing Website
One of the beauties of the sabermetric community is the ongoing dialog amongst the vast majority of contributors. In that context, the community needs forums to have that dialog, and there's no better forum than the various sabermetric blogs/websites in existence. With that in mind we present the Best Sabermetric Research or Writing Website which was defined as
The best sabermetric blog or website of the year. Define "best" as you wish, though it should be focused on writing and/or research contributions.
Enough babbling by me... your winners are (drum roll please)....
Experimenting With Clustering - Offense
This post originated out of me asking myself, "Self, if you were going to delve into the world of projecting offense, how would you go about it?" My answer was that I’d take a basic Marcels approach and add in some additional regression/weighting based on batted ball (plus a little extra) profiles. That approach would require me to bin players based on batted ball profiles, so I immediately thought of k-means clustering using R. The rest of this post is my brief exploration of batted ball profile clustering.
Using Fangraph’s 2009 stats (filtered to just the qualifiers) I created clusters based on the following sets of statistics.
| LD | GB | FB | IFF | |
| LD | GB | FB | IFF | HR |
| LD | HR | BB | ||
| HR | BB | K | ||
| GB | FB | ISO | SPD | |
| BB | K |
IFF = In Field Fly, HR = HR/FB%
The full lists of clusters can be found here, and I’ll discuss some of the things I found interesting after the jump
Free Agent $/Win Based on Playoff Probability Added
There has been a lot of discussion this off season dealing with spending relative to where a team is on the win curve. My goal here is simply to do some back of the envelope math to set some ranges on the dollar values that teams should pay for a win based on where they are on the win curve.
ASSUMPTIONS
- Teams should pay a different cost per win based on how valuable that win will be to them. For this analysis I'm using playoff probability added (PPA) to define valuable. If you disagree with either of these then the rest of the article is probably not for you.
- I used 4.4M per win as the average market value. Changing this assumption wouldn't change the shape of the curves, just peaks and valleys.
- The model needs some type of salary floor in order to more closely model reality and take into consideration off the field issues. I'll be setting the floor as a percentage of the maximum suggested salary (i.e. if I set the floor at 75%, then the teams on the low and high end of the win curve will be modeled at no less than 75% of the $/win of the teams at the inflection point). I'll create curves for multiple values as I am uncertain what this number should be.
- This analysis uses historical playoff probabilities for projected/3rd order wins as the guiding metric. The next step would be to use division strength to assist in the PPA calculations.
- Since PPA is the guiding metric, improving a high win team for performance in the playoffs themselves is ignored. (The Crapshoot Corollary perhaps?)
25 comments
|
4 recs |
Tweet
Sabermetric Primer (an assortment of links)
In a thread yesterday there was a request for a sabermetrics primer, so I thought I'd take the lazy way out and just link some of the extraordinary work people have done in other places. I'll break this into two primary sections 1) A Saber 101 set of links for those that want to understand sabermetrics better 2) A Saber 201 for those that want to start doing some sabermetric research on their own.
Saber 101 Links
1) Alex Remington's series at yahoo sports. I must admit I haven't read through the entire series, but it has been highly recommended by people I respect a great deal. Up to this point he's covered the following
- BABIP Batting Average on Ball in Play
- OPS+ Adjusted On Base Plus Slugging compared to league average (the B-Ref way)
- FIP Fielding Independent Pitching
- wOBA Weighted On Base Average
- WPA Win Probability Added
- WAR Wins Above Replacement
Alex's series isn't done, so I'll update with additional links as he continues
2) Michael Jong's Sabermetrics 101 blog at fanhuddle. I especially reccomend the piece on Linear Weights. Throughout the articles he has and his glossaries there are a ton of other very good links.
3) Tango's stuff. Tango just recently answered sets of questions from folks that aren't convinced that sabermetrics isn't all it's cracked up to be. Both are good introductions to various topics, and create good discussion. First were ten questions from Mike Silva, and then there was a set of questions from a BCB member. Additionally Tango hosts a wiki that would be a good resource to peruse.
4) Fangraph's value series. Dave Cameron describes how Fangraph's goes about calulating WAR for hitters and pitchers to include defining replacement value and looking at position adjustments.
5) Pitch F/X.Our very own vivaelpujols had a great primer on it as a fanshot on this very site.
I'm sure there's a bunch more that folks will add in the comments, but this will be plenty to get you started.
Now for the Saber 201 stuff.
You probably need 2 things to start doing your own sabermetric research 1) data 2) analytical tools and I'll try to provide a set of links for both. First the data question
1) Fangraphs can provide a lot of data that the aspiring saberist needs. It's got its version of WAR, wOBA, UZR, pitch type linear weights, batted ball profiles, various projection systems, and even summary type pitch f/x data. It's a great place to start while your getting your feet wet.
2) Rally's historical WAR data. Want to compare Pujols to Musial? Here's where to start. You can purchase the whole database in csv format or search out the guys you want to look at for free. A lot of the Hall of Fame analysis on the saber side has been done using Rally's data.
3) An actual database. Colin describing the process for a PC, and Sky for a MAC. These methods both require you to learn SQL along the way, but are very valuable tools of the trade if you want to do any sort of complex querying of your data
4) Pitch F/X. For the non data-base inclined you can get individual game information from Brooks Baseball or do some more complex querying using this new tool and get an excel type output. [As RJ points out in the comments, I missed an important source here]. Texas Leaguers tool gives you the opportunity to generate reports for a specific pitcher. For those that want their own database, follow vivaelpujols' primer found here (make sure to read through the comments as Mike Fast comes by to help out).
That's probably enough (or even too much) to get you started. Now you need some tools to do the analysis
1) Spreadsheets. Probably the most basic tool in the toolbox of the saberist. Use excel or open office versions, whatever floats your boat. Most places allow you to download excel friendly data, so it's a fairly seamless transition.
2) Statistical packages. If excel doesn't have enough horsepower to do what you want then there are open source statistical packages that you can download and use. Both R and gretl are good places to start. R is more powerful, but has a slightly steeper learning curve. Gretl is a little easier, but not as powerful (I'm going on others opinions here as I haven't extensively used either, some R, not much gretl at all).
I think that's all I've got for now. Feel free to add your own links in the comments.
Also, I should mention H/Ts all around, notably Tango, the BtB crew (I grabbed a bunch from the nominations in the
sabermetric writing awards).
31 comments
|
11 recs |
Tweet
The Braves Off Season - Playoff Probability Edition
The Atlanta Braves have had a busy (and sometimes controversial) offseason as they revamped their bullpen, subtracted an arm from their surplus of starters, and picked up a few average-ish bats. The individual moves have been covered in depth by our outstanding group of writers here at BtB (Vasquez, Soriano) and by some of the heavy hitters in the sabr community (Cameron, Tango). My goal in this article is to wrap all of the moves together and examine the offense as a whole. My metric of choice is playoff probability added/subtracted, and I am going to look at two separate approaches to estimate the gains/losses. First I will leverage the work Nick did here that looked at historic playoff probabilities given certain true talent level wins. Second I will take a more contextual look at the problem by seeing how the moves affected the division race as it currently stack ups.
Background Work
Critical to both approaches is an estimate of the Braves true talent level in terms of wins both before the moves and after the moves. I used current CHONE projections for offense and pitching, my own defensive projections, and playing time estimates based on the Fangraphs fan’s projections to come up with WAR projections. Using this method I estimated 92.8 wins before the moves and 89.4 wins after them.
The results are after the jump
15 comments
|
1 recs |
Tweet
Showing 1 - 30 of 37 Older
by 


