Pro Quality. Fan Perspective.
Login-facebook
Around SBN: Man Dies After MMA Bout In South Dakota

St_louis_cardinals_ce1141_003263

stevesommer05

Jul 18, 2009 Mar 30, 2012 37 360

rss icon RSSUser Blog

Viva El Birdos Some numbered game open thread. Cards and Brewers


Hopefully the pitching will be better than last night.

533 comments  | 

Viva El Birdos Some Albert Observations

I must admit that I was not able to catch much of the game last night so my commentary on it is pretty much nil.  What I do have for you is what seems like an annual, "What is Albert doing differently this year so far?" article.  Last year I penned such an article for ESPN Insider.  This year you guys will get the privilege of that article.  The clear irony of the situation is that after deciding to pen this article 10 or so days ago Pujols decided to blow up against the Cubs and 'Stros.  A similar thing happened after I decided to write last years piece (Cardinals and Albert, I'm available to be on the payroll next year if you want me to think of said article in April).  All that aside I thought it still may be interesting and insightful to dig deeper into Albert's year so far using Pitch f/x.  Results after the jump.

Continue reading this post »

69 comments  |  5 recs | 

Viva El Birdos K-Mac and His Bullpen Replacements

First a public service announcement, Tango is running his playing time community projections over at the Book Blog. If you are so inclined head on over and fill it out.

No back to your regularly, errr not regularly I guess, scheduled programing.

There has already been a large amount of ink spilled on Kyle McClellan and his move to the starting rotation. Azru discussed on this very site here, I've dealt with it at GHG here, and my colleague Andy Beard has been keeping running account of the 5th starter battle over at GHG. That said, I think there's still a little room for discussion on the topic. I'd like to expand on Kyle's repertoire a little bit using pitch fx and see what insights that can give us into his transition. I'd also like to spend a little time on his likely replacements in the bullpen.

As I see it there are a few primary differences between being a starter and a right handed reliever. Clearly the biggest is the need for within outing stamina. One of the parts of stamina is being able maintain velocity from the beginning of a start through many innings. Clearly we don't have any data on how Macs velocity will stand up after 5 or 6 innings but we can still look at how it holds over relief outing in the past 3 years.

Continue reading this post »

723 comments  |  8 recs | 

Viva El Birdos Cards Starters and Velocity

Dave Duncan instructs Adam Wainwright to throw sinking fastballs down in the zone.

We have all become accustomed to the Cardinal's pitching philosophy of ground ball inducing fastballs.   In that context, velocity would seem to take a backseat to both movement (preferably down) and location (also down).  This thought is backed up somewhat by the data as the Cards starters ranked in the middle of the pack in both 2010and 2009 in average fastball velocity.  Despite those facts, I was curious to see what kind of impact velocity has on the Cards starting pitchers.  Mike Fast showed that in a general sense that increased velocity means increased effectiveness; however, how that results applies to individual pitchers is a separate question.  In order to answer it I'm going to look at a couple of different things

 

  1. Run value per 100 pitches (rv100)  vs. fastball velocity
  2. How different fastball speeds affect an entire at-bat

Continue reading this post »

424 comments  |  9 recs | 

Inspired by a lot of the discussion in TPGs recent posts detailing mechanics and the pushback from some stats guys; I looked at ways that the stat community could better leverage the scouting community.

over 1 year ago St_louis_cardinals_ce1141_003263_tiny stevesommer05 0 comments 2 recs

I'm a baseball guy trying my hand at a little hockey analysis. This likely falls short of the work that Gabe has already done on aging, but it does have a new wrinkle in that I look at ATOI. It's all done using the delta method with no correction for survivor bias yet.

over 1 year ago St_louis_cardinals_ce1141_003263_tiny stevesommer05 2 comments

Viva El Birdos Colby, Fastballs and Pitch FX

Everyone's favorite robot is taking in some minor league action this weekend, so he asked if I would step in for him.

About a week and a half ago commenter stl522 asked about Colby Rasmus's swing being geared for the inside fastball and mentioned that the scouting report on him was to attack with the breaking ball.   To me that feels like a two part discussion 

  1. Is that really the scouting report?
  2. How well does perception match results?
I'll make an attempt to provide data/insight into both questions and hopefully we'll be able to come up with some answers.  

To truly answer the first question I'd need access to scouts and unfortunately I have none (I find it hard to get access to scouts never leaving my mother's basement and all).  As a proxy to actual scouts I thought the best course of analysis would be to compare how many fastballs Colby is seeing compared to other notable LH hitters with similar wOBAs.  The following table summarizes that research

Hitter FB%
Colby Rasmus 54%
Joe Mauer 62%
Brain McCann 54%
Kelly Johnson 56%
Carl Crawford 61%
Billy Butler 56%
David Dejesus 63%
Shin-Soo Choo 58%

Comments on the table and the data relating to question 2 are after the "read more" thingy

Continue reading this post »

220 comments  |  7 recs | 

I did a guest post at VEB using Pitch FX to break down Waino's curve

almost 2 years ago St_louis_cardinals_ce1141_003263_tiny stevesommer05 0 comments

Viva El Birdos Anatomy of a Curveball

Hello everybody, with Dan on vacation the VEB higher ups were kind enough to ask if I would be willing to pinch hit for a day.  I jumped at the chance to put some analysis in front of you guys.

As a quick comment on yesterday’s game, here is a chart on run distributions comparing 2009 and 2010 actual runs scored.

Run_distro_medium

Clearly these are actual data points and don’t necessarily speak to true talent; however, it does confirm the common talking points.  Yes the offense is scuffling.  Yes the Cards have been shutout or held to one runs more frequently than last year.  That being said that graph even looks "flukey" (and I'd venture to say not representative of the true talent of our offense).  Hopefully the guys will get if figured out.

Being the optimist that I am I wanted to talk about something a little more uplifting though, Adam Wainwright’s curveball.  I’ve always been fascinated by good 12-6 curveball be it Darryl Kile’s, Matt Morris’, Wainwright’s, or Carpenter's, but I haven’t put much analytical effort behind what makes them effective.

Continue reading this post »

634 comments  |  8 recs | 

Beyond the Box Score Quantifying the Impact of Defensive Uncertainty

Recently in the sabermetric community there has been a lot of discussion about fielding stats and their inclusion in WAR (see for example this thread, or this one at The Book blog) given the uncertainty behind the data (batted ball type, hit location etc.). With that in mind I thought it would be an interesting exercise to see how applying uncertainty to the defensive runs above average (DRAA) numbers affects the 2009 fWAR leaderboard. My method for applying the uncertainty is pretty simple; I just ran a Monte Carlo simulation using a normal distribution for the simulated DRAA with a mean of the DRAA reported by Fangraphs and a standard deviation of 5 runs. The following table looks at how often the top 10 players in fWAR fell into each of the top 10 slots after running the simulation 10000 times.

 

1 2 3 4 5 6 7 8 9 10
Albert Pujols 62% 23% 9% 4% 1% 0% 0% 0% 0% 0%
Ben Zobrist 22% 36% 22% 11% 5% 2% 1% 1% 0% 0%
Joe Mauer 12% 24% 29% 17% 9% 5% 2% 1% 0% 0%
Chase Utley 3% 8% 16% 23% 19% 13% 9% 5% 3% 1%
Derek Jeter 1% 4% 10% 16% 18% 17% 13% 9% 5% 3%
Hanley Ramirez 0% 2% 7% 12% 16% 18% 16% 11% 8% 5%
Evan Longoria 0% 2% 4% 10% 14% 17% 17% 14% 9% 6%
Prince Fielder 0% 0% 2% 5% 8% 12% 15% 16% 15% 10%
Ryan Zimmerman 0% 0% 1% 2% 4% 7% 11% 15% 16% 15%
Adrian Gonzalez 0% 0% 0% 1% 3% 6% 8% 13% 16% 15%

 

So if you buy my 5 run SD assumption then the impact on ordinal ranking is the above.  Clearly the impact on overall WAR (and thus $/WAR) isn't captured in the above analysis.

 

This is just a quick look at the subject, but I think there may be more to uncover like looking at different fielding metrics in place of UZR.  Either way it answered one of my questions, "What orders of magnitude are we talking about?"

Update:  Here's the same table with a SD of 10 runs

1 2 3 4 5 6 7 8 9 10
Albert Pujols 38% 21% 14% 9% 6% 4% 3% 2% 1% 1%
Ben Zobrist 22% 20% 16% 11% 9% 6% 4% 3% 2% 2%
Joe Mauer 15% 17% 15% 12% 10% 8% 6% 4% 3% 2%
Chase Utley 8% 11% 12% 11% 10% 9% 8% 6% 5% 4%
Derek Jeter 5% 8% 10% 10% 10% 9% 8% 7% 6% 5%
Hanley Ramirez 4% 7% 8% 9% 10% 9% 8% 7% 6% 5%
Evan Longoria 3% 6% 7% 9% 9% 9% 8% 7% 7% 5%
Prince Fielder 2% 4% 5% 7% 7% 8% 8% 7% 7% 6%
Ryan Zimmerman 1% 2% 4% 5% 6% 7% 7% 7% 7% 6%
Adrian Gonzalez 1% 2% 3% 5% 5% 6% 6% 7% 6% 6%

8 comments  | 

Rc_by_range_2

Bigger version here. The ranges represent the rank order of players by RC (i.e. 1-3 are the top 3 players for that team in current RC). I personally thought the Rockies breakdown was interesting. Another, probably more useful, chart would be to rank by preseason projection (instead of current RC) and see how the percents change

almost 2 years ago St_louis_cardinals_ce1141_003263_tiny stevesommer05 0 comments

I have an article up on the ESPN.com TMI blog (Insider subscription required) that deals with Pujols not exactly being Pujolsian. Of course I wrote this the morning of the 2 HR game, so he's hit 3 since it's writing.

almost 2 years ago St_louis_cardinals_ce1141_003263_tiny stevesommer05 9 comments 1 recs

Here's the full 5 year data behind the BtB 50 best series for all of the players that CHONE projected before the season. Would the community be interested in having something like this that is interactive? Have it where you can change the players wOBA and Defense and his 5 year WAR automatically changes?

almost 2 years ago St_louis_cardinals_ce1141_003263_tiny stevesommer05 1 comment

I realized I didn't properly explain the data I posted yesterday (thanks to a couple of readers). The data is the top ~150 position players and ~100 pitchers. It's not the top 250 players (Have to wait a bit on that as I recreate the data). There are some position players that were better than the top 100 pitchers, but not included in the list. Sorry for the confusion.

almost 2 years ago St_louis_cardinals_ce1141_003263_tiny stevesommer05 0 comments

Beyond the Box Score BtBs 50 Best of the Next 5 Years: Wrap Up

 

With the BtB top 50 of the next 5 years wrapping up I thought now was a good opportunity to do a few things

1.      Display some summary stats about the list.

2.      Provide the data set so that all of you can look at it however you want.

3.      Address some specific players brought up in the comments that didn't make the list.

4.      Open it up for discussion about improvements / tweaks you guys would like to see if/when we do this again next year

Before we get to those, here's the set of links in case you missed any of the pieces

Methodology

50-41

40-31

30-21

20-11

10-6

5-1

Continue reading this post »

16 comments  | 

Beyond the Box Score BtBs 50 Best of the Next 5 Years - Intro and Methodology


Over the next two weeks we will unveil our (and when I say our I really mean the data's) list of the 50 Best Players of the Next Five Years.  You first question is probably, "How is this going to be any different than Fangraph's Trade Value Series?"  The simple answer is that this list will ignore contractual status.  This list approaches the problem from a "all the contracts in MLB have been ripped up and we're picking teams playground style" angle.  With that premise in mind the goal was to come up with a data driven list rather than an author(s) opinion list.  The particular data in question is 5-year projected WAR using inputs I will outline for everyone now.

For position players I used pre-season CHONE projections to derive wOBA which I then aged over the 5 years using results from MGL's aging study.  For the defensive component I used my own defensive projections when available and CHONE for players that I hadn't projected.  I did not account for position switches over the course of the five years that were projected.  Playing time was the most difficult component to project.  Since the list is meant to be a "playground style list" it made sense to me to give all position players "starter caliber" playing time.  To that end I found the average of the top ten in PA by position over the last two years.  I then averaged each players last two years of PA and regressed towards the positional average.  The modeled PA was the maximum of the positional average and the regressed averaged.  Applying that playing time across all of the WAR components leads to the overall WAR value .

For pitchers I used CHONE's context neutral ERA and an innings pitched projection that mirrors the PA projection above.  The only difference is I looked at the top 80 starters and relievers to get the positional averages.  For aging I leveraged this blog post from MGL.  The cliff notes are that the curve is flat from 21-26 and then goes up 0.2 runs allowed per season.

A couple caveats worth mentioning

  1. The projections are the mean projections and do not address the uncertainty levels.  This is especially important since we are projection a lot of young players where the uncertainty level is going to be very high.
  2. The same aging curves were applied to all players.
  3. The playing time estimations were not aged (i.e. same number of PAs/IPs over the 5 years).
Even with those caveats, I still think the method will generate a pretty solid list that will be well worth discussing, so please stop by and discuss what the computer has spit out.  I doubt anyone (even me) will agree with the list in it's entirety so come back with good points and counterpoints about who should be included and excluded.  The schedule for posts will look something like this

Monday May 31 - Intro and Methodology
Tuesday June 1 - Players 50-41
Wednesday June 2nd - Players 40-31
Thursday June 3rd - Players 30-21
Monday June 7th - Players 20-11
Tuesday June 8th - Players 10-1
Wednesday June 9th - Wrap Up (Interesting tidbits and a full data dump)

6 comments  | 

Morneau

Justin Morneau is sporting a wOBA right around 0.500 and plate discipline may be the driving factor. With that in mind I again steal an idea from Dave Allen. The red squares mean more swings in 2010, blue means less. Bolded numbers on the periphery are the pitch fx location values. The chart is from the catcher's perspective.

about 2 years ago St_louis_cardinals_ce1141_003263_tiny stevesommer05 0 comments 1 recs

Pujols

My attempt at a poor man's Dave Allen chart. It's 2010 swing percentage - 2009 swing percentage for Albert Pujols. Red means more swings, blue less. Full article here at PAH9.

about 2 years ago St_louis_cardinals_ce1141_003263_tiny stevesommer05 2 comments 2 recs

Beyond the Box Score Mariners at Rays BtB Series Preview


The Rays and Mariners both have teams that were projected to be phenomenal defensively.  Unfortunately for the Mariners, that's about where the similarities end.  The Rays have a pythagorean record of 25-9 (one better than their actual record) and were 3rd in the most recent BtB Power Rankings.  On the other hand the Mariners have a pythagorean record of 14-20 (also one better than their actual) and were 25th in the BtB Rankings (one spot ahead of the Royals!)

 

The best place to see the divergence of the two teams is to compare the offenses.  The following chart looks at projected wOBA by lineup slot for the respective teams.  The lineups used were what I could glean from recent lineups and use preseason projections combined with this years data.

Sea_tba_off_medium

Clearly the Rays hold the edge (often substantially) at most spots in the order.  These projections are mostly indicative of reality as well as the Rays have scored 75 more runs than the Mariners so far and have a 44 run edge in runs created (RC).

Continue reading this post »

15 comments  | 

Beyond the Box Score Yanks vs Sox Preview and Discussion Thread - Updated


Hey folks, right now this is mainly a placeholder for a series preview post that will be finished up after my little one goes to sleep tonight.  I'd like to start a thread like this each week where we preview one series from the upcoming week using a some sabermetric principles and a simple simulation tool I've been working on developing.  This week I picked the Yanks-Sox series, but I think in coming weeks I'll let you, the reader, vote on the series you want to see previewed.  Ok, enough introductions.  To hold you over until I get the full preview I present a graphic courtesy of our resident graphic guru Justin Bopp

Ny_vs_bos_05-06-10_medium

The graph compares the lineups of the two teams using my home-brewed version of an updated CHONE projection.  Sorry to have to cut this short right now, but feel free to use the comments as an open discussion thread for all things baseball, and I'll get back with the rest of the preview in a bit.

Ok the rest of the article will describe what the simulation tool does, and then look at the inputs and outputs for the Yanks - Sox series.

Continue reading this post »

1 comment  |  1 recs | 

Beyond the Box Score Updating Playoff Probabilities - CHONE

So the title is a little misleading as I never got around to publishing the preseason CHONE based playoff probabilities.  That being the case you get a two for one on new information today, both the preseason CHONE playoff probabilites and an updated version based on games already played.  I'm calculating the probabilities using a simulation I described at Fangraphs as 

The simulation is a simple Monte Carlo that determines the winner of each game using random draws bounced up against log5 based winning percentages. For example, if we want to simulate the outcome of a game between Team A that has a 0.600 true talent win percentage and Team B that has a 0.450 win percentage, we first calculate the probability that A beats B using the log5 equation linked above. That calculation says that Team A should have a 0.647 winning percentage against Team B.

To simulate a game between these teams then, the simulation draws a random number between 0 and 1 and if the number is less than or equal to 0.647 then Team A wins, otherwise Team B wins. This process is repeated for all of the games for the entire season. Run the simulation for 10,000 such seasons and you have your results. Also built into the simulation is some up front uncertainty about the true talent win percentage. Before each of the 10,000 simulated seasons, the true talent win percentages for each team are varied slightly by using a random draw from a normal distribution centered at the input win percentage (which is based off of the projected standings) with a standard deviation of 0.030. For example, some seasons the Yankees will simulate as a 0.605 team, sometimes a 0.600 team and sometimes a 0.610 team. The standard deviation was derived through testing (read trial and error) and some of the comments in this thread at The Book Blog.

 

All that was needed to do the updating was an updated schedule and each teams current wins.  For this iteration I kept the preseason true talent levels; however, in the future I would like to adjust those either using updated in-season projections or some weighted mean with the BtB power rankings.  

Continue reading this post »

0 comments  | 

Tango is asking fans to fill out a community playing time projection for their teams. If you want to help out and haven't already, click the link above.

about 2 years ago St_louis_cardinals_ce1141_003263_tiny stevesommer05 0 comments

Sig Mejdal, the lead quantitative analyst for the Cards was kind enough to answer a few questions about his job for me over at PAH9. Thought the readership here would be interested as well.

over 2 years ago St_louis_cardinals_ce1141_003263_tiny stevesommer05 3 comments 2 recs

Sig Mejdal, the lead quantitative analyst for the Cards was kind enough to answer a few questions about his job for me over at PAH9. Thought the readership here would be interested as well.

over 2 years ago St_louis_cardinals_ce1141_003263_tiny stevesommer05 0 comments

Beyond the Box Score Playoff Probabilities Simulation - PECOTA Edition

With some of the projection systems also coming out with projections of the standings now's probably a good time to attach some playoff probabilities to the projections.  I'll start with PECOTA.

My methodology is fairly simple.  I created a Monte Carlo simulation that modeled team wins as a normal random variable with a mean of the projected wins and a variable standard deviation (not variable within a set of simulation runs but across unique sets of runs).  

I ran the simulation using a 9 win SD, an 8 win SD, and an 8 win SD with the caveat that total team wins had to fall within projected + or - 20.

The results using a 9 win standard deviation are after the jump and a spreadsheet containing all of the results is linked at the end. 

Continue reading this post »

23 comments  | 

Beyond the Box Score BtB Sabermetric Writing Award Results: Best Sabermetric Research or Writing Website

 

One of the beauties of the sabermetric community is the ongoing dialog amongst the vast majority of contributors.  In that context, the community needs forums to have that dialog, and there's no better forum than the various sabermetric blogs/websites in existence.  With that in mind we present the Best Sabermetric Research or Writing Website which was defined as

The best sabermetric blog or website of the year. Define "best" as you wish, though it should be focused on writing and/or research contributions.

Enough babbling by me...  your winners are (drum roll please)....

Continue reading this post »

3 comments  | 

Beyond the Box Score Experimenting With Clustering - Offense

 

This post originated out of me asking myself, "Self, if you were going to delve into the world of projecting offense, how would you go about it?"  My answer was that I’d take a basic Marcels approach and add in some additional regression/weighting based on batted ball (plus a little extra) profiles.  That approach would require me to bin players based on batted ball profiles, so I immediately thought of k-means clustering using R.  The rest of this post is my brief exploration of batted ball profile clustering.

Using Fangraph’s 2009 stats (filtered to just the qualifiers) I created clusters based on the following sets of statistics.  

 

LD GB FB IFF
LD GB FB IFF HR
LD HR BB
HR BB K
GB FB ISO SPD
BB K

 

IFF = In Field Fly, HR = HR/FB%

The full lists of clusters can be found here, and I’ll discuss some of the things I found interesting after the jump

Continue reading this post »

12 comments  | 

Beyond the Box Score Free Agent $/Win Based on Playoff Probability Added

 

There has been a lot of discussion this off season dealing with spending relative to where a team is on the win curve.  My goal here is simply to do some back of the envelope math to set some ranges on the dollar values that teams should pay for a win based on where they are on the win curve. 

ASSUMPTIONS

  1. Teams should pay a different cost per win based on how valuable that win will be to them.  For this analysis I'm using playoff probability added (PPA) to define valuable.  If you disagree with either of these then the rest of the article is probably not for you.
  2. I used 4.4M per win as the average market value.  Changing this assumption wouldn't change the shape of the curves, just peaks and valleys.
  3. The model needs some type of salary floor in order to more closely model reality and take into consideration off the field issues.  I'll be setting the floor as a percentage of the maximum suggested salary (i.e. if I set the floor at 75%, then the teams on the low and high end of the win curve will be modeled at no less than 75% of the $/win of the teams at the inflection point).  I'll create curves for multiple values as I am uncertain what this number should be.
  4. This analysis uses historical playoff probabilities for projected/3rd order wins as the guiding metric.  The next step would be to use division strength to assist in the PPA calculations.
  5. Since PPA is the guiding metric, improving a high win team for performance in the playoffs themselves is ignored.  (The Crapshoot Corollary perhaps?) 

Continue reading this post »

25 comments  |  4 recs | 

Viva El Birdos Sabermetric Primer (an assortment of links)




In a thread yesterday there was a request for a sabermetrics primer, so I thought I'd take the lazy way out and just link some of the extraordinary work people have done in other places.  I'll break this into two primary sections 1) A Saber 101 set of links for those that want to understand sabermetrics better 2) A Saber 201 for those that want to start doing some sabermetric research on their own.

 

Saber 101 Links

1)  Alex Remington's series at yahoo sports.  I must admit I haven't read through the entire series, but it has been highly recommended by people I respect a great deal.  Up to this point he's covered the following

  • BABIP Batting Average on Ball in Play
  • OPS+ Adjusted On Base Plus Slugging compared to league average (the B-Ref way)
  • FIP Fielding Independent Pitching
  • wOBA Weighted On Base Average
  • WPA Win Probability Added
  • WAR Wins Above Replacement

Alex's series isn't done, so I'll update with additional links as he continues

 

2)  Michael Jong's Sabermetrics 101 blog at fanhuddle.  I especially reccomend the piece on Linear Weights.  Throughout the articles he has and his glossaries there are a ton of other very good links.

 

3)  Tango's stuff.  Tango just recently answered sets of questions from folks that aren't convinced that sabermetrics isn't all it's cracked up to be.  Both are good introductions to various topics, and create good discussion.  First were ten questions from Mike Silva, and then there was a set of questions from a BCB member.  Additionally Tango hosts a wiki that would be a good resource to peruse.

 

4)  Fangraph's value series.  Dave Cameron describes how Fangraph's goes about calulating WAR for hitters and pitchers to include defining replacement value and looking at position adjustments.

 

5)  Pitch F/X.Our very own vivaelpujols had a great primer on it as a fanshot on this very site.

 

I'm sure there's a bunch more that folks will add in the comments, but this will be plenty to get you started.

 

Now for the Saber 201 stuff.

You probably need 2 things to start doing your own sabermetric research 1) data 2) analytical tools and I'll try to provide a set of links for both.  First the data question

 

1)  Fangraphs can provide a lot of data that the aspiring saberist needs.  It's got its version of WAR, wOBA, UZR, pitch type linear weights, batted ball profiles, various projection systems, and even summary type pitch f/x data.  It's a great place to start while your getting your feet wet.

 

2)  Rally's historical WAR data.  Want to compare Pujols to Musial?  Here's where to start.  You can purchase the whole database in csv format or search out the guys you want to look at for free.  A lot of the Hall of Fame analysis on the saber side has been done using Rally's data.

 

3)  An actual database.  Colin describing the process for a PC, and Sky for a MAC.  These methods both require you to learn SQL along the way, but are very valuable tools of the trade if you want to do any sort of complex querying of your data

 

4)  Pitch F/X.  For the non data-base inclined you can get individual game information from Brooks Baseball or do some more complex querying using this new tool and get an excel type output.  [As RJ points out in the comments, I missed an important source here].  Texas Leaguers tool gives you the opportunity to generate reports for a specific pitcher.  For those that want their own database, follow vivaelpujols' primer found here (make sure to read through the comments as Mike Fast comes by to help out).

 

That's probably enough (or even too much) to get you started.  Now you need some tools to do the analysis

 

1)  Spreadsheets.  Probably the most basic tool in the toolbox of the saberist.  Use excel or open office versions, whatever floats your boat.  Most places allow you to download excel friendly data, so it's a fairly seamless transition.

 

2)  Statistical packages.  If excel doesn't have enough horsepower to do what you want then there are open source statistical packages that you can download and use.  Both R and gretl are good places to start.  R is more powerful, but has a slightly steeper learning curve.  Gretl is a little easier, but not as powerful (I'm going on others opinions here as I haven't extensively used either, some R, not much gretl at all).

 

I think that's all I've got for now.  Feel free to add your own links in the comments.

 

Also, I should mention H/Ts all around, notably Tango, the BtB crew (I grabbed a bunch from the nominations in the
sabermetric writing awards).

31 comments  |  11 recs | 

Beyond the Box Score The Braves Off Season - Playoff Probability Edition

 

The Atlanta Braves have had a busy (and sometimes controversial) offseason as they revamped their bullpen, subtracted an arm from their surplus of starters, and picked up a few average-ish bats.  The individual moves have been covered in depth by our outstanding group of writers here at BtB (VasquezSoriano) and by some of the heavy hitters in the sabr community (CameronTango).  My goal in this article is to wrap all of the moves together and examine the offense as a whole.  My metric of choice is playoff probability added/subtracted, and I am going to look at two separate approaches to estimate the gains/losses.  First I will leverage the work Nick did here that looked at historic playoff probabilities given certain true talent level wins.  Second I will take a more contextual look at the problem by seeing how the moves affected the division race as it currently stack ups.

 

Background Work

Critical to both approaches is an estimate of the Braves true talent level in terms of wins both before the moves and after the moves. I used current CHONE projections for offense and pitching, my own defensive projections, and playing time estimates based on the Fangraphs fan’s projections to come up with WAR projections.  Using this method I estimated 92.8 wins before the moves and 89.4 wins after them.

The results are after the jump

Continue reading this post »

15 comments  |  1 recs |