<rss version="2.0">
  <channel>
    <title>SB Nation User Blog:  David Gassko</title>
    <link>http://www.sbnation.com/users/David%20Gassko</link>
    <description>Posts made by David Gassko on SB Nation</description>
    <item>
      <title>Hardball Times Annual 2007
</title>
      <link>http://www.beyondtheboxscore.com/2006/11/13/35617/404</link>
      <author>David Gassko</author>
      <pubDate>Mon, 13 Nov 2006 08:56:17 -0000</pubDate>
      <description type="html">
&lt;p&gt;David Studenmund has posted &lt;a href="http://www.hardballtimes.com/main/article/the-hardball-times-annual-2007/"&gt;an article detailing the essays and statistics contained in The Hardball Times Annual 2007.&lt;/a&gt; I have seen the final product, and it's spectacular (I know I'm biased, but honestly, it is). If you like The Hardball Times website, or if you simply like great baseball writing and research, I suggest you buy the Annual...now.&lt;/p&gt;



  

  


      </description>
    </item>
    <item>
      <title>Is David Ortiz Clutch?
</title>
      <link>http://www.beyondtheboxscore.com/2006/8/6/132647/3529</link>
      <author>David Gassko</author>
      <pubDate>Sun, 06 Aug 2006 17:26:47 -0000</pubDate>
      <description type="html">
&lt;p&gt;I have an article up &lt;a href="http://stats.mostvaluablenetwork.com/general/is-david-ortiz-a-clutch-hitter/"&gt;on my blog&lt;/a&gt; on what the numbers say.&lt;/p&gt;



  

  


      </description>
    </item>
    <item>
      <title>Determining the Best Runs/Win Formula
</title>
      <link>http://www.beyondtheboxscore.com/2006/6/25/105053/348</link>
      <author>David Gassko</author>
      <pubDate>Sun, 25 Jun 2006 15:01:03 -0000</pubDate>
      <description type="html">
&lt;p&gt;At some point Marc is going to start yelling at me for posting these technical, boring posts on BtB. Until he does, I'll continue writing about things the best runs/win estimators, which is the topic of today's post. In a nutshell, the importance of these methods (and there are a bunch) is to best convert a player's marginal runs (like runs above average or runs above replacement) into wins (above average or above replacement). Runs are a nice measurement, but baseball is all about wins and losses.&lt;/p&gt;



  &lt;p&gt;The most famous, and often-used runs-to-wins converter is Pete Palmer's, published in The Hidden Game of Baseball, and used in his Batter-Fielder Wins system. That converter is runs/win = 10*SQRT(RPG/9). RPG is runs per game, the average number of runs scored in a game. For example, in last year's American League, the average team scored 4.76 runs per game (and allowed the same), meaning that the RPG in the 2005 AL was 9.52. Plugging that into Palmer's runs/win converter, we get 10.28, meaning that the average team would need to score 10.28 extra runs to get one extra win, according to Palmer.&lt;/p&gt;&lt;p&gt;Is that right? We'll get to that in a second. First, let's quickly run down the other runs/win converter's out there. There are three in particular that I am familiar with, one from BaseRuns creator David Smyth, and two from noted baseball stat guy Tangotiger. Smyth's formula is the simplest, it's simply runs/win = RPG. It's not mean to be exactly correct, but just easy to use. In fact, if we look at last year's AL, it tells us that we would need 9.52 marginal runs to add an extra wins, which isn't all that different from Palmer's formula (though certainly, even though the difference between the two won't be more than half-a-win for any player, half-a-win is still worth almost $2 million on the free agent market).&lt;/p&gt;&lt;p&gt;Tangotiger's two formulas are pretty similar. The first is runs/win = .8*RPG + 2.4. That gives us a runs/win value of 10.02 for last year's AL, right between Smyth and Palmer, though closer to the latter. The second is runs/win = .7*(RPG + 5), which gives us a runs/win value of 10.16. So both of Tangotiger's formulas are closer to Palmer, though they both give lower values than the Hidden Game of Baseball author.&lt;/p&gt;&lt;p&gt;Now what is the correct value? How about I tell you first and then I explain? The correct value happens to be 9.97, which is best approximated by Tango's first formula. How did I determine that? Simple (or not so simple, depending on how you feel about calculating Pythagorean records with custom exponents).&lt;/p&gt;&lt;p&gt;The Pythagorean record is a team's expected record based on the number of runs it scores and allows. It takes the following form: W% = Runs Scored^Exponent/(Runs Scored^Exponent + Runs Allowed^Exponent). When Bill James developed the Pythagorean formula, he simply used an exponent of two, however since then, it has been shown that there are better exponents, and that the exponent is in fact dependent on the run environment. To determine the correct exponent based on the run environment, Smyth and a stat wonk who goes by US Patriot developed the Pythgenpat formula which is Exponent = RPG^.287. Exponents such between .278 and .287 have all been shown to work as well, but I like to stick to .287. It doesn't really change your answer very much, no matter which exponent you use.&lt;/p&gt;&lt;p&gt;Anyways, what's cool is that using this formula, we can determine the correct amount of runs it takes to gain a marginal win in any run environment. Here's the simplest (though not-quite mathematically correct, but so close it's more than close enough) way to do so: Take an average team in that run environment. For last year's American League, that would be a team that scores 4.76 runs a game and allows the same. Now add a very small total of runs, say .001 to its offense. How many more games will it win? Well, doing the math, we expect the team to have a .5001 W%, rather than be .500 (we actually need to use more decimal points for maximum accuracy). So that's .0001 more wins than expected. So how many runs would we have to add to get &lt;b&gt;one&lt;/b&gt; more than expected? Well, simply divide 1 by .0001 and you get 10,000. Then multiply that by .001 (because remember, we added .001 runs to the offense, so really what this is saying is that we would need to add .001 runs 10,000 times for an extra win). The answer is 10. You would need to score 10 extra runs in a game last year in the American League to win one extra game. Remember, the actual answer is 9.97, if don't do any rounding.&lt;/p&gt;&lt;p&gt;Using this method, and the other estimators mentioned, I've done the math for every run environment between 1 and 20 RPG. Here is a graph of how the estimator's compare:&lt;/p&gt;&lt;p&gt;&lt;img src="http://beyondtheboxscore.com/images/admin/RPW.GIF" /&gt;&lt;/p&gt;&lt;p&gt;You can see that they are all very close when it comes to run environments that baseball is actually played in, which is why they are all usable. Nevertheless, their weaknesses are obvious if we look at the graph. Every estimator except for Smyth is too high at very low RPG levels, while Smyth is way too low. On the other hand, Smyth's estimator over-predicts once we get past 11 RPG. This is because his formula is linear, while the number of runs it takes to get a marginal win is not. Nevertheless, at least his formula is simple.&lt;/p&gt;&lt;p&gt;Palmer's formula is also terrible for weird RPG ranges. It over-predicts the number of runs needed to add a marginal win at the low RPG ranges, and way under-predicts at high RPG ranges. Essentially, it is only usable in normal ranges (though on the other hand, that really is the only place we ever use these formulas anyways).&lt;/p&gt;&lt;p&gt;Tangotiger's two formulas hold up better, though they have their own problems. His first formula gets very close to the true number at 4 or 5 RPG, but it begins to drift away at around 13. His second formula over-predicts badly at low RPG ranges, but is very close to the truth at high RPG ranges.&lt;/p&gt;&lt;p&gt;However, since you're really only going to use any of these formulas to evaluate players playing in real contexts, let's look at how closely these formulas track the truth in real run environments, between 8 and 11 RPG:&lt;/p&gt;&lt;p&gt;&lt;img src="http://beyondtheboxscore.com/images/admin/RPW2.GIF" /&gt;&lt;/p&gt;&lt;p&gt;First, it's interesting to note how all the formulas converge at 11 RPG. You can see that Smyth's formula, while the simplest, is also the worst. Palmer's formula isn't as good as I would have thought it to be either. Tango's second formula is better, but his first takes the cake. It tracks the true number almost exactly. So when you want to convert a player's marginal runs into wins, it's best to use runs/win = .8*RPG + 2.4, in lieu of the true number.&lt;/p&gt;&lt;p&gt;In reality, however, none of these runs/win formulas are going to give you a very exact answer, and here's why. Each of these formulas (including the correct one) are based in an average context. They answer the question, "how many runs do you need to score to give an average team one extra win?" However, a player affects his own context, and if it's a good player, his affect is large enough to screw up these calculations. We added .001 runs to determine the correct formula; Albert Pujols adds 70 runs. Pedro Martinez &lt;i&gt;takes away&lt;/i&gt; 30 runs. By virtue of being themselves, Pujols and Pedro change their teams' runs/win converters. In reality, we need to account for that fact as well. But that's the topic of a whole other article.&lt;/p&gt;


  


      </description>
    </item>
    <item>
      <title>BABIP and DIPS 3.0
</title>
      <link>http://www.beyondtheboxscore.com/2006/5/22/01353/5180</link>
      <author>David Gassko</author>
      <pubDate>Mon, 22 May 2006 04:13:53 -0000</pubDate>
      <description type="html">
&lt;p&gt;Marc wants me to introduce myself, so I guess I should start by saying that my name is David Gassko, as my handle here might have given away. I write a weekly column for &lt;a href="http://hardballtimes.com/"&gt;The Hardball Times&lt;/a&gt;, and also have been blogging at &lt;a href="http://stats.mostvaluablenetwork.com/"&gt;Statistically Speaking&lt;/a&gt; for almost a year now. I'll be writing here at BtB now as well, semi-regularly. I hope you enjoy my writing, and if not, I hope you enjoy my wrath.&lt;/p&gt;



  &lt;p&gt;Last August, I published an article updating Voros McCracken's Defensive Independent Pitching Statistic (DIPS), calling my system DIPS 3.0. To recap quickly, what McCracken found was that individual pitchers seemed to show little control over what happens to balls put into play. Instead, he found, pitchers only seem to have much control over the &lt;i&gt;defensive independent&lt;/i&gt; categories--strikeouts, walks, and to a lesser extent, home runs.&lt;/p&gt;&lt;p&gt;I took his idea one step further (a step that was actually originally suggested by Voros), and based DIPS on batted ball information. Basically, I take a pitcher's batted ball line (ground balls, line drives, bunts, and outfield and infield fly balls) and transform it into a "regular" line--singles, doubles, triples, home runs, reached on error, as well as, strike outs, walks, and hit-by-pitch.&lt;/p&gt;&lt;p&gt;Here's a very simply run-down of how I do it. I take the number of batted balls the pitcher allowed, and assign him a league average line drive percentage, because based on my research with JC Bradbury in the &lt;a href="http://www.actasports.com/detail.html?&amp;amp;id=076"&gt;Hardball Times Annual&lt;/a&gt;, it seems that pitchers have little control over how many line drives they allow. I then split the rest of their batted balls based on their actual batted ball percentages. Take, for example, Jarrod Washburn. He allowed 586 batted balls last year, and since the average line drive percentage in the AL last year was 19.9%, I assign him 117 line drives. He actually allowed 120. Washburn also allowed 206 outfield flies. So his "new" outfield fly number would be 206/(586-120)*(586-117) = 208. And so on for every batted ball. A better explanation of the whole method is available &lt;a href="http://www.hardballtimes.com/main/article/another-look-at-batted-balls-and-dips/"&gt;here&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;To transform this translated batted ball line into "normal" statistics, I just take the average outcome of each type of batted ball, and multiply it by the translated line. So, for example, 50.8% of all line drives became singles in the American League last year. Since Washburn had 117 translated line drives, based on his liner numbers alone, he would be expected to have 59 singles. I do this for every batted ball type, and every hit-type (as well as reached-on-errors) as well.&lt;/p&gt;&lt;p&gt;I then plug that all into BaseRuns, and find how many runs the pitcher would be expected to allow.&lt;/p&gt;&lt;p&gt;Okay, 375 words in, and I'm finally getting to the point of this post. One criticism my system has received is that it zeroes out too many things. To some extent, this is true, though the point of DIPS is to zero-out the things that don't really matter. But here's the real question: how great is that extent?&lt;/p&gt;&lt;p&gt;&lt;a href="http://beyondtheboxscore.com/story/2006/4/4/133141/3292"&gt;On this very site&lt;/a&gt;, John Beamer wrote an article arguing that there is some skill involved in preventing line drives, which is certainly true. However, John's argument (not to put words in his mouth!) seemed to extend beyond that: As many others have said, John was arguing that disregarding one year's worth of line drives is incorrect; that there is some information contained in that information. He would not be the only person for whom I have respect to have said, that, more so, specifically in regards to DIPS 3.0. A poster that goes by the tag GuyDM &lt;a href="http://mb3.scout.com/fbaseballfrm8.showMessage?topicID=1163.topic"&gt;posted the following&lt;/a&gt; on the &lt;a href="http://mb3.scout.com/fbaseballfrm8"&gt;Strategy and Sabermetrics&lt;/a&gt; board awhile ago:&lt;/p&gt;
Not to quarrel with the central importance of K and BB rates, but to some extent the correlation of your metric and DIPS 3.0 is inevitable. David is imposing league average LD%, and standard run values for every BIP type. So the only source of variance left is GB/FB ratio, which translates into roughly +-.25 R/G given the range in GB/FB.
&lt;p&gt;Essentially, his point was that the process I go through for DIPS 3.0 does not leave much room for variance, less than there should be. Is that true? Well, luckily, the process is set up in such a way that we can actually check whether or not that is true. Using my expected batting lines against, we can calculate pitchers' expected batting average on balls in play (BABIP). In this case, I include expected reached on error in the numerator, because they are defense independent, since they are just based on batted ball distribution.&lt;/p&gt;&lt;p&gt;For example, Johan Santana was expected to allow 127 singles, 39 doubles, 4 triples, and 6 reached on error last year, according to DIPS 3.0. He also had 601 BIP, for a .294 BABIP.&lt;/p&gt;&lt;p&gt;Doing the math for every pitcher with at least 350 BFP in 2005 (171 in all), how much variance is there among these players? Well, the answer is .009. That's our standard deviation, which is a measure of spread. It means that 68% of all the pitchers would be expected to be within +- .009 points of average, and 95% would be expected to be within .018 points of the mean. How much is that? Since the average pitcher in our sample had about 500 BIP, that would make one standard deviation +/- 4.5 hits, or about .15 points of ERA.&lt;/p&gt;&lt;p&gt;So is that a lot or a little or what? Actually, it's just right. According to a &lt;a href="http://www.tangotiger.net/solvingdips.pdf"&gt;research paper by Erik Allen and Arvis Hsu&lt;/a&gt;, the true standard deviation of BABIP is supposed to be, you guessed it, .009 points. DIPS 3.0 is perfectly capturing the true spread in BABIP, further support for disregarding a pitcher's line drive percentage. That extra batted ball information actually plays a big role in understanding the subtler differences between pitcher seasons, because it allows us to capture the true spread in BABIP, which Voros' two versions of DIPS (which has a spread of zero, since they assumed every pitcher would have the same BABIP, though the second did make some small adjustments based on handedness) could not. And once again, we have even more reason to believe that line drive percentage really means nothing over the course of one season.&lt;/p&gt;&lt;p&gt;So vive DIPS 3.0! Oh, and it's nice to join the BtB staff.&lt;/p&gt;


  


      </description>
    </item>
  </channel>
</rss>
