(Photo by George Walker/Icon Sportswire)
Welcome back to my series on plate discipline/pitching metrics. This time, I’m going to focus on a close examination of the core principle behind my work – the so-called “strikeout rate discrepancies” predicted by plate discipline metrics compared to the actual rate. At this point, roughly two months of baseball are complete, which is as good a time as any to pause and take a look back. That way we can compare the data after 1 month to current data, and see how the predictions hold up on a month-to-month basis. I’m sure there still are a lot of doubts surrounding this whole theory, and honestly I can’t blame anyone. This is fairly new stuff, and after all, I’m just some guy on the internet. Why should you believe me? So let’s put the numbers to the test.
To get things started, here are the qualified starting pitchers who currently sport a greater discrepancy than +/- five points. If you recall, 5% is the upper end of the margin for error, so under my theory any discrepancies beyond that amount should be considered red flags. This table is sorted by the K% difference, with the over-achievers on top and under-achievers on the bottom.
|Name||Actual K%||Predicted K%||K% Difference|
Right off the bat, we can easily see just how far this list has been reduced since it was first introduced towards the end of April. At that point, there were 29 pitchers with a discrepancy larger than five points. Now we are down to just eleven. Already, this goes a long way towards validating my theory that these discrepancies in small samples should mostly work themselves out as more innings are compiled. The size of the largest discrepancies are also coming down. In that first list, the largest outlier was +13.6, and there were a handful over ten percent. Now, there are no qualified SP over ten points. Gerrit Cole is posting the largest discrepancy at -9. Furthermore, of those eleven pitchers, half of them are right around the five point line. There are really only 4-5 qualified guys left in the league who are meaningfully separated from that line. So it seems very clear that at least part of my theory is holding up – the discrepancies are indeed shrinking and vanishing as more time goes by.
The fact that the discrepancies are shrinking doesn’t tell the full story though. I think we also need to look at which side of things are moving. It could be that the strikeout rates are changing to support the metrics, or vice versa. If the strikeout rates are what’s moving, that would support my theory that the metrics are the truest reflection of the underlying strikeout skills. But if it’s the opposite, that would make the PD metrics less useful. If that was true, my theory would be more like the tail wagging the dog, and not predictive in the slightest. So let’s drill down into that original set of outliers, and see how things have played out since then.
To examine this, I am going to take the “predicted K%” from a month ago, and see how it compares to changes in the actual K% from that time up to now. My goal is to measure two things:
1) Each “predicted K%” value indicated a direction that the K% should move going forward from that point in time. For example, if the predicted value was higher than the actual K% during that first update, the predicted direction would be “up”. The first thing I want to measure is just what percentage of players at least got the direction right. That should serve as a good starting point. If this is a low percentage…well, I guess it’s back to the drawing board for me.
2) It’s not just the direction that matters. If the model predicts a +10% change in strikeouts, but you just get +1%, well that’s not really validating the model. Obviously some discrepancies still remain, so I’m accepting that the changes in strikeout rates are going to fall short of the predicted changes. Extreme outliers just need more data to stabilize, that’s the nature of statistics. But the second thing I want to measure is the predominant factor involved in the “gap closure”. Basically, I want to know: was it the strikeouts that have changed to meet the predicted rates, or was it the metrics that changed to meet the strikeout rates? My assumption here is that both sets of data will move a bit, but if my theory holds up, we should clearly see the strikeout rate movement as the predominant factor.
Here are the results:
|Name||Original Discrepancy||Current Discrepancy||K% Direction Predicted||Direction Moved||K% Actual Change||Gap Closure from K%|
|Gerrit Cole||-7.40%||-9.15%||Down||Up||1.40%||N/A (no closure)|
|Hyun-Jin Ryu||-6.20%||-6.50%||Down||Up||0.60%||N/A (no closure)|
|Masahiro Tanaka||5.30%||5.25%||Up||Down||-0.50%||N/A (no closure)|
|Tyler Anderson||5.30%||3.75%||Up||Down||-3.90%||N/A (no closure)|
First off, just a quick note: there were 29 pitchers in the original discrepancy list but only 25 here. The explanation for that is four of those guys were injured or moved to the bullpen after that update. Only starting innings are included in the data, so essentially these four pitchers were not given any chance to close their gaps, and it doesn’t make sense to include them. For the record, those pitchers are Robbie Ray, Miguel Gonzalez, Steven Brault, and Josh Tomlin.
To quantify my earlier observation that the size of discrepancies is coming down, we can take a quick look at the averages. In the first article, the average discrepancy of all listed outliers was around seven points. Now, that’s down to just 4 points, within the margin for error. That is another data point in support of the idea that the discrepancies are being reduced over time. It also gives us some clues as to the time frame it takes for these things to stabilize. On average, the discrepancies are being reduced by almost half, over a month or so.
Moving on, let’s try to answer the two questions posed above.
Q1) Did the model predict that strikeout rates would move in the right direction?
A1) Yes, it would appear so. 21 of the 25 pitchers got the predicted direction correct, which is a solid 84% success rate. Furthermore, of the four pitchers whose predictions weren’t correct, three of them only moved a tiny amount in said wrong direction. Tyler Anderson is the sole major outlier here.
Q2) What about the amounts? Are the strikeouts changing, or have we been wasting our time reading this nonsense??
A2) Again, it does look like we pass the test here. The last column on the table shows (for pitchers who did close their gap at least partially) what portion of that closure was from the strikeout side. As you can see, most pitchers (81%) are over 50% and many are at 90+. There is again only a single major outlier, Mike Fiers and his 20%. The average contribution towards the gap closure from strikeout rates is 76%. In retrospect, this fits nicely with the statistical data from my original piece. As a reminder, here are the two graphs showing the correlations between strikeouts and the two PD metrics I use to predict them:
Do those R-squared values seem familiar? R-squared values are another way of saying “the data fits this model X percent of the time”, so the fact that the predictions are roughly 75% accurate should be pretty much exactly what we expected.
Overall I am certainly encouraged by these results. It’s only two months of data, but 75-84% accurate predictions (depending on measurement) seems pretty solid and also in line with what we know from my previous work. There is one VERY important caveat though, in that I am only analyzing the most extreme outliers here. This is important to understand. The metrics just aren’t going to be very useful for analyzing every pitcher, since the vast majority of them are operating within their normal expected range. But the evidence does seem to support my methodology for identifying the largest outliers, and also predict what will happen to them going forward.
So this is good. I’m happy I get to keep working on this and not throw it all out the window. But, these results don’t come without downsides. If a system is designed to catch outliers, but the outliers are disappearing, at some point the system stops being useful. Going forward, I’ll need to take this account in order to stay relevant. With that being said, and also incorporating some feedback from readers, I’m going to start by splitting out the rankings into two tables. I’ll present the qualified SP first in one table, and then present another table for the guys with less innings. Here you go:
Table A – Qualified Starters
Table B – 10 IP Minimum
Data is current through games of Tuesday, May 29. Methodologies for the calculations remain unchanged from last week. Pitcher score is weighted 50/50 between PD score and xSLG. PD score is weighted as 3 points O-Swing%, 3 points Contact%, 3 points SwStr%, 1 point F-Strike%.
Predicted K% is calculated only from Contact% and SwStr%.
For both “K% difference” and “SLG-xSLG” columns, a negative number indicates good luck that should regress negatively going forward. A positive number indicates bad luck that should regress positively.
ERA, BB%, and SLG-xSLG are included for reference only and not included in any calculations.