A few weeks after the conclusion of the truncated 2020 season, I published xRV: The best pitches and pitchers of 2020 while working for BaseballCloud. Here, I formally introduced my pitch quality metric, Expected Run Value (xRV). Since then, I’ve been utilizing the model’s results in attempts to explain the stories of pitchers’ pitch types, arsenals, and isolating the two major components that went into xRV 1.0: stuff and command. Such as:
Hendriks' fastball was in the 98th percentile of all pitch types by xRV in '20 – the 32nd best pitch overall (min. 50 pitches).
His SL, in a vacuum, was in the 49th %ile, but it undoubtedly plays off his elite FB. It doesn't have a .149 xwOBA over the last 2 years by accident. https://t.co/qDyI6KENw1
— Luke Smailes (@CoeSoxMetrics) January 12, 2021
xRV indicates that Ryan Thompson likely benefited the most from this arm slot differential, but Diego Castillo probably did as well.
Their combined 29 pitches collectively outperformed their xRV (a metric that doesn’t consider arm slot). https://t.co/2R5XYYoFYy
— Luke Smailes (@CoeSoxMetrics) October 14, 2020
Through these examples, I’m alluding to something being missed here—some factor that plays a substantial role in what is pitching. That something is deception—a factor that I hoped would account for the residuals in my initial xRV model in this new version.
In a nutshell, xRV is built using a random forest algorithm that takes count, quality of contact, and pitcher/hitter handedness into account, and these qualities remain the same. Below, I address each of the attributes included in the model and also explain the 2.0 additions in more detail.
My first steps were to figure out how to quantify this phenomenon of deception, which by definition (in a baseball context), is “causing [a hitter] to believe something that is not true.” I then came across (former Indiana University baseball analyst and now Los Angeles Angels analyst) Jake Sauberman’s post (@jakesaub) where he outlines a three-pronged approach to attempt to quantify deception: indistinguishability, unpredictability, and unexpectedness. I now had my methodology inspiration, as Sauberman did a great job theorizing what exactly makes a pitcher deceptive. Rather than drum up new terminology to end up saying the same thing, I decided to reach out to Jake for his feedback on my application before moving forward with essentially his basic framework after some tweaks.
To better understand the foundation of xRV, which is still the same for the 2.0 version, I recommend giving 1.0 a read.
While Sauberman estimated indistinguishability as release point tunneling, I wanted to focus on the commit point of hitters to create a tunneling metric. In The Physics of Baseball, Yale physicist Robert Adair identifies the commit point as roughly 24 feet before home plate, which on a 90 mph fastball, equates to about 175 milliseconds.
In order estimate tunneling, I needed to have the probable pitch trajectories of every pitch at every .01 second (ten millisecond) interval. To do this, I utilized the pitchRx package in R and the getSnapshots function within. The function does what you’d expect given its name: It takes individual pitch metrics such as release point, velocity and acceleration, plugs them in to the kinematic equations from renowned baseball physicist David Kagan, and spits out a “snapshot” vector of the estimated coordinates at every ten-millisecond interval. I then just pulled out the point closest to where y = 24ft for each pitch thrown in 2020, and I then had my estimated commit points.
Back in 2017, Harry Pavlidis, Jonathan Judge, and Jeff Long wrote about pitch tunnels for Baseball Prospectus where they outline a Break:Tunnel Ratio that shows “the ratio of post-tunnel break to the differential of pitches at the Tunnel Point.” This was my foundational inspiration for how exactly to model what pitch tunneling is.
To effectively tunnel pitches, the goal is to minimize the distance of two pitches at the commit point (or tunnel point, point of no return, etc.) while then maximizing the distance between the pitch’s final coordinates. This simulates the “late break” that’s theoretically happening as the hitter is attempting to make contact. For the sake of my model, I simply took the Euclidean distances of the two pitches at their commit points and divided it by the pitches final Euclidean distances to create my tunnel metric. Thus, a smaller value was more favorable.
But how would I actually compare the pitches? Sauberman compares every pitch to the one thrown right before as the point of comparison. While I don’t necessarily disagree with the functionality of this method, I think comparing each pitch to a broader scope of the past better models hitter perception. A pitcher could get a whiff on a slider based on a well-tunneled fastball that he threw to that hitter two pitches prior, or even two at-bats prior. Plus, advance scouting and coaching staffs are feeding hitters information about the pitcher (including video) leading up to the at-bat, especially on their main pitch, and this affects their perceptions.
Therefore, I decided to compare each pitcher’s secondary pitches (all pitches he threw except for the most often-used pitch type) to the average commit and final locations of their main pitch type. The kicker is that I also took all the “main pitches” (roughly 80% of these are fastballs) and divided them up into 13 zones as defined by MLB’s heart, shadow, chase and waste zones to get more precise with the relationships. From there, I calculated how well each secondary pitch would tunnel with the pitcher’s main pitch when thrown in each of the 13 zones. Finally, I took a weighted average of those estimated tunnels based on where the pitcher located his main pitch in 2020 to construct my tunnel metric.
There’s no good way to explain that mouthful of a process, so maybe these visuals will help.
This is the breakdown of the zones from the catcher’s perspective:
This pitch is in zone eight (shadow low left from the catcher’s perspective) and according to the model, it best tunnels with his four-seamers thrown in zone four, followed by (in order) zones three, one, two and seven. Since Hendriks spots his fastball in zone four 6% percent of the time, that’s effectively how much “tunnel credit” he gets for that slider thrown in zone eight. Furthermore, his fastball is thrown in zone three 10% of the time and zone one 14% of the time (and so on), so these zones are weighted more heavily and have more of an effect on the TotalTunnel metric.
For each pitcher’s primary pitch type, each pitch was compared to a usage-weighted average of the tunnel metrics for each of their secondary pitches. For instance, if my model tells me that Hendriks’ slider tunnels with his fastball in the 66th percentile of all secondaries, then it’s reasonable for me to conclude that his fastball’s tunnel metric is in the 66th percentile for the 22% of his pitches that are sliders. Finally, each primary pitch is scaled based on how often that pitch is thrown (1 – Usage%). Otherwise, pitchers who threw their primary pitches extremely often would be boosted, as their secondary pitches with low usage-weights would over credit the overall tunnel with the primary pitch type.
Enough nuts and bolts. Here are some leaderboards:
|Top-15 Tunneled Breaking Balls – min. 100 pitches|
|Name||Pitch Type||Team||Total Tunnel||Pitches|
Note: It’s personal preference for me to consider cutters to be breaking balls, not fastballs
|Top-15 Tunneled Changeups & Splitters – min. 100 pitches|
|Name||Pitch Type||Team||Total Tunnel||Pitches|
|3||Martin Perez||Changeup||Red Sox||.435||249|
|12||Evan Marshall||Changeup||White Sox||.462||125|
|14||Gio Gonzalez||Changeup||White Sox||.469||194|
|Top-15 Tunneled Arsenals – min. 500 pitches|
For this facet of the model, I made no tweaks to Sauberman’s process. As he puts it,
“The less a pitcher deviates from his usual pitch type frequencies no matter the count, the harder it will be for the batter to guess which pitch is coming.”
Having count-neutral confidence in your pitch types seems like a valuable aspect to a pitcher’s success, provided those pitch types are good enough to get major league hitters out.
For each pitch type, the process was to calculate its usage percentage in each count and then subtract that usage from the pitcher’s overall usage of the pitch. Finally, these calculated deviations are weighted given how often each count appeared in MLB in 2020.
Another way to think about the value of count unpredictability is to look at it through the lens of game theory, specifically a Mixed Nash Equilibrium. By being more diverse with pitch types in any given count, the pitcher is effectively reducing the hitter’s probability of maximizing their payoff by guessing the pitch type correctly. While hitting success is not purely based on guessing correctly, it’s reasonable to assume that when a hitter does guess the pitch type correctly, it has a substantial positive effect on the quality of contact, and thus the run value of that pitch.
Neil Paine dove into this idea more thoroughly back in 2015 for FiveThirtyEight where he asserts that we should expect a pitcher’s arsenal to “settle into the optimal mix for retiring opposing hitters: a mix of fastballs and change-ups that’s impossible for a batter to exploit” … “assuming the batter adapts accordingly.”
The existence of more tools in the toolkit for any given pitcher only extenuates the difficulty of exploiting pitch type selection.
With a minimum qualifier of 100 pitches of any pitch type, the most unpredictable pitch types in 2020 were, in order: Lance Lynn’s sinker, Zack Greinke’s slider, Alex Young’s four-seamer, Johnny Cueto’s curveball, and Aaron Civale’s slider.
For an entire arsenal weighted by usage, here are the top-15 most unpredictable arms from 2020. A minimum of 500 pitches gives me a good subset of the mostly healthy starters from last season.
|Top-15 Unpredictable Arsenals – min. 500 pitches|
|5||Daniel Ponce de Leon||Cardinals||.019||558|
|7||Lance McCullers Jr.||Astros||.019||807|
|13||Lucas Giolito||White Sox||.022||1,089|
The final addition to the model was unexpectedness, and Sauberman defines it as:
“…the deviation between the actual pitch movement (horizontal and vertical) based on the pitcher’s release point, and the expected pitch movement based on the pitcher’s release point”
This is a fascinating concept to me, and it makes perfect sense. Hitting has a lot to do with our brain making predictions, and everything the pitcher does prior to and during the delivery of the pitch generally helps in making those predictions more accurate. Major league hitters have seen millions of pitches in their lifetimes and they know (whether consciously or subconsciously) that release point is one of those clues that helps them predict what exactly that pitch will do.
In examples that Eno Sarris points out in his Attempt to Quantify Deception (a piece that Sauberman also references), Brad Ziegler’s release point clearly screams horizontal movement while Tyler Clippard gives off an impression of vertical movement. As a movement profile deviates from what is expected given release point, we find ourselves with another factor of deception.
Science Insider illustrates this process really well in this video, and it can also be applied to the indistinguishability section above.
To model what is “expected from a hitter”, Sauberman utilized a K-nearest Neighbor regression algorithm, and I followed suit. While Sauberman used season averages for the movement and release point data, my initial goal was to create this model using pitch-level data. This was because I wanted to attempt to capture the subtleties that some high-level major league hitters can detect in release point deviation from pitch-to-pitch. However, within the KNN algorithm (say with K = 500), each pitch was compared mostly to the same pitch type of that same pitcher, and this didn’t help me model the release point/movement deception factor that I was going for. This was specifically not doing the “unicorns” justice—such as James Karinchak and Tyler Rogers who have such unique release points.
I therefore followed Sauberman in running the algorithm with the pitch type season averages, and as a result, my unexpectedness model’s RMSE was reduced by one-third.
Sauberman also showed that different KNN models for each pitch type were necessary. For each model with populations between 15,662 pitches (cutters) and 79,590 (four-seam fastballs), I found the optimal value for k (150 for four-seamers, 75 for sinkers and sliders, 55 for curveballs, 50 for cutters, and 25 for changeups and splitters), feature scaled the results, and ran each algorithm.
|Top-15 “Unexpected” Movement Pitch Types – min. 100 pitches|
|14||Derek Holland||Knuckle Curve||Pirates||.447||250|
Adding These Factors to xRV
With these three additions plus the addition of spin direction values, my xRV model had 13 attributes: velocity, horizontal movement, vertical movement, raw spin rate, spin direction, x location, z location, the unpredictability, predicted movement differential, and total tunnel metrics plus dummy variables for batter handedness, pitcher handedness, and whether or not the pitch was a fastball.
Here are the top-15 pitch types and top-25 arsenals according to xRV 2.0
|2020 Top-15 Pitch Types by xRV 2.0 – min. 100 pitches|
|11||A.J. Cole||Slider||Blue Jays||-2.22||.193||45.4%||163|
|2020 Top-25 Arsenals by xRV 2.0 – min. 300 pitches|
The “Deception Factor”
To isolate the “Deception Factor” of three metrics on different scales, I took a weighted average of their percentile rankings, with the weights determined based on how important they were to the model. The breakdown turned out as: .457 × Tunnel + .382 × PredMov + .162 × Unpred.
With a stricter pitch minimum, lets look at the top-15 pitch types in terms of overall deception.
|2020 Top-15 Pitch Types by Overall Deception – min. 200 pitches|
|Name||Pitch Type||Team||Mean Deception %ile||xRV %ile||xwOBA||Usage||Pitches|
|5||Ryan Weber||Sinker||Red Sox||85.7||57.1||.337||50.9%||313|
|7||Matt Barnes||4-Seam||Red Sox||84.2||12.0||.337||54.1%||223|
… and in terms of pitcher arsenals (weighted by usage):
|Top-15 Deceptive Arsenals – min. 500 pitches|
|Name||Team||Mean Deception %ile||Pitches|
|8||Ryan Weber||Red Sox||69.6||608|
To revisit Liam Hendriks one final time—I initially made the assertion that his slider “undoubtedly plays off of his elite fastball” back in January just based on his results. As it turns out, Hendriks’ slider is in the 66th percentile of pitch tunneling, the 67th percentile of overall deception (74th percentile movement unexpectedness and 57th percentile unpredictability), and the 57th percentile of xRV which is eight percent better than what xRV 1.0 had it at. While not as elite as I originally predicted, his slider is still in the top third in overall deception. xRV 2.0 better tells the story of why Hendriks’ slider was so effective.
What’s interesting is that his curveball, the pitch that better mirrors his fastball’s spin and has more robust 70th percentile vertical movement, was the league’s best tunneled pitch in baseball in 2020 when the pitch minimum is dropped to only 25. It had a 60% whiff rate which was fifth best and also had a movement unexpectedness in the 84th percentile. xRV has it as a 78th percentile pitch, or -.456 xRV/100.
While his slider has been lethal over the past two seasons, there’s some evidence that his curveball should be used more often given its relationship with his fastball. The GIF below shows Hendriks’ second best vertically breaking curveball of 2020 and correspondingly, his third best tunneled curve when compared to his fastball profile (high vertical break, steep vertical approach angle) and where he typically locates it.
That’s absolutely filthy.
The new model returned a mere two percent RMSE improvement from version 1.0 which is sort of disappointing on the surface, but this effort has helped further explain what exactly is going on in the art of pitching I’ve been able to improve my estimation, but I can conclude that even with these additions, I can’t explain 100% of the residuals from xRV 1.0, nor is it realistic to try to. The model features I selected are clearly not perfect representations of stuff, command, and deception. There’s also some variation in the results within the game of baseball that are sometimes inexplicable.
From the Boruta library in R, I was able to determine the importance of each of the model’s attributes which helped me confirm my weights for the “Deception Factor” stat.
|9||Pred. Movement Diff|
|12||Pitcher Handedness Dummy|
|13||Hitter Handedness Dummy|
Moving forward, I plan to further interpret the implications of this model. For example, looking at how certain pitchers can better optimize their pitch locations based on pitch tunneling, or even expanding on the game theory application regarding pitcher unpredictability. Finally, there’s a certain way in which, stuff, command and deception play a role in pitching success, so figuring out that exact necessary trade-off in this three-dimension realm should be valuable.
For any further results of the model for a certain pitcher or any questions on my process, shoot me a mention or DM on Twitter @CoeSoxMetrics.
Photos from Icon Sportswire | Adapted by Doug Carlin (@Bdougals on Twitter)