Generally speaking, we’re pretty far past using FIP — or any single stat — to make broad, conclusive statements about a pitcher’s performance. With all the information available to us, it’s easier than ever to break down on a granular level what went right or wrong in any given season. Stats are nothing without their context, and we have the ability to quantify a lot more context than we used to. That’s what this whole genre of article is for!
Nonetheless, most of us don’t have time to do deep dives on every pitcher with a disparity between their ERA and FIP. Whether you’re preparing for a fantasy draft, hunting for preseason predictions, or just looking at your favorite team’s pitchers, seeing a substantial gap between the two is always enough to give pause. When the results say one thing but the fancy math says another, it can be tough to know what to believe.
With this in mind (and in search of article topics), I went to take a look at some of FIP’s biggest over- and under-performers for this truncated season. The former will be with you next week, but with half of the league being done for the year, it seemed like a good time to see who might wind up being a lot better than what we saw this year. A quick search brings us the largest ERA-FIP gaps among all qualified pitchers in 2020:
Frankly, that’s not a super interesting list, at least for our purposes! Sometimes, even when advanced metrics tell us that there’s something to a pitcher that’s not meeting the eye, they still don’t really tell us anything new. We’ll get to Rick Porcello in a minute, but that FIP-ERA differential isn’t likely to change many people’s perception of Zack Greinke or Johnny Cueto at this point in their careers. Few people aren’t already excited by the Framber Valdez breakout, and Matthew Boyd and Andrew Heaney under-perform their peripherals about as frequently as they get haircuts. Which, I don’t know, but I presume it’s pretty frequent! Finally, with the logic of a short season, even casual fans with a somewhat broad outlook likely understand that Aaron Civale and Patrick Corbin are almost certainly better than what their run prevention was through 10 starts in 2020.
This isn’t to say that there’s not plenty of interesting stuff about these pitchers to dissect, but the attraction of FIP is that it can tell us when we need to do a wholesale reevaluation of a player’s skill level, and that list doesn’t give us too many prompts. What we’re looking for is the intersection between a pitcher who is perceived to not be particularly good and underlying numbers that suggest otherwise.
The first problem is that in a season like this, using qualified pitchers as an arbitrary cutoff just doesn’t give us much of a sample to work with. It invites survivor bias and excludes a number of pitchers who were legitimate pillars of their team’s pitching staff. Things start to get a bit more interesting when we lower the threshold to 30 innings pitched:
There’s still a good amount of obviousness — Michael Wacha and Tommy Milone are pretty washed; Jon Gray and Vince Velasquez aren’t as good as their strikeouts suggest; everything fell apart for Luke Weaver this year — but the appearance of a few FIPs in the threes with an ERA in the fives says we’re getting closer to what we’re looking for.
It’s still not quite it, though, so it’s time to get super arbitrary. Again, I’m interested partly in perception, and perception can be pretty arbitrary. It may differ from fan to fan, but generally speaking, we know an ERA that starts with 5 to be bad, and an ERA that starts with 3 to be pretty good. Consistently giving up five-plus earned runs per nine innings will typically earn a pitcher a one way ticket out of the rotation, but bringing that number down to three or four can make one a staff building block. So let’s narrow the search even more. Of all 158 pitchers with at least 30 innings on the year, four of them posted an ERA above 5 while keeping their FIP at 4 or below:
If we’re going to be arbitrary, I’m also going to use my better judgement on players who just barely missed the cut, hence the inclusion of Logan Webb. (Velasquez is the only other on the bubble, and he’s likely run out of chances to be a big league starter). Anyway, as I said earlier, stats are nothing sans context, and such an all-encompassing stat as FIP requires a lot of context to properly unpack. All of these pitchers are bonded by their “unlucky” ERAs, but as you probably know, they’re also very different kinds of pitcher, on the whole:
(Source: Baseball Savant)
We’re still a few steps away from drawing any concrete conclusions, but there are still a few things we can take away. An absence of loud contact and dearth of strikeouts make Porcello and Webb’s case for having actually “beaten” FIP considerably more suspect, while the same numbers indicate that that there might be something more to Josh Lindblom and Jordan Montgomery. Yusei Kikuchi falls somewhere in the middle, offsetting premier velocity and good overall contact management with too many walks and too many hard-hit balls. All of this speaks to the context-dependency of a stat as broad as FIP. There’s certainly more than one way to over- or under-perform it.
That being said, we do know where the large majority of variance between ERA and FIP comes from, thanks to some super interesting work from Fangraphs’ Craig Edwards. The entire piece is well worth reading, but the big takeaway is that roughly 75% of the difference between a pitcher’s ERA and FIP can be explained by two factors: BABIP and strand rate (LOB%).
It tracks logically. The way that hits are clustered on balls in play, whether with the bases empty or runners on, is not something that the pitcher has much control over on a year-to-year basis, and both — particularly the latter — have a significant impact on the number of runs that actually score. Let’s take a look at how those pitchers fared on the things they (maybe) didn’t have much of a say in:
(Source: Baseball Savant; Fangraphs)
I included wOBA and expected wOBA on contact (wOBAcon) because they give a somewhat more detailed picture of balls in play than BABIP. When relying on BABIP to make a point, even in cases like this where it’s proven to be relevant, it’s good to remember that at the end of the day, it’s still a batting average. I don’t think I need to say how we feel about batting average!
Anyhow, once again, it’s a mixed bag, but an enlightening mixed bag. As was to be expected, every one of these pitchers has been haunted by an abysmal LOB%, sometimes extraordinarily so. This season is heaven for Small Sample Sizes™, but for reference, the last qualified pitcher to strand less than 60% of their runners was Derek Lowe back in 2004. All four of the players up there with meaningful big league experience have strand rates near league average for their career, and it would be pretty reasonable to guess that they’ll all see some improvement in their ERA when a few more of those runners on base end up stuck there.
Of course, that’s also giving the pitchers less credit — positive or negative — than they probably deserve. Worse pitchers are going to strand fewer runners, so that’s not necessarily super useful in determining whether any of those guys might actually be good in the near future. The combination of BABIP and contact quality tells us a little more about how much of those rates are luck, and how much is actually deserved. Conventional wisdom, for example, dictates that Porcello’s sky-high BABIP ought to come down next year, but with contact quality metrics that turn opposing hitters into DJ LeMahieu, one begins to suspect that he’s simply making his own (bad) luck.
For the other pitchers, the mystery deepens. Webb appears to have had the worst batted ball luck, getting run for a BABIP 50 points worse than league average despite having fared much better than league average on contact. Lindblom’s high x/wOBAcon is inflated by being a a bit more home run happy than the rest of the league, but it also elicits concern that his .320 BABIP — 30 points above league average — is more the norm than the exception. It also can’t be ignored that, if you go back to the previous chart, Lindblom is the only one in the group who strikes out a meaningful number of hitters above average. Rick Porcello’s arsenal might preclude him from pitching his way out of jams, but the ceiling is higher for Lindblom. The same goes for Montgomery, whose ability to limit walks (92nd percentile BB%) while maintaining a league-average strikeout rate should theoretically allow him to do just fine with slightly worse than average contact metrics.
That leaves Kikuchi, who’s got quite a few conflicts going on. Despite having just about everything else go his way, he’s still on pace for that record-low LOB%. FIP puts a lot of stock into home runs, and while Porcello’s home run suppression looks pretty fluky, Kikuchi’s contact metrics and groundball tendencies (52.8% GB rate, highest of the five) make his ability to avoid the longball appear a bit more sustainable. Ultimately, all five pitchers have their warts, but we’re beginning to see who’s pulling away from the pack as far as expectations for next year should be concerned.
Before we make any final judgments, there’s one more crevice left to explore. Like the vast majority of these metrics, LOB% doesn’t exist in a vacuum. At its core, it’s just a measure of how successful one is at avoiding hits and walks with men on base. Cluster luck and fielder ability plays a big part in that number, but overall pitcher quality has something to do with it too. Either way, we need to do a little more work to see how fluky some of those strand rates actually are. For our last chart, here how all of those pitchers performed with runners on base this season:
(Source: Baseball Savant)
Now we can start drawing some real conclusions! Here’s one: if you ever find that you’re trying to talk yourself into a bounce-back year from Rick Porcello, well, you shouldn’t! The guy just gets hit hard in all phases of the game. To put Edwards’ findings into practice, we know that his high BABIP and low strand rate are what’s making his FIP so much more attractive than his ERA. But even if he’s done worse with runners on base than he’s probably deserved, what he actually deserved wasn’t very good either. The combination of loud contact and an inability to strike hitters out makes it a hell of a lot easier for things to break the wrong way than the right way.
That lack of strikeouts with runners on base is a common theme among these underperformers. The issues they have out of the stretch are probably unique to each individual pitcher, and unfortunately, this isn’t the article for that kind of breakdown. Anyhow, it isn’t promising that despite a Porcello-esque dearth of punchouts (with nearly double the walk rate), Webb has actually gotten better results with runners on base than expected. On one hand, this means that his abnormally low LOB% might be the flukiest of the bunch. But on the other, it also tells us that even if the strand rate corrects itself, it may well be negated by the regression looming in other, more important metrics.
As the only member of this list with a truly above-par strikeout rate in all circumstances, Lindblom theoretically has a bit more leeway with his contact quality. Unfortunately, hitters have consistently gotten the best possible outcomes when they do make contact, both with runners on base and otherwise. Any pitcher is prone to homer fluctuations on a year-to-year basis (hey there, Gerrit Cole!), but some are simply better at avoiding the dinger than others. The strikeouts make Lindblom perhaps the most attractive name here for a projected bounceback, but as Robbie Ray and Matthew Boyd remind us on a yearly (if not monthly) basis, strikeouts ultimately don’t mean much if the pitche-r just can’t stop serving up gopher balls. Lindblom’s 26.9% ground ball rate is the second-lowest in the majors this season, and it’ll be touch to curb the home runs in the future if he continues to be unable to keep the ball on the ground.
That leaves us with Kikuchi and Montgomery, who now really appear to be getting the short of end the short end of the stick in somewhat divergent ways. Montgomery has done just about everything he’s supposed to do. Despite having some difficulty reaching back for the strikeout, the massive gap between his actual results and expected stats say that he’s truly been the recipient of some terrible luck with men on base. He won’t be a true front-of-the-rotation starter, but given his gains in the command department (aided by some mechanical tweaks), Montgomery strikes me as the kind of FIP-beater who’s likely to bounce back with a far better 2021 than some might anticipate.
Kikuchi, meanwhile, remains an enigma. He’s become one of the hardest throwing lefties in baseball, with three of his four offerings drawing whiffs at above a 30% clip. And measured purely by the gap between expected and actual stats, he’s had far and away the worst luck of any pitcher discussed thus far. Kikuchi’s .351 BABIP with runners on base seems to be fueling a massive disparity in true-talent level; whereas Porcello’s monstrous BABIP just takes him from pretty bad to really bad, Kikuchi’s bad luck — much as I hate using luck to describe these relationships — is turning what might be a pretty damn good pitcher into a straight-up bad one.
To be fair, it’s not all luck working against him. You’re going to let a lot of runs score when you put up a 12% walk rate with men on base, and that’s what makes me slightly less sure of positive regression than in Montgomery’s case. Kikuchi has the stuff to be an elite pitcher, but as it stands, his control out of the stretch has been poor enough (for reasons that are as-yet unknown to me, so feel free to chime in, Mariners fans!) to do the double damage of adding more free baserunners on top of the poor luck he’s already experiencing. Poor control not only begets walks, it also begets hittable pitches that turn into runs. Kikuchi’s ground ball tendencies make me think he’ll continue to keep the home runs in check, but he’ll likely have to bring his walk rates back to 2019 levels, perhaps even at the expense of some velocity, to fully make good on that juicy FIP.
I’ll end by quoting the conclusion to Edwards’ article last season:
“It’s only natural to want to find a reason why a pitcher’s ERA and FIP are so different, and for that reason to be related to something the pitcher is or isn’t doing. Unfortunately, that isn’t always likely to be the case. In any single season, there are going to be outliers due to the relatively small sample of plate appearances we are dealing with, and almost all of the difference between ERA and FIP can be explained by BABIP and LOB%. While not all of a pitcher’s BABIP and LOB% are due to a pitcher’s defense, sequencing luck, and just general good fortune, a decent amount is just that. Baseball is a team sport and defenses play a large role in run prevention. While it isn’t always easy to admit, luck plays a role as well.”
As he usually is, Edwards is right. It’s not easy to admit when luck plays a part in the game, largely because it’s just about impossible to say for sure how big that part really is. All we can do is go case by case and use our best judgment to balance the big-picture facts with interpreting the individual components of any given player’s season. Sometimes the two line up, sometimes they don’t. As we love to say, baseball is a game of tiny margins, and many of those margins are gained by having a discerning eye between the exception and the rule. That’s all we’re trying to do here. Regardless, it’ll be exciting to see which way the pendulum swings for these five come next season.
(Photo by Frank Jansky/Icon Sportswire)