Going Deep: Interpreting BABIP and Players with Positive Regression on the Horizon

One of the best parts of working at Pitcher List is that 9.9 times out of 10, if you have something you want to write about the answer is always going to be yes. It’s such an empowering way to get my first taste of writing about fantasy baseball, and I really try to take advantage of that. I have written about everything from player deep dives to the weather to Hall of Fame arguments to rewriting the rules on how we think about baseball. Usually, these ideas come to me through observation and oftentimes, random curiosity.

It still amazes me how sometimes that the most interesting and well-received things that I have written so far have been born from a random tweet or a throwaway line on a broadcast. This may be no more apparent than on my Weekly Musings on Baseball Column. At the end of every week, I present a collection of my thoughts on a variety of baseball-related ideas with a healthy mix of fantasy and real-life topics, and it is a ton of fun to write. This past week I wrote about how I approach BABIP this early in the season, and I received a ton of requests to expand on the topic and provide more player examples, and thus, behold before you, my first reader-requested piece!

I’m really excited to dive further into the topic and hopefully give both solid advice about specific players and provide you the tools for accurately interpreting BABIP on your own (give a man a fish, he eats today. Teach a man to fish blah, blah, blah). This is something I’m really big on. When I started making a leap in terms of my skill and understanding of fantasy baseball many fantastic writers like Eno Sarris, Keith Law or Mike Podhorzer didn’t just give me fantasy advice but also provided me with the knowledge to start developing my own fantasy opinions. I hope I am able to do the same.

OK, on to BABIP itself. Let’s start by defining the term. If you read most fantasy baseball sites or listen to many fantasy baseball podcasts, you’ll hear the term thrown around pretty much willy-nilly without really any context or explanation. One really important thing to remember about BABIP is that it is at its core an index statistic. It’s really useful for providing a big-picture view of how lucky a player is getting based on their batted-ball data, but due to being a combination of multiple statistics, it doesn’t tell you specifically what is causing a deviation from the player’s normal BABIP to occur. Finding that a player’s BABIP radically deviates from either their norm or the league average can be really useful information, but it doesn’t tell you WHY the player’s BABIP has strayed from those numbers.

Our goal is to try to figure out why a player’s BABIP might be different, what can cause that to happen and what we conclusions we can start to draw about what may happen in the future. We need to define three terms before we jump into the heart of this article, and they are: regression; the law Of large numbers and Bayesian inference. We often view regression only in a negative context, but it’s really important to understand that a negative number (in this case BABIP) can regress positively to a player’s normal BABIP. Regression cares not for your starting point, it just indicates the possibility that said starting point will move to the player’s norm. The Law Of Large Numbers essentially states that if you were to simulate a player’s season multiple times from whatever point you choose, the more times you perform the simulation the closer the results will resemble the player’s average season.

It is essential to keep these ideas in mind as you take a look at BABIP this early in the season. You can’t dictate what will happen for the rest of the season. Injuries, weather, when that player woke up that day, the opponent’s defensive ability and thousands of other variables all effect how well a player’s season will go, but by using BABIP plus the law of large numbers to project how a player’s season SHOULD go, we can start to draw conclusions on how to handle a player from here on out. Bayes’ inference dictates that we essentially have to adjust our expected results based on new or different information. We’ll dive into this more in a minute, but it is essential to get a feel for when one set of data ends and when a new one begins based on a change in the player.

With all of this given, all BABIP regression falls into three categories, positive regression, neutral regression, and negative regression. There is plenty of nuance within these categories, and this is where Bayes’ Theorem comes into play. It’s incredibly useful to regress a player’s BABIP to their career norm, but if the player has made a change in his approach, batted-ball data, swing or batting stance, you have to take that into consideration and alter your projected final results appropriately. This brings up a quintessential difficulty in interpreting BABIP. This is not about assuring what will happen; this about trying to determine what WILL happen.

It’s like predicting the weather, but baseball players are human beings and so even less predictable. Sometimes a player figures out something in the swing that was out of wack, and once they radically fix it, they break out in a big way (think J.D. Martinez or Francisco Lindor hitting the ball in the air way more and pulling the ball more) or an aging player falling off the cliff (think an older player suddenly hitting way more ground balls or a spike in infield fly balls). This is the key. New information must inform your conclusions. With this in mind, let’s dive into the three major categories of BABIP regression and see if we can’t identify a few players who fall into each category.

Positive Regression

No matter which category you suspect a specific player falls into, the first thing you have to do is establish a norm for the player, keeping in mind a .300 BABIP is roughly league average. It’s important to remember that if you see a specific range (within reason) that comes to represent a player’s BABIP over at least a three-year period, it’s fair to assume that the player’s BABIP is skill-based rather than luck-based. If I have a player who does not have a three-year BABIP history (think rookies off to a hot start or a breakout player) or a player who has a had a wide variety of BABIPs over the years, I usually try to compare FanGraphs’ Depth Charts projections (since it is a combo of Steamer and ZIPS) to a player’s BABIP. Once I have an established baseline, I subtract that from their current 2019 BABIP. If it departs radically in either direction from their norm, then it likely warrants further investigation. That’s the key part. You have to dig deeper. You can’t just throw out, “Well their BABIP is well off their normal BABIP so clearly there is regression coming!” If you stop there, you could easily be missing something. In this section, I’ll talk about what I do when I see a BABIP radically lower than their norm and then walk through a few examples. Here’s what I look for in that situation:

First thing I do is look for an injury. I’ll start with a quick Google search and then do the same on Twitter. If I don’t find anything, then I’ll move on to scanning recent game recaps from the team’s beat writers. Usually, if a player is playing through an injury or is banged up they are the folks who will know. If they seem pretty responsive on Twitter, it might be worth reaching out to them there as well but don’t count on a response as they get a lot of things to reply to in any given day and you likely won’t hear back. Still, I’ve had some success with it. Finally, I’ll go and check out the player’s Statcast data. If I see anything like a rapid dip in exit velocity or sprint speed, it could also be indicative of an injury.
Next, I’ll look at the player’s batted-ball data, specifically line-drive rate, fly-ball rate, ground-ball rate, and pull rate, center rate, and opposite-field rate. If I see any diversion from the player’s career norms in those stats, I begin to wonder if the player is making a change in approach or trying something different that might take some time to figure out. Note this isn’t always positive.
If I haven’t found anything yet, I check a player’s Statcast data over at Baseball Savant, specifically their xBA, xWOB, and xSLG. These numbers aren’t perfect or predictive, but they can give you a quick look at whether the batted-ball data says they have been getting lucky. It’s also worth seeing how high a player’s exit velocity (EV), launch angle (LA), barrel percent (BBL%), and hard-hit Rate (HH%) are. This can tell you if the player is hitting the ball well but it just isn’t falling at the moment. Opponents’ defensive skill and positioning can play a huge part in this as well. If there is a massive difference between those two numbers and his actual average and slugging percentage, it could indicate the player is getting unlucky. I also check his heat chart over at FanGraphs to see if pitchers are throwing to him differently and also see if he is getting more of a specific pitch type if he struggles with that pitch type historically.
It’s also worth looking at the quality of his opponents over that time period as well as the weather. If the player has played in a lot of cold weather games, that can have a huge effect. An easy way to see this is to check if the player is typically a slow starter. Also if he got caught in a run of elite pitchers/defenses/bullpens, it could help explain the outlier as well.

If I get through these three steps and don’t find any real deviation in any of these numbers from the norm, then my usual method is to wait for more data. It sounds like a copout and to a certain degree, it is. Sometimes you just need to see more. It’s possible it could just be noisy data, a cold streak or simply bad luck. At that point, you just need more data. It is also incredibly important to understand that there is a decent amount of nuance at work. There is a big difference between a player I expect radical positive regression from and players I expect to rebound but their stats let me know it won’t be a full rebound. I will try to indicate that nuance whenever possible.

Here’s an example of my methodology at work:

Jose Abreu, 1B, White Sox

BABIP Baseline – Somewhere between .300 and .320;

Depth Charts BABIP – .303

Current BABIP – .220

There appears to be no injury on record. Admittedly, I mostly conducted a Google search, but I also checked Scott Merkin’s Twitter (he’s the White Sox beat writer), and there is no mention of an injury or anything nagging at Jose Abreu.
The first thing I see is Abreu’s line-drive percentage, which has dropped below 20.0% to 17.4%, while his ground-ball rate dropped nearly a full percentage point and his fly-ball percentage went up nearly five points. Right here, this could explain a decent chunk of his BABIP drop as fly balls will often suppress BABIP, while ground balls will prop up BABIP. Ideally, we want line drives, as those both stabilize BABIP and are the ideal hits. More on this when we get to the Statcast data. In addition, Abreu is pulling the ball nearly eight percentage points higher than in 2018. I don’t think this is a change in approach, but we should watch this trend to see if it continues. The AL Central is pretty shift-heavy, so I could also see an increased pull rate holding him back a bit. We’ll have to see if that trend continues, but it would stand to reason that his batted-ball results are affecting his output so far, especially when you consider that he has reduced his infield-fly ball percentage as well. It is also worth noting that Abreu’s K rate has risen over six percentage points this season, which considering that he has actually dropped his O-swing rate by over five percentage points while holding steady in terms of contact and swinging-strike percentages, seems more like an outlier than a trend. Either way, let’s keep an eye on that as well.
Abreu’s exit velocity holds steady with past years, but his launch angle, BBL% and hard-hit percentage really stand out. So far, Abreu has put together an astonishing 20.0 BBL%, which ranks 15th in MLB. He managed to improve his HH% by 6.9% so far to 52.8%, while his LA increased 4.6 degrees. To be hitting the ball that hard and with that launch angle and only have three home runs and a .189 batting average to show for it screams that something hinky is going here. In fact, his xStats data backs that up. his xBA says he should be hitting closer to .244 with over 130 more points of slugging, and a wOBA closer to .360 is well-above average. With better-batted ball luck, we go from a .639 OPS hitter to someone who has hit the ball more like a hitter with .849 OPS, and that’s a hitter well worth rostering all formats. Definitely some bad luck in play so far.
It’s Chicago in April. It is obscenely cold there on Lake Michigan during this time of the year, and while we can’t fully quantify the effects the weather has had on Abreu it’s, unlikely it hasn’t effected the Cuban so far through season. For his career, Abreu is a pretty slow starter, hitting nearly 34 points lower in average in the first half compared to the second half. In fact here are his batting averages and BABIPs from March 28 to April 20 over the last three years:

Year AVG BABIP

2018 .273 .280

2017 .200 .268

2016 .204 .222

It certainly seems like early-season slumps are par for the course for Abreu until things warm up in the Midwest. Historical precedent absolutely counts for something here.
Three of the teams Abreu has faced so far are the Indians, Rays, and Yankees, who share these ranks pitching-wise:

Team K% ERA AVG HR

Indians 1st 4th 4th 1st

Rays 2nd 2nd 3rd 5th

Yankees 5th 9th 15th 7th

So it seems reasonable to assume that part of Abreu’s struggles has also been a product of facing three of the best pitching staffs in baseball so far. This early in the season, that can have a huge effect on such a small sample size.

In conclusion, I expect a huge rebound ROS from Abreu across all categories, especially if he continues to barrel the ball as well as he has so far in 2019. As the season warms up, he starts to face better competition and gets better-batted ball luck, I expect we’ll see that BABIP skyrocket back up to his career norms, and we should see the Abreu we know and love.

Recommendation: BUY while you can! If you have Abreu hang tight he should be back to his old self soon.

Anthony Rizzo, 1B, Cubs

Normal BABIP – Somewhere between .270 and .300

Depth Charts BABIP – .281

Current BABIP – .170
1. I don’t see any articles or news citing an injury, and after thoroughly scanning Cubs beat writer Carrie Muskat’s Twitter, I did not find any report of an injury there either.
2. Much like Abreu, Anthony Rizzo’s LD% is way down from his norm and his FB% is up. He is also hitting way more infield fly balls than last year. Interestingly he is going the other way when he makes contact with an Oppo% about 10 percentage points higher than his career norm. It will be interesting to see if that continues all season or is just small-sample noise. It’s also interesting because typically, going the other way tends to create more base hits and limit fly balls, while he is hitting more fly balls so far this season and hitting for an abysmal .169 average. It is also worth noting that his BB% and K% are both WAY up from last year without any real significant change in his plate discipline stats, so that could be having an effect on things as well, as both of those events are not considered balls in play and so they can warm results to a certain degree.
3. Rizzo’s EV and LA are pretty much right in line with his career numbers, but his BBL% is up from last year and his HH% is way, way down. Other than that last number, this seems like business as usual for Rizzo. XStats say that he should actually be hitting around .212, which doesn’t sound great, but it’s over 40 points higher than his .169 so luck is definitely having a hand in his results so far. They also say he is missing out on just over 50 points of slugging and 34 points of wOBA. That would put Rizzo back up closer to an .800 OPS hitter, which makes his early-season struggles way more palatable.
4. Another Chicago-in-April player; anyone who follows the Cubs knows that on any given day the weather can have a huge effect on the game. Is the wind blowing in or out? Is it cold outside? Has the lake jacked up the humidity? These all can mess with a Chicago player. The Cubs have only played six home games, but that’s enough for the weather to potentially skew things. Also playing the majority of your games on the road will have a negative effect on any player’s numbers. Just like Abreu, Rizzo can be a bit of slow starter as well—if you remember his awful April last year. The moment the calendar hit May and things warmed up, he started mashing the ball like good ole dependable Rizzo. Here are Rizzo’s numbers over the last three years for March 28 through April 20:
  
  Year AVG BABIP
  
  2018 .150 .161
  
  2017 .293 .333
  
  2016 .189 .154
  
  Outside of 2017 when he got off to a hot start, his 2016 and 2018 start to look exactly like his 2019 season so far. Both of those seasons, Rizzo went on to be a fantasy stud. Especially since there’s nothing in his other numbers to indicate that anything has changed or is wrong, I would expect nothing different this year.
Recommendation: BUY as quickly as you can! If you already own him, hold on to hi; he’s going to be just fine.

Niko Goodrum, UTIL, Tigers

Normal BABIP – This harder to determine since he only has one full season under his belt but in 2018 he had a BABIP 0f .312

Depth Charts BABIP – .307

Current BABIP – .350

The first two hitters I went over are pretty straight-forward examples, but before we wrap things up I wanted to give an example of how BABIP can sometimes hide potential positive regression and disguise a young player growing as a hitter. Niko Goodrum is having himself a season so far, and as our own Nick Gerli pointed out, the breakout looks legit. With that being said, it would be hard to look at Goodrum’s BABIP and not expect some heavy regression to come his way. Here’s the thing: That BABIP might actually represent a new baseline, or at least something close to one. One of the hardest things to do when interpreting BABIP is deciding what to do about a young player who appears to be making the leap. So let’s walk through the steps and see if we think Goodrum’s season so far represents good fortune or potentially the new norm for a budding star.
1. 1. No injuries that I can find. It’d be weird if he was hurt and breaking out, but if he was dealing with a nagging injury and he was still doing this, I would absolutely lean more toward determining that it was luck-based. Since that isn’t the case, we can rule that out.
  2. Goodrum’s batted-ball data is nuts. He’s hitting roughly 40.0% line drives. That’s obviously unsustainable, but it gives a great indication of his approach, and when you combine that with a 52.0% pull rate, it’s easy to see why he’s absolutely crushing the ball and why his BABIP is so high. Given that last year he put up a 22.0 LD%, I don’t see why he could keep at least two-thirds of that as the season wears on. His BB% is way up and his K% is down, which indicates a hitter seeing the ball better than ever, and improvements in his O-Swing% and SwStr% could indicate that a big chunk of those improvements might be here to stay.
  3. Here’s where things get real. So far Goodrum has improved his BBL% to over 14.0% and has improved his EV and LA while adding nearly 20.0% to his HH%. All of that supports both a large chunk of the BABIP growth and the results he has gotten so far. The xStats are where things get really intriguing. They actually say he’s gotten UNLUCKY so far. In 2019 Goodrum has a .281/.388/.897 line with a .383 wOBA. His xBA indicates he really should be hitting for a .320 average with a .625 slugging percentage and a .442 wOBA! That’s nuts. It’s key to understand that I don’t think this means that Goodrum is going to go all Mookie Betts on us, but I do think it indicates that both Goodrum’s new BABIP and his numbers might be here to stay. Like I said before, you wouldn’t be faulted if you saw the BABIP leap and immediately dismissed what Goodrum has done so far, but the deeper you dive in, it looks like Goodrum might establishing a new baseline BABIP for use to be judging him by. His Statcast and xStats data are things we should be keeping an eye on as we see fluctuations in his BABIP so we can get a sense of just where that new BABIP line ends up.
  4. Despite Detroit being pretty cold this time of year, it doesn’t seem to have affected Goodrum at all, so I don’t think we likely have to take it into consideration too much. While Goodrum has certainly faced some weak opponents such as the Blue Jays and Royals, he has also faced the Yankees and Indians, so he’s seen both the best and worst so far and found success against both
Recommendation: Unless you get an offer you can’t refuse (perhaps one of the elite guys we talked about earlier) then I would say hold on to Goodrum, as he absolutely looks like the real deal so far through the 2019 season. I’m sure he will slump at some point as all players do, but his new high BABIP may be an indicator of the new normal for Goodrum, and his skills so far back that up. He’s earned his success, and that’s really exciting.

So that’s how we take a look at players whose BABIPs indicate they are in for positive regression. I hope this article gives you some line of reasoning as to looking at your own players and being able to interpret their data for yourselves. I apologize if I didn’t take a look at a specific player you may be curious about. If you would like some specific interpretation on a player, please feel free to name one in the comments or reach out to me on Twitter @DanielJPort and I will try to get my take on as many I can! In the meantime, I’ll be back next week to take a look at players with neutral BABIPs and hitters who should expect negative regression based on their BABIPs!

(Photo by Quinn Harris/Icon Sportswire)

7 responses to “Going Deep: Interpreting BABIP and Players with Positive Regression on the Horizon”

Dave Cherman says:

April 20, 2019 at 4:37 PM

https://www.reddit.com/r/fantasybaseball/comments/87v9va/babip_a_primer_for_hitters/
This statistical study is very helpful when looking at BABIP to note the specific batted ball factors and their relationship with BABIP.

Mike P says:

April 20, 2019 at 11:09 PM

Any insight on Kris Bryant? Do you see him turning it around. Or did a blow my second round pick on a average to below average guy?

- Daniel Port says:
  
  April 22, 2019 at 2:02 PM
  
  I definitely have my concerns on Bryant. After looking at his numbers though I’m starting to suspect a large part of his down numbers might be due to an approach change thanks to the shift.
  
  1. I don’t see any reports that Bryant is hurt or feeling any lingering effects from his shoulder injury but I’m skeptical that he is 100%. Could be lingering weakness from the injury but there are some worrying signs.
  
  2. His LD% is down 5.3% from last year but it’s actually now more in line with his previous seasons so that’s actually not that problematic. What is troubling is the drop in his FB% to a career-low 36.4%. It’s not his worst number for the month of April (that was last year’s 31.8%) so it’s getting better. This is the number I’d watch the most as we head into May, if we see that number get back up above 40.0% than I think he’ll be fine.
  It’s when we get too his spray chart though that this all starts to make more sense to me. Traditionally Bryant is known as a heavy pull hitter and while he is still pulling the ball a ton this year we’ve actually seen his Oppo% go up 12.7% to 32.7% which actually seems like a more likely explanation for the drop in his FB%, especially if he is deliberately trying to go the other way. Why would he do that? I haven’t watched a ton of Cubs games yet this season so I can’t say for sure but my suspicion is he’s trying to beat the shift. According to his Fangraphs splits, Bryant has faced the shift on 49 of his 87 Plate apperances. Balls hit to the opposite field tend be hit less hard and on the ground so it might not be that worrying after all. This is the other thing I would keep an eye on but there’s definitely some reason for optimism if it’s true. I’ll try to watch some of his at-bats once I get some time and see if I can suss out if that’s what he’s trying to do.
  
  3. If he is trying to go the other way more often that would also help explain the drop in most of his Statcast numbers. Other than his EV, almost all of his numbers are down. Unfortunately, his xStats back up his results so far but he might still be trying to figure out the new approach. Hopefully it’ll fall into place for him soon.
  4. Just like with Rizzo, Bryant has played in a ton of cold weather so far so there’s reason to suspect that it has had some affect on his play it’s just hard to tell how much.
  
  Recommendation: Hold for now. Keep an eye on his FB% and Oppo% for any changes. We get to the middle of May and I think we’ll have much more sure idea of what the rest of his season will look like. I’ll try to watch some of his at-bats either today or tomorrow and see if I can get a better sense of if it is a deliberate change to his approach or just random early season noise.
  
Luke says:

April 21, 2019 at 9:53 AM

Any comments on Winker and Puig? Both have relatively low BABIP.

- Daniel Port says:
  
  April 22, 2019 at 3:39 PM
  
  WINKER
  
  1. I don’t see any reported injuries on Winker anywhere, in fact I’m actually seeing a lot of reports that his shoulder feels great. Could be the effects of weakness lingering in his shoulder but there’s not a great way to know if that’s the case.
  2. Most of Winker’s batted ball data actually lines up with his historical norms. Much like Kris Bryant though we are seeing an uptick in Winker’s Oppo% with it climbing all the way up to 32.7%. Just like Bryant this would make a lot of sense if he is trying to beat the shift since he has faced the shift on 32 of his 64 PAs so far this season. Here’s the thing. When he goes Oppo so far this he’s absolutely crushed the ball. He’s batting .353 when he hits the ball the other way including five home runs when he drives the ball to opposite field while he is currently has a 77.0 GB% when he pulls the ball, which then plays right into the heart of the shift. If he continues to find that much success when he goes the other way teams won’t shift him for very long which should open the field back up for Winker. Normally when he pulls the ball he has hit for closer to a 63.0 GB% which is less than ideal but certainly shows how there is room for improvement. One last note on his batted ball data. He has an astronomical HR/FB% right now at 31.6%. It’s good to remember that a HR doesn’t count as a ball in play and so can absolutely affect his BABIP. Winker tends to hit doubles at a rate of roughly one per every 17 ABs which is illuminating since he hasn’t hit one yet. Given that he has 64 ABs on the season he should have hit roughly 3 to 4 more doubles than he does. All of his home runs that I have distance data for are of a distance of 375 feet or less which makes think alot of these home runs could have been doubles instead which would have definitely boosted his BABIP. He also has been walking less without any real change in his plate discipline metrics so that will go back up which means less balls in play as well.
  
  3.The Statcast data is very encouraging. BBL% and EV are all up while his LA is right around the same as last year. xBA says he should be hitting closer to .259 with a .342 wOBA which definitely shows he’s been getting unlucky so far this season and his BABIP backs that up as well.
  
  4. It’s been super cold in Cincy (and most the rest of the NL Central) so I wouldn’t be shocked if this has had an effect on his performance as well.
  
  I’m actually really encouraged by these numbers. I still am all aboard the Winker train and these numbers show we should temper patience. Keep an eye on his Oppo% and see whether or not he continues to have the same level of success and whether or not teams keep shifting on him but for now I recommend holding on to Winker and he’s a great buy-low candidate as well. I’m running to lunch now but I’ll get the Puig take up later today!
  
TheKraken says:

April 21, 2019 at 11:12 AM

I just assume that this year should be more like an entire career than not. No need to make a process of it.

Dr. Tobias Funke says:

April 21, 2019 at 10:01 PM

I love articles like this that help us understand the process behind interpreting data. Great work, Daniel. Thanks so much.

AL East

AL Central

AL West

NL East

NL Central

NL West

Going Deep: Interpreting BABIP and Players with Positive Regression on the Horizon

Daniel Port

7 responses to “Going Deep: Interpreting BABIP and Players with Positive Regression on the Horizon”

Leave a Reply to Dr. Tobias Funke Cancel reply

Year	AVG	BABIP
2018	.273	.280
2017	.200	.268
2016	.204	.222

Team	K%	ERA	AVG	HR
Indians	1st	4th	4th	1st
Rays	2nd	2nd	3rd	5th
Yankees	5th	9th	15th	7th

Year	AVG	BABIP
2018	.150	.161
2017	.293	.333
2016	.189	.154