In this golden age of baseball analytics, the data resources available to us for our analyses are anything but lacking. But what if we wanted to focus on a more obscure test that would require raw, unfiltered data?
Our focus today will be on generating that through a simple Python script and saving it to our machine. We want to put emphasis on quickness of access, therefore we’ll limit each script execution to a single player and year. The resulting file will contain the Statcast values of every pitch thrown by a pitcher, or faced by a batter, in a season. Now that the plan is set, let’s jump in.
Before we start, make sure the requests package is installed (see https://docs.python-requests.org/en/latest/user/install/#install for instructions).
The first thing we’ll want to do is set the parameters for our importer. Since we’ll be fetching a batter or pitcher’s yearly data, we’ll need a player name and a year.
Next, we’ll define a function that asks for our selection from a list of players whose names match the one we provided. The reason we’re doing it this way is because BaseballSavant relies on a player’s unique number ID (assigned by Major League Baseball) to generate the data but just like website domain names and their associated IP addresses, keeping track of number assignments is much more challenging than remembering actual words, in this case, the player’s name.
It’s important to note that using common names will result in a bigger list of matches, making the player selection longer if the one we’re looking for is at the bottom of that list. Instead of using the name Jones, opt for something more precise like Adam Jones.
And now for the most important step: data retrieval. Other than the hfSea, player_id and player_type parameters, we can set our own constraints directly in the code. For example, if we only want the data if the pitcher threw at least 100 pitches, set min_pitches to 100 instead of 0. Or if we only want regular season data, set hfGT to R| (NOTE: the value associated to hfGT must end with |).
Once that data is received, we’ll want to save it to a csv file. The file we’ll write to will be named after the player and saved in the same directory as the Python file. However, if we want to save it elsewhere or use a different name, simply modify the player_name, filename and/or filepath variables.
Finally, we’ll combine all of those functions into one file. For the sake of not overpopulating this page with code, I’ll only use the names of the functions surrounded by square brackets. Make sure to replace them with the actual function code before running the script.
Save that file and voilà we’re all set!
Here are a few examples of how to run the importer (make sure to replace [file] with the name of the Python file):
- Mookie Betts batting data for 2021:
python3 [file] --year 2021 --batter "mookie betts"
- Max Scherzer pitching data for 2019:
python3 [file] --year 2019 --pitcher "max scherzer"
- Madison Bumgarner batting data for 2018:
python3 [file] --year 2018 --batter "madison bumgarner"
The data we’re pulling is from BaseballSavant so always remember to quote them accordingly. For the full documentation of what each column in the csv file represents, refer to https://baseballsavant.mlb.com/csv-docs.
Happy hacking to all you number junkies out there!
Featured Image by Justin Paradis (@JustParaDesigns on Twitter)