We found a really fun dataset to play around with: every Major League Baseball pitch thrown so far in 2014, as tracked by the PITCHf/x system. It includes everything you’d want to know about a pitch: it’s speed, the type of pitch, how much it broke on each spatial axis, how many men were on base at the time, who threw it, and much more.
We’ve grabbed all the pre-All Star break data from 2014* and popped it into Statwing.
*A few very small notes about the dataset:
- Okay, it’s not technically every pitch in 2014 since (1) some pitchers traded teams and for esoteric reasons that made some of the data work unnecessarily difficult, so we threw out a very small number of datapoints, and (2) a few percent of pitches don’t appear to have been tracked by the cameras, so only game context like outs and count are available, not pitch details like speed
- Here’s a good explanation of break distance, if you looked at the dataset and were wondering.
- The pitch types are estimated by algorithm based on trajectory and speed. It’s not a perfect classification. More on this.