Every MLB pitch thrown in 2014

We found a really fun dataset to play around with: every Major League Baseball pitch thrown so far in 2014, as tracked by the PITCHf/x system. It includes everything you’d want to know about a pitch: it’s speed, the type of pitch, how much it broke on each spatial axis, how many men were on base at the time, who threw it, and much more. Screen Shot 2014-07-23 at 8.29.31 PM

We’ve grabbed all the pre-All Star break data from 2014* and popped it into Statwing.

Check it out.


Hat tip to Jeff Zimmerman of Baseball Heat Maps for organizing the data neatly into a convenient PITCHf/x download (MySQL). Here’s the very transformed version we ended up dropping into Statwing.

*A few very small notes about the dataset:

  • Okay, it’s not technically every pitch in 2014 since (1) some pitchers traded teams and for esoteric reasons that made some of the data work unnecessarily difficult, so we threw out a very small number of datapoints, and (2) a few percent of pitches don’t appear to have been tracked by the cameras, so only game context like outs and count are available, not pitch details like speed
  • Here’s a good explanation of break distance, if you looked at the dataset and were wondering.
  • The pitch types are estimated by algorithm based on trajectory and speed. It’s not a perfect classification. More on this.

Statwing Pricing Change

We’re changing our pricing structure:We’re eliminating the free Public planThe Silver plan will now be $50/month, up from $25/monthSilver plan users will remain at $25/month until August 1, at which point any new charges will be $50/month. We’ll email again at that time to remind you.Existing Public plan users will also continue to have free accessContinue Reading

Results of Stack Overflow survey of 20,000 software developers

There’s some fun stuff in this dataset. We’ll skip the pre-analysis and let you dive right in yourself: https://www.statwing.com/demos/dev-survey-2 Enjoy!

Visualizing crime in Chicago

We just put up a fun guest blog post over on Socrata’s blog: http://www.socrata.com/blog/crime-time-visualizing-crime-data-chicago/ Here’s a little teaser from that post, the daily rhythms of various crimes in Chicago: Enjoy the post!

Results of HN poll: Half think bootcamp grads as good as fresh CS majors

We wanted to get a sense of how well code bootcamps prepare their graduates for work. So we asked Hacker News readers who had worked with a graduate to evaluate them.(See methodology) Most importantly we asked, “Would you rather work with the graduate, or the median fresh-out-of-college Computer Science-major you’ve worked with?”  While on averageContinue Reading

Visualize any public CSV on github in a few clicks

1. Find a random public CSV on Github If you don’t personally have any public CSVs of github data, click this preloaded, semi-randomized search to find some CSVs to upload: Google search for github CSVs (Click a result from the search then grab the URL, copying a link directly from the result page doesn’t work)Continue Reading

Heartbleed vulnerability fixed

Recently a weakness was discovered in OpenSSL, the secure cryptographic communication library that just about every site on the web relies upon. A patch was recently issued, and we’ve upgraded our infrastructure so that we’re no longer vulnerable.  That is all, carry on.

8 tips and tricks for getting the most out of Statwing

1. When you first load a dataset, get an overview of the dataset and look for dirty data by selecting all the variables…   …and then selecting Describe.     2. You can also relate one variable to every other variable in the dataset by putting the key by that variable and then selecting Relate.Continue Reading

Statwing can now handle gigabytes of data

Most data isn’t big data; very few companies have terabytes of data on hand. But quite a few companies have what we would call “medium data”, the lower bound of which we define as “data that won’t fit into Excel without a whole lot of pain”. Depending on your computer, that limit could be reachedContinue Reading

New Feature: Survey Weighting

We’re happy to announce the deployment of survey weighting. If you have a dataset with a column of weights, here’s how to use it: First, go to the variable settings by clicking the name of the dataset in the top left of the interface: Then, select which variable you would like to use as weights,Continue Reading