Dataset: Ten Years of NFL Plays Analyzed, Visualized, Quizzified (Downloadable)


Statwing is an easy-to-use data analysis tool, available for individual use or embedded into other products.

It’s third-and-3 and you desperately need a first down. What do you do, run or pass?

We’ve structured ten years of NFL play-by-play data (raw data complements of Advanced NFL Stats), then
uploaded it into Statwing for analysis.
Now you can test your coaching instincts against the data.

Early in the game, the score is tied. You have fourth-and-goal at the 2-yard line. What should you do?



Correct

Wrong. You should go for it.

Wrong. You should go for it.

When teams go for it on fourth-and-goal from the 2, they get a touchdown 45% of the time. So on average teams get 3.1 points when they go for it—roughly the same amount they’d expect if they kicked a field goal, since only 2% of field goals are missed at that range.

Outcome of going for it on fourth and goal

Click the image to explore this analysis. If you want your analyses to save, though, you’ll need to use the link at the top of the page to play with the dataset.

But that’s not all. If you’re stopped on fourth-and-goal, the opponent starts with terrible field position. You’ll even get a safety about 5% of the time. By comparison, you can expect the opponent to start from the 23-yard line after a kickoff following a made field goal,
and they will even have a 0.5% chance of returning the kickoff for a touchdown

Distribution of field positions immediately after kickoff

Click the image to see the full NFL 2012 regular season dataset in Statwing. It contains all the analyses cited above.

Going for it and kicking a field goal both yield about 3 points on average, and the field position is much better if you go for it.
Despite this, coaches usually kick a field goal on fourth-and-goal. During the last ten seasons coaches went for the touchdown about 20% of the time in this situation.

You’ve gotten out of answers correct so far.

It’s third-and-1. Which type of run is most likely to result in a first down?


Correct

Wrong. Side story: your author’s mom always yelled at the Chiefs not to go up the middle on third-and-short.
It saddens your author to find out that she was mostly likely leading the Chiefs astray.

Going up the gut just barely beats running around the end.

Run direction on third and one

Click the image to see an even more detailed breakdown of running plays (e.g., off-center versus off-guard).

You’ve gotten out of answers correct so far.

On third-and-3, are you more likely to pick up a first down by running the ball or passing it?



Sort of. Good enough.

Running was not statistically significantly more likely to yield a first down, but it did trend slightly above passing (51% vs 49%), so we’ll give it to you.

Wrong

Running actually trends towards being more effective than passing, at 51% vs 49% (though the difference isn’t statistically significant, so the best answer is “equally likely to work”.

Correct

Running was not statistically significantly more likely to yield a first down (though it did trend slightly above passing (51% vs 49%)

In case you’re curious, here are the odds of picking up a first down on third-and-x, split by running versus passing:

Likelihood of getting a first down on third by run vs pass

Runs are statistically significantly more effective with 1 yard to go, and passes are more effective with 4+ yards to go.

As an aside, coaches tend to pass on third with more than a yard to go.

Third down play selection by yardage

Coaches very rarely run on third down with three or more yards to go.

You’ve gotten out of questions correct so far.

You need a two-point conversion. What kind of play should you call?



Correct

Wrong

Wrong

During the last ten years running has succeeded 62% of the time, versus 46% for passing.

This seems odd because we just found out that running was only microscopically better than passing on third-and-2. But a two-point conversion is different from a typical third-and-2; the defense isn’t spread out, so it’s hard for receivers to find gaps in the coverage.

Run vs pass on 2-point conversion

Click the image to see statistical data, explore this analysis, and play with the rest of the dataset in Statwing.

This suggests that coaches should run more often than they currently do.

Coaches pass on 2-point conversion

Click the image to see the confidence intervals and play with the rest of the dataset in Statwing.

You’ve gotten out of answers correct so far.

Would you like to see pretty data visualizations about punting?


Correct.You would, it seems.

Sorry, that’s incorrect, you would love to see pretty data visualization about punting. Suprised you didn’t know that.

Punts have gotten roughly half of a yard longer per season over the ten year period.

Punt lengths over time

Click the image to see statistical data, explore this analysis, and play with the rest of the dataset in Statwing.

But does that mean punters are increasingly outpunting their coverage? After all, longer punts beget longer returns.

Punt length and return length are positively correlated

Like the other binned scatterplot, this visualization was made automatically in Statwing with 3 clicks.

Nope! We looked into it, and while returns have gotten longer, they only got longer at about .15 yards per season, and a pretty similar number of punts aren’t returned at all.


Thanks for playing

You got out of answers correct.
When you try this quiz with a sorry quiz-taker like you, that’s the result you’re going to get. [That’s a joke, we think you did fine :)]
When you try this quiz with a sorry quiz-taker like you, that’s the result you’re going to get. [That’s a joke, we think you did fine :)]
When you try this quiz with a sorry quiz-taker like you, that’s the result you’re going to get. [That’s a joke, we think you did fine :)]
When you try this quiz with a sorry quiz-taker like you, that’s the result you’re going to get. [That’s a joke, we think you did fine :)]
When you try this quiz with a sorry quiz-taker like you, that’s the result you’re going to get. [That’s a joke, we think you did fine :)]
You’re the best quiz-taker in the game.

Tweet your quiz results




Discussion on Hacker News

A special thanks to David Laughlin, who wrote most of the copy for this post. David is available for freelance work at
davidclaughlin@gmail.com.


See notes below to download the data.

Update: burntsushi from Hacker News created some tools that make it easy to query this kind of data. We haven’t looked at them in depth but they look much more efficient than the datasets we link to below.

Notes


The original data from
Advanced NFL Stats is mostly free-text play descriptions, which we interpreted into structured data using Excel. The original data does have a few errors here and there, not all of which could be cleaned up. Some plays are missing, and a some plays have some inaccurate data, maybe 0.5%.

We’re very confident that you’ll have a much easier time exploring the data in Statwing than in Excel or another tool. So we encourage you to try analyzing it in Statwing first. To save your analyses, use this link, not the ones linked to in the images above.

But, if you don’t believe us or you want to modify the data, you can download the raw CSV of our version of the data.

In 2003, 2004, and 2005, the data doesn’t discriminate between a QB scramble and a run. So for many analyses (like many of the above), you’ll want to filter out those years.


We make the following assumptions throughout:

  • You are coaching the “average” team. Individual teams would vary, but the data we use is the average of all teams’ behaviors in whatever situation we are analyzing.
  • There are more than five minutes left in the half or game.
  • Unless otherwise noted, you’re between your ten and your opponents 20-yard line.
  • Yardage from penalties committed during a play is included in the outcome.
  • The hypothetical coach does not always call the same type of play in the same situation.
    That is, the coach randomizes play calling enough to be unpredictable while still favoring the more advantageous plays.
    The very awesome Brian Burke at Advanced NFL Stats does a great job of
    describing randomization and game theory.


In the last ten years, there have only been a few instances where teams went for it on fourth-and-goal at the 2-yard line. It turns out, though, that going for the end zone on third-and-short is pretty similarly successful to going for it on fourth-and-short, so we used both types of plays in this analysis. For example, the 45% figure was calculated using both third- and fourth-and-2. Here, as with the rest of this fourth-and-2 analysis, we were inspired by a 2002 paper by David Romer. Romer is a notable Berkeley economist, and his wife chaired the White House Council of Economic Advisors in 2009 and 2010.


If objectives other than just picking up a first down are considered, there is evidence that running is better. Brian at Advanced NFL Stats does a great job of diving into that question, though you might want to learn about the concept of expected points before reading Brian’s analysis.

Sorry, comments are closed for this post.