We can’t tell you which teams will win this weekend’s AFC and NFC championship games, but we can tell you how many people we think will be arrested at each. In Atlanta, if the Falcons win, 1 fan will be arrested. If Green Bay wins, 3 fans will be arrested. In Foxborough, if the Patriots win, 11 fans will be arrested. And if the Steelers win, 13 fans will be arrested.
How We Made These Predictions
In a grad level econometrics class at Purdue, my classmates and I were tasked with presenting statistical findings from socioeconomic data. While most economists are busy figuring out how to predict the next financial crisis, solve poverty, or predict how many people will buy their company’s next product, I was more interested in what I might be able to learn about my favorite rowdy NFL fans from the Washington Post’s NFL arrests dataset.
Kent Babb and Steven Rich of the Washington Post organized public record requests from police departments that oversaw NFL stadium security between 2011 and 2015, and they provided the total number of arrests made at each game along with other game-specific stats like the time of day that the game was played, the final scores of the home and away team, and whether or not it was a division game or went into overtime. Of the 31 jurisdictions in which there is an NFL stadium (note that the Giants and the Jets reside in the same jurisdiction), Cleveland and New Orleans were the only precincts that did not submit any data at all. Buffalo, Miami, and Oakland provided only partial records and had to be omitted from the data set, which honestly kind of sucks, because we know Oakland would have the GOAT arrest totals. Precincts in Detroit, Minneapolis, and Atlanta also excluded parking lot arrests, which means we might be missing a few booze-fueled tailgating arrests from this study.
I wanted to create a model to predict the chances of getting arrested at an NFL game. The Washington Post data set gave me a few of the pieces of the puzzle in knowing whether or not the home team won and the time of the game, but I had a few other questions.
- Did higher attendance relative to stadium capacity lead to rowdier fans, and, more arrests?
- Did people tend to drink more on a hot day and find themselves in the tank more often than cold days?
- Did controlling for local crime rates change the probability of being arrested?
In order to answer these questions, I merged data from Pro-Football Reference.com on game attendance, NFL stadium seating capacity from Wikipedia, game time weather from NFLSavant.com, and the FBI’s Uniform Crime Report with the Washington Post dataset to come up with the best overall look at local conditions during an NFL game that I could.
The model I generated from this data tells us that the probability of being arrested at an NFL game is higher if:
- The home team loses
- The game starts later
- Attendance is low relative to stadium capacity
- The weather is hot
- The local violent crime rate is high
- The local property crime rate is low
If the model relied on factors 1, 3, and 4 alone, Jacksonville would have this one in the bag!
I’ll explain the math used with broad strokes here, but if you really want the details, here’s the original term paper I submitted to Purdue. It should be noted that I received a 98% on the paper, which is enough of a confidence boost for me to publicly reveal this model. It should also be noted that my mom has yet to hang this term paper on the fridge.
The model was estimated using linear regression, and here is the primary equation:
arrest2attend = .0023167 – .000024hometeamwin + .0001295gametime – .000265attend2capac + .0000514ltemp + .0000508lhomeviocrmrt – .0003217lhomepropcrmrt
Let’s also take a second to explain each variable in the model:
- arrest2attend = total arrests at an NFL game divided by the total attendance for that game
- hometeamwin = 1 if the home team won, 0 if they lost
- gametime = the local time that the game was played expressed as a fraction of 1 (i.e. if the game had started at noon, it would be represented as .50 because noon is half way through the day)
- attend2capac = total game attendance divided by the stadium’s capacity
- ltemp = the natural log of the temperature (in Fahrenheit) during game play
- lhomeviocrmrt = the natural log of the local violent crime rate
- lhomepropcrmrt = the natural log of the local property crime rate
For the stats nerds out there, this model had an R-Squared of .2885, and the heteroskedastic robust standard errors for each independent variable implied statistical significance to at least the 10% level. For those who hated stats, this model was, scientifically speaking, not bad.
The implications of the model actually make sense. If the home team loses, and we assume the majority of the crowd at any game is there to support the home team, we can expect a few upset fans to act irrationally, or, criminally. If the game starts later, there’s more time for tailgating and all the fun that comes with that.
The fact that lower relative attendance implied a higher probability of arrests may not reveal anything about social conditions at a game, but more about the nature of statistics. Let’s say 5 people each are arrested at 2 different games attended by 40,000 and 50,000 people, respectively. The chance of being arrested (arrests divided by attendance) at the game with lower attendance is 25% higher than that at the higher attended game, even though the total number of arrests was the same.
The weather is another interesting factor. Anecdotally, sun and heat lead to day drinking (naturally, Green Bay residents may be the exception). Heat can lead to higher instances of dehydration, which may lead to higher cases of public intoxication. But, this could also be a testament to regional cultures. Perhaps police forces in the south, where it’s warmer, are more likely to make arrests than in the north. Or maybe fans in warmer regions really are just rowdier than the north (though, back to our data omissions, we’d really like to officially verify this against Oakland and Buffalo).
With crime rates, one might attribute the difference in violent crime rates and property crime rates to police force allocation. In a region with high relative violent crime rates, precincts may be more likely to send officers into stadiums in an attempt to curb violence (our model thanks Philadelphia for not making the playoffs this year). In an area with higher relative property crime rates, precincts may be more likely to keep officers on the streets to deter theft and vandalism while the city is otherwise preoccupied with the game.
It should be noted that while the Washington Post’s data set included data from 2011-2015, I was only able to retrieve weather data from 2011-2013. So, this model is based on game data that is several years old, and should be taken with a grain of salt. That said, we used this model to predict the number of arrests that might occur this weekend at each conference championship game.
Predicting the Number of Arrests at This Weekend’s AFC and NFC Championship Games
We had to look at two outcomes for each game: one where the home team won, and one where they lost. From there, we just filled in the blanks on the model provided above. Green Bay at Atlanta is set to start at 3:05pm local (.628 in our time units), and Pittsburgh at New England is set to start at 6:40pm local (.778). Being conference championship games, we expect sellout crowds, so we set the attendance-to-capacity parameter to 1. The current game time forecast (as of the morning of January 19th, 2017) for Atlanta is 68 degrees Fahrenheit, and the same for Foxborough is 47. In Atlanta, the average violent crime rate for the last 4 years has been 399.3 for every 100,000 residents, and the property crime rate has been 3,374.4 out of 100,000. In Boston, the violent crime rate has been 507.3 out of 100,000, and the property crime rate has been 2,208 out of 100,000.
Plugging all of these numbers into the model above, and multiplying the expected probability of being arrested at each game by the expected attendance, we arrived at the predictions stated in the intro.
The results table looks something like this:
More than likely, the arrest figures we predicted here will not hold since we’re using regular season data obtained from 2011-2013 to predict events at postseason games played in 2017. In actuality, arrests may be higher due to the playoff atmosphere and likely larger security and police presence. Further, linear regression is not a perfect science, and this model cannot account for unexpected events like fan rioting, or some goofball in a hat and a red shirt showing up. Regardless, since the variables in the model make sense from a socioeconomic standpoint, it will be interesting to see how our predictions for this weekend’s game time arrests hold up in practice!
Let us solve your analytics problem.
Latest posts by Patrick Brown (see all)
- The Only Two Metrics That Actually Matter In Advertising - September 4, 2019
- Integrating Google Enhanced Ecommerce and the Facebook Pixel - May 9, 2019
- Math for Marketers: Quadratic Trendlines - February 22, 2017