# The math behind March Madness

## A Q&A with statistician Shane Jensen, who discusses the math behind sports team rankings, why March Madness has so many underdog victories, and how technology might change how analysts study sports teams in the future.

March Madness kicks off this week. With an estimated average viewership of 5.6 million people per game for the Round of 32 matches back in 2017, the NCAA men’s basketball tournament has become a springtime phenomenon for both casual and serious fans of college sports.

Now that the top teams in the country are preparing for their first matches, many might be wondering how to go about filling in their March Madness bracket. Last year, Warren Buffett raised the stakes for that of a typical office pool by offering his employees \$1 million a year for life if they could correctly guess all of the teams that reached the Sweet 16.

No one won the grand prize due to a number of upsets in the early rounds, which shouldn’t be surprising as there are more than 9.2 quintillion ways to fill in a bracket. Even if a billion people filled a March Madness bracket every year, the chances of a perfect bracket would be very low, even for hundreds of years of tournaments.

Penn Today sat down with Shane Jensen,  associate professor of statistics in the Wharton School, to talk about sports analytics, how to pick a “better” bracket, and what the future holds for sports rankings and matchup predictions in the future.

There are 351 Division I men’s basketball teams. How do sports analysts rank such a large number of teams, given that they will not all play one another during the season?

Sports teams are generally ranked using the Elo rating system. It was initially developed for chess and allows you to rank players that haven’t all played each other. For team sports, Elo looks at who a team won or lost against, as well as who did their opponents win or lose against, and uses these pair-wise outcomes to infer a team’s strength.

With Elo, if a good team plays what the model thinks is a bad team, and the good team loses, the good team’s ranking is hurt more than if they lost to another good team. But there are some disadvantages to this system because it is only based on who won and doesn’t account for the strength of victory; for example, it doesn’t take into account if a team won by 20 points or one point.

There are more sophisticated ranking systems that take into account strength of victory or other more detailed aspects of in-game performance. Cade Massey and Rufus Peabody publish their football rankings each week in The Wall Street Journal. The Massey-Peabody index is one of the most sophisticated football ranking systems out there by taking into account score differences as well as play-by-play data to infer things like offensive and defensive efficiency.

Only 64 teams (around 18 percent) make it to the first round of March Madness. How are teams selected?

Because there are many considerations that must be taken into account when selecting teams for March Madness, such as guaranteed entries for certain conferences, there is a Selection Committee that ranks and selects teams for the NCAA tournament. There will always be a few teams that are no-brainers, but teams that have similar records will need to be looked at more closely. The committee will usually try to make some distinction for these teams based on the strength of opposition.

Using a ranking system like Elo, if a team piles up a really good record against teams that end up mostly losing otherwise, it’s not going to improve that team’s ranking as much as winning against other good teams. With basketball, the nice thing is that there are so many games in the season, so each team is able to play a greater diversity of opponents. You still end up having some imbalance though, since each team would have to play hundreds of games in a season to compete against all of the of teams in Division I.

What’s the best way to select teams for a bracket? Are there other metrics you can look at besides a team’s ranking?

Individual matchups of teams can be informative, especially if you have a lot of knowledge about basketball. You could look at a specific game and see something like, “This team is really good defensively, but I think that’s a bad matchup against this particular opponent.” There are also matchups that you can predict without a lot of knowledge about college basketball, like that a No. 1 seed will usually beat a No. 16 seed.

The problem is that if you have to set your entire bracket ahead of time; looking at individual matches is something you can only really do for the first couple of rounds. After that, there’s enough randomness in teams getting knocked out that you probably can’t even predict who’s going to play each other after the second round.

What is it about March Madness that makes the tournament so exciting?

It’s the perfect combination of a sport where games can be played relatively frequently and that also has a large number of teams. While college football fans would love a giant tournament at the end of the season that involved 16 or 32 teams, you can’t play football more than once a week, and so a large tournament would take too much time to complete.

The tournament structure itself is also very unique. The fact that you have so many teams, and each team can be knocked out by a single game, makes each of the games really exciting. Each individual game is random enough that you see upsets all the time. If you’ve got an imbalanced matchup, where you’ve got a not-so-good team playing a good team, the best chance that the not-so good-team has is to just beat them in one game. Playoff formats like the NBA are not nearly as unpredictable as March Madness because the chances of an upset in a seven-game series is much less than in a single game.

If you want to set up something where the better team actually wins more often, then longer game series are ideal. But if you are a casual fan and you want to see upsets, or you want to see cool and unexpected things happening, a March Madness-style tournament is ideally suited for that sort of drama.

How do you think analytics will change sports in the future?

Most areas of human endeavor are becoming more data-oriented, and sports analytics is a very active front in this data revolution. There are already a lot of fascinating analyses using new sources of sports data, and I think this progress will continue for the foreseeable future.

Basketball is the sport with the most progressive development in the area of individual-player and ball-tracking technology. For the past few years, they have been using video data to track the positional coordinates of the ball and each individual player on the court throughout every NBA game. This high resolution data allows us to move beyond historical metrics like shooting efficiency or number of rebounds in order to examine more detailed aspects of game play, such as the accuracy of individual passes or how much space players create around themselves..

Ranking systems will also probably improve as more detailed game-play data get incorporated into ranking models. However, there is an upper bound on how we will be able to improve our predictions as there is inherent randomness and luck in every game outcome. We’re never going to get to a point where we can perfectly predict an upcoming game, and we wouldn’t want that to happen because that would be a lot less exciting.

Who will you be rooting for during March Madness?

I like to cheer for whoever ends up in the tournament from the Ivy League conference. I also like to cheer for the little guys, the teams from places that I never would have heard of if not for basketball.