Is Your Bracket Busted? You Should Have Used Data.

Let me start off by saying that this blog has no direct tie to our business... but at Utopia Global, data is always on our minds and there’s no better time to “think data” than during the NCAA March Madness tournament.

The odds of picking a perfect NCAA tournament bracket (that’s 63 correct winners) is 1 in 9 quintillion. Even with the relative predictable outcomes of many matchups, According to NCAA.com, after Round 1, 265 perfect brackets were recorded across the 6 largest tournament game websites. By the end of the second round, there were 0.

If you look at the dataset of tournament results for the last 32 years, there’s been an average of 9 first round upsets per year. 2015 had a record 15 upsets in the first 32 games, but in 2017, there were just 7. However, before you could deem this year a “boring and predictable tournament”, the second and third rounds hit everyone like a wall of bricks. 11-seed Xavier beat 3-seed Florida St. and Wisconsin as an 8-seed, beat Villanova, defending National Champions and number 1-seed. Then in one of the first games of the Sweet Sixteen, 11-seed Xavier upsets Arizona, a legitimate 2-seed – destroying what was left of anyone’s West bracket.

And then there will always be twists in the tournament storyline - unforeseen circumstances, emotions, adrenaline or a renewed sense of appreciation for life (ie: Michigan’s airplane crash). There’s no stat sheet that can account for those situations, but when it comes to playing the bracket straight by the numbers, the data doesn’t lie.

In a fun, interactive tool built by NCAA.com, you can explore different tournament matchups which uses historical data from all tournament games from 1985 to the present. You’ll immediately notice some clear trends. A number 1-seed wins only 53.2% of the time over a 2-seed. Teams who wear orange have defeated teams wearing red on 59% of the occasions. And a 16-seed has NEVER beaten a 1-seed. EVER.

Why does it seem like 12-seeds win over 5-seeds more than your typical underdog matchups? Because according to the data, they do almost 40% of the time!  If you analyze the matchups, it points to the following reason: The best teams from the small conferences (Ivy-League, Mountain-West, etc.) who earn bids typically reach their seed ceiling at 12 (mostly due to their conference schedules and lack of play against AP-ranked teams). But #5-seeds are likely to have a few conference loses, maybe even were on the bubble at one point in the season, but yet the selection committee and all of their mysterious wonder gave them the benefit of the doubt.

If people are not using historical data to predict their winners, then what is their strategy? If you ask your group of friends, family or colleagues you’re likely to hear a variety of things. There is the traditionalist who picks based on seed rankings. Some give priority to their alma mater. Maybe it’s jersey colors. Or do you always pick whoever is playing Duke? (Hint: Unless you’re from Durham, North Carolina, you probably have at least a little hatred towards Duke)

If you’re data geeks like we are at Utopia, you might want to dig into the datasets that are available through awesome platforms like Kaggle and Data.World. You just might end up finding something which could propel you to tournament bracket glory. One thing you can be sure of, more times than not… better data leads to better decisions. And Utopia believes that “Perfect Data is Perfectly Possible”. Best of luck.

Topics: Data Quality