Data, Statistics & Fantasy Football

Posted by Katie Mowery on August 30, 2018

It’s that time again – (no, we’re not talking about the early release of Starbucks Pumpkin Spice Lattes), it’s American football season and arguably more interesting… the start of Fantasy Football season.

For those who partake in the chest-thumping, smack-talking, armchair quarterbacking display of pure football passion – the next 16-18 weeks are a master class in a level of data analysis and advanced statistics you thought was only possible by MIT grads. 

Acronyms for the different data points available are flying across the message boards – ADP, ADP for a non-PPR, PRJ, YPP, EPAPP (I’m not making this up – it’s Expected Points Added Per Play). Whether you’re using Yahoo, ESPN, NFL.com or a homegrown fantasy football platform, you’re likely relying on data and algorithms to help you get to the championship. Unless you're one of those "go with your gut" kind of people. (That's not me.) 22fjsz

A few years ago, NYT writer Boris Chen, wrote an article about the advanced statistics behind fantasy football analysis. As opposed to relying on expert rankings to determine the value and associated rank of a player – he applied a clustering algorithm called the Gaussian mixture model to find natural tiers and groups within the data and help address the limitation found across most ranking systems which is the monotonic ordering.

Ex) You need to fill a few slots on your roster and are looking at who is available on the website – ranked in order. RB#1 is ranked 18th. RB#2 is ranked 30th. And the third guy, RB#3 is ranked 32nd.  What the rankings do not reflect, is the gap that could exist between player’s who may only be ranked a few slots apart. The rankings would lead you to believe that RB#2 and RB#3 were fairly comparable – only ranked 2 apart, but the reality is that there could be a huge talent gap between RB#2 and RB#3 which isn’t obvious based on their rankings. To put this mathematically, the rankings lead us to believe RB#1 > RB#2 > RB#3, but it could be that RB#1 >> RB#2 = RB#3.

This is where it gets really crazy… Boris Chen does a full analysis each year (don’t be turned off by the underwhelming site design), tiering players based on the expert consensus data from FantasyPros.com (a selected pool of consistently accurate FFBall analysts) and applying the Gaussian mixture model. This leads to a tiered ranking system for each position resulting in some data visualization to help you pick your lineups each week.

Speaking of setting your lineups…

Data Experts and all-around cool guys, Greg Reda and Mike Stringer (seriously, check them out), wrote a fun analysis back in 2014 on the DataScope blog about the accuracy of ESPN’s fantasy football projections. Their quote pretty much sums up my thought every single fantasy week for the last 8 years.

“ESPN’s fantasy football projections are way off. They're projecting Kelvin Benjamin will score 18.4 points in my league. He's only scored over 18 points twice, and one of those times was 18.2 points. He averages 13 points a game. How do they come up with this stuff?!“

So when you’re tinkering with your lineup 10 minutes before kick-off, how do you know what to trust? Does ESPN have inflated projections?

Leave it up to Data Scientists at Datascope (Now part of IDEO) to come up with an analysis (and code) to get the answer. The team looked at the Absolute Error for “fantasy relevant” players that ESPN has projected to score at least 1 fantasy point. The absolute error being the difference between projected points and actual points scored. And then the Mean Absolute Error bootstrapping it by randomly sampling from it with replacement. I’ll let you check it all out for yourself, but hint… they did get to an answer.

And if you REALLY like digging into the data (because here at Utopia we speak data all day, every day), Fantasy Football Analytics has an API (FFA API) that allows developers to access fantasy data in JSON format. (Subscription required) The API has two endpoints, “proj” and “adp”, which allow the user to get averaged projections and ADP data, respectively. API requests are expected to be made in the form of a simple HTTP GET, with users specifying various query parameters. They claim that their projections incorporate more sources than any other site. There is also a gold mining chart for ease of use and reference during a draft and the ability to compare ADP to projections to find undervalued players.

So whether you rely on the rankings or the projections to set your rosters – there is a vast amount of data out there available. How are YOU using data to crush your opponents this year?? 

Topics: Analytics