Whoops, The Numbers Were Off

Our numbers have looked a little...off...this season, and yesterday I discovered why. I beleive that I have fixed the issue, and so you may want to go back to the players page (or to your favorite team) and re-check the numbers, if you've been tracking them all season long (spoiler alert: LeBron James is still at the top).

I've been getting emails about the fact that there were way too many centers in the top 50. At first, I dismissed this as an artifact of small sample size (during the first month or so), but as the season went on, it became an obvious problem: since our rankings adjust for position, it's obviously difficult for half of the NBA's starting centers to be 2-3 times better than the average center, because that's not really how the bell curve works.

So my next suspicion was that the way we combine power forwards and centers into one "big man" position was flawed, because nowadays teams tend to play a lot of players at the power forward who can shoot 3s but not do very much else that big men need to do -- in other words, if nearly half of the leagues "big men" are really just small forwards playing out of position, then the top half of centers is going to look really good by comparison. We've beaten this drum a lot, but small ball lineups tend to work best when your "small" players are really good at things big men tend to do, like rebound or defend the paint. Or when you're "small ball lineup" center is really seven feet tall.

But that wasn't it either. When I took a look at the distribution of minutes played at every position, I noticed that things weren't adding up. At the time I was debugging, Atlanta, for instance, had 16825 player minutes, but only 16715 minutes from players' total playing time, and only 9754 minutes from all the "time spent playing X position" minutes.

First I discovered that Andrew White was missing. That explained the missing 110 minutes from total time. Then, I discovered that the math that we use to distribute playing time by position, which we do down to the second in granularity, had a problem:

if seconds_played == 0
  self.seconds_played = self.minutes * 60
end

Well -- it looks like we only convert minutes into seconds when it's 0. That can't be good. And indeed, it turns out that there used to be a line in the daily parsing that converted minutes into seconds_played, and somewhere around a month into the season I deleted that line (why would I delete that line!?), meaning that minutes kept updating, but seconds_played did not. I should probably change seconds_played to just be a derived field -- I frankly have no idea why there are two fields that both measure time played (I blame my past self, he's a bit of an idiot), but that's a refactor for a future date. So, since the minutes (seconds) played at each position just didn't add up to the team's (and therefore the league's) total minutes played, this skewed a lot of the averages.

So...we're sorry about that. I'm in the process of building some interactive tools that give you better insight into how our numbers are generated, which will hopefully make mistakes like this a lot more obvious in the future.

Loading...