Thoughts On DataBall

A few weeks ago, Kirk Goldsberry wrote an article called DataBall on Grantland. The Box Score Geeks were talking about this in an email exchange, and our colleague Jeremy Britton had a few thoughts that I will paraphrase and plagiarize for you here.

Let's start with this very illustrative quote from the piece, which underlines the power of storytelling:

The box score reduces that sequence to a few basic integers; Kawhi Leonard is credited with one field goal attempt, one field goal made, and three points scored. Tim Duncan’s screen goes undocumented, and the totality of Parker’s catalytic undertakings gets recorded as one measly assist.

I'm reminded of a researcher who said human brains are story processing machines, not logic processing machines. We crave stories first. We reason second. 

Mr. Goldberry's line of research is fine. I like a good story. I also think that there is social value somewhere in there with all these romantic ideas about "the totality of catalytic undertakings", but it's just not quite the kind of value Goldberry thinks (i.e. I don't think it is going to tell us much about how players help their teams win – or lose). 

What is interesting – but also a shame – is that somewhere along the way we lost sight of what the boxscore data was for and what it purported to explain. I think this is partly due to bad design. A boxscore packs in a lot of information, but it emphasizes point totals as the main metric of player value. In fact, we'll probably never know just how great Bill Russell or Wilt Chamberlain really were because they played at a time when the NBA didn't record blocks, steals or turnovers. If you told me that Wilt averaged six blocks a game, given footage I've seen, I would probably believe it (for perspective, only nine players with qualifying numbers have averaged more than four blocks a game since it was introduced to the box score, and only two have had more than five.* But remember that Wilt was an iron man; he holds the top seven minutes-per-game seasons, and has what is probably the most unbreakable record in pro sports, averaging a ridiculous 48.5 minutes per game in 1961-62).

Here at Box Score Geeks, we often refer to "the turnover era" as the era where box scores in the NBA started containing steals, turnovers, and blocks, and therefore contained enough information to measure individual performance (1973-74 – right after Wilt retired, dammit). Regardless, basketball culture has grown up with some big misconceptions about how the game actually works, and is still blind to these misconceptions. It's what we mock when we refer to the "YAY! POINTS!" metric.

I also think that today, access to tools and techniques for data analysis is outpacing the process of learning it takes to do science properly. This creates a glut of storytelling rationalized with sciency-sounding stuff – a lot of recent articles on "advanced metrics" are all very wibbly-wobbly, timey wimey. All too easily, these metrics get built on top of a pile of flawed assumptions about how the game works.

At the end of the day, our criticism with almost every metric of evaluating an individual's contribution boils down to wins. How does some metric contribute to wins? Most models or formulas simply gloss over this step and assign value to things (like the way PER places value on shots taken, even if they don't go in) based simply on the assumption that they are important.

In other words, it's not going to matter that you have hundreds of times more data – and are getting many more answers to your questions – if you're still asking the wrong questions. It will be interesting to see if all this data experimentation one day stumbles onto the insights that Professor Berri's Wins Produced model gave us from a more humble dataset; I have my doubts. I think it will lead to lots of analysis along the lines of "player X is extremely good at skillset Y," where no one bothers to ask what effect skillset Y has on winning games.


* Those nine players? Mark Eaton (four times), Manute Bol (twice), Elmore Smith, Kareem (twice), Hakeem (three times), Mutombo (twice), Tree Rollins, Patrick Ewing, and David Robinson. Eaton and Bol were the only two to average five or more.