I'm a bit late to comment on this article, but several days ago, Benjamin Morris wrote an article on FiveThirtyEight about the hidden value of steals:
To illustrate this, I created a regression using each player’s box score stats (points, rebounds, assists, blocks, steals and turnovers) to predict how much teams would suffer when someone couldn’t play. The results:
Yes, this pretty much means a steal is “worth” as much as nine points. To put it more precisely: A marginal steal is weighted nine times more heavily when predicting a player’s impact than a marginal point.
Uh...wow? First, an aside. I'm also late to this bandwagon, but so far my opinion of FiveThirtyEight is very similar to to that of many blogging economists like Noah Smith, Paul Krugman, and Tyler Cowen: most of the articles represent incredibly sloppy writing, sloppy analysis, or both. I think Krugman expresses it best:
Unfortunately, Silver seems to have taken the wrong lesson from his election-forecasting success. In that case, he pitted his statistical approach against campaign-narrative pundits, who turned out to know approximately nothing. What he seems to have concluded is that there are no experts anywhere, that a smart data analyst can and should ignore all that.
But not all fields are like that — in fact, even political analysis isn’t like that, if you talk to political scientists instead of political reporters. So, for example, before glancing at some correlation and asserting causation, you really should talk to the researchers.
So let's return to this article. There are many problems with it, but let's start with the premise. I think what Morris is trying to say is that, in general, points and assists are easier to replace than steals. So additional points and assists are worth much less to wins and point margin than are steals. He just does an extremely poor job at getting that point across. And if you work out the marginal value, they do in fact line up decently to things like wins produced.
And sure, the impact of an increase in a player's steals scored per game has greater correlation to wins than points per game (which has a fantastically crappy correlation with winning).
Morris completely misses the boat on offensive rebounds – which are clearly a good candidate for one of the "kinds of things" that is relatively unmarred by replaceability – by lumping them together with defensive rebounds. He has a natural test for his thesis, but misses the boat on it due to lack of familiarity with the sport. A few years ago, David Berri adjusted the weighted value of a defensive rebound because he found that there are diminishing returns: if you put two great defensive rebounders on the floor, they "take away" rebounds from each other. This is easy and intuitive to understand if you realize that some defensive rebounds, like after a free throw, are relatively uncontested. But no such effect exists for offensive rebounds at all. The two are really different skills, and one of them is far more "irreplacable" than the other (and, incidentally, that's why it isn't a coincidence that one has twice the impact on wins produced than the other).
Next, there's the issue of this regression. The upshot of these articles so far is that all of these alluded-to-but-undisclosed models and regressions feel suspect. I find myself very skeptical that these guys are accounting for all of the big caveats that effect how closely their models track reality. But by not disclosing those models, they are implicitly asking us to take it for granted that they are. I've seen enough screwups in real scientific studies not to afford them that.
It's immensely important that someone disclose what is in a regression. You really can't tell if a regression is "reasonable" or not (notice, I'm not using the word "right") if you don't even know what is in the model. By not doing this, Morris should have no hope of persuading anyone that he has a clue about what he's talking about. It's not clear what regression he is actually running. Is he running a series of separate regression like this:
Wins (or point margin) = f(PTS)
Wins (or point margin) = f(STL)
Or is it something like this:
Wins (or point margin) = f(PTS, STL, other stuff)
The first approach is really wrong, and the second approach will not work if you did not specify the other stuff correctly. Since he doesn't disclose the methodology, we're left to guess. For instance, it sure looks like Morris is conflating points and shot attempts into a single variable. What he's seeing is that 9 points corresponds to, on average, a +1 in Win Score terms. This seems like a reasonable, albeit meaningless, estimate.
It's meaningless because, as we say over and over and over, shooting efficiency matters. A point is worth 0.0032 wins, but a shot attempt is worth negative 0.0032 wins. Meaning that if you only score one point per shot attempt, you break even. If you boil this all down to "points" without accounting for this, you are grossly misunderstanding in what sense "points" are replacable. Let me put it this way: scoring is easily replacable, but efficient scoring is incredibly hard to replace. It's easy to find guys who score double digits, but very hard to find guys who can reliably score 34 points on 22 shots per 48 minutes. So...are LeBron's points per game easily "replacable"?
And where is offensive efficiency and defensive efficiency, or, hell, point margin, in all of this? I don't think anyone is really suffering under the illusion that points-per-game, in a vacuum, correlates with winning (ask the 90/91 Nuggets, right?). It's points-per-possession that matters.
There's a lot of subtlety in the boxscore that is missed by even lifelong fans (team rebounds anybody?). Silver's mission statement for 538 says that he thinks that he can take his doctor's bag of stats tools to any domain and tease out actionable causal relationships. But so far, I think that the writing on the blog has not supported this claim. There's a shoddiness to the analysis and a lack of familiarity with subject matters. At this point, it seems like they're missing things that are fairly obvious to experts in many fields and have not shown a mastery of the domains that they cover.
Again, to Krugman's point, just because Silver encountered a bunch of idiot pundits in one area (and showed them up in spectacular fashion, for which I will love him forever) does not mean that there are no experts anywhere, or that you never have to show your work.