Nba nerd

Why a Steal Isn't Really Worth Nine Points

I'm a bit late to comment on this article, but several days ago, Benjamin Morris wrote an article on FiveThirtyEight about the hidden value of steals:

To illustrate this, I created a regression using each player’s box score stats (points, rebounds, assists, blocks, steals and turnovers) to predict how much teams would suffer when someone couldn’t play. The results:

Morris-Predictive-Ability-1

Yes, this pretty much means a steal is “worth” as much as nine points. To put it more precisely: A marginal steal is weighted nine times more heavily when predicting a player’s impact than a marginal point.

Uh...wow? First, an aside. I'm also late to this bandwagon, but so far my opinion of FiveThirtyEight is very similar to to that of many blogging economists like Noah Smith, Paul Krugman, and Tyler Cowen: most of the articles represent incredibly sloppy writing, sloppy analysis, or both. I think Krugman expresses it best:

Unfortunately, Silver seems to have taken the wrong lesson from his election-forecasting success. In that case, he pitted his statistical approach against campaign-narrative pundits, who turned out to know approximately nothing. What he seems to have concluded is that there are no experts anywhere, that a smart data analyst can and should ignore all that.

But not all fields are like that — in fact, even political analysis isn’t like that, if you talk to political scientists instead of political reporters. So, for example, before glancing at some correlation and asserting causation, you really should talk to the researchers.

So let's return to this article. There are many problems with it, but let's start with the premise. I think what Morris is trying to say is that, in general, points and assists are easier to replace than steals. So additional points and assists are worth much less to wins and point margin than are steals. He just does an extremely poor job at getting that point across. And if you work out the marginal value, they do in fact line up decently to things like wins produced.

And sure, the impact of an increase in a player's steals scored per game has greater correlation to wins than points per game (which has a fantastically crappy correlation with winning).

But...

Morris completely misses the boat on offensive rebounds – which are clearly a good candidate for one of the "kinds of things" that is relatively unmarred by replaceability – by lumping them together with defensive rebounds. He has a natural test for his thesis, but misses the boat on it due to lack of familiarity with the sport. A few years ago, David Berri adjusted the weighted value of a defensive rebound because he found that there are diminishing returns: if you put two great defensive rebounders on the floor, they "take away" rebounds from each other. This is easy and intuitive to understand if you realize that some defensive rebounds, like after a free throw, are relatively uncontested. But no such effect exists for offensive rebounds at all. The two are really different skills, and one of them is far more "irreplacable" than the other (and, incidentally, that's why it isn't a coincidence that one has twice the impact on wins produced than the other).

Next, there's the issue of this regression. The upshot of these articles so far is that all of these alluded-to-but-undisclosed models and regressions feel suspect. I find myself very skeptical that these guys are accounting for all of the big caveats that effect how closely their models track reality. But by not disclosing those models, they are implicitly asking us to take it for granted that they are. I've seen enough screwups in real scientific studies not to afford them that.

It's immensely important that someone disclose what is in a regression. You really can't tell if a regression is "reasonable" or not (notice, I'm not using the word "right") if you don't even know what is in the model. By not doing this, Morris should have no hope of persuading anyone that he has a clue about what he's talking about. It's not clear what regression he is actually running. Is he running a series of separate regression like this:

Wins (or point margin) = f(PTS)

Wins (or point margin) = f(STL)

Or is it something like this:

Wins (or point margin) = f(PTS, STL, other stuff)

The first approach is really wrong, and the second approach will not work if you did not specify the other stuff correctly. Since he doesn't disclose the methodology, we're left to guess. For instance, it sure looks like Morris is conflating points and shot attempts into a single variable. What he's seeing is that 9 points corresponds to, on average, a +1 in Win Score terms. This seems like a reasonable, albeit meaningless, estimate.

It's meaningless because, as we say over and over and over, shooting efficiency matters. A point is worth 0.0032 wins, but a shot attempt is worth negative 0.0032 wins. Meaning that if you only score one point per shot attempt, you break even. If you boil this all down to "points" without accounting for this, you are grossly misunderstanding in what sense "points" are replacable. Let me put it this way: scoring is easily replacable, but efficient scoring is incredibly hard to replace. It's easy to find guys who score double digits, but very hard to find guys who can reliably score 34 points on 22 shots per 48 minutes. So...are LeBron's points per game easily "replacable"?

And where is offensive efficiency and defensive efficiency, or, hell, point margin, in all of this? I don't think anyone is really suffering under the illusion that points-per-game, in a vacuum, correlates with winning (ask the 90/91 Nuggets, right?). It's points-per-possession that matters.

There's a lot of subtlety in the boxscore that is missed by even lifelong fans (team rebounds anybody?). Silver's mission statement for 538 says that he thinks that he can take his doctor's bag of stats tools to any domain and tease out actionable causal relationships. But so far, I think that the writing on the blog has not supported this claim. There's a shoddiness to the analysis and a lack of familiarity with subject matters. At this point, it seems like they're missing things that are fairly obvious to experts in many fields and have not shown a mastery of the domains that they cover.

Again, to Krugman's point, just because Silver encountered a bunch of idiot pundits in one area (and showed them up in spectacular fashion, for which I will love him forever) does not mean that there are no experts anywhere, or that you never have to show your work.

Good read, was hoping someone would comment on that article.

Edit:
"And sure, the impact of an increase in a player's STEALS per game has greater correlation to wins than points per game (which has a fantastically crappy correlation with winning)."
Part of the problem with 538 is that it's trying to publish at the same volume a non-analysis site would. It takes a lot longer to do an analysis (to say nothing of a good one) and then write about it, than it does to just write an article. The editorial schedule doesn't fit well with the content.
I also had a hard time figuring out the linked post by Benjamin Morris. In particular, I don't understand how he measures the "difference in SRS (simple rating system, or average margin of victory/defeat adjusted for strength of schedule) with or without" each particular player. Is this like plus/minus? Or is he just measuring the "without" when a player happens to be injured?

That said, Silver's response to Krugman was pure gold:
http://fivethirtyeight.com/datalab/for-columnist-a-change-of-tone/
To Al_S:

I found Silver's response extremely underwhelming based on the (in my opinion excellent) opposition to some of his journalists. I think Silver has given a lot of support to people who have more in common with the political news hacks (Morning Joe) than with what he was doing in political journalism. I think this site came out trying to make a splash and has failed spectacularly. Its strange, Silver's analysis is still pretty good but he seem to have zero recognition about identifying his own talents in others outside of the political field (I thought his columnists for 538 while at the times were pretty good). Its too bad, if he clears ship and actually focuses on his sites "mission statement" than things might get better but it is a pretty bad site right now except for Silver's own work.
Great article. Steals are good. Possessions are good, esp. when opp. is denied a good shot opportunity: a steal w/ 1 sec remaining on the shot clock is not worth too much. But teams often win games with 0 steals or almost no steals. Krugman is a tool tho
When someone makes a grandiose claim like this, I've got to wonder about statistical significance in addition to interpretive value. And if players' points scored per game have a low correlation to SRS strength, then even a small confidence interval on the value could correspond to a gigantic one if you use it the denominator in a ratio. (Using ratios seems doubly strange when we consider that SRS is already in MoV points.)
When you're missing really important controls (like, oh, shots taken) then there's every reason to believe that your point estimates are biased.

If all you're trying to do is make predictions about what is going to happen, that is ok! All you care about for prediction is model fit and whether you are interpolating or extrapolating that model.

Directly interpreting the coefficients of a biased model, well...suffice to say that biases in your model translate to false or misleading interpretations. Forget statistical significance; this doesn't even pass the smell test.
76ers lead the league in steals and give up the second most amount of steals. Miami ranks high in both as well. Seems highly dependent on pace and coaching philosophy. Seems hard to parse the effect of gambling for a steal with the effect in has on defense. Steal almost always leads to a high percentage shot. Still hard to see who is "good" at stealing and who just likes to go for steals. We have stats for failed shot attempts and so we can easily see who is good at scoring and who just likes to shoot. I'm inclined to call them "yay steals" for now.
My comment on that article was that points per game is not a stat. You miss so much by multiplying together FGA and TSP instead of treating them as separate. TSP is repeatable and valuable, and this analysis missed that. I appreciate the disparaging of ppg, but it's done from an unsound foundation: ignoring players' shooting efficiency.

In a larger sense though, this kind of article is what seems to me to be characterizing the new site, which has really disappointed me. Nick is spot on when he says that they're trying to do the same volume of work as other sites and therefore being low on quality quantitative analysis. They are using numbers, but not doing enough model building. So thanks to you guys for building a great model, constantly tweaking it, building off of it, and doing rigorous analysis every time you go out on a limb. It's not about quantity in the field of statistical research, except when it comes to sample size, and Fivethiryeight seems to have forgotten both of those things.
When I read the original article, I found it somewhat odd that Morris used raw point totals without at least considering field goal attempts (fgas, ftas, and tos would be better). Also, his writing made it seem like he ran separate correlations for each stat, which would be very wrong because the amount of error involved would be very large. I can only hope that he used better methodology but did not bother explaining it because it would bore readers. Even entertainment writing regarding quantitative analysis needs some rigor. There is a danger that this sort of sloppiness will turn people away from numbers.
Sorry to double post, but I also wanted to point out that assists may not be as easily replaceable if assists/turnover ratios are considered.
Steals (and blocks and presses/traps) are high variance strategies. Philly may be leading the league in steals but they're also leading the league in opponents FTA's and near the top in opponents offensive rebounds. Those numbers indicate failed steal attempts more often than not. The Heat successfully make the trade-off because their pps and afg% are through the roof vis a vis transition baskets.
Good analysis. Stats are hard without an underlying model to guide you.
On that subject, I feel like your Lebron James quote could be truncated and still remain true, regardless of efficiency:

"but [it's] very hard to find guys who can reliably score 34 points"

Why is that, do you think? How does the underlying model you use to guide you explain it?

And while we're on the subject of properly building causation into our statistical analysis: I don't feel like your "Professor Berri cleverly realised the difference between offensive and defensive rebounds and built it into his model so as not to overvalue uncontested rebounds" is a particularly fair summary of the history there. Perhaps "eventually listened to the critics he had previous dismissed and..." would be a better opening clause.
@Lacemaker, you can't simply truncate that Lebron quote. Ultimately there are a dozens of guys in the NBA that could score 34+ points several times a season, and at least a dozen who could score 34+ppg for an ENTIRE season, if you force fed them the ball and played at a fast pace. All you need is a willingness to take bad shots and a coach to let you. Monta Ellis had a similar approach for YEARS in Golden State. Off the top of my head: Rudy Gay, DeMarcus Cousins, Jamal Crawford, LaMarcus Aldridge, Melo, Dirk, Kyrie Irving, OJ Mayo, Damien Lilliard, Blake Griffin, Steph Curry, Klay Thompson, Derrick Rose, all to go with the usual suspects of Lebron, Wade, Durant, Westbrook, Harden, and Love.
I agree. And yet none of them do. Which suggests that coaching determines shot allocation and that marginal field goal percentage doesn't equal average field goal percentage. Which is bad news for any model which assumes the contrary.
Yes yes, but I haven't seen your formula for wins produced either. I see that it differs from the one used by www.basketball-reference.com/ Maybe there was an article on that some time, but I haven't seen it. I would be very curious to see your formula, explanation why in your opinion it is better than other formulas for the same thing.

Sign in to write a comment.