An Argument for Transparent Player Analytics

We have a great column from an amazing special guest. Ari Caroline (@aricaroline) is the Chief Analytics Officer at Memorial Sloan-Kettering Cancer Center. Ari is no stranger to applying analytics and stats to difficult problems. He's been looking at basketball statistics and models for years, and presents the great point that it's important to see how the analytics work when we're looking at individual player analysis.

Analysis Paralysis

All types of complex analytic methodologies are currently in vogue in the NBA. SportsVu systems are ubiquitous in NBA stadiums and measure minute play-by-play data, from the number of isolation plays to the ratio of contested to uncontested shots. NBA coaches are fed detailed plus/minus stats for all possible lineup calculations. Heck, there are even balls that measure the arc of every shot.

Now, I’m not a Luddite. In fact, my whole life has been devoted to technology and analytics. At Memorial Sloan Kettering Cancer Center, where I work, we’ve devoted more resources to analytics than just about any academic medical center in the world. We too have embraced many forms of complex analytics, including machine learning, natural language processing, discrete-event simulation, and complex statistical clustering models for combing through terabytes of genomic sequencing data. I should emphasize that all of this is separate from our work with IBM to train their Watson system in oncology.

However, we’ve also learned how to estimate a “return on investment” for our analytic efforts. If a complex model is overkill to answer a simple question, we can handle it with a basic regression model. To put this another way, in many cases, there is decreasing marginal return that you get by introducing greater complexity to the analysis. At some point, in fact, the marginal return of greater complexity actually becomes negative. The current generation of NBA analysts, in my opinion, has not yet arrived at this conclusion. Furthermore, I fear that analytics overkill, to some degree, risks undermining what is an otherwise positive trend.

The Beauty of Transparency

When looking at how basketball is played, I would argue that a compelling argument can be made for what I call "Transparent Player Analytics". The transparent part is relatively straight-forward. Black-box analytic methodologies, or even those that are just so complex that they defy logical explanation, may help in building an aura of wizardry around their creators. However, they fail on just about every other level. Even when they prove to have strong predictive power, without understanding what underlies that predictive power, it's nearly impossible to act upon it. 

For example, if rosters remain relatively static, a model could show great predictive power in terms of wins and still mistakenly attribute the success of a team to players who are actually negative contributors. Similarly, statistics like minutes played and usage % could boost the superficial performance of a model. This too, however, is not actionable information for the GM (unless he is planning on a trip to Vegas). Certainly, playing time is linked to a player's overall contribution. Ball hogging also works to pad stats. But any model that includes these two stats as positive contributors implies that you increase a player's value just by playing him more or allowing him to take bad shots with impunity. By that measure, Mo Cheeks did exactly the right thing by giving Josh Smith so much playing time.

To be actionable for an NBA GM, an analytics model must explain much more than simply which player ranks over another. To be useful, the model must transparently explain WHY this is the case. Even more important, the logical basis of the analysis – when properly explained – should jibe with the subjective knowledge of the experts, the coaches, and GMs in this case.

Person vs. Thing

The argument for focusing on player analytics in basketball is more subtle, but equally compelling. In other sports, particularly football, complex interactions between all the players drive the success or failure of every play. This is not the case in basketball. It is true that a really good point guard improves the performance of the players around him. However, we can isolate most of this contribution using assists as a proxy variable and subtracting that contribution from the shooters. If you attempt to delve any further and analyze factors like lineup combinations and play design, the result is significantly decreasing marginal returns on your analytic efforts. The complexity of the models increases exponentially the further you deviate from straight player analytics. As we discussed before, that complexity can make the output unintelligible and unactionable. More importantly, it also offers little in return. That is because, in basketball, you can tell 90% of the story just by using each player's box score statistics. It’s your basic 80/20 (or in this case 90/10) Pareto Principle. Sure, you can probably milk another 2-3% of the story using SportsVu, adjusted +/-, and the other complex stats methodologies that are currently in vogue. However, you do so at the risk of distracting from the main story: Good players generally play consistently well (and intelligently) from year to year, and the value of the team is pretty much just the sum of the value of the players on the floor.

Even injury analysis, which certainly has its place, offers little marginal return on complexity if you are already doing good player analysis. Player analysis, when done correctly, actually empowers behaviors that reduce the number of injuries. To understand why this is so, it's helpful to borrow a tool from the medical world and do a quick “Cochrane” review of the analyses that have already been done on injuries in the NBA. Analyses done to date yield a short list of predominant risk factors associated with injuries. Near the top of the list are playing time and usage. This should make sense. The more you run around and bang into people, the more likely you are to get injured. However, if you are doing your player analytics correctly, it's very easy to develop a deep bench of positive contributors. The Spurs don't burn out their star players because they have 12 players that can contribute meaningful minutes. With a deep bench, there is no need to play your starters for 3.5 quarters every game, and there is certainly no need to have your "superstars" force up 90% of the team's shots through traffic.

The Last Word from the Editors

Ari kindly said I could wrap up the piece. As we've harped on for years, the complexity of a stat should not be its selling point. If a stat tells you something, but you can't act on it, it's no good. Even Stan Van Gundy agrees! I am fully behind the "Transparent Player Analysis" movement, because honestly it's the only way forward. Coaches and front offices who are skeptical of numbers are not going to get less skeptical the more complex and opaque they become. Fans will not gain a deeper understanding of the game if they can't see why the "advanced stats" like or dislike a player. If we add more trasnparent/testable models though? Well, I'd like to see how that goes.

Loading...