Teachable Moments #1: Sample Size and Confounding Variables with Jusuf Nurkic

The Nuggets jettisoned Jusuf Nurkic and acquired Mason Plumlee. On paper, we like this move. Nurkic has yet to be an above average big, whereas we’ve thought Plumlee rightfully deserved a spot on the U.S. National Squad. Kevin Pelton also provided some analysis, which had a different take. And if you want to hear in depth why I disagree, tune into this week’s Boxscore Geeks Podcast. However, in reviewing his analysis I found two common analytics issues, and I figured I’d take this as a teachable moment to discuss them. Today let's talk about sample size and confounding variables!

You can view Kevin Pelton’s Insider Article here. It is behind a paywall.

Sample Size

Pelton notes that it makes sense for the Nuggets to trade Nurkic, which we agree with. Then his argument falls off the rails when he says this:

The shortcomings of the Jokic-Nurkic duo are inarguable. According to NBA.com/Stats, Denver was outscored by an incredible 15.6 points per 100 possessions with Jokic and Nurkic on the court together, ghastly no matter the sample size.

His argument centers around Nurkic being a bad complimentary piece to Jokic and he uses on/off statistics (how well the Nuggets do with both Nurkic and Jokic on the court, versus how well they do with both of them off the court.) We’re not fans of on/off statistics, a topic for another day. But let’s discuss sample size. If we are recording a statistic (in this case +/- statistics for Nurkic/Jokic on/off combinations), our sample size is how many times we’ve tested or observed the behavior to record said observation. And obviously, this is important. If we flip a coin one time, our sample is one, and we can’t derive much. If we flip it 100 times, we get a much better gauge on how biased the coin is. As Pelton brings up sample size here, let’s examine that.

The Nuggets played 2,607 minutes before trading Nurkic. Using the site NBAWowy, we can see that Jurkic and Jokic have played together … 108 minutes. Now, the art of picking the right sample size can vary based on the test you are doing, etc. That said, a simple comparison here is if we pretended the Nuggets NBA season were a single game, Jokic and Nurkic’s time together would account for a minute and fifty-nine seconds! The fact that Pelton hand waves away sample size here is on its own egregious (sample size always matters! The line “no matter the sample size” belongs nowhere in proper analysis) but in this case, it’s even worse as the sample size is ridiculously small. As an example, Steph Curry and Kevin Durant can both be on the court and each miss a shot, and the opponent can score three times. Their +/- for this period will look dreadful, but would it be the right move to bench them for five bad possessions?

The lesson I want to take away here is to never get culled by the notion that sample size doesn’t matter and more importantly to always gauge sample size when making your assessments. Nikola Jokic looks amazing right now, for example. But it’s been 14 games since he’s been “promoted” to a starting role. While I can view his stats and say they’re impressive, the sample size should temper my confidence. Pelton confidently using the on/off stats to say Nurkic and Jokic don’t work to the point of ignoring sample size? That’s a flaw I see all too often in sports.

Confounding Variables

On/off and +/- have a bevy of issues in regards to variables and causality, but Pelton’s example brings up an even better one - confounding variables. In the example above Pelton is trying to use the variable Nurkic + Jokic to explain the outcome of a bad team performance. I think there are some confounding variables. A confounding variable according to the Wikipedia entry is

In statistics, a confounding variable (also confounding factor, a confound, a lurking variable or a confounder) is a variable in a statistical model that correlates (directly or inversely) with both the dependent variable and an independent variable, in a way that "explains away" some or all of the correlation between these two variables.

Or put another way, a confounding variable happens when you can’t be sure the variable you think explains the outcome explains it, or if it’s another variable. Let’s get back to NBAWowy and Nurkic/Jokic. Here’s a rundown of the minutes played by the Nuggets while Jokic and Nurkic were on the court together.

Player	Minutes Played	Possessions
Nikola Jokic	108	224
Jusuf Nurkic	108	224
Danilo Gallinari	97	202
Emmanuel Mudiay	97	201
Will Barton	49	103
Gary Harris	33	65
Jamal Murray	25	54
Jameer Nelson	16	31
Juancho Hernangómez	2	6
Wilson Chandler	3	4

Let’s break it down, in the 108 Minutes Nurkic, and Jokic played together, 97 of those minutes had Mudiay and Gallinari on the court as well. That means it’s really hard to have any idea if the issue is the Nurkic/Jokic pairing and not something related to Mudiay and/or Gallinari. And in fact, if you take Nurkic/Jokic on the court with Mudiay and Gallinari of the court, the Nuggets played well … for 8 minutes! Please see above about sample size for why this line of thinking is spurious.

Conclusion

The funny thing about sites like stats.nba.com and NBAWowy is they’ve improved the ability to access data. The hard part is that there many easy traps to fall into in regards to data analysis. While not the only issues with on/off and +/- stats, sample size, and confounding variables are two major ones and ones I see conveniently ignored when explaining why a player is responsible for their team’s woes or successes. Hope it helped!

P.S. Dre Rant

I’ll be honest that I want to be careful in how often I bash other analysts work. In large part because, sadly, a lot of the “analytics” in sports are poor or done poorly. That said, in cases like these it does provide both “teachable moments.” I’m not planning on regularly bashing various mainstream outlets analysis; I wouldn’t sleep. However, I will occasionally take the chance to point out general flaws I notice. And as I’ve mentioned on the Podcast, if I do criticize an article, I will do my due diligence to read it thoroughly and possibly vet my criticisms (I had Dave Berri review this post, e.g.) As a final note. Saying an article contains bad analysis should not be taken as an insult to the author. Statistics can be difficult, as are many things. And the demands of being a writer with a deadline can make the work that much more difficult to do properly. That said, bad math is bad math.