The Nuggets jettisoned Jusuf Nurkic and acquired Mason Plumlee. On paper, we like this move. Nurkic has yet to be an above average big, whereas we’ve thought Plumlee rightfully deserved a spot on the U.S. National Squad. Kevin Pelton also provided some analysis, which had a different take. And if you want to hear in depth why I disagree, tune into this week’s Boxscore Geeks Podcast. However, in reviewing his analysis I found two common analytics issues, and I figured I’d take this as a teachable moment to discuss them. Today let's talk about sample size and confounding variables!

You can view Kevin Pelton’s Insider Article here. It is behind a paywall.

## Sample Size

Pelton notes that it makes sense for the Nuggets to trade Nurkic, which we agree with. Then his argument falls off the rails when he says this:

The shortcomings of the Jokic-Nurkic duo are inarguable. According to NBA.com/Stats, Denver was outscored by an incredible 15.6 points per 100 possessions with Jokic and Nurkic on the court together, ghastly no matter the sample size.

His argument centers around Nurkic being a bad complimentary piece to Jokic and he uses on/off statistics (how well the Nuggets do with both Nurkic and Jokic on the court, versus how well they do with both of them off the court.) We’re not fans of on/off statistics, a topic for another day. But let’s discuss sample size. If we are recording a statistic (in this case +/- statistics for Nurkic/Jokic on/off combinations), our sample size is how many times we’ve tested or observed the behavior to record said observation. And obviously, this is important. If we flip a coin one time, our sample is one, and we can’t derive much. If we flip it 100 times, we get a much better gauge on how biased the coin is. As Pelton brings up sample size here, let’s examine that.

The Nuggets played 2,607 minutes before trading Nurkic. Using the site NBAWowy, we can see that Jurkic and Jokic have played together … 108 minutes. Now, the art of picking the right sample size can vary based on the test you are doing, etc. That said, a simple comparison here is if we pretended the Nuggets NBA season were a single game, Jokic and Nurkic’s time together would account for a minute and fifty-nine seconds! The fact that Pelton hand waves away sample size here is on its own egregious (sample size always matters! The line “no matter the sample size” belongs nowhere in proper analysis) but in this case, it’s even worse as the sample size is ridiculously small. As an example, Steph Curry and Kevin Durant can both be on the court and each miss a shot, and the opponent can score three times. Their +/- for this period will look dreadful, but would it be the right move to bench them for five bad possessions?

The lesson I want to take away here is to never get culled by the notion that sample size doesn’t matter and more importantly to always gauge sample size when making your assessments. Nikola Jokic looks amazing right now, for example. But it’s been 14 games since he’s been “promoted” to a starting role. While I can view his stats and say they’re impressive, the sample size should temper my confidence. Pelton confidently using the on/off stats to say Nurkic and Jokic don’t work to the point of ignoring sample size? That’s a flaw I see all too often in sports.

## Confounding Variables

On/off and +/- have a bevy of issues in regards to variables and causality, but Pelton’s example brings up an even better one - confounding variables. In the example above Pelton is trying to use the variable Nurkic + Jokic to explain the outcome of a bad team performance. I think there are some confounding variables. A confounding variable according to the Wikipedia entry is

In statistics, a confounding variable (also confounding factor, a confound, a lurking variable or a confounder) is a variable in a statistical model that correlates (directly or inversely) with both the dependent variable and an independent variable, in a way that "explains away" some or all of the correlation between these two variables.

Or put another way, a confounding variable happens when you can’t be sure the variable you think explains the outcome explains it, or if it’s another variable. Let’s get back to NBAWowy and Nurkic/Jokic. Here’s a rundown of the minutes played by the Nuggets while Jokic and Nurkic were on the court together.

Player | Minutes Played | Possessions |
---|---|---|

Nikola Jokic | 108 | 224 |

Jusuf Nurkic | 108 | 224 |

Danilo Gallinari | 97 | 202 |

Emmanuel Mudiay | 97 | 201 |

Will Barton | 49 | 103 |

Gary Harris | 33 | 65 |

Jamal Murray | 25 | 54 |

Jameer Nelson | 16 | 31 |

Juancho Hernangómez | 2 | 6 |

Wilson Chandler | 3 | 4 |

Let’s break it down, in the 108 Minutes Nurkic, and Jokic played together, 97 of those minutes had Mudiay and Gallinari on the court as well. That means it’s really hard to have any idea if the issue is the Nurkic/Jokic pairing and not something related to Mudiay and/or Gallinari. And in fact, if you take Nurkic/Jokic on the court with Mudiay and Gallinari of the court, the Nuggets played well … for 8 minutes! Please see above about sample size for why this line of thinking is spurious.

## Conclusion

The funny thing about sites like stats.nba.com and NBAWowy is they’ve improved the ability to access data. The hard part is that there many easy traps to fall into in regards to data analysis. While not the only issues with on/off and +/- stats, sample size, and confounding variables are two major ones and ones I see conveniently ignored when explaining why a player is responsible for their team’s woes or successes. Hope it helped!

## P.S. Dre Rant

I’ll be honest that I want to be careful in how often I bash other analysts work. In large part because, sadly, a lot of the “analytics” in sports are poor or done poorly. That said, in cases like these it does provide both “teachable moments.” I’m not planning on regularly bashing various mainstream outlets analysis; I wouldn’t sleep. However, I will occasionally take the chance to point out general flaws I notice. And as I’ve mentioned on the Podcast, if I do criticize an article, I will do my due diligence to read it thoroughly and possibly vet my criticisms (I had Dave Berri review this post, e.g.) As a final note. Saying an article contains bad analysis should not be taken as an insult to the author. Statistics can be difficult, as are many things. And the demands of being a writer with a deadline can make the work that much more difficult to do properly. That said, bad math is bad math.

Zaza Pachulia currently has the highest +/- in the NBA (per minute, min 100 minutes).

I say that's because Zaza is the best player in the NBA, and there are no confounding variables. How about you?

I've done a little work on estimating the standard deviation of offensive and defensive ratings, which suggests it's about 7.0 points per 100 possessions over the 224 possessions Jokic and Nurkic have played together. Even a two-SD confidence interval tops out at 107.7 for their offensive rating together. Given that the Nuggets score 117.6 points per 100 possessions with Jokic on and Nurkic off -- and that this has a much smaller 2.1 standard deviation, meaning 107.7 is outside a similar confidence interval -- I think we can confidently say the Nuggets don't score as well with Jokic and Nurkic together and likely score much worse.

I also find the argument that Mudiay and Gallinari might be the problem unconvincing given that Jokic has played plenty of minutes with those players with Nurkic on the bench. In fact, according to NBAwowy, that particular combo has scored 126.3 points per 100 possessions in 835 possessions this season, as compared to 96.9 in 135 possessions with Nurkic on the court.

So I think there is ample evidence that the Nurkic-Jokic pairing was the problem.

Also, looking at Nurkic's negative stats, I could easily see him becoming a good player if he's paired with a good PG (which he now is). He turns the ball over and he takes bad shots.

Wouldn't the turnovers and bad shots go away if he had to touch/hold the ball less with a point guard who isn't the worst player in the NBA getting him the ball in bad positions on the floor?

His other negative area is fouls, which isn't terribly off at 5.2 (less than fouling out) per 48.

Kevin Pelton's comment above brings up an interesting point. What is the point at which we can say the minutes are sufficient to feel confident in conclusion you may draw? (Assuming you believe conclusions may be validly drawn from raw +/- at all. Which I really don't.) My suspicion is, contra Pelton, 108 minutes is not sufficient. But that's just a suspicion.

In fact all of this should be utterly unsurprising, since Nurkic's plain-old +/- on the year is -11.6.

Dre makes perhaps too fine a point. We aren't even in the realm of clean hypothesis testing with clear null hypotheses that everything you learned about p-values applies to. If you want to reference player *combinations* as being particularly noteworthy, then you're in the realm of hierarchical modeling, if not multiple hypothesis testing (because we're not just fishing for extreme results, right?), and getting your inferences right when playing on that field is *NOT* easy.

It's a combination the Nuggets would need to play in order to start Nurkic going forward now that Jokic has firmly established himself as their star. There were questions (and curiosity) about its effectiveness from the very first time the two players played together, and reporting over the summer that the front office wanted Michael Malone to play them together despite his reservations. There's also been plenty of reporting that Nurkic was unhappy going to the bench, and insinuations that his effort suffered as a result.

All of which is to say it's hardly picking out a poor lineup combination at random.

The expected value of the population mean is the sample mean, and it's not clear that we should have any strong priors that two centres can play together effectively, which is, as Kevin points out, the question the team needed to answer.

Now, you can argue that the certainty of that answer wasn't high enough, but Denver is a team fighting for its playoff life and we can estimate the costs of gathering additional data based on the expected +/- of any additional minutes they played, which we know to be -15.6/100 possessions. It's probably not unreasonable for them to conclude that they have enough information to decide that the two can't productively play together, even if it doesn't reach an arbitrary threshold for statistical significance.

I also don't think there's much value in trying to frame the sample size as small relative to the number of minutes in a season. The only real question is absolute sample size, which amounts to a bit over three games at normal mins/game. That still sounds small, obviously, but makes more sense than saying "if it was one game it would be two minutes"...

He may or may not also be just bad in general - that's now a question for the Blazers, though not a high risk one, since they've bought low on him. The available evidence suggests he is, though to a lesser extent.

You see that there's weak evidence that Jokic - Nurkic is a weak pairing? Ok, great. Is that because Nurkic sucks in isolation? Maybe because Jokic sucks? Well, we have some information on that, that Nurkic sucks and Jokic is good. Great, put that into the model. Put in your estimate (and uncertainty) about how good Jokic is and how bad Nurkic is, and then tell me whether the 108 minute sample tells you much of anything about how the two play together.

It doesn't. Even adjusting +/- over the whole season usually doesn't give enough information to properly evaluate players. There's unlikely to be much power left to dig deeper than that.

The question of whether weak evidence is strong enough to act on depends on the expected gain from acting (large, since the expected benefit of keeping Jokic is small) and the costs of gathering more information (large, since you would have to allocate more minutes to a pairing which you expect to play terribly). So you're not using "enough information" correctly, given the context.