Nerdnumbers avatar

Teachable Moments #1: Sample Size and Confounding Variables with Jusuf Nurkic

The Nuggets jettisoned Jusuf Nurkic and acquired Mason Plumlee. On paper, we like this move. Nurkic has yet to be an above average big, whereas we’ve thought Plumlee rightfully deserved a spot on the U.S. National Squad. Kevin Pelton also provided some analysis, which had a different take. And if you want to hear in depth why I disagree, tune into this week’s Boxscore Geeks Podcast. However, in reviewing his analysis I found two common analytics issues, and I figured I’d take this as a teachable moment to discuss them. Today let's talk about sample size and confounding variables!

You can view Kevin Pelton’s Insider Article here. It is behind a paywall.

Sample Size

Pelton notes that it makes sense for the Nuggets to trade Nurkic, which we agree with. Then his argument falls off the rails when he says this:

The shortcomings of the Jokic-Nurkic duo are inarguable. According to, Denver was outscored by an incredible 15.6 points per 100 possessions with Jokic and Nurkic on the court together, ghastly no matter the sample size.

His argument centers around Nurkic being a bad complimentary piece to Jokic and he uses on/off statistics (how well the Nuggets do with both Nurkic and Jokic on the court, versus how well they do with both of them off the court.) We’re not fans of on/off statistics, a topic for another day. But let’s discuss sample size. If we are recording a statistic (in this case +/- statistics for Nurkic/Jokic on/off combinations), our sample size is how many times we’ve tested or observed the behavior to record said observation. And obviously, this is important. If we flip a coin one time, our sample is one, and we can’t derive much. If we flip it 100 times, we get a much better gauge on how biased the coin is. As Pelton brings up sample size here, let’s examine that.

The Nuggets played 2,607 minutes before trading Nurkic. Using the site NBAWowy, we can see that Jurkic and Jokic have played together … 108 minutes. Now, the art of picking the right sample size can vary based on the test you are doing, etc. That said, a simple comparison here is if we pretended the Nuggets NBA season were a single game, Jokic and Nurkic’s time together would account for a minute and fifty-nine seconds! The fact that Pelton hand waves away sample size here is on its own egregious (sample size always matters! The line “no matter the sample size” belongs nowhere in proper analysis) but in this case, it’s even worse as the sample size is ridiculously small. As an example, Steph Curry and Kevin Durant can both be on the court and each miss a shot, and the opponent can score three times. Their +/- for this period will look dreadful, but would it be the right move to bench them for five bad possessions?

The lesson I want to take away here is to never get culled by the notion that sample size doesn’t matter and more importantly to always gauge sample size when making your assessments. Nikola Jokic looks amazing right now, for example. But it’s been 14 games since he’s been “promoted” to a starting role. While I can view his stats and say they’re impressive, the sample size should temper my confidence. Pelton confidently using the on/off stats to say Nurkic and Jokic don’t work to the point of ignoring sample size? That’s a flaw I see all too often in sports.

Confounding Variables

On/off and +/- have a bevy of issues in regards to variables and causality, but Pelton’s example brings up an even better one - confounding variables. In the example above Pelton is trying to use the variable Nurkic + Jokic to explain the outcome of a bad team performance.  I think there are some confounding variables. A confounding variable according to the Wikipedia entry is

In statistics, a confounding variable (also confounding factor, a confound, a lurking variable or a confounder) is a variable in a statistical model that correlates (directly or inversely) with both the dependent variable and an independent variable, in a way that "explains away" some or all of the correlation between these two variables.

Or put another way, a confounding variable happens when you can’t be sure the variable you think explains the outcome explains it, or if it’s another variable. Let’s get back to NBAWowy and Nurkic/Jokic. Here’s a rundown of the minutes played by the Nuggets while Jokic and Nurkic were on the court together.

Player Minutes Played Possessions
Nikola Jokic 108 224
Jusuf Nurkic 108 224
Danilo Gallinari 97 202
Emmanuel Mudiay 97 201
Will Barton 49 103
Gary Harris 33 65
Jamal Murray 25 54
Jameer Nelson 16 31
Juancho Hernangómez 2 6
Wilson Chandler 3 4

Let’s break it down, in the 108 Minutes Nurkic, and Jokic played together, 97 of those minutes had Mudiay and Gallinari on the court as well. That means it’s really hard to have any idea if the issue is the Nurkic/Jokic pairing and not something related to Mudiay and/or Gallinari. And in fact, if you take Nurkic/Jokic on the court with Mudiay and Gallinari of the court, the Nuggets played well … for 8 minutes! Please see above about sample size for why this line of thinking is spurious.


The funny thing about sites like and NBAWowy is they’ve improved the ability to access data. The hard part is that there many easy traps to fall into in regards to data analysis. While not the only issues with on/off and +/- stats, sample size, and confounding variables are two major ones and ones I see conveniently ignored when explaining why a player is responsible for their team’s woes or successes. Hope it helped!

P.S. Dre Rant

I’ll be honest that I want to be careful in how often I bash other analysts work. In large part because, sadly, a lot of the “analytics” in sports are poor or done poorly. That said, in cases like these it does provide both “teachable moments.” I’m not planning on regularly bashing various mainstream outlets analysis; I wouldn’t sleep. However, I will occasionally take the chance to point out general flaws I notice. And as I’ve mentioned on the Podcast, if I do criticize an article, I will do my due diligence to read it thoroughly and possibly vet my criticisms (I had Dave Berri review this post, e.g.) As a final note. Saying an article contains bad analysis should not be taken as an insult to the author. Statistics can be difficult, as are many things. And the demands of being a writer with a deadline can make the work that much more difficult to do properly. That said, bad math is bad math.

Confounding variables are the best.

Zaza Pachulia currently has the highest +/- in the NBA (per minute, min 100 minutes).

I say that's because Zaza is the best player in the NBA, and there are no confounding variables. How about you?
How does the deal look from the Blazers' perspective---has anyone tried to calculate the value of a draft pick (i.e., was Plumlee the proverbial bird in the hand)? The Blazers now have three 1st round picks in the coming draft. How has stockpiling picks worked out for teams like the Sixers and Celtics? And it wasn't that long ago that Portland was a WOW darling. What happened?
Hey Dre, you're right that "no matter the sample size" is a poor choice of phrasing. If it were eight minutes, it certainly wouldn't tell us anything about whether Jokic and Nurkic can play well together. However, I think you're underestimating how much we can learn from 108 minutes.

I've done a little work on estimating the standard deviation of offensive and defensive ratings, which suggests it's about 7.0 points per 100 possessions over the 224 possessions Jokic and Nurkic have played together. Even a two-SD confidence interval tops out at 107.7 for their offensive rating together. Given that the Nuggets score 117.6 points per 100 possessions with Jokic on and Nurkic off -- and that this has a much smaller 2.1 standard deviation, meaning 107.7 is outside a similar confidence interval -- I think we can confidently say the Nuggets don't score as well with Jokic and Nurkic together and likely score much worse.

I also find the argument that Mudiay and Gallinari might be the problem unconvincing given that Jokic has played plenty of minutes with those players with Nurkic on the bench. In fact, according to NBAwowy, that particular combo has scored 126.3 points per 100 possessions in 835 possessions this season, as compared to 96.9 in 135 possessions with Nurkic on the court.

So I think there is ample evidence that the Nurkic-Jokic pairing was the problem.
The bigger problem is that Nurkic is still a well-below average NBA player- he's only 22 so there's certainly room for growth, but he is so poor with regards to efficient scoring, turnover, and foul rates that . He was accurately evaluated by the Blazers to extract a first round pick for a player who is about to be properly paid. And they're gonna need it to find somebody to take on $44 million of Allen Crabbe, Evan Turner, and Myers Leonard.
While Plumlee is clearly a good player, I don't like that trade for the Nuggets. I feel like they traded a position of strength for a position of strength when they probably could have traded that same package to upgrade their woeful PG position.

Also, looking at Nurkic's negative stats, I could easily see him becoming a good player if he's paired with a good PG (which he now is). He turns the ball over and he takes bad shots.

Wouldn't the turnovers and bad shots go away if he had to touch/hold the ball less with a point guard who isn't the worst player in the NBA getting him the ball in bad positions on the floor?

His other negative area is fouls, which isn't terribly off at 5.2 (less than fouling out) per 48.
todd2 - from the Portland side, I don't think Plumlee was a bird in the hand, seeing as how it was unlikely they want to resign in the offseason given their cap situation. A team that's well under the cap probably would have kept Plumlee and wouldn't have done this deal!

Kevin Pelton's comment above brings up an interesting point. What is the point at which we can say the minutes are sufficient to feel confident in conclusion you may draw? (Assuming you believe conclusions may be validly drawn from raw +/- at all. Which I really don't.) My suspicion is, contra Pelton, 108 minutes is not sufficient. But that's just a suspicion.
Dre addresses the 108 minutes in the new podcast. It's up on YouTube.
I would say there is no particular minutes threshold: statistical significance depends both on the size of the effect (which is enormous in this case) and the size of the sample. Looking at confidence intervals showcases this, I think.
At best this is a case of extremely misleading cherry picking. Nurkic-Jokic had a -16.5 +/- over 108 minutes, sure...but Nurkic-Gallinari had a -14.2 +/- over 518 minutes. Nurkic-Mudiay? -18.9 over 500 minutes. Pairing him with another star? Nurkic-Faried has a -13.2 over 286 minutes. Why would you pick out the Nurkic-Jokic combination of all of the above, when it's the least statistically relevant?

In fact all of this should be utterly unsurprising, since Nurkic's plain-old +/- on the year is -11.6.

Dre makes perhaps too fine a point. We aren't even in the realm of clean hypothesis testing with clear null hypotheses that everything you learned about p-values applies to. If you want to reference player *combinations* as being particularly noteworthy, then you're in the realm of hierarchical modeling, if not multiple hypothesis testing (because we're not just fishing for extreme results, right?), and getting your inferences right when playing on that field is *NOT* easy.
I don't even need to look up stats to know that nearly everyone on the Nuggets likely has a negative +/-. They are a lottery team. Unless they have a star who plays very little minutes, it is not going to look pretty. Without even looking, I'd bet that no combination of theirs can beat any combination of Warriors players that play a significant amount of minutes.
"Why would you pick out the Nurkic-Jokic combination of all of the above, when it's the least statistically relevant?"

It's a combination the Nuggets would need to play in order to start Nurkic going forward now that Jokic has firmly established himself as their star. There were questions (and curiosity) about its effectiveness from the very first time the two players played together, and reporting over the summer that the front office wanted Michael Malone to play them together despite his reservations. There's also been plenty of reporting that Nurkic was unhappy going to the bench, and insinuations that his effort suffered as a result.

All of which is to say it's hardly picking out a poor lineup combination at random.

There seems to be a bit of "cult of significance" thing going on here. This is really better viewed a a Value of Information question.

The expected value of the population mean is the sample mean, and it's not clear that we should have any strong priors that two centres can play together effectively, which is, as Kevin points out, the question the team needed to answer.

Now, you can argue that the certainty of that answer wasn't high enough, but Denver is a team fighting for its playoff life and we can estimate the costs of gathering additional data based on the expected +/- of any additional minutes they played, which we know to be -15.6/100 possessions. It's probably not unreasonable for them to conclude that they have enough information to decide that the two can't productively play together, even if it doesn't reach an arbitrary threshold for statistical significance.

I also don't think there's much value in trying to frame the sample size as small relative to the number of minutes in a season. The only real question is absolute sample size, which amounts to a bit over three games at normal mins/game. That still sounds small, obviously, but makes more sense than saying "if it was one game it would be two minutes"...
I've read the comments, and I still don't see why Jokic-Nurkic pairing is important compared to just how bad Nurkic has been in any lineup. Nurkic is bad, full stop. Saying he's bad with player X is hiding the real issue. It suggests that he would be good with other players. Evidence for that is not readily available.
tgt, Denver doesn't need to know if Nurkic is bad in any conceivable lineup, only if he is bad in the lineup he will realistically be playing in following the emergence of Jokic. The available evidence suggests he is.

He may or may not also be just bad in general - that's now a question for the Blazers, though not a high risk one, since they've bought low on him. The available evidence suggests he is, though to a lesser extent.
Apparently we are watching Street basketball with these 2 man lineups but it shouldn't be surprising considering that it's all star weekend today.
Iceman, being bad in general, which there's more evidence for, is less important than being bad with a given player... because injuries and trades never happen. Also, second unit 12 minutes a game players grow on trees. Ugh.
Lacemaker - sure, but the difficulty is that constructing proper inferences on whether a particular player is good or not in general is *much* easier, and takes a lot less data, than figuring that out for specific line-up effects.

You see that there's weak evidence that Jokic - Nurkic is a weak pairing? Ok, great. Is that because Nurkic sucks in isolation? Maybe because Jokic sucks? Well, we have some information on that, that Nurkic sucks and Jokic is good. Great, put that into the model. Put in your estimate (and uncertainty) about how good Jokic is and how bad Nurkic is, and then tell me whether the 108 minute sample tells you much of anything about how the two play together.

It doesn't. Even adjusting +/- over the whole season usually doesn't give enough information to properly evaluate players. There's unlikely to be much power left to dig deeper than that.
BPS, there's some reasonable stuff in there, but it's all chained to conventional calculations of statistical power which just don't make much sense outside an academic context.

The question of whether weak evidence is strong enough to act on depends on the expected gain from acting (large, since the expected benefit of keeping Jokic is small) and the costs of gathering more information (large, since you would have to allocate more minutes to a pairing which you expect to play terribly). So you're not using "enough information" correctly, given the context.

Sign in to write a comment.