Arturo avatar

Confidence Intervals and you

One of the truths that pundits in sports ignore is that there are times in the year when it is exceedingly hard to make predictions. Right now is very much such a time in the 2014-15 NBA season. Given the quantity of games that have been played, it would be silly to make predictions or draw conclusions with the same level of confidence that we would have in, say, January. The size of a sample is inversely proportional to the amount of error that your chest thumping affirmation on television will have embedded.

You'll notice that this doesn't really stop anyone from getting on their bully pulpit.

One of the advantages of a sport like basketball is that, eventually, the number of games in a season allows us to make pretty accurate conclusions about relative strength of a team. This leads to a certain level of assurance in those that write about the sport that is (again, after a certain point in the season) a lot more warranted.  5 to 6 games into the season is not that time.

Here's an illustration of just why this is so.

One of the fun projects I took on in the offseason was to build a database of every NBA game ever played. This graph above shows the standard deviation of Point Margin per game before and after a certain game. I looked at every 82 game season played in the NBA (all non-strike years from 1974-75 onwards) and calculated the difference in Point Margin for each team up to an including game number X and afterwards (PM Delta) . For example, after 41 games the standard deviation of this point margin delta is 3.3 points per game over the last ten seasons. After 6 games, that number is 5.3.

Right now, your margin of error is 60% higher than expected.

Does that mean we need to recuse ourselves from providing an opinion until such a time as there is enough information to provide it? Nope. Having to provide a usable and useful opinion without actually having all the relevant facts is actually a common problem. Think of your local weatherman -- he wishes he had enough time to get his weather predictions in the 95% range all the time, but he simply lacks the resources or the processing time to get anything above the typical "80% chance of thundershowers" on a typical day.

Data costs money. Sampling and processing takes time. I would love to run my models in the real world with an unlimited number of samples but the reality is ten or less samples might have to satisfy me most of the time. Scientists, engineers and economists all have to make due with the data that is available to make conclusions that are less than optimal and less certain than we would like.

Given that reality, it's not surprising that there are some well worn concepts around dealing with the uncertainty brought about by small samples. Let's talk about confidence intervals.

In statistics, a confidence interval is a way to provide an interval estimate of a population. Confidence intervals are meant as a range of values that act as good estimates of the unknown variable (for example projected wins). The level of confidence of the interval would indicate the calculated probability that the range captures this true population parameter given the samples we have handy. Basically, we are able to provide a maximum and minumum value based for the number in question based on the observed behavior of the sample.

For example, we could look at a 5-0 NBA team with an average margin of victory of 10 points per game and make a determination as to the maximum and minimum level of wins we would expect for at a certain confidence limit. In applied practice, confidence intervals are typically stated at the 95% confidence level. In layman's terms, if I said team A would be expected with a 95% confidence to win between 40 and 62 games in an 82 game season I am saying that I expect that is the season were played 1000 times, 950 of them would fall within 40 and 62 wins.

The trick is that we need to have a clue as to what the expected error and variation is. If you were paying attention at the top you realize that we kind of do. That let's us do some fun things.

If I had a team with a 6-1 record and a 11 point Margin of victory, I could build a tool to estimate confidence intervals for thier expected win totals. That would look like so:

Learn About Tableau


That purely hyphotethical Northern Atlantic dinosaur themed team would be expected with 95% confidence to win between 45 and 79 games this season.

If I wanted to look at the actual odds of a specific win total for that selfsame team, let's say 48 wins, I'd build a tool like this:

Learn About Tableau

96.3% of the time, that's a tasty over.

There are, of course, some more factors to consider when projecting the season (the schedule). We'll cover that in our upcoming rankings (which will of course incorporate our shiny new confidence intervals)a bit later in the week.





Arturo: Interesting discussion of the issue. I certainly agree that assessing teams after 5 games is very problematic.

But at the same time, what strikes me about your graph at the top is that it takes only about 15 games to get very accurate reads on teams. At that point, you know nearly as much as you will at mid-season. That you can learn so much from n=15 is another sign that the NBA season is far, far longer than it needs to be (in terms of determining true team strengths).
Speaking of confidence intervals, I'd love to see them for WP48 or the ADJP48. Or perhaps the standard errors at least
I'm not certain, but it seems like this post is using a common misinterpretation of confidence intervals. According to my understanding, from a frequentist perspective, you aren't allowed to say that there is a 95 percent probability that the confidence interval contains the population parameter, because the population parameter is a number that's not subject to probability, so after you draw your sample, the population parameter is either in it or not. Attempting to model your knowledge of the population parameter using confidence intervals is a mixture of frequentist and Bayesian thinking that is disapproved™. Only credible intervals can be interpreted in such a manner.

Of course, that doesn't mean your analysis is actually wrong, it's most likely good. And as far as I know, nothing bad has ever happened to someone because they interpreted a confidence interval as having a 95% probability of containing the population parameter.
I don't understand what's going on in that top graph. Why is that metric skyrocketing at the end of the season?
BPS: Because the sample size of the *remaining* games is getting very small. The reason the curve is lowest at 41 games is because that reduces the random error in both samples. Before and after 41 games, one of the samples is smaller.
58 home and home games against inter league games, 8 extra home and home games against division teams, no more conferences, and the season starts at the same point or slightly move it back but you don't want the games compressed together. I think everyone would enjoy that. That could open up for a midseason tournament since they are discussing it.
I get your concern. I also agree with your conclusion :-)

Guy gets that right.

I wouldn't be opposed to a 58 game (2 times 29) season, an inseason FA like tournament (something like double elimination) and a play in round 1 with more playoff teams.
My point was less about advocating for a shorter season (though it's an interesting idea) than marveling at the huge amount of information provided by relatively few NBA games. After just another 8-9 games, you will be able to start drawing tentative but meaningful conclusions about teams.

Whereas, trying to evaluate baseball teams after 15 games would be pointless.
That probability app is a fun tool to play around with (when you have made few over / under bets :) ). Needs strenght of schedule too to be more accurate. My lakers under 31,5 wins bet looks too rosy currently (91% of the time ), or does it :giggle:.
@ThatBJTerry: Strictly speaking, you're right, but I think in terms of communicating results to people who aren't professional statisticians there's nothing wrong with using the looser, probability based definition. While there are real technical reasons why confidence intervals are not interpreted way, on an intuitive level it is a perfectly valid way to communicate a result generated from a CI. If you are interested in stats (as it seems you are), I highly recommend reading Andrew Gelman's blog; he talks a lot about this kind of thing. Despite what some hardliner Bayesians and frequentists would like you to think, in the vast majority of simple computational situations (like this one) the results of a confidence and credible interval are indistinguishable anyway.

Also, to be fair, a couple of paragraphs down he uses a more strict definition of confidence interval: "In layman's terms, if I said team A would be expected with a 95% confidence to win between 40 and 62 games in an 82 game season I am saying that I expect that is the season were played 1000 times, 950 of them would fall within 40 and 62 wins."
Aha, that's an unorthodox, but pretty cool, estimator. Cheers for the quick explanation!
Wait, a greater than 50% probability that the Rapt, er, dinosaur-themed team wins at least 68 games??? Uh, I'm taking the under.

(It's entirely possible, of course, that this simple lawyer with no real stats background is reading that last chart entirely incorrectly.)
In theory, you could just look for their adjusted point margin per game (say here and correct for that.
You could also wait for me to do it in my rankings and Sim :-)
I'm impatient! Sorry, I just looked at the estimator again after BPS's post and that struck me. Even though when I read the post the first time around I understood perfectly well that we needed to adjust for schedule strength.
Al_S - you're not reading it wrong. What you're intuitively seeing is that, based on a long history of NBA seasons, a +11 point margin over the whole season would be historic and the Raptors are much more likely to lower point margin going forward than a higher one.

The model here doesn't take any of that into account though. It just sees +11 now, and a +5.3 standard deviation the rest of the way, and without any other information and assumptions of normality it says ~ +0.4 to +21.6 at a 95% CI. Intuitively that upper bound is laughable, but the model doesn't know that and is telling you only what you know based on what was given.

You'd have to put those population assumptions into the model (equivalent to a Bayesian prior, or through estimates of regularization coefficients if you prefer) if you wanted otherwise.
And the season sim accounts for just that :-)
It's disingenuous to suggest that this is a good model for the dinosaurs. While I will assume that it's accurate for an NBA team in a vacuum, once you start naming specific teams you've got prior information, in this case that the dinosaurs are probably not as good as these results suggest, because of player data from prior season and the preseason predictions.
interestingly enough the slider on the tool bottoms out at a -15 win margin and the sixers are -16.4

Sign in to write a comment.