One of the truths that pundits in sports ignore is that there are times in the year when it is exceedingly hard to make predictions. Right now is very much such a time in the 2014-15 NBA season. Given the quantity of games that have been played, it would be silly to make predictions or draw conclusions with the same level of confidence that we would have in, say, January. The size of a sample is inversely proportional to the amount of error that your chest thumping affirmation on television will have embedded.

You'll notice that this doesn't really stop anyone from getting on their bully pulpit.

One of the advantages of a sport like basketball is that, eventually, the number of games in a season allows us to make pretty accurate conclusions about relative strength of a team. This leads to a certain level of assurance in those that write about the sport that is (again, after a certain point in the season) a lot more warranted. 5 to 6 games into the season is not that time.

Here's an illustration of just why this is so.

One of the fun projects I took on in the offseason was to build a database of every NBA game ever played. This graph above shows the standard deviation of Point Margin per game before and after a certain game. I looked at every 82 game season played in the NBA (all non-strike years from 1974-75 onwards) and calculated the difference in Point Margin for each team up to an including game number X and afterwards (PM Delta) . For example, after 41 games the standard deviation of this point margin delta is 3.3 points per game over the last ten seasons. After 6 games, that number is 5.3.

Right now, your margin of error is 60% higher than expected.

Does that mean we need to recuse ourselves from providing an opinion until such a time as there is enough information to provide it? Nope. Having to provide a usable and useful opinion without actually having all the relevant facts is actually a common problem. Think of your local weatherman -- he wishes he had enough time to get his weather predictions in the 95% range all the time, but he simply lacks the resources or the processing time to get anything above the typical "80% chance of thundershowers" on a typical day.

Data costs money. Sampling and processing takes time. I would love to run my models in the real world with an unlimited number of samples but the reality is ten or less samples might have to satisfy me most of the time. Scientists, engineers and economists all have to make due with the data that is available to make conclusions that are less than optimal and less certain than we would like.

Given that reality, it's not surprising that there are some well worn concepts around dealing with the uncertainty brought about by small samples. Let's talk about confidence intervals.

In statistics, a confidence interval is a way to provide an interval estimate of a population. Confidence intervals are meant as a range of values that act as good estimates of the unknown variable (for example projected wins). The level of confidence of the interval would indicate the calculated probability that the range captures this true population parameter given the samples we have handy. Basically, we are able to provide a maximum and minumum value based for the number in question based on the observed behavior of the sample.

For example, we could look at a 5-0 NBA team with an average margin of victory of 10 points per game and make a determination as to the maximum and minimum level of wins we would expect for at a certain confidence limit. In applied practice, confidence intervals are typically stated at the 95% confidence level. In layman's terms, if I said team A would be expected with a 95% confidence to win between 40 and 62 games in an 82 game season I am saying that I expect that is the season were played 1000 times, 950 of them would fall within 40 and 62 wins.

The trick is that we need to have a clue as to what the expected error and variation is. If you were paying attention at the top you realize that we kind of do. That let's us do some fun things.

If I had a team with a 6-1 record and a 11 point Margin of victory, I could build a tool to estimate confidence intervals for thier expected win totals. That would look like so:

That purely hyphotethical Northern Atlantic dinosaur themed team would be expected with 95% confidence to win between 45 and 79 games this season.

If I wanted to look at the actual odds of a specific win total for that selfsame team, let's say 48 wins, I'd build a tool like this:

96.3% of the time, that's a tasty over.

There are, of course, some more factors to consider when projecting the season (the schedule). We'll cover that in our upcoming rankings (which will of course incorporate our shiny new confidence intervals)a bit later in the week.

-Arturo

But at the same time, what strikes me about your graph at the top is that it takes only about 15 games to get very accurate reads on teams. At that point, you know nearly as much as you will at mid-season. That you can learn so much from n=15 is another sign that the NBA season is far, far longer than it needs to be (in terms of determining true team strengths).

Of course, that doesn't mean your analysis is actually wrong, it's most likely good. And as far as I know, nothing bad has ever happened to someone because they interpreted a confidence interval as having a 95% probability of containing the population parameter.

58 home and home games against inter league games, 8 extra home and home games against division teams, no more conferences, and the season starts at the same point or slightly move it back but you don't want the games compressed together. I think everyone would enjoy that. That could open up for a midseason tournament since they are discussing it.

I get your concern. I also agree with your conclusion :-)

BPS,

Guy gets that right.

Guy,

I wouldn't be opposed to a 58 game (2 times 29) season, an inseason FA like tournament (something like double elimination) and a play in round 1 with more playoff teams.

Whereas, trying to evaluate baseball teams after 15 games would be pointless.

Also, to be fair, a couple of paragraphs down he uses a more strict definition of confidence interval: "In layman's terms, if I said team A would be expected with a 95% confidence to win between 40 and 62 games in an 82 game season I am saying that I expect that is the season were played 1000 times, 950 of them would fall within 40 and 62 wins."

(It's entirely possible, of course, that this simple lawyer with no real stats background is reading that last chart entirely incorrectly.)

In theory, you could just look for their adjusted point margin per game (say here http://bkref.com/tiny/bqp1j) and correct for that.

You could also wait for me to do it in my rankings and Sim :-)

The model here doesn't take any of that into account though. It just sees +11 now, and a +5.3 standard deviation the rest of the way, and without any other information and assumptions of normality it says ~ +0.4 to +21.6 at a 95% CI. Intuitively that upper bound is laughable, but the model doesn't know that and is telling you only what you know based on what was given.

You'd have to put those population assumptions into the model (equivalent to a Bayesian prior, or through estimates of regularization coefficients if you prefer) if you wanted otherwise.

And the season sim accounts for just that :-)