Nerdnumbers avatar

RPM and a Problem with "Advanced" Stats

From the organization that brought you PER comes RPM! ESPN has decided to pick a "new" "advanced" stat (that's right I used quotes TWICE) and went with an adjusted plus-minus variant. I have multiple issues with this, but let's get to the most fundamental. Many of us at Boxscore Geeks and Wages of Wins have more academic backgrounds. That means we're pretty big fans of the scientific method. Lots of textbooks will have this, but Richard Feynman summed this up best for me.

  • Guess
  • Compute the consequences.
  • Compare it to reality.

Fun times! Now, an important part of this process is being able to replicate it. In short, if I have a theory (guess) and I make a model (compute the consequences), then you should be able to do the same thing and verify if I'm right or not (we both compare it to reality). ESPN hopped on the RPM bandwagon quickly. Here's the original post introducing RPM to the world. And here's about as in-depth as it gets to how RPM is calculated:

RPM stats are provided by Jeremias Engelmann in consultation with Steve Ilardi. RPM is based on Engelmann's xRAPM (Regularized Adjusted Plus-Minus). Play-by-play data provided by

Right after this post came out, Kevin Pelton followed up with a post – and because this is ESPN, it's hidden behind a paywall – showing the RPM All-Stars. Kostya Medvedovsky had a good question about this:

In short, ESPN is using a model their own analysts don't understand, which is based on very complicated math by some people that have done iffy analysis before. Steve Ilardi was behind APM, which Arturo deconstructed here. And as an outsider, it's even harder to understand. On Twitter after this went up, we were asked about writing a piece on it. Ok, well how does it work? I was told it was similar to RAPM or xRAPM, but even Pelton, who works for ESPN, doesn't know!

The Calculating Wins Produced page reads like it was written by a college professor. It doesn't have the effusive explanations of how it handles the things we know matter in basketball. It doesn't use an example of a player to prove how right it is. It does however, provide the means to redo the work. RPM does not.

For those who want the "background" on RAPM: ininitally this work was started in a paper, written by Joe Sill, that was presented at Sloan. The paper was called "Improved NBA Adjusted +/- Using Regularization and Out-of-Sample Testing", and it won the grand prize. You may notice that the Sloan site no longer has a copy of this up. There is a site up that has RAPM data -- -- and a site with players' cumulative xRAPM. Trying to look for a site with explicit how-to instructions was difficult, but I got some Twitter feedback. Here's a description of RAPM from someone who made a boxscore variant of it. Here's an ABPR discussion about it.

And that's what we have, a paper not available on-line (I do have a copy of it, but as it's not freely available, I give that little credit), a site with the raw numbers, and some discussion threads about it. And of course, this isn't guaranteed to be the same metric used at ESPN. This isn't a data revolution. This isn't a step forward for advanced stats. This is the same mindset as PER. The difference is it is popular among a group of "advanced statisticians".

As I mentioned, Arturo and I were pinged almost immediately after. Were we going to write a piece on this? Well, my answer now is simple. Send me a vetted (as in ESPN agrees it is the method) set of steps as to how to calculate RPM. Then I'll be happy to look at it and give my opinions. Until then, I see this as more of the common problem in the stats community right now. As more and more data becomes proprietary, and more and more metrics are complicated black boxes, the less we'll advance.

Ok, but how many points is a steal worth?
This is not unusual at all in today's science. Look at climate change - in many cases, both the models AND the data are proprietary. It's black boxes all the way down. Yet everybody still believes climate scientists, because, you know, there's a "consensus" among the scientific "community".
I'm going to write about this a little more but one of my key problems is that since we have no way of knowing how the numbers were generated we also have no way of testing the claims made of greater predictability. We do not know if the say the 03-04 data set is done using 04-05 data to influence the coefficients or the priors. This would be juicing the numbers and lead to artificially inflated prediction results.

The other problem is that having tested similar models before, the correlation is very poor. As dre says, If I can get a method for calculation I can do the math and test the result. In the interim, I am left with my initial conclusion.
> We do not know if the say the 03-04 data set is done using 04-05 data to influence the coefficients or the priors.

This was precisely my thought when I started looking into RPM. The two biggest red flags for me are that there's only 1 year of data for some reason, and there aren't any huge surprise results compared to conventional wisdom. To me, that just smells a bit off. I don't *know* whether they took a bunch of data and fiddled with their formula until they got something that looked right, but they haven't exactly given anyone much of a reason to think otherwise.
I'm pretty sure RPM or xRAPM is just a love child of RAPM and PER (not PER itself, but boxscore weights). Its creator's belief in it seems to be based on better predictive ability than RAPM. But for reasons Arturo has written, predictive power and RAPM should be taken with a grain of salt. At the least these guys have given me no room to trust them. Any APM attempt is destroyed when its creator is one who cannot control his confirmation bias, or is unable to see the small sample size and correlation isn't causation elephant and gorilla in the room. To put things bluntly my list of basketball bloggers I trust to not get tripped up by these is a small list
And it doesn't help that the APBR community are all guys trying to audition for NBA teams to get hired. The NBA has set a precedent where if you show you can calculate your own APM or RAPM or SPM, then post it on APBR board to ooos and ahs from people who can't see how it's calculated, the chance of getting hired is legit. So what is the incentive, even if subconscience? Is it to make the most sound stat? Or to make it look more predictive and impressive, no matter the cost. I bet some of these guys would give a pinky to get hired by Morey, so what's a little statistical fudging compared to that

I am not sure you could have picked a worse example to prove your point (which, by the way, I do not concede).
Al_S, what you've said is kind of case-in-point regarding my biggest frustration with climate skeptics: a failure to actually study available data and models. If you had, you'd know that virtually none of it is propriety. But this is off topic.

With the full intent of piling on... pretty much all of the scholarly research on climate change is publicly available. Patrick more or less said it perfectly.
This is getting a bit off topic, but a good article on the lack of availablility of model code and data (a couple of years old now, alas) is here:

In any event, the point is that this is unfortunately not uncommon. That's going to be even the more case where we're talking about a for-profit enterprise like the NBA.
Dre (or anyone else interested), if you want a copy of Joe Sill's original Sloan paper on rapm, I can send it along.

I can't speak as to whether you'll ever get a step-by-step guide on how to calculate RPM, but you could certainly do well to reconstruct it if you were willing to put in some time and effort. It's essentially a rapm model that uses a blend of box score and plus minus data from previous years as a bayesian prior and out of sample testing to determine weights.
I have the paper (which I did mention at the bottom of the post), my concern is its lack of availability.

And no, I couldn't recreate RPM if I had time and effort as I don't know how it's calculated. I have people (not Illardi or Engelmann) sending me links to forum threads that may be how it's done. There's no "essentially" here. To our Sloan not being academic point: if you tried to submit a paper to any legitimate conference linking a forum on the net and said "Essentially that, kinda, we don't know." It would have no hope of being accepted.
I think the majority of consumers of "advanced stats" are interested in trivia rather than statistics. ESPN doesn't care what goes into the box as long as sports media product keeps coming out of it.
"The Calculating Wins Produced page reads like it was written by a college professor. It doesn't have the effusive explanations of how it handles the things we know matter in basketball. It doesn't use an example of a player to prove how right it is. It does however, provide the means to redo the work. RPM does not."

No, you don't have a full explanation either. I tried to calculate it step-by-step but many details are missing.

"This isn't a step forward for advanced stats. This is the same mindset as PER."

What does that even mean? They were developed in entirely different ways by different people with a different means of evaluation.

RPM *does* use the scientific method: it's actually tested out of sample. What matters, then, is how well the metric can explain point differential in new situations that the data have not seen.

What's the consequence? It's a powerful tool. RAPM destroyed the Wins Produced metric in predicting 2013 wins from the player's 2012 values. I believe it's doing the same again this season (and did do better over like a decade when Neil Paine looked at it.)

Unless it doesn't matter that a metric is a lot better when predicting out of sample data sets?
I've always wondered, how does +/- whatever count FTs? Often, teams substitute a player between FTs. If it is a make, why should that count against the current lineup, and if it it is a miss, does the rebound count but not the miss?

Of the match-up data that I've seen, the free-throws always get assigned to the units on the court when the foul happened.
Essentially, I agree with everything Justin said and would love to see a counter argument.

Also, for what it's worth, Ilardi is a well-renowned academic himself. That doesn't automatically make someone good or bad at stats (and the same goes for you all and Berri), but I figured I'd mention it as you made the point that the WP community is "academic."

Sign in to write a comment.