From the organization that brought you PER comes RPM! ESPN has decided to pick a "new" "advanced" stat (that's right I used quotes TWICE) and went with an adjusted plus-minus variant. I have multiple issues with this, but let's get to the most fundamental. Many of us at Boxscore Geeks and Wages of Wins have more academic backgrounds. That means we're pretty big fans of the scientific method. Lots of textbooks will have this, but Richard Feynman summed this up best for me.

- Guess
- Compute the consequences.
- Compare it to reality.

Fun times! Now, an important part of this process is being able to replicate it. In short, if I have a theory (guess) and I make a model (compute the consequences), then you should be able to do the same thing and verify if I'm right or not (we both compare it to reality). ESPN hopped on the RPM bandwagon quickly. Here's the original post introducing RPM to the world. And here's about as in-depth as it gets to how RPM is calculated:

RPM stats are provided by Jeremias Engelmann in consultation with Steve Ilardi. RPM is based on Engelmann's xRAPM (Regularized Adjusted Plus-Minus). Play-by-play data provided by Basketball-Reference.com.

Right after this post came out, Kevin Pelton followed up with a post – and because this is ESPN, it's hidden behind a paywall – showing the RPM All-Stars. Kostya Medvedovsky had a good question about this:

```
```@kmedved I'm not capable of speaking for them about the evolution from those versions of RAPM to real plus-minus.

— Kevin Pelton (@kpelton) April 7, 2014

In short, ESPN is using a model their own analysts don't understand, which is based on very complicated math by some people that have done iffy analysis before. Steve Ilardi was behind APM, which Arturo deconstructed here. And as an outsider, it's even harder to understand. On Twitter after this went up, we were asked about writing a piece on it. Ok, well how does it work? I was told it was similar to RAPM or xRAPM, but even Pelton, who works for ESPN, doesn't know!

The Calculating Wins Produced page reads like it was written by a college professor. It doesn't have the effusive explanations of how it handles the things we know matter in basketball. It doesn't use an example of a player to prove how right it is. It does however, provide the means to redo the work. RPM does not.

For those who want the "background" on RAPM: ininitally this work was started in a paper, written by Joe Sill, that was presented at Sloan. The paper was called "Improved NBA Adjusted +/- Using Regularization and Out-of-Sample Testing", and it won the grand prize. You may notice that the Sloan site no longer has a copy of this up. There is a site up that has RAPM data -- http://stats-for-the-nba.appspot.com -- and a site with players' cumulative xRAPM. Trying to look for a site with explicit how-to instructions was difficult, but I got some Twitter feedback. Here's a description of RAPM from someone who made a boxscore variant of it. Here's an ABPR discussion about it.

And that's what we have, a paper not available on-line (I do have a copy of it, but as it's not freely available, I give that little credit), a site with the raw numbers, and some discussion threads about it. And of course, this isn't guaranteed to be the same metric used at ESPN. This isn't a data revolution. This isn't a step forward for advanced stats. This is the same mindset as PER. The difference is it is popular among a group of "advanced statisticians".

As I mentioned, Arturo and I were pinged almost immediately after. Were we going to write a piece on this? Well, my answer now is simple. Send me a vetted (as in ESPN agrees it is the method) set of steps as to how to calculate RPM. Then I'll be happy to look at it and give my opinions. Until then, I see this as more of the common problem in the stats community right now. As more and more data becomes proprietary, and more and more metrics are complicated black boxes, the less we'll advance.

The other problem is that having tested similar models before, the correlation is very poor. As dre says, If I can get a method for calculation I can do the math and test the result. In the interim, I am left with my initial conclusion.

This was precisely my thought when I started looking into RPM. The two biggest red flags for me are that there's only 1 year of data for some reason, and there aren't any huge surprise results compared to conventional wisdom. To me, that just smells a bit off. I don't *know* whether they took a bunch of data and fiddled with their formula until they got something that looked right, but they haven't exactly given anyone much of a reason to think otherwise.

I am not sure you could have picked a worse example to prove your point (which, by the way, I do not concede).

With the full intent of piling on... pretty much all of the scholarly research on climate change is publicly available. Patrick more or less said it perfectly.

http://www.nature.com/nclimate/journal/v1/n1/full/nclimate1057.html

In any event, the point is that this is unfortunately not uncommon. That's going to be even the more case where we're talking about a for-profit enterprise like the NBA.

I can't speak as to whether you'll ever get a step-by-step guide on how to calculate RPM, but you could certainly do well to reconstruct it if you were willing to put in some time and effort. It's essentially a rapm model that uses a blend of box score and plus minus data from previous years as a bayesian prior and out of sample testing to determine weights.

I have the paper (which I did mention at the bottom of the post), my concern is its lack of availability.

And no, I couldn't recreate RPM if I had time and effort as I don't know how it's calculated. I have people (not Illardi or Engelmann) sending me links to forum threads that may be how it's done. There's no "essentially" here. To our Sloan not being academic point: if you tried to submit a paper to any legitimate conference linking a forum on the net and said "Essentially that, kinda, we don't know." It would have no hope of being accepted.

No, you don't have a full explanation either. I tried to calculate it step-by-step but many details are missing.

"This isn't a step forward for advanced stats. This is the same mindset as PER."

What does that even mean? They were developed in entirely different ways by different people with a different means of evaluation.

RPM *does* use the scientific method: it's actually tested out of sample. What matters, then, is how well the metric can explain point differential in new situations that the data have not seen.

What's the consequence? It's a powerful tool. RAPM destroyed the Wins Produced metric in predicting 2013 wins from the player's 2012 values. I believe it's doing the same again this season (and did do better over like a decade when Neil Paine looked at it.)

Unless it doesn't matter that a metric is a lot better when predicting out of sample data sets?

Of the match-up data that I've seen, the free-throws always get assigned to the units on the court when the foul happened.

Also, for what it's worth, Ilardi is a well-renowned academic himself. That doesn't automatically make someone good or bad at stats (and the same goes for you all and Berri), but I figured I'd mention it as you made the point that the WP community is "academic."