RPM and a Problem with "Advanced" Stats

From the organization that brought you PER comes RPM! ESPN has decided to pick a "new" "advanced" stat (that's right I used quotes TWICE) and went with an adjusted plus-minus variant. I have multiple issues with this, but let's get to the most fundamental. Many of us at Boxscore Geeks and Wages of Wins have more academic backgrounds. That means we're pretty big fans of the scientific method. Lots of textbooks will have this, but Richard Feynman summed this up best for me.

  • Guess
  • Compute the consequences.
  • Compare it to reality.

Fun times! Now, an important part of this process is being able to replicate it. In short, if I have a theory (guess) and I make a model (compute the consequences), then you should be able to do the same thing and verify if I'm right or not (we both compare it to reality). ESPN hopped on the RPM bandwagon quickly. Here's the original post introducing RPM to the world. And here's about as in-depth as it gets to how RPM is calculated:

RPM stats are provided by Jeremias Engelmann in consultation with Steve Ilardi. RPM is based on Engelmann's xRAPM (Regularized Adjusted Plus-Minus). Play-by-play data provided by Basketball-Reference.com.

Right after this post came out, Kevin Pelton followed up with a post – and because this is ESPN, it's hidden behind a paywall – showing the RPM All-Stars. Kostya Medvedovsky had a good question about this:

In short, ESPN is using a model their own analysts don't understand, which is based on very complicated math by some people that have done iffy analysis before. Steve Ilardi was behind APM, which Arturo deconstructed here. And as an outsider, it's even harder to understand. On Twitter after this went up, we were asked about writing a piece on it. Ok, well how does it work? I was told it was similar to RAPM or xRAPM, but even Pelton, who works for ESPN, doesn't know!

The Calculating Wins Produced page reads like it was written by a college professor. It doesn't have the effusive explanations of how it handles the things we know matter in basketball. It doesn't use an example of a player to prove how right it is. It does however, provide the means to redo the work. RPM does not.

For those who want the "background" on RAPM: ininitally this work was started in a paper, written by Joe Sill, that was presented at Sloan. The paper was called "Improved NBA Adjusted +/- Using Regularization and Out-of-Sample Testing", and it won the grand prize. You may notice that the Sloan site no longer has a copy of this up. There is a site up that has RAPM data -- http://stats-for-the-nba.appspot.com -- and a site with players' cumulative xRAPM. Trying to look for a site with explicit how-to instructions was difficult, but I got some Twitter feedback. Here's a description of RAPM from someone who made a boxscore variant of it. Here's an ABPR discussion about it.

And that's what we have, a paper not available on-line (I do have a copy of it, but as it's not freely available, I give that little credit), a site with the raw numbers, and some discussion threads about it. And of course, this isn't guaranteed to be the same metric used at ESPN. This isn't a data revolution. This isn't a step forward for advanced stats. This is the same mindset as PER. The difference is it is popular among a group of "advanced statisticians".

As I mentioned, Arturo and I were pinged almost immediately after. Were we going to write a piece on this? Well, my answer now is simple. Send me a vetted (as in ESPN agrees it is the method) set of steps as to how to calculate RPM. Then I'll be happy to look at it and give my opinions. Until then, I see this as more of the common problem in the stats community right now. As more and more data becomes proprietary, and more and more metrics are complicated black boxes, the less we'll advance.