The Geeks at Sloan, Part 2: The Data Should be Free

For the second year in a row, the Box Score Geeks went to Sloan! Click here to read part one.

My concern is that we’re at the end of a golden age; there’s real value to having the crowdsourcing model. You have this large community that’s passionate about sports. It’s important for leagues to tap into this energy. [SSAC] shows there’s demand for this.” - Arturo Galletti

That's a quote from an interview I had with Peter Dizikes for MIT News (click here to read the article), hitting on what I see as the key issue facing sports analytics going forward: who controls the data? Seven of the eight research papers being presented at the conference were prepared using propietary data.

Why is this a problem? If you think about some of the seminal figures in the sports analytics movement (Bill James being the classic example), the outsider working on publicly available data is a key player in the process. By not having a stake, the outsider is able to go where the insider wouldn't. There are also the advantages that come with crowdsourcing (which is something that we take advantage of constantly). I understand why teams want to put up some walls, but I feel like it significantly hinders the progress to be made. It also leads to another problem that teams have: they are simply not prepared to pay the market rate for work that they've been getting for free. Once they restrict the flow of information to those highly motivated, highly prepared, and highly passionate analysts that are out there doing analytics pro-bono, they will be forced to rely on their own in-house teams and academia. To illustrate the problem with this, allow me to paraphrase Bryan Colangelo: "Teams should be able to find $250 000 to spend on analytics." Mr Colangelo: $250 000 will not get you very far.

The other concern is a journalistic and scientific one. As the data goes into silos, we won't be able to test and replicate results. Transparency and accountability will be harder to come by. I take my journalistic and scientific work very seriously. I take the time to evaluate and analyze what teams do, and the barriers going up make this work harder. There is a lesson to be learned from the NFL here: the NFL has actually made as much of its information public as is possible. If you want to have access to every single shred of tape the NFL has, you simply need to pony up and pay for it. I'm not arguing that professional sports leagues or teams need to give away their information – if they sold access at reasonable prices I'd be willing to pay for it – but there needs to be a way for outsiders to gain access to it.

The first day of the conference started with an all-star panel featuring Bill James, Daryl Morey, Nate Silver, Kevin Kelley (coach of no punting pulaski academy), and George Karl. This was a fun and informative panel. Even if some of the panelists looked a little like they got up on the wrong side of the bed (I won't name names, but one of them had bed hair). There were so many nice little tidbits.

For one, everyone seemed to be bewildered by the state of analytics in the NFL. The feeling was that the NFL was moving one step forward and two steps back. Nate Silver summed it up nicely: NFL has low incentive to innovate because they have such a profitable and popular product. When it comes to football statistics, High Schools and colleges are more likely to innovate than the NFL Brian Burke even built them a nice, publicly available tool for working down fourth down probabilities and no one seems to be using it. Bill James and Daryl Morey don't get playcalling on fourth down either. Kevin Kelley loves the fourth down bot. And I quote: "Going for it on fourth down makes us more likely to score". Simple logic, no? Charming, smart, and innovative, Kevin Kelley was the star of a panel with Daryl Morey, Bill James, and Nate Silver. Seriously, why is this guy not coaching a major college or an NFL team again?

Bill James hit on some crucial points. "A gimmick's simply an innovation so ahead of its time that there's no foundation of knowing whether it's going to work." In essence, Bill hit on the key fact that, in general, innovation is an outsider's game. He had nothing to lose when he started and no stake in the establishment, so he was willing to take the road less-traveled and arrive at a novel conclusion. Teams and institutions are always pulled to dwell in the past, and this is doubly so in baseball. Major league baseball needs to innovate and fix the perception that baseball is a boring sport. This is supremely hard for baseball because of who they are ("people still upset about the DH rule. It's been 41 years – get over it"). Oh, and Bill James wants to measure player potential statistically. I love Bill James.

George Karl was curmudgeonly. He said that, in the NBA, the best team and the worst team aren't that far apart, and then devolved into coach speak ("teams that know how to win") while making googly eyes at GMs in the room (Hollinger: protect your maidenly virtue!). Yeah, GK, that's not right. I did like his suggestion of a single-elimination tournament after All-Star break. And his impassioned plea to the statheads to look for a formula to explain love had me picturing a basketball Dumbledore ("it's love Andres...."). There is something sublime about sitting next to Andres Alvarez while you hear George Karl say that he was just trying to play his best players.

Nate Silver did get over his bad hair day and make some very nice points as well. Like how statistics show that, of all the major North American sports leagues, the NBA's champ is the closest to being the best team in the league (I've done the math and agree wholeheartedly). Oh, and Silver also mentioned on of my favourite observations about pro sports: that the typically capitalist US has a socialist sports model, whereas typically socialist Europe has a capitalist sports model. Looking forward to the new website, Nate (call me :-) ).

Daryl Morey was the first to bring up what would be a recurring theme of the conference: getting rid of the marginal incentive to lose. We've talked tanking to death here, so I won't bore you too much. Suffice it to say, a significant change appears to be coming. Mike Zarren's draft wheel is on its way and it appears like the league is working out the finer points. Be prepared for at least ten thousand words from me on this subject during the summer.

Stay tuned for Part 3: Lies, Damn Lies, and Statistics