The All-Singing, All-Dancing Boxscore Geeks 2013-14 NCAA Tournament Bracket Buster

I've done this dance for you before. In 2012, I took some existing models and made a nice and quick cheat sheet for our readers. Last year, I added a model built around Wins Produced and some knowledge gained from projection of NBA games.

This year, I went whole hog. I downloaded every game and built a nicely-tuned projection model using actual game data for this season (Division 1 teams only) and tournament data from the past three years. I think the results are very good. To illustrate, I was able to shave about a point and a half from the individual game projection for the season I used for tuning.

As always, a huge shout out goes to College Basketball Reference for compiling all the data in one place and to Ken Pomeroy for providing a great, great measuring stick and a place to validate my assumptions.

Let's get you started with the play-in games:

School  Region Tourney Rk Grid Order Point Margin Opponent
Cal Poly 1Midwest 16 2 -0.2 60.7%
Texas Southern 1Midwest 16 2 -4.1 39.3%
North Carolina St. 1Midwest 12 6 6.8 42.3%
Xavier 1Midwest 12 6 9.6 57.7%
Iowa 1Midwest 11 10 15.6 54.4%
Tennessee 1Midwest 11 10 14.0 45.6%
Albany 3South 16 2 -1.3 54.4%
Mount St. Mary's 3South 16 2 -2.9 45.6%

I feel like putting up the Power Rank next:

This table was built by taking the point margins for every game and the KenPom numbers for 2014 and converting them to a projected margin of victory in the tournament versus an average NCAA team. I did this three ways: using the Ken Pom numbers, using straight point margin numbers (again, only Division 1 games), and doing a composite model. In all three cases, I adjusted for pace, schedule, and opponents as required, and I proceed to adjust for depth (i.e. shorter tournament rotations, as per the Half Baked Notion) and injuries using individual player Win Score numbers. 

So that is your Tournament Power Rank, with everyone’s expected point margin against average competition. In simple terms: you want to project the margin for a game? Look up the teams' point margins and work out the difference using simple subtraction.

Now, as mentioned, I have a particular and unique method that I use for working out the win probabilities. This year I actually took the time to refine it with actual NCAA data. I used that method to project the win probability for every matchup (which I give to you here as a Poster).

So what are the team-by-team odds from the composite simulation?

The model thinks that Lousiville has been seriously disrespected. Louisville is the favourite to win in the Midwest, but other than that, the model likes the number one seeds to win in the other three regions. As for Cinderellas? Pittsburgh, Iowa, OK St., Tennesse, Stanford, and Harvard (fair Harvard) are the bets to make if you're feeling lucky.

Oh, and you might like to have everything as a spreadsheet. You can see the other simulations in the sheet as well.

And don't worry, there is more to come. Feel free to make suggestions; I'll work in what I can.

Have fun and good luck!

Some longshots with interesting value:

Tennessee 200-1
Pittsburg 100-1
Creighton 40-1
UConn 95-1
Cinci 95-1
Are any of the box score geeks doing a billion dollar bracket?
Forgive me for possibly asking a stupid question, but why are you adjusting KenPom's numbers for schedule, pace, and opponents? Doesn't he already include that in his ratings? Additionally, how much more accurate is the model by including the non-KenPom point margins? Aren't KenPom's ratings (inputs are adjusted offensive and defensive efficiencies) more predictive then using just plain old point differential?
I will be doing multiple brackets which I will publish using the Models.

Patience. ESPN and Yahoo close on the 20th
One last question.

"Now, as mentioned, I have a particular and unique method that I use for working out the win probabilities."

How is this method better then KenPom's log5 formula (he uses a 10.25 exponent according to his site) he uses for win probabilities?

Poochman,
I probably wasn't clear. I use KenPom's AdjO and AdjD number for each team (which are per 100 possesions). I then adjust to average pace (about 67) and then put in an adjustment for projected changes in rotation due to injury and shorter rotations (using WP and WinScore).

Oh, and I am not using any non division 1 games.
Poochman,
I use a different method than log5 for working out predicted PM and Win %. It does well in general and I retroactively tested it on the data. I use the same method for NBA as well.

Don't worry. I will put up some straight versions of the Models as well.
By your numbers, it seems that there is a 30% chance that a 16 seed will knock off a 1 seed this year 1-(.904)(.923)(.924)(.913))). That seems awfully high to me (considering it hasn't ever happened). Are this year's 1 seeds particularly weak, or is something off with the model?
Lance,
I checked the numbers. Those 16 seeds are in the ball park (particularly Weber State). Given how the talent gap has closed we're kind of overdue for a 16 over one upset.
I gathered all of the moneyline betting odds from las vegas and added in the EV of most of the first round games into a spreadsheet:

https://docs.google.com/file/d/0ByUIUqLzGhwhaC1jRkVKWHJEeEU/edit

looks like theres a lot of value betting on the underdog moneylines in the first round

85.7% chance a 1 seed is eliminated by the end of the 2nd round!
Arturo,

I am sure you guys have checked out Nate Silver's new analytics based journalism site fivethirtyeight.com

I used Silver's interactive bracket to fill out my personal bracket. Essentially he did a similar projection to yours. For probably no ones intrigue except my own I posted Silver's projections next to yours.

My biggest takeaways are your substantially higher likelihood of upsets in general. Your top seeded teams have much lower chances of getting through the first round or two than Silver's do.

The middle pack seeded team (seeds6-8) are on par with Silver.

It is at the lower seeds we on average have the 10,11 and 12 seeds close to three times more likely to perform an upset than Silver does (not saying it's your error or his, just an observation). Your 14,15 and 16 seeds are as high as 7 times more likely to perform an upset on their opponent than Silver's 14,15 and 16 seeds.

Your numbers are giving very reasonable chance for these high (13,14,15 and 16) seeds to even win in the second round, another distinction.

You also highlighted the Final Four on your table. You have a substantially more distributed chance for many teams to make the Final Four. Silver's Final Four odds use a heavy majority of the available chance to the top seeded teams and very little to median seeded teams. One example of this is Silver's 3% chance of Pittsburgh making the Final Four where you have 8.8%

I put a sum of the "rounded percentages" beneath each teams championship odds. This sum will provide an estimation of total estimated wins in the tournament. For example the sum of Silver's percentages on Florida is 326%, this estimates that Florida will win 3.26 games, his highest total of estimated wins. This may serve as another way to look at power rankings.

Your top 4 teams in estimated total wins:
1. Florida 2.49wins
2. Arizona 2.48wins
3. Virginia 2.44wins
4. Louisville 2.43wins

Midwest 4 Louisville √
Silver-Arturo
93%-81.4%
78%-63.6%
54%-41.0%
38%-27.6%
24%-17.5%
15%-11.4%
3.02wins-2.43wins
South 1 Florida √
99%-92.3%
84%-62.2%
62%-41.1%
41%-27.3%
26%-16.7%
14%-9.2%
3.26wins-2.49wins
West 1 Arizona √
98%-91.4%
73%-59.9%
58%-43.5%
42%-27.7%
23%-16%
13%-9.9%
3.07wins-2.48wins
South 2 Kansas √
92%-78.7%
67%-45.6%
42%-24.2%
21%-10.9%
12%-5.2%
6%-2.2%
2.4wins-1.67wins
East 1 Virginia √
96%-92.4%
71%-65.7%
39%-39.9%
23%-24.5%
12%-14.1%
6%-7.4%
2.47wins-2.44wins
East 4 Michigan State √
91%-80.3%
66%-48.3%
39%-25.6%
24%-14.6%
12%-7.8%
6%-3.7%
2.39wins-1.80wins
Midwest 3 Duke √
93%-78.8%
70%-48.3%
43%-30.7%
18%-15.3%
9%-8.3%
5%-4.6%
2.38wins-1.86wins
Midwest 1 Wichita State √
98%-90.5%
59%-58.4%
24%-30.7%
14%-18.2%
8%-10.1%
5%-5.7%
2.08wins-2.14wins
East 2 Villanova √
95%-87%
64%-55.3%
41%-34.9%
21%-18.9%
9%-10.3%
4%-5%
2.34wins-2.07wins
West 2 Wisconsin √
93%-78.8%
72%-49.3%
40%-26.2%
16%-13.1%
7%-6.2%
3%-3.1%
2.31wins-1.78wins
Midwest 2 Michigan √
95%-85.9%
74%-54.8%
37%-27.3%
14%-11.8%
6%-5.6%
3%-2.7%
2.29wins-1.88wins
West 3 Creighton √
88%-84.5%
54%-57.9%
30%-36.1%
12%-20%
5%-10.5%
3%-5.9%
1.92wins-2.15wins
Midwest 8 Kentucky √
74%-61.6%
34%-27%
14%-11.6%
8%-5.7%
4%-2.6%
2%-1.2%
1.36wins-1.10wins
South 6 Ohio State √
75%-62.4%
40%-35.2%
20%-19.8%
9%-9.1%
4%-4.4%
2%-1.9%
1.5wins-1.33wins
South 4 UCLA √
87%-69%
63%-38.5%
19%-17.1%
8%-9.4%
4%-4.6%
1%-2%
1.82wins-1.41wins
South 3 Syracuse √
88%-77.3%
50%-42.1%
22%-23.5%
9%-10.8%
4%-5.1%
1%-2.2%
1.76wins-1.61wins
East 3 Iowa State √
81%-68.9%
46%-39.4%
20%-19.3%
8%-9%
3%-4.2%
1%-1.7%
1.59wins-1.43wins
West 9 Oklahoma State √
52%-53.1%
14%-21.2%
9%-12.6%
5%-6.3%
2%-2.8%
1%-1.3%
.83wins-.97wins
East 6 North Carolina √
68%-55.9%
36%-28.7%
16%-12.7%
6%-5.3%
2%-2.3%
1.28wins-1.05wins
West 6 Baylor √
70%-59.2%
34%-24.2%
14%-11.5%
4%-4.7%
2%-1.8%
1.24wins-1.01wins
South 5 Virginia Commonwealth √
76%-66.7%
29%-37.1%
10%-16.4%
5%-9%
2%-4.4%
1.21wins-1.36wins
West 4 San Diego State √
75%-64%
43%-37.1%
14%-14.4%
8%-6.6%
2%-2.7%
1.39wins-1.26wins
East 7 Connecticut √
67%-61.7%
26%-28.3%
14%-15.3%
6%-6.9%
2%-3.1%
1.15wins-1.17wins
West 8 Gonzaga √
48%-46.9%
13%-17.4%
8%-9.8%
4%-4.6%
1%-1.9%
.74wins-.81wins
South 7 New Mexico √
64%-52.9%
22%-26.4%
10%-12.3%
3%-4.8%
1%-2%
1wins-.98wins
West 5 Oklahoma √
64%-60.2%
33%-30.8%
8%-10.8%
3%-4.6%
1%-1.7%
1.09wins-1.08wins
South 9 Pittsburgh √
72%-70.3%
14%-29.3%
7%-16%
3%-8.8%
1%-4.3%
.97wins-1.3wins
East 5 Cincinnati √
58%-56.7%
20%-27.8%
8%-12.8%
3%-6.4%
1%-2.9%
.89wins-1.08wins
West 7 Oregon √
65%-63.2%
19%-30.7%
10%-13.9%
4%-5.9%
1%-2.3%
.99wins-1.17wins
Midwest 11b Tennessee
52%-46% *play-in game (essentially divides win% by 2)
36%-29.4%
12%-14.1%
6%-7.8%
2%-3.3%
1.04wins (sum of Tenn+Iowa)-1.34wins
Midwest 11a Iowa
48%-54% *play-in game (essentially divides win% by 2)
32%-37.2%
9%-19.4%
5%-11.6%
2%-5.3%
1.04wins (sum of Tenn+Iowa)-1.34wins
East 8 Memphis √
55%-50.7%
17%-16.8%
6%-6.6%
2%-2.7%
.80wins-.78wins
Midwest 5 Saint Louis √
58%-54.7%
12%-16.1%
4%-5.9%
2%-2.3%
.76wins-.79wins
East 12 Harvard √
42%-43.3%
12%-18.5%
4%-7.4%
2%-3.2%
.60wins-.74wins
East 9 George Washington √
45%-49.3%
12%-16%
4%-6.2%
1%-2.5%
.62wins-.74wins
Midwest 7 Texas √
50%-49.7%
13%-20.7%
3%-7.5%
.66wins-.80wins
East 11 Providence √
32%-44.1%
12%-20.1%
4%-7.8%
1%-2.8%
.49wins-.76wins
South 10 Stanford √
36%-47.1%
9%-22.2%
3%-9.7%
.48wins-.85wins
Midwest 10 Arizona State √
50%-50.3%
12%-21.1%
3%-7.7%
.66wins-.82wins
Midwest 9 Kansas State √
26%-38.4%
7%-12.8%
1%-4.1%
.34wins-.57wins
West 11 Nebraska √
30%-40.8%
9%-13.3%
3%-5.1%
.42wins-.61wins
East 10 Saint Joseph's √
33%-38.3%
8%-13.5%
3%-5.7%
.44wins-.59wins
South 11 Dayton √
25%-37.6%
8%-16.7%
2%-7.3%
.35wins-.64wins
West 10 Brigham Young √
35%-36.8%
7%-13.3%
3%-4.4%
.45wins-.56wins
Midwest 12a North Carolina State √
42%-41.2%
7%-9.3%
2%-2.6%
.51wins-.53wins
Midwest 6 Massachusetts √
32%-33.4%
6%-11.6%
2%-4.8%
.40wins-.51wins
West 12 North Dakota State √
36%-39.8%
15%-16.3%
2%-4.3%
.53wins-.62wins
South 8 Colorado √
28%-29.7%
2%-7.1%
.30wins-.39wins
South 13 Tulsa √
13%-31%
4%-11.4%
.17wins-.47wins
West 13 New Mexico State √
25%-36%
9%-15.8%
1%-4.2%
.38wins-.57wins
South 12 Stephen F. Austin √
24%-33.3%
4%-13%
.28wins-.51wins
East 14 North Carolina Central √
19%-31.1%
5%-11.7%
1%-3.6%
.25wins-.47wins
Midwest 13 Manhattan √
7%-18.6%
3%-8.5%
.10wins-.30wins
East 13 Delaware √
9%-19.7%
2%-5.4%
.11wins-.26wins
Midwest 14 Mercer √
7%-21.2%
2%-6.6%
.09wins-.30wins
South 14 Western Michigan √
12%-22.7%
2%-6%
.13wins-.30wins
West 14 Louisiana-Lafayette √
12%-15.5%
3%-4.6%
.15wins-.21wins
South 15 Eastern Kentucky √
8%-21.3%
2%-5.9%
.10wins-.27wins
West 15 American University √
7%-21.2%
2%-6.8%
.09wins-.30wins
East 15 Milwaukee √
5%-13%
.05wins-.16wins
West 16 Weber State √
2%-8.6%
.02wins-.10wins
Midwest 15 Wofford √
5%-14%
1%-3.3%
.06wins.17wins
South 16a Albany √
1%-4.6%
.01wins-.05wins
East 16 Coastal Carolina √
4%-7.6%
.04wins-.09wins
Midwest 16a Cal Poly
56%-61%
1%-6.8%
.01wins-.08wins
Midwest 16b Texas Southern
44%-39%
0wins-.08wins
South 16b Mount St. Mary's







Midwest 12b Xavier








If you stuck through this congrats.

-@Nikolapekovic
If you're actually Nikola Pekovic, whoa, hi. I doubt that though. If you have checked it out, Nate Silver's model is pretty crappy. It takes seven metrics, including preseason expectations of people who don't understand winning in basketball (I like the idea of preseason expectations. Just not preseason guys' belief that the Harrison twins could be the next Allen Iversons and that would be a good thing). Every holistic metric gets a vote is not a good evaluation technique, as these guys have said many times. And Nate's using Win Shares to evaluate the impact of injuries. Well that's gonna underestimate the impact of a guy like Joel Embiid's injury, because win shares undervalue offensive rebounding. So not every analysis anyone does of the tournament is equally valid, and say a method that takes every metric and averages it like Nate's, isn't the best way to do it. I'm gonna take a closer look at Arturo's methodology and see if I should switch a couple picks (although everyone who's looking at this data is coming to the same conclusions-Wichita State's point margin underperforms what a top team would do with their strength of schedule, Kentucky's sick on the offensive glass, Kansas is not the same until Embiid comes back, Louisville outscored opponents by 21.1 points, 6 more than Wichita, and is indisputably a top 3 team, etc.). But I think that Nate's methodology is pretty weak, and Arturo's might be stronger than my simple metric based on SOS, point margin, orbr, drbr, tsp, opp tsp, tovr, and opp tovr, especially because he actually took the time to calculate individual performance! Thanks Arturo!
Thanks Nathan. And to reiterate, I am not saying that Silver's or Arturo's are better or worse than each other simply noting the differences.

I do think Arturo's analysis favors the lower seeded teams too much. I understand how this could come up in the model especially if Arturo highlighted the past few years. I'm just interested to hear if Arturo really believes or would be willing to wager that Eastern Kentucky or American actually have a 21% chance at winning in the opening round.

I do think dogs are more valuable now in NCAA. The world is a flatter place.
So, it looks like Vegas moneylines were a lot more accurate than your win percentages, mainly because of your high likelihood of upsets. I think Nate Silver's projections were also a lot closer to actual results. Are you still confident that your model is appropriate for teams with a wide disparity in talent?

Sign in to write a comment.