View New Posts
1. ## On calculating zero-vig NFL win totals

Hey everyone...I'm new here. I'm working on a project about the accuracy rates for various prognosticators of NFL win totals. Obviously, one of the "types" of prognosticators I'm looking at is handicappers, so I inevitably came across this forum during the course of my research. Needless to say, I've gotten some valuable knowledge about quant-based NFL handicapping in the short time I've been here.

In the handicapper realm, what I wanted to do was to figure out how accurate the Vegas win totals were. Given that there's juice attached to each Vegas win total, and that Vegas is trying to even out the money on both sides, it became obvious that I needed to calculate zero-vig win totals to see what Vegas' true win predictions were. Unfortunately, the only thing I could find around the web was an SBR forum reply by the apparently unparalleled Ganchrow himself. In his infinite wisdom and charitability, he provided a link to an Excel spreadsheet that uses Solver to solve for the probability of success on a single trial, when only the number of trials, the number of successes, and an estimate of the cumulative binomial probability are known. This was exactly what I was looking for, i.e., deriving a team's single-game winning percentage from # of trials = 16, # of successes = Vegas win total, and estimated cumulative probability = implied winning probability given the Vegas total and price. Solving for single-game winning percentage allows you to then simply multiply by 16 to get a "zero-vig win total." Predictably, Ganchrow's link to his spreadsheet is also broken.

So, basically, the point here is that, unable to find any shortcuts from the web despite hours and hours of googling, I was forced to figure this out myself using only Ganchrow's brief explanation of the mathematical theory behind what his now-defunct spreadsheet was accomplishing. And precisely because I couldn't find any Excel shortcuts/instructions on the web, I'm posting this so that anyone in the future who endeavors what I have these past few days will not suffer my same fate.

I'll preface the remainder of this post by saying that I'm not exactly sure how to share spreadsheets on here, so I'm just going to give written instructions. If anyone replies with some quick pointers on how to share spreadsheets in the forum, I'll be glad to share mine with everyone. Also, if I'm mathematically off-base anywhere, please let me know. After all, like I said, I'm new to these specific stat applications in the context of handicapping. OK, here goes...

Ganchrow said the following on that other thread:

Originally Posted by Ganchrow
If you think that the fair value of Chicago Under 10 is -133, this implies a conditional probability of 133/(133+100) = 57.082% for winning fewer than 10 games and a conditional probability of 1-57.082% = 42.918% for winning more than 10 games.

Assuming all games have equal win probabilities (a highly specious assumption to be sure) this implies a single-game win probability of 60.4858% (because =(binomdist(16,16,60.4858%,0) + binomdist(15,16,60.4858%,0) + binomdist(14,16,60.4858%,0) + binomdist(13,16,60.4858%,0) + binomdist(12,16,60.4858%,0) + binomdist(11,16,60.4858%,0) ) / (1 - binomdist(10,16,60.4858%,0) ) ≈ 42.918%)...

You need to solve for the single game win probability that yields a probability of winning X or more games equal to that implied by the fair total and over/under lines.
Putting aside the caveat about faulty assumptions, I took this to mean that we basically find the implied win probability of betting over X wins, and then reverse-engineer for the single-trial success probability that produces an over-X cumulative binomial probability equal to the implied win probability of over X wins.

For instance, as of this writing, Pinnacle has the 49ers 8.5o (-169) 8.5u (+144). So, the implied win probabilities are calculated as

IWP(O) = 169/(100+169) = 62.8253%
IWP(U) = 100/(100+144) = 40.9836%

After accounting for the 3.8089% hold, the zero-vig probabilities are calculated as

FairP(O) = 60.5201%
FairP(U) = 39.4799%

Given the 60.5201% fair probability for the over, we then have to find the single-game win probability that produces a cumulative binomial probability (X > 8.5) = 60.5201%. If you attempt this through trial and error using a binomial probability calculator (in Excel or on the internet) you find that the correct answer is appoximately 56.3242%. In other words, a team that wins a single game 56.3242% of the time will win more than 8.5 games in a 16-game season 60.5201% of the time; and that 60.5201% equals the fair probability for the 8.5o bet. Obviously, this information is good to know in itself. If my system spits out that the Niners are going to be a .688 team this season (aka an 11-5 team), I'm going to hammer that 8.5o (-169), which is offering them to me as a fair-price .563 team.

Now, here's where things got really complicated mathematically (at least for me). Remember that my main purpose here was to turn these single-game winning percentages into adjusted, zero-vig win totals so that I could assess how accurately Vegas's win-total futures predict actual win totals at the end of the season. Unfortunately, Ganchrow's instructions fell short in this regard; and this is where I'm going to fill in the gaps.

Basically, there were 3 major problems related to the same underlying issue: Ganchrow (understandably) assumed no pushes. The first way that this assumption reared its ugly head was that, in solving for the conditional probability given no push, Ganchrow's procedure removed around 10% of the NFL's team-win probability distribution. Specifically, when I solved for all 32 teams' zero-vig, single-game win probabilities, and then multiplied these by 16 to get a full-season, zero-vig win total, I ended up with approximately 234 total wins; about 22 fewer than the 256 that are necessitated by the NFL schedule. Clearly, this presents a less-than-ideal replication of the actual NFL season.

What's worse, however, is that the "missing wins" were systematically taken away from the teams with Vegas win totals that were positive integers. That's because, whereas the theoretical push probabilities for teams with half-wins in their win totals are essentially zero, the theoretical push probabilities for teams without a half-win in their win totals are in the 15-20% range. For instance, the Lions, who Pinnacle is currently offering at 5o (-118) 5u (+101), have zero-vig probabilities of 52.1069% for the over and 47.8931% for the under. Using Ganchrow's method, these data correspond to a fair, no-push, single-game win probability of 26.4068%; which in turn results in a push probability of 19.2343%. In other words, a team that wins a single game 26.4068% of the time will win exactly 5 games in a 16-game season (i.e., push on the over) 19.2343% of the time. [To see this for yourself, plug in .264068, 16, and 5 into a binomial probability calculator, and look at the cumulative binomial probability for P(X = 5).]

This does not occur when using Ganchrow's method for deriving fair, single-game win probabilities for teams whose win totals include a half-point. Why? Well, aside from the empirical data suggesting ties (aka half-wins) are incredibly rare in the NFL -- to the point of utter statistical insignificance -- the theoretical reason for a lack of push probability is that, "successes" in binomial terms are themselves positive integers. To derive single-game win probabilities for half-win teams -- according to Ganchrow's method -- you simply move up or down to a neighboring integer to get around the problem.

For the purposes of illustration, let's return to the Lions. Suppose their Vegas win total was 5.5 instead of 5.0, with the same prices for the over and under. To solve for the single-game win probability, let's say we chose to round down to 5. Now, when you plug .264068, 16, and 5 into the binomial calculator, and get that same P(X = 5) = 19.2343%, it's no longer the push probability because, in 5.5o/u world, X = 5 is a loss on the over. Thus, it's actually part of the win probability, the push probability = 0, and we've thereby satisfied Ganchrow's no-push assumption.

My aim here is not to quibble with Ganchrow's method. The guy is obviously a legend, 100 times smarter than me, and 1 million times the handicapper I am. Indeed, when your aim is to calculate theoretical holds, implied win probabilities, or zero-vig win total for teams with win totals involving half-wins, Ganchrow's method works just fine because there is no theoretical push probability to violate his no-push assumption. However, when a team's Vegas win total does not include a half-win, there does exist a theoretical push probability; and its existence makes about 15-20% of the team's season escape into the ether, such that the NFL is no longer a 256-win season.

Up until now, I've focused on how Ganchrow's method was inadequate for the purposes of my specific research. The third problem -- in addition to the 234-win-season and integer-total-discrimination problems -- that arises out of Ganchrow's no-push assumption is more likely to be vexing for handicappers like yourselves who might see the financial value in knowing how much a team's Vegas win total differs from it's zero-vig win total. If I were into naming things, I might call it The True Blood Paradox. If I were actually good at naming things, I'd probably name it something else.

Basically, the issue is this. Recall that, using Ganchrow's method, the 2010 Lions have a zero-vig, single-game win probability of 26.4068% when the Vegas line is 5o (-118) 5u (+101). When you extrapolate that winning percentage out to the full 16-game season, you end up with a zero-vig win total of 4.23. Anyone notice anything peculiar in the relationship between the total (i.e., the output of our math) and the line (i.e., the input of our math)? They're totally at odds with each other (pun intended). Vegas has listed the over as the favorite, the fair win probability suggests it's a favorite, and yet the zero-vig total says it's an underdog.

Now, like I said, I'm new at this, so there could easily be some math-based handicapping reason for this counterintuitive result. If there is, I'm eager to learn. However, I'm under the impression that, when going from vig to zero-vig lines/totals/probabilities, the favorite remains the favorite and the underdog remains the underdog because the math requires it that way. And, as I understand it, the Lions' zero-vig probabilities of 52.1069% for the over and 47.8931% for the under work out to a zero-vig price of -108.63/+108.63, i.e., the over is still the favorite. In other words, even for a team with a half-win in their Vegas win total, every complicated handicapping-related mathematical transformation of the input results in the favorite remaining the favorite; but when we do the simplest transformation in the entire process -- multiplying by 16 -- the favorite shapeshifts into a dog (hence, the True Blood reference).

Rather than chalking this up to complex math, my intuition is that the True Blood Paradox happens because of the ~19% of Detroit's season that disappeared when we derived their no-push, zero-vig, single-game win probability of 26.4068%. Therefore, I'm guessing that, in order to solve the dilemma, we have to allocate that 19% of team wins somewhere; presumably, half to the win total and half to the loss total. My intuition is/was that, by doing so, we'll solve all of the push-related problems I've exhaustively detailed so far. Below is my revised method for deriving zero-vig, single-game win probabilities, and extrapolating them into zero-vig win totals. It preserves Ganchrow's invaluable method, but extends and improves it where necessary.

Step 1: Calculate implied win probabilities for each team's over and under.
Step 2: Convert implied win probabilities for each team's over and under into zero-vig probabilities.
Step 3: Divide the Vegas win totals by 16 to get starting values for each team's single-game win probability.
Step 4: Use each team's Vegas win total and starting value (from Step 3) to calculate a preliminary value for Ganchrow's no-push cumulative binomial probability for the over.
Step 5: Calculate the sums of squared errors (SSEs) for the preliminary discrepancy between each team's zero-vig win probability for the over (from Step 2) and their no-push cumulative binomial probability for the over (from Step 4).
Step 6: Use Excel Solver to minimize the SSEs (from Step 5) by changing the starting values for each team's single game win probability (from Step 3). This will give you the precise single-game win probabilities that result from from Ganchrow's no-push method.

If you were to multiply by 16 here, you'd get the same confusing results that led me on my journey to figure this whole thing out. The rest of the procedure is my fix, and it essentially uses the results of the Ganchrow method to redo Steps 1-6 only for teams without half-wins in their Vegas win total.

Step 7: Use each team's Vegas win total and Ganchrow's single-game win probabilities to calculate the binomial probability, P(X = win total) for each team, i.e., their push probability. Set push probabilities equal to zero for teams with half-wins in their Vegas win total.
Step 8: Recalculate each team's zero-vig probabilities for the over and under after subtracting out half of the push probability on each side. You should end up with values akin to what's displayed in the Half-Point calculator, i.e., a push probability, a fair win probability for the over, and a fair win probability for the under that, when added together, equals 100%. Obviously, for half-win teams, these zero-vig probabilities will be the same as what you got in Step 2.
Step 9: Repeat Step 3.
Step 10: Use each team's Vegas win total and starting value (from Step 9) to calculate a preliminary value for my push-allowed cumulative binomial probability for the over. Do this for both half-win and integer-win teams.
Step 11: Calculate the sums of squared errors (SSEs) for the preliminary discrepancy between each team's zero-vig win probability for the over (from Step 8) and their push-allowed cumulative binomial probability for the over (from Step 10). Do this for both half-win and integer-win teams.
Step 12: Use Excel Solver to minimize the SSEs (from Step 11) by changing the starting values for each team's single game win probability (from Step 3). Do this for both half-win and integer-win teams. This will give you the push-allowed, zero-vig, single-game win probabilities.
Step 13: Multiply each team's push-allowed, zero-vig, single-game win probabilities (from Step 12) by 16. Do this for both half-win and integer-win teams. This will give you true, zero-vig, win totals. Adding up team wins will give you approximately 256. Any error is due to rounding, so you can randomly add or subtract .01 (or .001 or .0001, etc.) if you really want to get it to add up to exactly 256. No favorites will shapeshift into dogs.

To end this tome, I'll just go through a quick example that shows you how my method fixed the Lions (the win total, not the team...the team's beyond fixing).

Step 1: As mentioned earlier, the Pinnacle line for their team win total is 5o (-118) 5u (+101). This corresponds to IWP(O) = 54.1284% & IWP(U) = 49.7512%.
Step 2: As mentioned earlier, we get zero-vig probabilities of 52.1069% for the over and 47.8931% for the under.
Step 3: Starting value = .3125.
Steps 4-6: Excel Solver reduces SSE to zero, and results in a Ganchrow single-game win probability of 26.41%.
Step 7: P(X = 5) = BINOMDIST(11,16,26.41%,0) = 19.2343%.
Step 8: Half of 19.2343% is 9.6172%. P(O|zero vig) = 52.1069% - 9.6172% = 42.4897%. P(U|zero vig) = 47.8931% - 9.6172% = 38.2759%. So, push probability + fair over probability + fair under probability = 19.2343% + 42.4897% + 38.2759% = 100.00% (ignore the rounding error).
Step 9: Starting value = .3125.
Steps 10-12: Excel Solver reduces SSE to zero, and results in a push-allowed, zero-vig, single-game win probability of 32.5112%.
Step 13: Zero-vig win total = 32.5112% x 16 = 5.2018 wins. Magically, the Lions win total fits into a 256-game NFL season. Even better, the favorite of 5o is still the favorite. The True Blood Paradox has been solved.

I'm sure there'll be questions & comments, so fire away. Like I said, I'm here to learn more about the math/stats side of handicapping, so criticisms of my math are welcome. And, of course, I'm happy to delve deeper into the Excel side of things because (a) I know there's more to detail in the method than I
probably discussed here, and (b) I'm sure there are more savvy ways to automate most of the procedure in Excel. To close, here are the zero-vig win totals I came up with based on Pinnacle's lines as of this writing (MIN is off the board):

Team PinaccleW FairW
ARI 7.5 7.77
ATL 9 9.38
BAL 10 9.97
BUF 5 5.13
CAR 7 6.87
CHI 8 7.88
CIN 8 8.26
CLE 5.5 5.73
DAL 10 10.30
DEN 7.5 7.06
DET 5 5.20
GB 9.5 9.88
HOU 8 8.43
IND 11 10.59
JAX 7 6.94
KC 6.5 6.56
MIA 8.5 8.59
NE 9.5 9.60
NO 10.5 10.30
NYG 8.5 8.89
NYJ 9.5 9.43
OAK 6 6.55
PHI 8 8.21
PIT 9 8.39
SD 11 10.59
SEA 7.5 6.96
SF 8.5 9.01
STL 5 4.82
TB 6 5.83
TEN 8 8.22
WAS 8 7.44

2. interesting thought process, i've been trying to tackle this lately too, with half points everything works out fine and the combination of win future, no-vig probability, and predicted wins after the fact are in accordance with a rational scale

but when its a whole number its a difficult situation, and what I did to get to a total number from the interval 246-248 ( to allow for the Vikings total to be convened later), I divided the PMD function of integer wins by 8 and added to the newly constructed winning percentage

it works out, with the accumulation and the proportion of wins and vig up and down the scale

i'm sure there is some reason for it to be working, i have yet to find a reason to justify other than instantiation through trial and error