Market Efficiency and Bayesian Probability Estimation via the Beta Distribution : Nom

TakeIt · 12-14-11, 05:13 PM

Originally posted on 07/20/2010:

Originally posted by Ganchrow

Let's start off by making a very general observation about betting markets: Betting markets are inherently competitive -- they reward winners and penalize losers. This creates a dynamic fundamentally different from what we see in, say, internet polling where there's no self-reinforcing drive to succeed.

Outside the betting world and without money on the line there's little incentive for accuracy.

Within the betting world, however, people make their opinions known with a signal more telling than their vote -- their money. Within the competitive marketplace it's democracy be damned and capitalism rules the roost.

(Now certainly this signal can get corrupted by such forces as recreational bettors who'll blindly bet their home team at any odds or by games thrown by dishonest referees or players. Market economists account for these factors by assuming that the smart money (including both books and players betting for profit and with superior information flow and processing), will in totality eventually overwhelm the rest.)

This leads to a framework known as (profit-wise) "market efficiency".

Conceptually, the way we'll be considering what market efficiency as applied to sports betting says is that without "extraordinary knowledge" (including meta-knowledge) a player's expected profit from betting at the market no-vig line is on average zero.

This definition still allows for many interpretations of market efficiency, running the continuum from so-called "strong form" (most restrictive in its assumptions as to market behavior) to "weak form" (least restrictive). But at its most fundamental the defining parameter comes down to little more than what exactly is meant by the characterization of knowledge as "extraordinary".

Following is an indicative list of examples of types of knowledge going from those likely deemed extraordinary by all but adherents to the strongest forms of efficiency to those only so deemed (possibly) by proponents of the weakest:

first-hand knowledge of game fixing
insider information regarding a change in directive for officials
insider information regarding a key player's health
faster receipt and processing of new information
faster response time following receipt of new information
access to non-readily available (but still public) information
novel quantitative processing of existing information
any proprietary processing of existing information
information gained by watching ESPN
deeper fundamental understanding of motivations and factor facing one's home team or University

Some strong form advocates would say that while they wouldn't in theory doubt the efficacy of some of the items on the above list, they would call in to question their general existence (or might just point to the infrequency with which such knowledge is found and so tend to consider them little more than just pathological cases).

Personally, I'll just leave my own characterization as purposely vague with the eventual intention of providing some form of weak parameterization of the concept. I will say that given the above listing I'd probably be somewhere between the middle and the bottom.

Another question one should ask would be what exactly represents a market line? Here there's a lot less debate. A market line should be one accessible (and already widely bet) to the betting public as a whole and one that won't be artificially held in place by a book limiting the number of bets by individual players (in other words after a reasonable period of time a market line should either move out of the way or accept another bet at the same price).

The best examples for many sports would probably be Pinnacle, BetFair, and Matchbook. For certain geographically disparate sports, however, one would probably tend to find a better indication of the true market line with some of the Asian books. Generally speaking, very high limits, low juice, market prominence, and attitude towards professionals will jointly determine how well a given sports book's lines are indicative of the market as a whole.

For the sake of argument we'll just say "the market" is represented by the inside line across geographically-appropriate "Pinnacle-like" books and exchanges. (Within what markets and under what circumstances Pinnacle itself would actually represent a "Pinnacle-like" book would be very much up for debate.)

Getting right down to it, recall that we that declared the defining characteristic of market efficiency to be that a non-extraordinary player's expected profit from betting at the market no-vig line would on average be zero (or equivalently that on average E[player profit] = -theoretical hold, regardless of market side chosen).

Recall that for a true win probability of p and a no-vig decimal line of d:

E[profit] = p*d - 1 = 0
d = 1 p

But this definition of market odds is intuitively rather unsatisfying as it presupposes exact knowledge of win probability. Remember that the only defining characteristic of market efficiency to which we've limited ourselves is that (absent extraordinary information, which we discount for the time being) expected profit on bets placed at the zero-vig line will always be zero. But does this really require that the market knows the true win probability exactly all the time every time without margin for error?

What we're trying to get at here are some determinations about how the market might set its lines (and although we keep referring to "the market" as a single entity, understand that even though the market line might be reflected at only a single book, it the market as a whole represents a conglomeration of opinions across professional players and books. The mathematical mechanics of how "the market" might handle such a conglomeration this we won’t delve into.

I expect that anyone who's already gotten this far is already familiar with the concept of expectation as it relates to, say, value. Specifically, the expected value of a bet could be described as the theoretical average profit per game were the game to be repeated an infinite number of times (at least that's how a "frequentist" would describe it).

So we'll write:

E[p̂] = 1 d
d = 1 E[p̂]

Where the caret-looking symbol lurking over (or nearly over) the p is known by the highly complex term "hat". In probability, a hat-ed variable almost always represents an estimate of a distribution parameter.

But how exactly should we interpret the seemingly innocuous term "E[p̂]"? Well all we're doing is applying the concept of expectations to a random variable.

It's just that in this case that random variable happens not to be an estimate of value but rather of probability ... more specifically the market's estimate of the true win probability of the game.

Think about it in a manner similar to expected value. If we repeated the game an infinite number of times and then repeated that set of infinite repetitions an infinite number of times what we'd see is that across each one of those infinite sized game-groups, the frequency of wins was on average the same as the expected win probability (note that this abstraction to an infinite number of infinitely sized game groupings isn't necessary and is only presented with the hope that it might aid in visualization).

This is an extremely important concept. We're not forcing the assumption that the market is always correct in its probability estimates ... just that its probability estimates are correct on average.

And there's another way to consolidate the above:

E[p̂] = p

So in other words, the expected value of the market estimate of probability equals the true probability. In more technical terms this statement can be read to say that p̂ is an unbiased estimator of p. So while a given probability estimate might be wrong in any single case it's still correct on average. This is perfectly analogous to expected value.

(FWIW, note that this is not an equivalent statement to the similar E[1/p̂] = 1/p, so in other words while we’ve concluded based on our assumptions that the zero-vig implied probability is an unbiased estimator of true probability, we can't in general conclude that the zero-vig line will be an unbiased estimator of the true line.)

So now that we've looked at the expected value of the market probability estimate, the next logical meta-statistic would be the variance of the market probability estimate. Once again we're not deviating from well-trod territory -- this is analogous to not only talking about the expected value of a bet, but also talking about the variance of the value of a bet.

For a reason that should soon become I'm going to switch over to time subscripts and propose a very simple model of market efficiency such that the market estimate of win probability at ordinal time t, p̂_t (where t = 0, represents the present and t = Τ represents the market close) obeys:

E(p̂_t) = p ∀ 0 ≤ t ≤ Τ
Var(p̂_s) > Var(p̂_t) > 0 ∀ s > t ≥ Τ

So what this is saying is that at any moment in time, the reciprocal of the market line (which is to say, p̂_t) is an unbiased estimator of true win probability and that the variance of that estimate is strictly decreasing as game time approaches.

As a corollary to 1) above we can write:

E(p̂_t) = E(p̂₀) ∀ 0 ≤ t

Or in other words the expectation of the implied market probability at any point in the future is just the current implied market probability. This follows directly from the unbiasedness of the probability estimator across time.

Let's step back and consider this all for a moment. We're saying that at any point in time between now and game start the market estimate of probability will on average be correct and will become continuously more accurate (meaning less variance) as time progresses.

A good way to intuit this would be in terms of weather forecasting, say the probability of precipitation on a given day. In theory we might say that every forecast we hear we believe to be unbiased (correct on average) and that we further expect the reliability of each forecast to increase (lower variance) as we get closer to the day for which we're forecasting.

Moving back to statement 1) above, let's try to find some sort of additional functional bound on the market estimator. One particularly apparent bound should be fairly easy to spot. If we're estimating some probability p, and know that that estimate will be correct on average what’s the noisiest possible probability estimate that will still be unbiased?

It should eventually become apparent that the maximum variance estimator p̂, would simply be the estimator that takes on the value p̂=100% with probability=p, and the value of p̂=0% with probability=(1-p).

This is just another way of saying that p̂ were Bernoulli distributed with parameter p. (Once again think of weather forecasting ... a Bernoulli forecast of this type would always offer either a 100% or 0% probability of rain on a given future day, and while it might be correct on average -- and remember that the true probability of rain on a future is never really 0% or 100% -- it'll certainly be less valuable in picking a future day for a picnic than a lower variance probability estimate).

We know that a Bernoulli-distributed variable with parameter probability parameter p has variance = p*(1-p), so when taken in conjunction with condition 2) which says that future variance of a given estimate will always be lower than the current variance, this gives another bound on Var(p̂²). Specifically, 2) implies that maximum variance of p̂_t could only be achieved at t=&-infin;, which means that for all finite t:

Var(p̂_t) < p*(1-p)

Recalling that for any random variable X:

Var(X) = E[X²] - E[X]²

We could also say:

E[p̂²] < p*(1-p) + p² ≡ p

These are by necessity rather weak statements, but they nevertheless do give lend a bit more probabilistic rigor to the notion of market efficiency by allowing us to place functional bounds on the estimator p̂:

E(p̂) = p
Var(p̂) < p*(1-p)

But this upper bound on Var(p̂) is the result of assuming a maximum variance estimator of p, which was that obtained by the Bernoulli distribution and a high variance estimator like just doesn't pass the smell test. A probability of 0%/100% would yield a bet at +∞ on one team or the other. Whatever assumptions we might accept that's just a full out absurd result. So we'd really hope that we can improve upon that.

But where to turn? Given that the Beta distribution is the conjugate prior of the binomial it certainly might be convenient to look there.

Recalling our vast knowledge of probability distributions (or Wikipedia) we can say that if p̂ is Beta distributed with mean p and Beta shape parameter ν, then the variance of p̂ is given by:

Var(p̂) = p*(1-p) 1+ν

It's via this parameter ν (the Greek letter nu) that we can start thinking of providing a parameterization of the concept of market efficiency.

Specifically ν → 0, represents the maximum variance unbiased estimator of p, while ν → ∞ would yield constant win probability p̂ ≡ p. And anywhere else in between is fair game.

This is actually a meaningful result. What we've done is use a parameter from a well-known distribution to place an equality bound on the variance of the market probability estimate effectively parameterizing the degree of efficiency for a given market. Of course we're still assuming a magically unbiased line that's even more magically taking into account all market knowledge ... but patience is a virtue.

To continue further, we’ll consider our model from the perspective of Bayesian inference.

What is Bayesian inference you might ask?

Well it's really a different way of looking at probability that allows for a different methodology in forecasting. Recall that I earlier wrote that a frequentist would view the expected value of a bet as the average profit per game were the bet to be repeated an infinite number of times. Well that's just not how a Bayesian sees the world.

A Bayesian considers the probability of an outcome as a (possibly subjective) measure of the degree of informed belief in that outcome. We talk about "informed" belief because the process of Bayesian inference involves updating one's prior beliefs based on the availability of new evidence.

A Bayesian doesn't in general think about hypothetical frequencies of an event given a hypothetical infinite number of repetitions because in general events can't be repeated an infinite number. To estimate outcome probability what a Bayesian does is gauge prior knowledge of that event and then update that knowledge as future information becomes available.

Note that this methodology doesn't hold value for all types of experiments as for some events can know everything there is to know about it a priori. These events (take the game of Craps for instance) are frequentist in nature meaning that there is no value to new information (although Craps could deemed be otherwise were we to suspect either an unfair game or the presence of a skilled dice roller).

This can be a particular convenient tool when looking at the progression of a betting line. One can build a forecast in any way one chooses and then continually reevaluate that forecast based on upon the availability of new evidence (e.g., a change in the market line).

By way of contrast, a non-Bayesian might place a bet at a line of +3, and then after observing the line move to +5 simply declare his original bet "bad" in that the market hadn't backed up his opinion. A Bayesian, on the other hand would realize that his bet was made conditioned only on the information he had available at the time the bet was made (which would have only included the then current line), and while he would almost certainly view the bet as "unfortunate", he could accept that the bet was still a good bet at the time it was placed. Certainly he'd use the new information to revise his current opinions on game probabilities, and going a step further might even use it to discount the value of his model, but a Bayesian can accept that a decision might be perfectly valid at the time it was made, even as new information sheds doubt on it in hindsight.

At the heart of all Bayesian inference is what's known as Bayes' theorem. This is stated as follows:

Let H = A hypothesis (for example the hypothesis that win probability = a specific value p̂)
Let E = The evidence

Prob(H|E) = P(E|H) * P(H) / P(E)

What this says is that the probability of a hypothesis (which we can extend to a probability distribution) given some set of evidence is proportional to how likely we'd be to view that evidence assuming our hypothesis true times the probability of the hypothesis based solely on prior information (i.e., without benefit of any new evidence).

Armed with this added understanding, we can meaningfully back off from our prior view of market efficiency by relaxing the global assumption of p̂ being a strictly unbiased estimator of p and replace it with the weaker assumption:

p̂(K) = E(p | Κ)

where Κ represents a given "knowledge space" comprised not only of timely and historical event and market data but also of specific prior knowledge of relevant deterministic functional relationships between that data and p̂, the variable of interest.

Note a couple of things about this equation.

Firstly, we've moved our p̂ estimator outside of the expectation and moved the parameter p representing "true" market probability inside the expectation. This is consistent with a Bayesian framework. The actual probability value is itself viewed as a random variable, while the estimator of p is viewed as a dependant variable. This is because Bayesian true probability isn't viewed as a discrete measurable quantity and can only viewed in terms of information upon which estimates are conditioned.

Secondly, note that we're now making explicit the functional relationship between our estimator of p̂ and the knowledge space. This reflects the notion that the estimator is only as valid as the knowledge upon which it's based. Also included in K would ν(K), which once again represents a measure of inherent "value" of the knowledge space, only in this case relating that value not only to the variance of the estimate but also to its mean.

This is virtually akin to stating that p̂ is a minimum mean square error (MMSE) estimator of p conditioned on "prior knowledge". It isn't exactly the same because we're necessitating explicit inclusion of "relevant deterministic functional relationships" within the space. A true MMSE estimator would need to already take into account all possible functional relationships between information pieces.

Still what this allows for is an explicit decomposition of "market probability" into any number of discrete conditionally independent knowledge packets. What’s the relevance of this? Well it’s actually quite substantial. Remember our vague "extraordinary knowledge" constraint? Well now that concept has some statistical teeth. Extraordinary knowledge with respect to a given estimator is simply any knowledge not included in the Bayesian inference used to derive that estimator.

It might at first seem like this is just an issue of semantics, but it's really a whole lot more. We're now no longer confined to talking about a single market line that has the property of "efficiency" to some specified degree, but rather we can talk about any number of market estimators each gauged by specific p and ν parameters. These estimators could represent current lines at any number of sportsbooks as well as any given bettor's own forecasts (the proviso being the conditional independence of the estimates or at least a gauge as to how the conditional joint probabilities of the estimates relate).

Let's consider a hypothetical example of such a partition.

I'll assume that readers are already familiar with the Binomial distribution, which gives the probability of occurrence of a specified number of "wins" over a specified number of bets each with a constant probability.

One feature of the Binomial distribution as it relates to Bayesian inference is that of it having a conjugate prior, specifically the Beta distribution. A conjugate prior is really just a convenience when applying Bayesian inference. What it tells us in this case is that if we model our prior distribution of the Binomial probability parameter (which here is to say P(H) in the statement of Bayes' Theorem above) as a Beta distribution, then all our posterior distributions (the probability of the hypothesis given the evidence, P(H|E)) will also be Beta distributed (only reparameterized to take into account of the new information).

This is especially convenient for the Binomial distributions because the uniform distribution is itself a special case of the Beta. This means that if absent any information we can fairly judge all win probabilities equal likely, then we can use additional information to continuously create new Beta distributions of win probabilities.

Here's how it might work in practice (you can obtain specific values of the Beta distribution using the BETADIST() function in Excel).

Consider a sporting event so new that bettors are completely ignorant of the distribution of win probabilities and so can do no better than assuming all win probabilities equally likely (i.e., prior win probability estimates are drawn from the uniform distribution over the interval(0, 1)). This by definition is equivalent to the Beta Distribution with parameters mean = 50% and our value parameter ν = 2 (as explained above).

To make this compatible with Excel, there's a slightly different parameterization of the Beta that we'll be using.

Specifically we'll consider parameters α and β (the usage of "β" being by convention and unrelated to the name of the Beta distribution itself).

The mean and variance of the prior Beta distribution for probability parameter p are then given by:
E(p) = α α + β
Var(p) = α + β (α + β)^2 * (α + β + 1)
The previously utilized ν parameter could now be reinterpreted as the sum of the parameters α + β yielding:
E(p) = α ν
Var(p) = E(p) * (1-E(p)) ν+1

So by assuming a uniformly distributed prior (encapsulated by α = β = 1, with ν = α + β = 2) this gives us:
E(p) = α α + β = 1 2 = 50%
Var(p) = E(p)*(1-E(p)) ν + 1
= 50%*50% 3
= 1 12
Which are just the mean and variance of the uniform distribution over support p ∈ (0%, 100%). The value parameter for this initial prior is ν = 2.
Now let's say that after 100 games have been observed (all between different teams), the only information available is that out of those 100 games the home team won 55.

Consistent with our beta conjugate prior to the binomial, we update our distribution as follows:
α_new = α_old + successes = 56
β_new = β_old + failures = 46
ν_added = 100
ν_cumulative = 102

This means that our new (posterior) distribution for home team win probability is distributed as:
p ~ Beta(56, 46)
This then implies:
E(p) = α α + β = 56 56+46 ≈ 54.9020%
The result for E(p) above is often referred to as the "rule of succession".

Var(p) = 56*46 102*102 * (56 + 46 + 1)
≈ 0.2404%
This would then imply a fair line of ~ -121.739 on the home team in the next match up.
Now let's imagine that some additional information comes to light regarding the record of a specific team. Let's say that we discover that this team has won 17 of its last 20 games against fully typical opponents. If this team were next playing on the road what would the fair line on that game be?

To answer this we'll need to perform another round of Bayesian estimation to yield a posterior distribution that takes into account this new information. Looking at it from the perspective of the home team:
α_new = α_old + successes = 56+3 = 59
β_new = β_old + failures = 46+17 = 63
p ~ Beta(55+1+3, 45+1+17)
ν_added = 20
ν_cumulative = 122

E(p) = α/ν = 59/122 ≈ 48.3607%
Var(p) = 59/122 * (1-59/122) /(123) ≈ 0.2030%
This would then yield a fair line of +106.780 on the home team.
Up to this point we've been solely considering historical categorical win frequencies as components to this framework. There's no reason that this need be the case in general, however. As long as a model may be used to estimate binary probabilities, then once we assign a "value" for those estimates in the form of our ν parameter (this can be obtained by considering the volatility of probability estimates) we can input these figures directly into our Bayesian inference model.

This could be especially useful in inferring the value of a book's line based upon how they react to new information.

So let's go back to the original 55/45 home away record and assume that new information has now come to light ... specifically regarding the composition of one of the teams.

Once this information is released, Pinnacle now offers a no-vig line of even odds (and we'll assume for the sake of simplicity that they aren't biasing their market in any way).

What can we infer from Pinnacle's actions? Well if we assume they're working off the same priors as we are, then we know that whatever the form of their new model, it managed to bring α and β up to equal footing. This will be useful in the next illustration.
Let's say the next game was won by the home team (offered by Pinnacle at even odds) and the two teams were then slated to play another game following on the same home field.

Further assume that this time around Pinnacle hangs a -100.6579 zero-vig line (p= 50.1639%) on the home team.

What can we infer this go-around?

Well this time there's actually quite a lot there.

First we'll update our priors for the home/away model:
α_new = α_old + successes = 57
β_new = β_old + failures = 46
ν_added = 1
ν_cumulative = 103

E(p) = α/ν = 57/103 ≈ 55.3398%
But now we're better poised to utilize Pinnacle's line signal.

We already determined that Pinnacle's old values α_pinnacle-old and β_pinnacle-old values were equal. We can further state that Pinnacle's new α_pinnacle and β_pinnacle values must obey:
α_pinnacle / (α_pinnacle + β_pinnacle) = 50.1639%
So by incorporating updates to the home away model (although admittedly without considering updates to the proprietary Pinnacle model) we also have:
α_pinnacle = α_pinnacle-old + 1
So putting it all together:
α_pinnacle / (α_pinnacle + α_pinnacle - 1) = 50.1639%
α_pinnacle = 50.1639%/(2 * 50.1639% - 1) = 153
β_pinnacle = 153 - 1 = 152
This time, let's say we beat Pinnacle to market having just learned of a last moment game change. The upcoming game, it turns out, will for the first time in the history of the sport be played on neutral ground.

Now obviously we’d want to jump on what had been the previous away line of +100.6579, but the trickier question here is what would be our estimate of the edge on the away team bet?

In this case all we'd need to do be would subtract out the alpha and beta additions provided by the home/away model from the those inferred as Pinnacle's proprietary model parameters. This gives us:
α_new = 153 - 56 = 97
β_new = 152 - 45 = 107

E(p) = 47.5490%
This yields an edge on the away bet at +100.6579 of 5.2%.

Now obviously the above relies on a very simplified set of assumptions, but it's presented to demonstrate the methodology of Bayesian inference. Even with a relatively paltry model, if one properly conditions one's prior forecasts on the information made implicitly available via publicly available lines, profit opportunity may abound.

The important take away from all this (and hopefully this will the start of an ongoing discussion) is that new information needs to be continuously used to condition prior knowledge. To that end, it's imperative that when building a model one attempt to incorporate timely broad-based market information (e.g., widely available market indicative lines) into one's forecasts.

Hopefully, I've provided a good starting point that yields food for additional thought.

is all of this just another way of saying, "no edge no bet?"