1. #1
    RockyV
    RockyV's Avatar Become A Pro!
    Join Date: 09-11-10
    Posts: 26

    Conceptually Understanding Sports Betting from a Bayesian pov

    Hi guys,

    I'm pretty much a newbie to SBR, trying to understand how this stuff works. I understand probability/statistics pretty well. It seems to me that you can understand SBR pretty well through Bayesian statistics.

    Let's focus on spread betting. The basic idea is that you want to predict as accurately as possible (before the game starts) what the final spread will be. If you had some magical genie that would tell you the exact spread beforehand, life would be great, you'd win every bet.

    Of course, this is probably too much to ask for. Suppose instead you had a slightly weaker genie. He cannot predict the future, but instead he can build 100,000 (or some other suitably large number, say 1 billion if you think 100K is too small) perfect copies of the real world, and then play tomorrow's game in each simulated world.

    You wouldn't know the final spread for tomorrow's game, but instead you'd have a probability distribution describing it. For simplicity's sake, lets say that the 100K margins the genie gave you are very well approximated by a normal distribution (http://en.wikipedia.org/wiki/Normal_distribution) with say mean -5 and standard deviation 8.

    In other words, from the 100k simulations the genie gave you, you feel that the true spread should be -5.

    If the Pinnacle spread for the game were say -2, you could then calculate (or consult a Z-scores table, for example) that the probability of the favorite beating the spread is roughly 64.6%.

    So in other words, this would be a highly, highly favorable bet for you....you'd beat the spread 64.6% of the time.

    Of course, all of this is highly dependent on two factors:
    1) How far away is the Pinnacle spread from the "Average" spread the genie gives you? For example, if the Pinnacle spread were changed from -2 to -5 in the example I did above, then you'd have absolutely no advantage...you'd only beat the spread 50% of the time.
    2) How big the standard deviation of your genie's estimate. If the standard distribution changed from 8 to say 100 in the above example, then your chance of beating the spread drops down to 51.197% (you can get this value yourself by using a Z-score calculator.)

    Does the above conceptually make sense, as a way to understand sports betting conceptually? Of course, you can make this simple setup more precise by progressively weakening the power of your genie. But I'm just trying to figure out if I have a high-level understanding of what it is going on here (at least from a Bayesian sort of perspective.)

    Thanks in advance for any clarification/comments.
    Last edited by RockyV; 09-11-10 at 10:52 AM. Reason: .

  2. #2
    Peregrine Stoop
    Peregrine Stoop's Avatar Become A Pro!
    Join Date: 10-23-09
    Posts: 869
    Betpoints: 779

    yes, this makes sense.
    just a note: most things are not normally distributed.

  3. #3
    RockyV
    RockyV's Avatar Become A Pro!
    Join Date: 09-11-10
    Posts: 26

    Yeah, I chose the normal just for an example, since you only then need two parameters (mean and standard deviation) to then calculate (well, to be more precise, accurately estimate) the probability of beating the spread. If the distribution of the 100k spreads were not normally distributed, this simplification would probably be a terrible idea. But there are other things you could do (for example, just count the fraction of those 100k spreads which beat Pinnacle, and use that to figure out probabilities.)

  4. #4
    Sportslover
    Sportslover's Avatar Become A Pro!
    Join Date: 06-04-09
    Posts: 860
    Betpoints: 320

    Am I correct in saying that a normal distribution would give you the number of "scores" and then you'd have to do a separate calculation to work out what percentage of those scores were touchdowns, field goals, safeties?

  5. #5
    Wrecktangle
    Wrecktangle's Avatar Become A Pro!
    Join Date: 03-01-09
    Posts: 1,524
    Betpoints: 3209

    Perhaps a cleverer genie would know that his distributions would be at least two poisson dists, one for TDs, and one for FGs (vice normals), and perhaps three lesser ones: conversions, safeties, and defensive TDs?

  6. #6
    RockyV
    RockyV's Avatar Become A Pro!
    Join Date: 09-11-10
    Posts: 26

    Hrm...someone else who is probably more of an expert should chime in...but googling around a bit, it appears that you should probably model scoring events with a Poisson process (http://en.wikipedia.org/wiki/Poisson_process). So then the total number of scores would be drawn from the Poisson distribution (http://en.wikipedia.org/wiki/Poisson_distribution). The Wikipedia article also discusses when you can approximate a Poisson distribution with Gaussian...basically, the rate parameter (scoring events per minute) needs to be high.

    Once you've modeled the number of scores with a Poisson process (if you care about times between scoring) or more simply a Poisson distribution (you only care about total number of scoring events, and thus don't care about the amount of time between scoring events), then my intuition is that you should then model each scoring event separately. For the sake of simplicity, assume scoring events behave roughly the same, then calculate what percentage of the time scoring events are touchdowns, FGs and safeties for each team.

    Anyway, these two modeling steps allow you to model scoring in sports as a Compound Poisson Process (http://en.wikipedia.org/wiki/Compound_Poisson_process).

    To be honest though...I don't know if any of these modeling assumptions hold true for something like sports betting. So I think you'd need to build a model yourself and test the validity of each assumption. You don't need them to hold exactly...approximately is probably going to be enough.
    Last edited by RockyV; 09-12-10 at 09:54 AM. Reason: .

  7. #7
    splash
    splash's Avatar Become A Pro!
    Join Date: 05-25-09
    Posts: 38

    Isn't what you're describing the frequentist pov? not bayesian? Frequentists look at distributions over hypothetical infinite sample sizes and try to draw conclusions about the population. Bayesians realize that infinite sample sizes do not exist so they take a prior hypothesis, gather data, and update that hypothesis based on the new information. Right?

  8. #8
    gman2114
    SBR Rocks
    gman2114's Avatar Become A Pro!
    Join Date: 10-20-09
    Posts: 418
    Betpoints: 1279

    Football is emotion which can't be factored in. Stick to horse racing for numbers.

  9. #9
    RockyV
    RockyV's Avatar Become A Pro!
    Join Date: 09-11-10
    Posts: 26

    Splash: Thanks for catching that.... There is nothing really Bayesian about the setup above, it is just pure parameter estimation (in this case, the parameter is probability that the final game outcome beats the spread.)

    Bayesian stuff would sort of come afterwards in the modeling process, incorporating prior knowledge about injuries, etc.

  10. #10
    jgilmartin
    jgilmartin's Avatar Become A Pro!
    Join Date: 03-31-09
    Posts: 1,119

    Quote Originally Posted by gman2114 View Post
    Football is emotion which can't be factored in. Stick to horse racing for numbers.
    So, mathematical analysis in football is useless and one should just bet on instinct?

  11. #11
    Sean81
    Sean81's Avatar Become A Pro!
    Join Date: 12-31-09
    Posts: 281
    Betpoints: 756

    Quote Originally Posted by Peregrine Stoop View Post
    yes, this makes sense.
    just a note: most things are not normally distributed.
    but the distribution of sample means are....something like that.

Top