Modelling standard deviation

**yak merchant** · 05-12-14, 10:09 AM

Originally posted by brettd

I like to think of myself as an experienced sports modeler that is well versed in all manner of regression and classification techniques, however I've drawn somewhat limited inspiration as to how to model standard deviation.

The 'true' standard deviation of underlying game is not even known at the end of the game, as compared to usual regression and/or classification targets such as the margin of victory or who covered ATS.

The only way I can think of to model standard deviation is through Monte Carlo simulation. Bayesian techniques may also work, but there still has to be some manner of characterizing the distribution of a mean for the likelihood function.

Has anyone thought about this themselves?

I've thought about it and I'm sure smart people are doing it with Markov Chains and other things. I talked to a Phd Bayesian guy once and he spoke Klingon about crazy things like Inverse Gamma and Half Cauchy priors. Maybe some day I can get one of those guys to help me. Basically I decided I wasn't smart enough and do what I always do and go back to full brute force Monte Carlo everything.

**brettd** · 05-12-14, 11:02 AM

Yeah I'm hoping there's some other way than Monte Carlo simulation.

Simulation is manageable when modelling discrete type sporting events like baseball or American football, you just tackle building one distribution at a time, form your event chain, and iterate. But it's much more difficult with continuous action sports like Ice Hockey, soccer, etc.

**flsaders85** · 05-12-14, 11:23 AM

Do you use Poisson for hockey/soccer?

**brettd** · 05-12-14, 10:50 PM

I don't model hockey or soccer, but Poisson would be the distribution that I'd initially work with, if my chosen modelling objective was a continuous variable.

**antonyp22** · 05-12-14, 11:46 PM

By standard deviation do you mean the SD of a team's output/performance? Can you give an example in an Aussie Rules context?

**brettd** · 05-13-14, 01:08 AM

If you are predicting the margin of victory of any game outcome, that really then, should be the mean of a distribution.

My point is how does one model the width (standard deviation) of this distribution? How can one be more confident than other games of the mean estimate being achieved?

**ceder** · 05-13-14, 01:35 AM

I use monte carlo simulation for modelling standrad deviation.
There some addins to excel that are very good for that.

**brettd** · 05-13-14, 01:59 AM

Yeah I could do that, but just wondering if there is an alternative way.

**antonyp22** · 05-13-14, 02:03 AM

This sounds tough. I'm not sure if assigning a percentage probability to a specific MOV will help or perhaps to split the MOV into two factors: one which is pre-determined by recent form, venue e.t.c. and the other which is random and generally unexplainable such as performance on any given day?

**bihon** · 05-13-14, 08:05 AM

Some time ago have picked a math way of doing this, which is fairly simple.
The site doesn't resolve anymore, however here is a snapshot:

How to Calculate Standard Deviation from Probability & Samples | ncalculators.com

https://web.archive.org/web/20110714025843/http://ncalculators.com/math-worksheets/calculate-standard-deviation-from-probability-samples.htm

Edit: After rereading the thread, it's probably not what you're looking for.
Anyway...

**lamichaeljames** · 06-02-14, 02:27 PM

Originally posted by brettd

I like to think of myself as an experienced sports modeler that is well versed in all manner of regression and classification techniques, however I've drawn somewhat limited inspiration as to how to model standard deviation.

The 'true' standard deviation of underlying game is not even known at the end of the game, as compared to usual regression and/or classification targets such as the margin of victory or who covered ATS.

The only way I can think of to model standard deviation is through Monte Carlo simulation. Bayesian techniques may also work, but there still has to be some manner of characterizing the distribution of a mean for the likelihood function.

Has anyone thought about this themselves?

Have you found a way other than the Monte Carlo simulation?

**brettd** · 06-02-14, 09:41 PM

Originally posted by lamichaeljames

Have you found a way other than the Monte Carlo simulation?

No I haven't.

**antonyp22** · 06-08-14, 09:25 PM

brettd this has nothing to do with your original question, but what figures in terms of beating the closing line by x average amount or x% of the time would you say are an indicator of long term success in Aussie Rules football betting?

**brettd** · 06-16-14, 10:29 AM

You beat the close in AFL on average by 1.5 points, you got a 55% model, 1.75 points = 57% model, 2 points = 60% model. I model the AFL so I know about this in detail.

**magyarsvensk** · 07-25-14, 03:52 PM

What does the standard deviation have to do with whether the game is continuous or discrete? Also, what would you mean by 'true standard deviation'? Population rather than sample?

In my opinion, statistics will give you a very rough approximation of sporting event outcomes because statistics assumes uniform distribution of results. Every sports team is made up of humans who have emotional swings, and all bettors are humans with emotional swings as well, so neither the data nor the odds are going to be uniformly distributed.

Also, after a few years of modeling sports -- especially sports with irregular scoring dynamics like baseball -- I have found that standard deviation is more or less useless for scaling scores. Any type of distribution that uses the median would work better.

**smoke a bowl** · 07-25-14, 05:56 PM

Originally posted by brettd

You beat the close in AFL on average by 1.5 points, you got a 55% model, 1.75 points = 57% model, 2 points = 60% model. I model the AFL so I know about this in detail.

This cannot be correct. An AFL game lined -2 has a ml median general around -110 or (52.38%) therefore a 2 point difference on a model assuming the model was absolutely perfect would only hit roughly 52/53% of the time.

**yak merchant** · 07-26-14, 01:55 PM

Originally posted by magyarsvensk

What does the standard deviation have to do with whether the game is continuous or discrete? Also, what would you mean by 'true standard deviation'? Population rather than sample?

In my opinion, statistics will give you a very rough approximation of sporting event outcomes because statistics assumes uniform distribution of results. Every sports team is made up of humans who have emotional swings, and all bettors are humans with emotional swings as well, so neither the data nor the odds are going to be uniformly distributed.

Also, after a few years of modeling sports -- especially sports with irregular scoring dynamics like baseball -- I have found that standard deviation is more or less useless for scaling scores. Any type of distribution that uses the median would work better.

I may be wrong but I think his whole point is using component stats and modeling in game situations and not just "scaling scores". Also, statistical dispersion is just as relevant when using medians. Median absolute deviation can provide a great deal of insight.

**magyarsvensk** · 07-28-14, 09:26 AM

What are mean and standard deviation if not scaling tools? The results of baseball games are not uniformly distributed.

Median absolute deviation is still symmetrical. The distribution of baseball scores is very asymmetrical. When a team is getting blown out, they put benchwarmers on the mound and give up more runs. When the score is close or tied, they put their bullpen aces on the mound and keep the score low.

Runs Per Game

http://www.hardballtimes.com/runs-per-game/

Dave returns to a subject from his last “Ten Things” column and hopefully does a better job of explaining himself this time.

**yak merchant** · 07-29-14, 10:07 PM

Originally posted by magyarsvensk

What are mean and standard deviation if not scaling tools? The results of baseball games are not uniformly distributed.

Median absolute deviation is still symmetrical. The distribution of baseball scores is very asymmetrical. When a team is getting blown out, they put benchwarmers on the mound and give up more runs. When the score is close or tied, they put their bullpen aces on the mound and keep the score low.

http://www.hardballtimes.com/runs-per-game/

I never said they weren't scaling tools. You keep bringing up the distribution of "results". If you are modeling component stats (i.e. Derek Jeter's contact percentage when Joe Nathan throws a 3-2, 4-seam fastball from the stretch, then the shape of the distribution might be one of your problems, but whether or not a scrub pitcher is on the mound is not one of the problems (well I'm sure with Nathan's performance this year the "Scrub" title might also be up for debate).

Yes MAD is still symmetrical, and i'm sure you could use interquartile range, or something else to understand the distribution more accurately, but you in your first post, were the one that recommended using Medians and one sentence earlier stated that statistical dispersion (standard deviation) is useless.

**magyarsvensk** · 07-30-14, 02:11 AM

It's wishful thinking that one could model such a "component stat", and even if you could, what would be the point? Winning in sports betting consists of picking one outcome over another and either getting paid based on the line and/or the spread or losing. That is the output. You can make the inputs whatever you want, but if you introduce intermediates like "Derek Jeter's contact percentage when Joe Nathan throws a 3-2, 4-seam fastball from the stretch", you are making an inference, which will increase the error in your calcs which will decrease your chances of winning.

The more steps it takes to get from point A to point B, the greater chance you have of tripping.

Anyway, my original questions were posed because it is not clear at all what stat OP is trying to model for what sample.... It didn't really need to get heated.

**brettd** · 07-31-14, 03:44 AM

Originally posted by smoke a bowl

This cannot be correct. An AFL game lined -2 has a ml median general around -110 or (52.38%) therefore a 2 point difference on a model assuming the model was absolutely perfect would only hit roughly 52/53% of the time.

I'm talking the Australian Football League.

Empirically, if I beat the closer by 1.5 or better on sides, i'm winning at 55% (over a large sample size).

**brettd** · 07-31-14, 03:46 AM

Originally posted by magyarsvensk

What does the standard deviation have to do with whether the game is continuous or discrete?

I didn't say anything about this. Frame this question with any distribution you want, the question about standard deviation is still the same.

**brettd** · 07-31-14, 03:52 AM

Originally posted by magyarsvensk

In my opinion, statistics will give you a very rough approximation of sporting event outcomes because statistics assumes uniform distribution of results.

Um.... statistics can deal with any type of result distribution. Name me a result distribution where it cannot.

Originally posted by magyarsvensk

I have found that standard deviation is more or less useless for scaling scores. Any type of distribution that uses the median would work better.

Z-scaling is one of the most useful things you can do to analyze a variable. Especially when examining interactions with other variables in higher dimensional space. I agree though, medians often work better in sports modelling. Adopted a median as the central tendency when generating a standard deviation is something I have not done though.

**brettd** · 07-31-14, 03:55 AM

Originally posted by yak merchant

I may be wrong but I think his whole point is using component stats and modeling in game situations and not just "scaling scores". Also, statistical dispersion is just as relevant when using medians. Median absolute deviation can provide a great deal of insight.

Yeah pretty much what I meant. I assume 'scaling scores' means some method of normalization, yeah I'm not interested in that.

If you're building a simulation, you need to craft chains of distributions. If you don't know what a standard deviation should be for a given distribution, how do you derive it?

**magyarsvensk** · 07-31-14, 09:44 AM

Originally posted by brettd

Um.... statistics can deal with any type of result distribution. Name me a result distribution where it cannot.

No, it cannot. Statistics deals with uniform distributions. Sports teams get hot and cold, trade players, injuries, weather, and probably a million other things. More importantly, for statistics to work as theorized, the sample needs to be part of the population. When you try to use statistics of the past to predict the future, the landscape changes dramatically.

Originally posted by brettd

Z-scaling is one of the most useful things you can do to analyze a variable. Especially when examining interactions with other variables in higher dimensional space. I agree though, medians often work better in sports modelling. Adopted a median as the central tendency when generating a standard deviation is something I have not done though.

Z-scaling is one of the most useful thing you can do for variables that you know obey a normal distribution.

If you are serious about making predictions in sports, it is best to limit the assumptions and inferences to zero -- or as few as is feasible. The results are discrete, so you don't need to muck around with continuous distributions -- you can just throw the results of the past x years into buckets, and there is your distribution. Why make it more complicated than that? Finding out how to limit the error of that measurement is the biggest hurdle. In some sports, it may not even be possible that the error of your estimate is less than the vig.

**brettd** · 07-31-14, 10:17 AM

Originally posted by magyarsvensk

No, it cannot. Statistics deals with uniform distributions. Sports teams get hot and cold, trade players, injuries, weather, and probably a million other things. More importantly, for statistics to work as theorized, the sample needs to be part of the population. When you try to use statistics of the past to predict the future, the landscape changes dramatically.

Statistical modelling can effectively deal with all these things you mention. Modelling these effects is absolutely necessary in beating liquid markets, and I expend effort modelling these things full time.

I don't understand why you're talking about uniform distributions and the fact that this is all statistics is. This is the definition of a uniform distribution, whether discrete or continuous:

Discrete uniform distribution - Wikipedia

http://en.wikipedia.org/wiki/Uniform_distribution_(discrete)

Continuous uniform distribution - Wikipedia

http://en.wikipedia.org/wiki/Uniform_distribution_(continuous)

Originally posted by magyarsvensk

Z-scaling is one of the most useful thing you can do for variables that you know obey a normal distribution.

If you are serious about making predictions in sports, it is best to limit the assumptions and inferences to zero -- or as few as is feasible. The results are discrete, so you don't need to muck around with continuous distributions -- you can just throw the results of the past x years into buckets, and there is your distribution. Why make it more complicated than that? Finding out how to limit the error of that measurement is the biggest hurdle. In some sports, it may not even be possible that the error of your estimate is less than the vig.

What results are discrete? What sport? Or are you talking about binomial classification of win/loss?

**magyarsvensk** · 07-31-14, 11:35 AM

I should have specified discrete and finite. There are a limited number of possible outcomes for the scores, the odds, the spreads, etc. So if you want the distribution, just sum up all of the outcomes and there is your distribution. Why would you fit it to an approximate curve when you don't have to? That would just decrease the accuracy and inhibit your ability to beat the vig.

And yes, those definitions are what I mean. You seem to be stating hopeful assumptions as facts. Statistical theories that were developed under certain ideal assumptions cannot effectively handle non-uniform dynamic distributions. Sure, some people use them and may experience temporary limited success by sheer luck, but there are many more folks who think they are deriving mathematical truths when they are really just poking around in the dark.

To summarize: like other games such as bridge, chess, poker, etc. there is no axiomatic mathematical theory that you can correctly apply to sports betting because none of these games are axiomatic systems. They were not built from the bottom up, they were built from the top down.

**brettd** · 08-01-14, 12:47 AM

Originally posted by magyarsvensk

There are a limited number of possible outcomes for the scores, the odds, the spreads, etc. So if you want the distribution, just sum up all of the outcomes and there is your distribution.

That is not a distribution. Recording all possible permutations doesn't tell you about the frequency/probability density of the phenomenon you are investigating.

Originally posted by magyarsvensk

And yes, those definitions are what I mean. You seem to be stating hopeful assumptions as facts. Statistical theories that were developed under certain ideal assumptions cannot effectively handle non-uniform dynamic distributions.

Firstly, what definitions are you talking about? I'm not stating any 'hopeful assumptions' as facts.

What I have bolded is just plain incorrect.

There's a lot of statistical theory that doesn't rely on any assumptions (ML estimation comes to mind first). Heck, there's even a branch of statistics (Bayesian statistics) whereby you can begin to make assumptions about phenomena after a single observation combined with a prior distribution that is completely unknown.

Originally posted by magyarsvensk

To summarize: like other games such as bridge, chess, poker, etc. there is no axiomatic mathematical theory that you can correctly apply to sports betting because none of these games are axiomatic systems. They were not built from the bottom up, they were built from the top down.

I'm making a living just fine using 'axiomatic' mathematical theories in sports betting markets. I've also met others that have just done fine for decades and made a whole lot more money than me using 'axiomatic' mathematical theories in sports markets.

**magyarsvensk** · 08-01-14, 09:18 AM

brettd, it's clear from your first post that you are not "well versed" in mathematics. But now it's clear that you have no idea what you are talking about and you're also proud of it.

Good luck with that.