Making a model - median vs. mean

**yak merchant** · 08-22-15, 02:29 PM

I have always had much better results with median. That being said I am trying to build a model sampling from the second and third quartiles but having issues. But just averaging after dropping the top and bottom X% of data points can have merit also if you have enough data points

**KVB** · 08-22-15, 02:43 PM

Agreed, you will likely do better with medians than means.

Whichever you use, make sure you use medians when comparing the teams to each other and the league. Using the mean will only work when one team is above average and one below...and often produces a similar result to using medians.

But when comparing two teams that are above or two teams that are below average, using the mean will give a skewed result that may not present logically...use medians.

**akphidelt** · 08-23-15, 12:10 AM

Throw a little standard deviation in there. That is pivotal for understanding variances. Whether you use the mean or median you are not going to be telling much of a story. If there is a higher deviation you can use some different model to try to predict the efficiency.

**jeffjam_** · 08-23-15, 04:04 PM

Well if results are very inconsitent thus standard devitation is high I think it would be very difficult near impossible to predict anything. Anyway when calculating league average I think it is easier to use mean average because median doesn't really change the results given the large dataset.
I also plan to use regression to compare the results with the first method but it isn't that easy.

**akphidelt** · 08-24-15, 01:25 AM

Originally posted by jeffjam_

Well if results are very inconsitent thus standard devitation is high I think it would be very difficult near impossible to predict anything. Anyway when calculating league average I think it is easier to use mean average because median doesn't really change the results given the large dataset.
I also plan to use regression to compare the results with the first method but it isn't that easy.

Thats statistics. If there are large variances it's near impossible to predict anything regardless of what calculation you use. Using mean or median does not change the inconsistency of the data. That's why I was thinking to find the variance and if it is too high than come up with a different model for that team.

**antonyp22** · 08-25-15, 07:28 PM

Do some analysis on past data of team efficiency and try and find if there is any specific distribution that can be assigned to it. Use the median/mean as the median/mean which can be used as the basis for a simulation......something to get you thinking anyway

**ball stopper** · 08-26-15, 10:24 PM

Originally posted by akphidelt

Thats statistics. If there are large variances it's near impossible to predict anything regardless of what calculation you use. Using mean or median does not change the inconsistency of the data. That's why I was thinking to find the variance and if it is too high than come up with a different model for that team.

i imagine modelling with means and variance is superior to using just medians...and medians is superior to only means

but how do you project variance? is past variance predictive of future variance? my guess for basketball is pace, 3 point attempts, and TOs are three 3 variables that determine variance. any thoughts on this?

**akphidelt** · 08-28-15, 04:59 AM

Originally posted by ball stopper

i imagine modelling with means and variance is superior to using just medians...and medians is superior to only means

but how do you project variance? is past variance predictive of future variance? my guess for basketball is pace, 3 point attempts, and TOs are three 3 variables that determine variance. any thoughts on this?

You don't project future variance. If you could, you wouldn't need statistics. You calculate the variance within your dataset to project the probability of future data points. For example, a team that scores 80, 100, and 120 points has the same mean and median as a team that scores 98, 100, 102 points, yet the variance isn't even close. So you have to figure out a model to change the 80, 100, 120 points in to something more correlated. Whether that is weighting for defenses, weighting for starting lineups, etc.... there is just nothing you can do mathematically with that data in its current state to give you any possible way of predicting future outcomes with any significant probability.

To have a serious model you have to have a calculation that decreases the variance to a point where the probability is significant enough to have confidence in the result. Using the mean or median has no effect on the variance and does not change the probability of your predicted result. That's why you need statistics to have any serious model.

**peacebyinches** · 08-28-15, 02:01 PM

For our purposes, I agree with the majority here and think using the median (for most measures, less just use total points scored for example) is best.

On a side note, when it comes to variance... and this is something I've contemplated about for a while... I feel like this is an underrated measure that could really give bettors an edge, especially when in comes to moneyline bets.
For example, you figure out a certain team is essentially schizophrenic in their performance, they lose to craptacular teams again and again, but win once in a while (more than they should, keep that in mind) when they are large underdogs. Now isn't this a team, upon determining this large variance (relative to comparable teams or even the rest of the league) going to be a +EV wager whenever they are not favored to win?

**magyarsvensk** · 08-28-15, 05:07 PM

When I used to try to scale things, I would use median-based methods rather than average based methods. You should be able to just collect all the data, look at a distribution graph and see how it looks to find out if median will work better.

I think the most obvious example in baseball is when a game gets out of control in the first few innings and then you see a parade of loser relievers come in and put the winning team on easy street. In college football, I think it's a given that when the first quarter score is high, the refs tend to call more PIs.

If you are talking basketball though, I think there is a lot of consistency there in total scores. The biggest effect on efficiency is going to come from free throws, and yes, those tend to come in clumps. If a team gets into the penalty in the first few minutes of play, you are guaranteed a very efficient output from then on. But then the game also slows down quite a bit as teams try to draw fouls instead of make baskets, so I'm not sure that efficiency is the way to go.

**magyarsvensk** · 08-28-15, 05:08 PM

I should mention that I have since given up trying to model anything. I've switched to a straight up traits-based program.

**jeffjam_** · 08-31-15, 05:34 PM

So basically, your goal is to find teams that don't have high variance in whatever the data you use to create the system.
I for example try to use efficiency as a basic predictive element so the question is what standard deviation is bad enough to say there is high variance.

**magyarsvensk** · 09-01-15, 02:26 PM

My algorithm has two basic goals: minimize assumptions and maximize statistical significance. It's closer to data mining than modeling.

You've got data, and you need to turn that data into predictions. In my trials and tribulations, I've come to the conclusion that the less stuff you to do that data to turn it into predictions, the better it's gonna work. For example, you are using efficiency as a predictive element, which might be a great way to do it, but what if it isn't? What if there is a better way right under your nose? You would never know, because you are locked into the efficiency thing. What are the chances that one statistic is going to be the secret to consistent positive returns?

So it's a bit of a Catch-22 (or maybe more like Heinseberg uncertainty). The more stuff you try out, the harder it is to know whether you have found something significant. The less stuff you try, the easier it is to know whether you have found something significant, but the less likely it will be significant.

Hope this helps. This was just my over-the-summer thinking on reworking my algorithm. Haven't had direct success with it yet, but I feel good about the test results.

**HeeeHAWWWW** · 09-02-15, 11:34 AM

Originally posted by magyarsvensk

You've got data, and you need to turn that data into predictions. In my trials and tribulations, I've come to the conclusion that the less stuff you to do that data to turn it into predictions, the better it's gonna work.

That's a fairly standard modelling dilemma indeed. There are plenty of measures to penalise additional complexity you can use though, for example Akaike or Schwarz Criterions.

**magyarsvensk** · 09-04-15, 02:28 AM

Originally posted by HeeeHAWWWW

That's a fairly standard modelling dilemma indeed. There are plenty of measures to penalise additional complexity you can use though, for example Akaike or Schwarz Criterions.

This sounds interesting. I'm going to read up on it. Thanks.

**Jayvegas420** · 09-04-15, 03:00 PM

. The more stuff you try out, the harder it is to know whether you have found something significant. The less stuff you try, the easier it is to know whether you have found something significant, but the less likely it will be significant.

Thats really interesting.
And really well said.

Question.

Is all this date supposed to be used to predict the results of the events or could it be better used to predict the openers, line Movements, public perception & beating the closing line?

**magyarsvensk** · 09-08-15, 12:59 PM

Originally posted by Jayvegas420

. The more stuff you try out, the harder it is to know whether you have found something significant. The less stuff you try, the easier it is to know whether you have found something significant, but the less likely it will be significant.

Thats really interesting.
And really well said.

Question.

Is all this date supposed to be used to predict the results of the events or could it be better used to predict the openers, line Movements, public perception & beating the closing line?

Thanks.

I use it to (try to) beat the closing lines, but I imagine it could be used to predict anything that is not a random event.

On the question of beating the closing line versus predicting the result of the event, I think the lines themselves contain a great deal of information in them, and that it would be unwise to ignore that information. The book's job is to predict the outcome of an event. They put a lot of effort into doing that. When you use the line in your predictions, it's like saying, "okay, books, this line represents all of the research and computing power that you have put into this game, now let's see if I can find a way to find faults in these lines using additional data and techniques that you may not have considered." So once you consider the line as an essential tool in predicting the outcome of the game, you are essentially making a prediction to beat the line regardless of how you express it.

The book has the advantage of setting and controlling the rules (the line, the vig, etc.) The capper's principal advantage is that the book must lay its cards on the table first. Not using the book's line to make your predictions would be like not using the blackjack dealer's hole card to make your decisions.

**Jayvegas420** · 09-08-15, 10:53 PM

OK so another crazy random question.
I got into a conversation in Vegas with a guy who told me that at a table where BJ pays 6-5, its silly to double down on BJ if allowed.
We actually argued about it & I tried to use an analogy along the same lines you just spoke of, at the end of your post.
I said to the guy" when you get your 1st card there's already a slight variance in the odds."
Now the dealer reveals his hole card & there's is a huge variance. (Depending on what he pulls)
Then you get your 2nd card & there's fixed odds on how this hand should result.
If I thought my odds were 49%-51% before the hand started, when I was dealt that 1st ace I liked my odds better already. Then when the dealer pulls a 6 I put myself as a big favourite. Now there's, say a Jack.
This is the same result as being dealt a 5 & 6, if I chose to use the ace as one value.
So if doubling on 11 vs a 6 is always a +ev proposition, why not double?
I tried to use the anology of the fixed line we get pre kick off.....and how our perception of +ev increases as 1/2 time approaches & we can bet again, (at the adjusted odds the book will set)
In this case at the BJ table we can see it's half time & we are huge favourites & can press iyr bet it split it at twice the value at the same line as when the game started.
And we're winning.
With idds in our favour that we will actually win thus more often than not.

I know that sounds crazy but the simple question is:
If the house will allow it, should you double BJ at a 6-5 table?

**yak merchant** · 09-08-15, 11:50 PM

Originally posted by Jayvegas420

OK so another crazy random question.
I got into a conversation in Vegas with a guy who told me that at a table where BJ pays 6-5, its silly to double down on BJ if allowed.
We actually argued about it & I tried to use an analogy along the same lines you just spoke of, at the end of your post.
I said to the guy" when you get your 1st card there's already a slight variance in the odds."
Now the dealer reveals his hole card & there's is a huge variance. (Depending on what he pulls)
Then you get your 2nd card & there's fixed odds on how this hand should result.
If I thought my odds were 49%-51% before the hand started, when I was dealt that 1st ace I liked my odds better already. Then when the dealer pulls a 6 I put myself as a big favourite. Now there's, say a Jack.
This is the same result as being dealt a 5 & 6, if I chose to use the ace as one value.
So if doubling on 11 vs a 6 is always a +ev proposition, why not double?
I tried to use the anology of the fixed line we get pre kick off.....and how our perception of +ev increases as 1/2 time approaches & we can bet again, (at the adjusted odds the book will set)
In this case at the BJ table we can see it's half time & we are huge favourites & can press iyr bet it split it at twice the value at the same line as when the game started.
And we're winning.
With idds in our favour that we will actually win thus more often than not.

I know that sounds crazy but the simple question is:
If the house will allow it, should you double BJ at a 6-5 table?

Huh? You want to double down on black jack? Like A-10? First off if you were my friend I would punch in the face for being too lazy to find a 3-2 game, because playing 6-5 pretty much reduces your chancing of winning to zero if you plan on playing more than a few minutes. But that being said you absolutely never double soft 21 even against a 6 (only caveat would be if you are a serious counter and the count is astronomically high).

Expected value of doubling is 0.667380, expected value of standing is 0.902837. Which means it is not even close. You should however take even money against a dealer ace in a 6-5 game if they will let you (which they probably won't).

**magyarsvensk** · 09-09-15, 12:02 AM

I actually had to look up the idea of doubling down with a Blackjack hand because I have never heard of that before. It's an interesting problem. It's been a while since I played a 6:5 Blackjack table. The reduced payout brings down the odds quite a bit, I understand.

Since I am a coder, my solution is always just to build a simulation bot to figure out which move will generate the best return. I did find a website with the probabilities though.

We're assuming that the dealer doesn't have blackjack. Otherwise, you wouldn't have the option to hit in the first place. It would be a push, so the probability of that happening doesn't factor into the relative expected value of the other two options (we only care about which of those two is better).

So option 1 is to take the blackjack. No non-blackjack hand can beat a blackjack, so 100% of the time the expected value is 6/5=+1.2

Option 2 is to double down. The possible returns are -2.0 (lose), 0.0 (push), and 2.0 (win).

Using the example of dealer has a 6 showing, here are the probabilities according to a website I found on google:

Win: 63.4%
Push: 6.7%
Lose: 29.9%

So to get the expected value of doubling down, we would calculate .634*2+.067*0+.299*-2=+0.67

Since the expected value of keeping the blackjack of +1.2 is quite a bit greater than the expected value of doubling down which is +0.67.

Here is the website that I am riding on the coattails of: https://www.blackjackinfo.com/double...probabilities/

**yak merchant** · 09-09-15, 12:16 AM

Yes I screwed up. EV on 6-5 blackjack is 1.2 not .902 (which is the EV if you have 21 (3 cards) and the dealer has a 6 and a chance to draw to 21 to tie you).

**Jayvegas420** · 09-09-15, 12:45 AM

Makes complete sense. I also agree about taking insurance on BJ in a 6-5 game.
They're all over the strip now. Higher limit games offer 3-2.

**yak merchant** · 09-09-15, 01:03 AM

Actually insurance is still bad (-EV) as you have to put up more money. Even money is much different that insurance, and on 6:5 is EV but not very many places offer it on 6:5 from what I understand (I don't play 6:5).