1. #1
    jeffjam_
    jeffjam_'s Avatar Become A Pro!
    Join Date: 11-02-14
    Posts: 107
    Betpoints: 4483

    Making a model - median vs. mean

    Recently, I've created a certain basketball model that uses team efficiencies and I need a little advice. When calculating running average of a team's past efficiencies, is it better to use median instead of mean average? My take is that almost all teams except for the best produce say 20-30% of out of range ATS and total results (meaning ATS result 10+ off the line and same for the total) because they are highly inconsistent in their efforts. So calculating medain instead of mean would seem to produce more compact picture. On the other hand using median for those few teams that are consistent might hurt the overal numbers.
    I guess it all depends on distribution of results but at first sight it should be right approach. What do you think?

  2. #2
    yak merchant
    yak merchant's Avatar Become A Pro!
    Join Date: 11-04-10
    Posts: 109
    Betpoints: 6170

    I have always had much better results with median. That being said I am trying to build a model sampling from the second and third quartiles but having issues. But just averaging after dropping the top and bottom X% of data points can have merit also if you have enough data points

  3. #3
    KVB
    It's not what they bring...
    KVB's Avatar SBR PRO
    Join Date: 05-29-14
    Posts: 74,849
    Betpoints: 7576

    Agreed, you will likely do better with medians than means.

    Whichever you use, make sure you use medians when comparing the teams to each other and the league. Using the mean will only work when one team is above average and one below...and often produces a similar result to using medians.

    But when comparing two teams that are above or two teams that are below average, using the mean will give a skewed result that may not present logically...use medians.


  4. #4
    akphidelt
    akphidelt's Avatar Become A Pro!
    Join Date: 07-24-11
    Posts: 1,228
    Betpoints: 640

    Throw a little standard deviation in there. That is pivotal for understanding variances. Whether you use the mean or median you are not going to be telling much of a story. If there is a higher deviation you can use some different model to try to predict the efficiency.

  5. #5
    jeffjam_
    jeffjam_'s Avatar Become A Pro!
    Join Date: 11-02-14
    Posts: 107
    Betpoints: 4483

    Well if results are very inconsitent thus standard devitation is high I think it would be very difficult near impossible to predict anything. Anyway when calculating league average I think it is easier to use mean average because median doesn't really change the results given the large dataset.
    I also plan to use regression to compare the results with the first method but it isn't that easy.

  6. #6
    akphidelt
    akphidelt's Avatar Become A Pro!
    Join Date: 07-24-11
    Posts: 1,228
    Betpoints: 640

    Quote Originally Posted by jeffjam_ View Post
    Well if results are very inconsitent thus standard devitation is high I think it would be very difficult near impossible to predict anything. Anyway when calculating league average I think it is easier to use mean average because median doesn't really change the results given the large dataset.
    I also plan to use regression to compare the results with the first method but it isn't that easy.
    Thats statistics. If there are large variances it's near impossible to predict anything regardless of what calculation you use. Using mean or median does not change the inconsistency of the data. That's why I was thinking to find the variance and if it is too high than come up with a different model for that team.

  7. #7
    antonyp22
    antonyp22's Avatar Become A Pro!
    Join Date: 01-12-14
    Posts: 78
    Betpoints: 2528

    Do some analysis on past data of team efficiency and try and find if there is any specific distribution that can be assigned to it. Use the median/mean as the median/mean which can be used as the basis for a simulation......something to get you thinking anyway

  8. #8
    ball stopper
    ball stopper's Avatar Become A Pro!
    Join Date: 08-26-15
    Posts: 1

    Quote Originally Posted by akphidelt View Post
    Thats statistics. If there are large variances it's near impossible to predict anything regardless of what calculation you use. Using mean or median does not change the inconsistency of the data. That's why I was thinking to find the variance and if it is too high than come up with a different model for that team.
    i imagine modelling with means and variance is superior to using just medians...and medians is superior to only means

    but how do you project variance? is past variance predictive of future variance? my guess for basketball is pace, 3 point attempts, and TOs are three 3 variables that determine variance. any thoughts on this?

  9. #9
    akphidelt
    akphidelt's Avatar Become A Pro!
    Join Date: 07-24-11
    Posts: 1,228
    Betpoints: 640

    Quote Originally Posted by ball stopper View Post
    i imagine modelling with means and variance is superior to using just medians...and medians is superior to only means

    but how do you project variance? is past variance predictive of future variance? my guess for basketball is pace, 3 point attempts, and TOs are three 3 variables that determine variance. any thoughts on this?
    You don't project future variance. If you could, you wouldn't need statistics. You calculate the variance within your dataset to project the probability of future data points. For example, a team that scores 80, 100, and 120 points has the same mean and median as a team that scores 98, 100, 102 points, yet the variance isn't even close. So you have to figure out a model to change the 80, 100, 120 points in to something more correlated. Whether that is weighting for defenses, weighting for starting lineups, etc.... there is just nothing you can do mathematically with that data in its current state to give you any possible way of predicting future outcomes with any significant probability.

    To have a serious model you have to have a calculation that decreases the variance to a point where the probability is significant enough to have confidence in the result. Using the mean or median has no effect on the variance and does not change the probability of your predicted result. That's why you need statistics to have any serious model.

  10. #10
    peacebyinches
    pull the trigger
    peacebyinches's Avatar SBR PRO
    Join Date: 02-13-10
    Posts: 1,108
    Betpoints: 7790

    For our purposes, I agree with the majority here and think using the median (for most measures, less just use total points scored for example) is best.

    On a side note, when it comes to variance... and this is something I've contemplated about for a while... I feel like this is an underrated measure that could really give bettors an edge, especially when in comes to moneyline bets.
    For example, you figure out a certain team is essentially schizophrenic in their performance, they lose to craptacular teams again and again, but win once in a while (more than they should, keep that in mind) when they are large underdogs. Now isn't this a team, upon determining this large variance (relative to comparable teams or even the rest of the league) going to be a +EV wager whenever they are not favored to win?

  11. #11
    magyarsvensk
    magyarsvensk's Avatar Become A Pro!
    Join Date: 07-25-14
    Posts: 193
    Betpoints: 378

    When I used to try to scale things, I would use median-based methods rather than average based methods. You should be able to just collect all the data, look at a distribution graph and see how it looks to find out if median will work better.

    I think the most obvious example in baseball is when a game gets out of control in the first few innings and then you see a parade of loser relievers come in and put the winning team on easy street. In college football, I think it's a given that when the first quarter score is high, the refs tend to call more PIs.

    If you are talking basketball though, I think there is a lot of consistency there in total scores. The biggest effect on efficiency is going to come from free throws, and yes, those tend to come in clumps. If a team gets into the penalty in the first few minutes of play, you are guaranteed a very efficient output from then on. But then the game also slows down quite a bit as teams try to draw fouls instead of make baskets, so I'm not sure that efficiency is the way to go.

  12. #12
    magyarsvensk
    magyarsvensk's Avatar Become A Pro!
    Join Date: 07-25-14
    Posts: 193
    Betpoints: 378

    I should mention that I have since given up trying to model anything. I've switched to a straight up traits-based program.

  13. #13
    jeffjam_
    jeffjam_'s Avatar Become A Pro!
    Join Date: 11-02-14
    Posts: 107
    Betpoints: 4483

    So basically, your goal is to find teams that don't have high variance in whatever the data you use to create the system.
    I for example try to use efficiency as a basic predictive element so the question is what standard deviation is bad enough to say there is high variance.

  14. #14
    magyarsvensk
    magyarsvensk's Avatar Become A Pro!
    Join Date: 07-25-14
    Posts: 193
    Betpoints: 378

    My algorithm has two basic goals: minimize assumptions and maximize statistical significance. It's closer to data mining than modeling.

    You've got data, and you need to turn that data into predictions. In my trials and tribulations, I've come to the conclusion that the less stuff you to do that data to turn it into predictions, the better it's gonna work. For example, you are using efficiency as a predictive element, which might be a great way to do it, but what if it isn't? What if there is a better way right under your nose? You would never know, because you are locked into the efficiency thing. What are the chances that one statistic is going to be the secret to consistent positive returns?

    So it's a bit of a Catch-22 (or maybe more like Heinseberg uncertainty). The more stuff you try out, the harder it is to know whether you have found something significant. The less stuff you try, the easier it is to know whether you have found something significant, but the less likely it will be significant.

    Hope this helps. This was just my over-the-summer thinking on reworking my algorithm. Haven't had direct success with it yet, but I feel good about the test results.

  15. #15
    HeeeHAWWWW
    HeeeHAWWWW's Avatar Become A Pro!
    Join Date: 06-13-08
    Posts: 5,487
    Betpoints: 578

    Quote Originally Posted by magyarsvensk View Post
    You've got data, and you need to turn that data into predictions. In my trials and tribulations, I've come to the conclusion that the less stuff you to do that data to turn it into predictions, the better it's gonna work.
    That's a fairly standard modelling dilemma indeed. There are plenty of measures to penalise additional complexity you can use though, for example Akaike or Schwarz Criterions.

  16. #16
    magyarsvensk
    magyarsvensk's Avatar Become A Pro!
    Join Date: 07-25-14
    Posts: 193
    Betpoints: 378

    Quote Originally Posted by HeeeHAWWWW View Post
    That's a fairly standard modelling dilemma indeed. There are plenty of measures to penalise additional complexity you can use though, for example Akaike or Schwarz Criterions.
    This sounds interesting. I'm going to read up on it. Thanks.

  17. #17
    Jayvegas420
    Vegas Baby!
    Jayvegas420's Avatar SBR PRO
    Join Date: 03-09-11
    Posts: 28,142
    Betpoints: 14967

    . The more stuff you try out, the harder it is to know whether you have found something significant. The less stuff you try, the easier it is to know whether you have found something significant, but the less likely it will be significant.


    Thats really interesting.
    And really well said.

    Question.

    Is all this date supposed to be used to predict the results of the events or could it be better used to predict the openers, line Movements, public perception & beating the closing line?

  18. #18
    magyarsvensk
    magyarsvensk's Avatar Become A Pro!
    Join Date: 07-25-14
    Posts: 193
    Betpoints: 378

    Quote Originally Posted by Jayvegas420 View Post
    . The more stuff you try out, the harder it is to know whether you have found something significant. The less stuff you try, the easier it is to know whether you have found something significant, but the less likely it will be significant.


    Thats really interesting.
    And really well said.

    Question.

    Is all this date supposed to be used to predict the results of the events or could it be better used to predict the openers, line Movements, public perception & beating the closing line?
    Thanks.

    I use it to (try to) beat the closing lines, but I imagine it could be used to predict anything that is not a random event.

    On the question of beating the closing line versus predicting the result of the event, I think the lines themselves contain a great deal of information in them, and that it would be unwise to ignore that information. The book's job is to predict the outcome of an event. They put a lot of effort into doing that. When you use the line in your predictions, it's like saying, "okay, books, this line represents all of the research and computing power that you have put into this game, now let's see if I can find a way to find faults in these lines using additional data and techniques that you may not have considered." So once you consider the line as an essential tool in predicting the outcome of the game, you are essentially making a prediction to beat the line regardless of how you express it.

    The book has the advantage of setting and controlling the rules (the line, the vig, etc.) The capper's principal advantage is that the book must lay its cards on the table first. Not using the book's line to make your predictions would be like not using the blackjack dealer's hole card to make your decisions.

  19. #19
    Jayvegas420
    Vegas Baby!
    Jayvegas420's Avatar SBR PRO
    Join Date: 03-09-11
    Posts: 28,142
    Betpoints: 14967

    OK so another crazy random question.
    I got into a conversation in Vegas with a guy who told me that at a table where BJ pays 6-5, its silly to double down on BJ if allowed.
    We actually argued about it & I tried to use an analogy along the same lines you just spoke of, at the end of your post.
    I said to the guy" when you get your 1st card there's already a slight variance in the odds."
    Now the dealer reveals his hole card & there's is a huge variance. (Depending on what he pulls)
    Then you get your 2nd card & there's fixed odds on how this hand should result.
    If I thought my odds were 49%-51% before the hand started, when I was dealt that 1st ace I liked my odds better already. Then when the dealer pulls a 6 I put myself as a big favourite. Now there's, say a Jack.
    This is the same result as being dealt a 5 & 6, if I chose to use the ace as one value.
    So if doubling on 11 vs a 6 is always a +ev proposition, why not double?
    I tried to use the anology of the fixed line we get pre kick off.....and how our perception of +ev increases as 1/2 time approaches & we can bet again, (at the adjusted odds the book will set)
    In this case at the BJ table we can see it's half time & we are huge favourites & can press iyr bet it split it at twice the value at the same line as when the game started.
    And we're winning.
    With idds in our favour that we will actually win thus more often than not.

    I know that sounds crazy but the simple question is:
    If the house will allow it, should you double BJ at a 6-5 table?

  20. #20
    yak merchant
    yak merchant's Avatar Become A Pro!
    Join Date: 11-04-10
    Posts: 109
    Betpoints: 6170

    Quote Originally Posted by Jayvegas420 View Post
    OK so another crazy random question.
    I got into a conversation in Vegas with a guy who told me that at a table where BJ pays 6-5, its silly to double down on BJ if allowed.
    We actually argued about it & I tried to use an analogy along the same lines you just spoke of, at the end of your post.
    I said to the guy" when you get your 1st card there's already a slight variance in the odds."
    Now the dealer reveals his hole card & there's is a huge variance. (Depending on what he pulls)
    Then you get your 2nd card & there's fixed odds on how this hand should result.
    If I thought my odds were 49%-51% before the hand started, when I was dealt that 1st ace I liked my odds better already. Then when the dealer pulls a 6 I put myself as a big favourite. Now there's, say a Jack.
    This is the same result as being dealt a 5 & 6, if I chose to use the ace as one value.
    So if doubling on 11 vs a 6 is always a +ev proposition, why not double?
    I tried to use the anology of the fixed line we get pre kick off.....and how our perception of +ev increases as 1/2 time approaches & we can bet again, (at the adjusted odds the book will set)
    In this case at the BJ table we can see it's half time & we are huge favourites & can press iyr bet it split it at twice the value at the same line as when the game started.
    And we're winning.
    With idds in our favour that we will actually win thus more often than not.

    I know that sounds crazy but the simple question is:
    If the house will allow it, should you double BJ at a 6-5 table?
    Huh? You want to double down on black jack? Like A-10? First off if you were my friend I would punch in the face for being too lazy to find a 3-2 game, because playing 6-5 pretty much reduces your chancing of winning to zero if you plan on playing more than a few minutes. But that being said you absolutely never double soft 21 even against a 6 (only caveat would be if you are a serious counter and the count is astronomically high).

    Expected value of doubling is 0.667380, expected value of standing is 0.902837. Which means it is not even close. You should however take even money against a dealer ace in a 6-5 game if they will let you (which they probably won't).

  21. #21
    magyarsvensk
    magyarsvensk's Avatar Become A Pro!
    Join Date: 07-25-14
    Posts: 193
    Betpoints: 378

    I actually had to look up the idea of doubling down with a Blackjack hand because I have never heard of that before. It's an interesting problem. It's been a while since I played a 6:5 Blackjack table. The reduced payout brings down the odds quite a bit, I understand.

    Since I am a coder, my solution is always just to build a simulation bot to figure out which move will generate the best return. I did find a website with the probabilities though.

    We're assuming that the dealer doesn't have blackjack. Otherwise, you wouldn't have the option to hit in the first place. It would be a push, so the probability of that happening doesn't factor into the relative expected value of the other two options (we only care about which of those two is better).

    So option 1 is to take the blackjack. No non-blackjack hand can beat a blackjack, so 100% of the time the expected value is 6/5=+1.2

    Option 2 is to double down. The possible returns are -2.0 (lose), 0.0 (push), and 2.0 (win).

    Using the example of dealer has a 6 showing, here are the probabilities according to a website I found on google:

    Win: 63.4%
    Push: 6.7%
    Lose: 29.9%

    So to get the expected value of doubling down, we would calculate .634*2+.067*0+.299*-2=+0.67

    Since the expected value of keeping the blackjack of +1.2 is quite a bit greater than the expected value of doubling down which is +0.67.

    Here is the website that I am riding on the coattails of: https://www.blackjackinfo.com/double...probabilities/

  22. #22
    yak merchant
    yak merchant's Avatar Become A Pro!
    Join Date: 11-04-10
    Posts: 109
    Betpoints: 6170

    Yes I screwed up. EV on 6-5 blackjack is 1.2 not .902 (which is the EV if you have 21 (3 cards) and the dealer has a 6 and a chance to draw to 21 to tie you).

  23. #23
    Jayvegas420
    Vegas Baby!
    Jayvegas420's Avatar SBR PRO
    Join Date: 03-09-11
    Posts: 28,142
    Betpoints: 14967

    Makes complete sense. I also agree about taking insurance on BJ in a 6-5 game.
    They're all over the strip now. Higher limit games offer 3-2.

  24. #24
    yak merchant
    yak merchant's Avatar Become A Pro!
    Join Date: 11-04-10
    Posts: 109
    Betpoints: 6170

    Actually insurance is still bad (-EV) as you have to put up more money. Even money is much different that insurance, and on 6:5 is EV but not very many places offer it on 6:5 from what I understand (I don't play 6:5).

Top