bet sizing with an edge

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • nomadcrypto
    SBR Rookie
    • 12-02-18
    • 5

    #1
    bet sizing with an edge
    Hi, this is my first post here. First I'd like to say I wasn't a sports bettor until a few days ago when I started live testing a model I developed for NBA spreads. I'm primarily a software developer and have been mainly focused on machine learning and statistical analysis. I was browsing kaggle(a community of data scientists and machine learners) and found a rather messy NBA dataset which included moneylines and spreads for some 15k games. I cleaned the dataset up and scraped some more info including player/team advanced stats as well as totals lines. After working on it for a while I developed a binary classifier that works really well in back testing. The output from the classifier is in the form of a list of probabilities like this [0.3, 0.7] as an example. The prediction is the index of the largest number(1 in the example case). The output probabilities aren't very well calibrated. For example, over the validation data(last 2500 games) the brier score was 0.305 and pinnacle's brier score over that same sample(using the implied probabilities) was 0.251. The model's output probabilities aren't very well calibrated but still the higher the probability the higher the accuracy rate but not at a 1:1. The back test results were pretty promising given the difficulty of the task. The accuracy rate over 2500x sample games that I held out for validation data/back testing was ~64%. Here is the "classification report"(sklearn) for the validation data:


    Code:
    validation report:
                  precision    recall  f1-score   support   
    
    
               0       0.64      0.67      0.65      1281   
               1       0.63      0.61      0.62      1219   
    
    
       micro avg       0.64      0.64      0.64      2500   
       macro avg       0.64      0.64      0.64      2500   
    weighted avg       0.64      0.64      0.64      2500
    I "bin-ed" the output probabilities into 3 bins and then evaluated the accuracy rates of each bin on the validation data. I named them bronze, silver, gold:

    Code:
    
    total 2500 games in the back test/validation data
    bronze rating is 55.87% accurate over 1235.0 sample 
    silver rating is 66.20% accurate over 722.0 sample 
    gold rating is 78.82% accurate over 543.0 sample

    Now I'm not really sure what the optimal bet sizes to use for each class/bin would be. I've read up on the kelly criterion and found a wealth of academic papers on the subject but from what I've read that is most likely not the best route. For example a fully kelly would say to bet roughly 50% of the bankroll on a pick with a 78% accuracy given the standard 52.38% implied probability(-110) but intuitively that seems like a bad idea.


    I've thought about using a unit based system where I base a unit on some fixed percentage of the bankroll(typical sports bettors range between 1-5%) and then betting more/less based on the accuracy rate of the output class. For example, 1 unit on bronze, 2.5 on silver and 5 on gold but betting upwards of 25% of the bankroll on any one match seems excessive to me and 5% on an almost 79% chance to win seems overly cautious. Basically I'm trying to minimize risk/volatility while maximizing gain.


    Hopefully some pros can shed some light on how they would structure their bet sizes given the back testing results
    Last edited by nomadcrypto; 12-03-18, 08:43 AM. Reason: typo
  • Dan Kelly
    SBR MVP
    • 02-19-11
    • 1332

    #2
    2 Things =

    1. The linemakers are very sharp, their work is constantly evolving so your work may have had an edge yesterday and none today.

    2. Make sure you set up multiple accounts and shop for the best lines.



    As for bet size, the smaller the better to start with - My guess is that .5%, 1%, & 1.5% for your bins should be max.

    BEST OF LUCK
    Comment
    • HeeeHAWWWW
      SBR Hall of Famer
      • 06-13-08
      • 5487

      #3
      Specific answer: if your estimates are exaggerating your edge, Kelly will hurt you. Go beyond full-Kelly and you're already into counterproductive territory, go beyond twice and you'll actually lose money longterm regardless of your edge. One approach is to adjust your edge back to something realistic.

      General answer: as a new sports bettor, you probably shouldn't use Kelly. NBA is quite efficient, and if you're predicting 78% when the market thinks 50% ..... you've made an error somewhere :-) On that scale, it's highly likely to be leakage.


      I'd suggest working on the models for quite a while, then when you start, stake small - say 1% of your bankroll. Get yourself some practical experience for at least a year.
      Comment
      • nomadcrypto
        SBR Rookie
        • 12-02-18
        • 5

        #4
        I am pretty confident that the results are not overfitting. Here is the full report including validation data classification report, backtest results and classification reports for the test/train data. While I don't claim to have any sports betting experience I do, however, have a lot of experience in machine learning and with tackling hard problems with ML models. Out of the dataset I held back 2500x games(roughly the past 2 years) and the results were virtually identical to the cross validation results which is why I'm confident enough to live test it.

        Originally posted by Dan Kelly
        2 Things =

        1. The linemakers are very sharp, their work is constantly evolving so your work may have had an edge yesterday and none today.

        2. Make sure you set up multiple accounts and shop for the best lines.



        As for bet size, the smaller the better to start with - My guess is that .5%, 1%, & 1.5% for your bins should be max.

        BEST OF LUCK
        Those three pieces of advice make total sense. The main reason I think that I would continue to have an edge over a traditional book is that we have different motivations. My model is trying to predict if the home team will cover the spread and their models are trying to place odds in such a way that they get close to 50/50 bets on both sides(fairly big assumption on my part I admit). Overall I suspect you're absolutely correct about the markets evolving over time. Which is why I plan to retrain the model on a fairly regular basis.

        Originally posted by HeeeHAWWWW
        Specific answer: if your estimates are exaggerating your edge, Kelly will hurt you. Go beyond full-Kelly and you're already into counterproductive territory, go beyond twice and you'll actually lose money longterm regardless of your edge. One approach is to adjust your edge back to something realistic.

        General answer: as a new sports bettor, you probably shouldn't use Kelly. NBA is quite efficient, and if you're predicting 78% when the market thinks 50% ..... you've made an error somewhere :-) On that scale, it's highly likely to be leakage.


        I'd suggest working on the models for quite a while, then when you start, stake small - say 1% of your bankroll. Get yourself some practical experience for at least a year.
        This is also some great points/advice. When I show 78% I mean that in the 2500x games I held out for validation/back testing that the "gold" bin had a 78% accuracy rate. Since it is ~550x games, which is a relatively small sample size, variance could account for some of that accuracy for sure.

        When I initially trained the model and saw an overall 64% average accuracy rate I was suspicious and took a lot of steps to make sure I hadn't overfit the model or that I was introducing a look-ahead bias.
        Comment
        • Alfa1234
          SBR MVP
          • 12-19-15
          • 2722

          #5
          Have you applied the model to the games being played each night this week? That % of accuracy will establish very quickly if you are right or wrong.
          Comment
          • nomadcrypto
            SBR Rookie
            • 12-02-18
            • 5

            #6
            I've been live testing the model since the 29th of last month(24 picks in total) and the overall accuracy has been ~50% so far. I posted all predictions at pastebin.com under my account and will continue to do so until I have 300-500x picks. So far 2 days have been profitable and 2 days have been loses. I need to have a few hundred live testing results before I could be confident either way.
            Comment
            • Alfa1234
              SBR MVP
              • 12-19-15
              • 2722

              #7
              As you already know yourself, your accuracy of 50% does not bode well for the future, even though the sample size is still tiny. After 100 picks (a few more days) you should start seeing some kind of a trend unless you've been extremely unlucky (considering the fact that due to variance, 90% of all samples that size should hit anywhere between 51 and 77% if you really had a 64% win rate).

              Not do put you down, but I'd wait a while before worrying about bet size...
              Comment
              • nomadcrypto
                SBR Rookie
                • 12-02-18
                • 5

                #8
                Originally posted by Alfa1234
                As you already know yourself, your accuracy of 50% does not bode well for the future, even though the sample size is still tiny. After 100 picks (a few more days) you should start seeing some kind of a trend unless you've been extremely unlucky (considering the fact that due to variance, 90% of all samples that size should hit anywhere between 51 and 77% if you really had a 64% win rate).

                Not do put you down, but I'd wait a while before worrying about bet size...
                how did you come up with those numbers? afaik the confidence interval formula is const * sqrt( (error * (1 - error)) / n) where error is 1-accuracy. It is pretty well established that you need a minimum sample of 30 to estimate the confidence interval. Assuming I'm correct on the formula I could expect a confidence interval of

                1.96 * sqrt((0.36*0.64)/30)=0.1717
                upper accuracy = 1-(0.36-0.1717)=81.17%
                lower accuracy = 1-(0.36+0.1717)=46.82%

                50% over the 24x predictions is well within the 95% confidence interval and over a 100x sample I could expect a lower bound of 54.59% and an upper bound of 73.4%. This is also not taking into consideration that in back test the accuracy rate of the bronze bin was 55.87 and accounted for 49.4% of the games, silver was 66% accurate and accounted for 28.8% of the games, and gold was 78.82% accurate and accounted for 21.72% of the games.
                Last edited by nomadcrypto; 12-03-18, 11:55 AM. Reason: typos
                Comment
                • Alfa1234
                  SBR MVP
                  • 12-19-15
                  • 2722

                  #9
                  Originally posted by nomadcrypto
                  how did you come up with those numbers? afaik the confidence interval formula is const * sqrt( (error * (1 - error)) / n) where error is 1-accuracy. It is pretty well established that you need a minimum sample of 30 to estimate the confidence interval. Assuming I'm correct on the formula I could expect a confidence interval of

                  1.96 * sqrt((0.36*0.64)/30)=0.1717
                  upper accuracy = 1-(0.36-0.1717)=81.17%
                  lower accuracy = 1-(0.36+0.1717)=46.82%

                  50% over the 24x predictions is well within the 95% confidence interval and over a 100x sample I could expect a lower bound of 54.59% and an upper bound of 73.4%. This is also not taking into consideration that in back test the accuracy rate of the bronze bin was 55.87 and accounted for 49.4% of the games, silver was 66% accurate and accounted for 28.8% of the games, and gold was 78.82% accurate and accounted for 21.72% of the games.
                  You are correct with those numbers.

                  I just want to say a 64% accuracy is unheard of and extremely unlikely over the long term. I have very serious doubts the market is that incorrect and inefficiënt for you to be able to find that big an edge in a large market like the NBA. Those first 24 games are indeed a tiny sample but it would have given me reason to doubt my assessement if you had indeed not been around 50% accurate in those games but closer to 64%...time will tell but I urge you not to get your hopes up yet.
                  Comment
                  • danshan11
                    SBR MVP
                    • 07-08-17
                    • 4101

                    #10
                    the NBA is super super close there is no 60% nothing in variance in anything, if you want to look at the "actual" range look at high or low lines to closed lines and this will give you the room in an NBA game, it is very small but not as small as the NFL or MLB
                    Comment
                    • nomadcrypto
                      SBR Rookie
                      • 12-02-18
                      • 5

                      #11
                      I just did the estimates and I need a sample of 200x games to get a real estimate of the model's true performance. Assuming the distribution of bronze, silver and gold bins are representative of the actual distribution anyway. The confidence intervals are as follows over a 200x game sample:
                      Code:
                      [FONT=monospace][COLOR=#000000]Upper/lower for bronze bin sample size 112.658295006 84.9417049938[/COLOR]
                      Upper/lower for silver bin sample size 70.1518375515 45.0481624485
                      Upper/lower for gold bin sample size 54.8694777926 32.0105222074
                      [/FONT][FONT=monospace][/FONT]


                      Thank you all for the words of caution as well as thank you for bringing up confidence intervals. Confidence intervals go a long way towards answering my original question regarding optimal bet sizing(optimal in terms of maximizing gain while minimizing risk).
                      While you guys can't trust that I took the necessary steps to avoid overfitting and a look-ahead bias I am extremely confident that I did take those steps. The consistency of the results over a 2500x validation sample as well as the consistency of the test/train results on the full data and the stratified k-fold cross validation tells me that this model is a sound model. I'm not here to argue about the validity of the model. I came here to ask what would be the optimal bet sizing given that edge. While the answer wasn't given directly I was certainly pointed in the right direction.
                      Comment
                      • danshan11
                        SBR MVP
                        • 07-08-17
                        • 4101

                        #12
                        fixed profit betting is best its a cousin of kelly
                        1 unit is 1% of your bankroll and every wager is to win 1 unit no matter what the odds are.

                        odds +100 bet 1 to win 1
                        odds -130 bet 1.3 to win 1
                        odds +200 bet .5 to win 1
                        the idea is to have more on favorites which are suppose to win more anyway. using a star or color system to determine the strength of a bet is flawed because we all know the actual probability of an event is the closed line, so we determine that to be the closest thing to the actual implied probability of winning
                        Comment
                        • vampire assassin
                          SBR Sharp
                          • 03-09-18
                          • 296

                          #13
                          Are these predictions on who will win the game straight up? Just a moneyline play?

                          If you pick big favorites, you can get a high accuracy, but that is irrelevant. Your must be more accurate than the market, at least in subsets.

                          Instead of looking at bins on absolute accuracy, compare your prediction versus the market prediction. You might make 3 bins based on the difference.

                          For example, if the difference were 4% or less, that might be marginal. 4-8% might be good, and your smallest subset of 8%+ would be small. When you test those bins, you might find that the first two lose money betting, and the last subset yields a small profit. Or who knows what you'll find.

                          You need a method to evaluate your edge on each individual play.
                          Comment
                          • TommieGunshot
                            SBR MVP
                            • 03-27-12
                            • 1601

                            #14
                            Originally posted by nomadcrypto
                            Now I'm not really sure what the optimal bet sizes to use for each class/bin would be. I've read up on the kelly criterion and found a wealth of academic papers on the subject but from what I've read that is most likely not the best route. For example a fully kelly would say to bet roughly 50% of the bankroll on a pick with a 78% accuracy given the standard 52.38% implied probability(-110) but intuitively that seems like a bad idea.
                            The best description I've heard of Kelly Criterion is "maximally aggressive." Anything more and risk of ruin is too high. Anything less and you are limiting profits. Intuition is meaningless. You can run some simulations to easily prove that it is best.

                            Hell, in the first paragraph in the wikipedia article, they even say it can be counterintuitive. But the same article goes on to provide rigorous mathematical proof. What more do you want?

                            A lot of people like "half-Kelly" as a way to ensure overestimating the edge never causes you to overbet.
                            Comment
                            • HeeeHAWWWW
                              SBR Hall of Famer
                              • 06-13-08
                              • 5487

                              #15
                              Originally posted by TommieGunshot
                              A lot of people like "half-Kelly" as a way to ensure overestimating the edge never causes you to overbet.
                              That's certainly true that most people use it that way .... however, imo it's a mis-use. If the input is incorrect, adjust the input.

                              A Kelly multiplier should be used to manage variance to suit the individual's risk preferences. Full Kelly may be optimal for growth, but the bottom few percentiles of outcomes can be extremely damaging - yes, eventually full Kelly will emerge superior, but that can take many years, during which time your edge may disappear.

                              Going to half Kelly only sacrifices 1/4 of expected growth, but cuts the chance of heavy downswings enormously - for example, at full Kelly the chance of a 75% downswing from starting bank is 25%, but at half-Kelly it's 1.6%, at 1/3rd Kelly 0.1%. Meanwhile, expected growth doesn't fall anywhere near as dramatically:


                              Kelly EG -50% roll -75% roll
                              1.000 1.0000 50.0000% 25.0000%
                              0.667 0.8889 25.0000% 6.2500%
                              0.500 0.7500 12.5000% 1.5625%
                              0.400 0.6400 6.2500% 0.3906%
                              0.333 0.5556 3.1250% 0.0977%
                              0.286 0.4898 1.5625% 0.0244%
                              0.250 0.4375 0.7813% 0.0061%
                              0.222 0.3951 0.3906% 0.0015%
                              0.200 0.3600 0.1953% 0.0004%
                              Comment
                              • oilcountry99
                                SBR Wise Guy
                                • 08-29-10
                                • 707

                                #16
                                Flat bet by taking your unit amount (1 in this example), and bet the amount that the risk + the win = double the unit size.

                                For example:

                                -120 fav: risk .91 to win 1.09
                                +150 dog: risk .80 to win 1.20
                                -200 fav: risk 1.33 to win .67
                                +180 dog: risk .72 to win 1.28

                                This way, every game has equal value and is true flat betting.

                                This is a strategy I came across by a poster from another site Vanzack
                                Comment
                                • tsty
                                  SBR Wise Guy
                                  • 04-27-16
                                  • 510

                                  #17
                                  When ror is zero regardless of the fraction then its stupid to go half

                                  Maximising money is the whole point
                                  Comment
                                  SBR Contests
                                  Collapse
                                  Top-Rated US Sportsbooks
                                  Collapse
                                  Working...