A way of evaluating predictive models reasonableness

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Bsims
    SBR Wise Guy
    • 02-03-09
    • 827

    #1
    A way of evaluating predictive models reasonableness
    Many of us use programs, spreadsheets, data, etc. to build models predicting scores for contests we are interested in wagering on. We are faced with the question of how good are the predictions? The obvious answer is, do you make money with the model? But that can take a lot of bets to get comfortable with the results. It can also be expensive if the model isn’t very accurate.

    The next obvious approach is to back test with previous data. This is free, but a lot of work. You must get the previous data in a usable form, then write your model to iterate through the previous games generating the bets the model likes. This is the approach that I mainly rely on. One problem with this is that I only keep profitable bets. That doesn’t tell me anything about the other score predictions.

    Recently I’ve started using correlation analysis. I generate two correlations; one for each team’s predicted score and actual score. This is a particularly useful tool for comparing different predictive models. I’m currently doing CBB analysis and looking at 4 predictive models; LV (implied scores from several books using the average spread and totals lines), Like games (my like game system described on my blog and in another topic here), KenPom predictions, and predictions from a power rating system I developed years ago.

    The CBB system I’m developing will rely on 3 of the 4 models (I won’t be using the LV implied scores since the lines are generated from these). The problem is how to weight the other three. Hopefully the correlation study will provide some guidance. I’ll have the correlation results shortly.
  • Waterstpub87
    SBR MVP
    • 09-09-09
    • 4102

    #2
    I'd watch about the Kenpom predictions, especially when it comes to totals. Last year I got absolutely creamed at the end of the season and the tournment using a kenpom based system. Not that the same thing would happen to you, but it is something to keep in mind.

    I don't really bother to back test much anymore. After some many times where stuff worked in the backtest and failed in the live, I don't really waste alot of time .To be honest, my models are all several years in, so much of my recent work has been in better operational stuff. Usually, when I am changing something, I will run through the first few months of a season. I focus more on how close I am to the line. If I within a point or so in NBA of the closing line on 80+% of games, I know I have a decent model. For new stuff, I will generate close to a seasons worth of data, sometimes it works out really well, and sometimes it crashes and burns.

    Most of the time though, I am not doing anything too crazy, so if I am close to the line, I am pretty sure my model is good.
    Comment
    • HeeeHAWWWW
      SBR Hall of Famer
      • 06-13-08
      • 5487

      #3
      Brier scores work well for this: calculate it for your model's predictions, and for the implied prob of the market.

      It's not perfect because of differing subsets, but it's quick.
      Comment
      • Bsims
        SBR Wise Guy
        • 02-03-09
        • 827

        #4
        Correlation Summary

        Here are the results of the correlations between the predicted scores and the actual scores for CBB games in 2017-18 season.

        Predictor Scores # Predictions Avg Regulation Score Team 1 Avg Pred Team 1 Corr 1 Avg Regulation Score Team 2 Avg Pred Team 2 Corr 2
        LV implied scores 3,971 69.6 70.0 0.524 74.6 75.0 0.572
        Like games average scores 3,505 70.3 71.0 0.498 73.5 74.1 0.508
        KenPom predicted scores 3,884 69.6 70.0 0.502 74.6 74.8 0.543
        Power ratings predicted scores 2,725 69.9 69.9 0.447 74.1 74.5 0.441
        Total Games on Scores File 3,975
        Games at Neutral Sites 590
        Percent of Neutral Site Games 14.8%
        Note that I only used the score at the end of regulation play, ignoring overtime. Normally team 1 is the visitor, and team 2 is the home team. For games at neutral sites, both are considered visitors. This may raise some questions. I’ll try to deal with some obvious ones in subsequent posts. I have put a spreadsheet with the source data and summary in the cloud. Hopefully you can access it via the following URL, bit.ly/2A1f8pE
        Comment
        • Bsims
          SBR Wise Guy
          • 02-03-09
          • 827

          #5
          Originally posted by HeeeHAWWWW
          Brier scores work well for this: calculate it for your model's predictions, and for the implied prob of the market.

          It's not perfect because of differing subsets, but it's quick.
          Interesting, I'll have to learn more about this. I can think of some other applications.
          Comment
          • HeeeHAWWWW
            SBR Hall of Famer
            • 06-13-08
            • 5487

            #6
            Originally posted by Bsims
            Interesting, I'll have to learn more about this. I can think of some other applications.
            Other possibles are the other proper scoring rules: logloss, and spherical loss. Logloss is usually the most practical of the three for most purposes, but given most bets are in the middle of the probability range, Brier is likely best for most people.
            Comment
            • nash13
              SBR MVP
              • 01-21-14
              • 1122

              #7

              i guess here is enough to evaluate your betting process
              Comment
              • yak merchant
                SBR High Roller
                • 11-04-10
                • 109

                #8
                Originally posted by HeeeHAWWWW
                Brier scores work well for this: calculate it for your model's predictions, and for the implied prob of the market.

                It's not perfect because of differing subsets, but it's quick.
                So how do you deal with interval/ratio data types with Brier scores? Do you convert everything to Moneyline probabilities or are you binning results? Every example I’ve ever seen is analyzing probabilities between Predicted and actual for Nominal or Ordinal types.
                Comment
                • HeeeHAWWWW
                  SBR Hall of Famer
                  • 06-13-08
                  • 5487

                  #9
                  Originally posted by yak merchant
                  So how do you deal with interval/ratio data types with Brier scores? Do you convert everything to Moneyline probabilities or are you binning results?
                  No need for binning, it inherently calibrates across the whole range.

                  All you need is the (binary) outcome, and prediction %.
                  Originally posted by yak merchant
                  So how do you deal with interval/ratio data types with Brier scores? Do you convert everything to Moneyline probabilities or are you binning results?
                  No need for binning, it inherently calibrates across the whole range.

                  All you need is the (binary) outcome, and prediction %. This is a superior metric than traditional ones using binary outcomes vs binary predictions (eg accuracy, Kappa, AUC etc), because those are throwing away a lot of info about the prediction.
                  Last edited by HeeeHAWWWW; 12-17-18, 05:10 PM.
                  Comment
                  • yak merchant
                    SBR High Roller
                    • 11-04-10
                    • 109

                    #10
                    Originally posted by HeeeHAWWWW
                    No need for binning, it inherently calibrates across the whole range.

                    All you need is the (binary) outcome, and prediction %.

                    No need for binning, it inherently calibrates across the whole range.

                    All you need is the (binary) outcome, and prediction %. This is a superior metric than traditional ones using binary outcomes vs binary predictions (eg accuracy, Kappa, AUC etc), because those are throwing away a lot of info about the prediction.
                    Well I guess that is my question the model in question is comparing predicted scores to actually scores not a binary outcome.
                    Comment
                    • peacebyinches
                      SBR MVP
                      • 02-13-10
                      • 1112

                      #11
                      I look forward to seeing how this works out brims
                      Comment
                      • HeeeHAWWWW
                        SBR Hall of Famer
                        • 06-13-08
                        • 5487

                        #12
                        Originally posted by yak merchant
                        Well I guess that is my question the model in question is comparing predicted scores to actually scores not a binary outcome.
                        AHh, gotcha. I suppose you could use traditional regression metrics, mean squared error etc, take your predicted line and the market's middle point. Problematic in lower scoring sports though, or those with irregular scoring distributions.

                        Binary over/under or a particular handicap also has the nice advantage of focusing your prediction efforts on improving accuracy in the area that matters - ie exactly the thing you're trying to predict and bet on.
                        Comment
                        • danshan11
                          SBR MVP
                          • 07-08-17
                          • 4101

                          #13
                          I think closing line predictions are predictive than actual scores of past games. The big issue I see with the idea is the injuries, rest, suspensions of players that actually change the line. Perfect example is the Rockets without Harden is a different team without Harden. Also considering that CBB teams are very different from day one to the next season especially with loss of superstar one and dones.
                          Comment
                          • Waterstpub87
                            SBR MVP
                            • 09-09-09
                            • 4102

                            #14
                            Originally posted by danshan11
                            I think closing line predictions are predictive than actual scores of past games. The big issue I see with the idea is the injuries, rest, suspensions of players that actually change the line. Perfect example is the Rockets without Harden is a different team without Harden. Also considering that CBB teams are very different from day one to the next season especially with loss of superstar one and dones.
                            If you are actually testing realistically, you should account for injuries. When I was testing NBA models, set up a scraper that would scrap the games line ups for a particular day. All I had to do was to hit 2 buttons, one to pull the lineup and one to process the results.

                            If you are testing CBB it is a little different. But you should account for returning starters when projecting next year. I calculated returning minutes, and went from there.
                            Comment
                            • danshan11
                              SBR MVP
                              • 07-08-17
                              • 4101

                              #15
                              I dont think his model is doing that and in order to do it successfully you need an algo for player worth, I use a team weight system and give each player value and compare that to total team value!
                              Comment
                              • Bsims
                                SBR Wise Guy
                                • 02-03-09
                                • 827

                                #16
                                Originally posted by danshan11
                                I think closing line predictions are predictive than actual scores of past games. The big issue I see with the idea is the injuries, rest, suspensions of players that actually change the line. Perfect example is the Rockets without Harden is a different team without Harden. Also considering that CBB teams are very different from day one to the next season especially with loss of superstar one and dones.
                                Agree. The problem with any handicapping or predictive model is that unknown information like injuries will result in some wagers will look too good. Somehow one must account for these and be leery of these wagers. I tend to compute a return per dollar and bet on those with returns above $1.00. If the return is something like $1.25, be very careful.

                                Your second point is also good. CBB is a good example of where a team might change significantly from year to year. Of the 4 models , the LV one and like games (since it comes from LV) probably are the best early on. KenPom probably considers player changes. I'm skeptical about how well this can be done. The power rating system won't generate ratings for a team until it has scores for at least 3 games at the appropriate site. That's why it has about a thousand less games than the others.

                                I'm planning on a follow up study that will look at correlations by month. I would expect the power rating system to improve the most. In a previous study the ratings got better with more data.
                                Comment
                                • Bsims
                                  SBR Wise Guy
                                  • 02-03-09
                                  • 827

                                  #17
                                  One issue I always face is how to account for home court advantage. Three of the four models take this in account. The power rating system alone faces this problem. One approach is to adjust the predicted scores by some home court advantage. I don't like this approach.

                                  Since basketball teams play lots of games, I look at each team as two different teams, one on the road and one at home. Thus I have two ratings for Duke, one for vDuke and the other for hDuke.
                                  Comment
                                  • tsty
                                    SBR Wise Guy
                                    • 04-27-16
                                    • 510

                                    #18
                                    You can do regression with past odds instead of results? Lol
                                    Comment
                                    • Waterstpub87
                                      SBR MVP
                                      • 09-09-09
                                      • 4102

                                      #19
                                      Originally posted by Bsims
                                      One issue I always face is how to account for home court advantage. Three of the four models take this in account. The power rating system alone faces this problem. One approach is to adjust the predicted scores by some home court advantage. I don't like this approach.

                                      Since basketball teams play lots of games, I look at each team as two different teams, one on the road and one at home. Thus I have two ratings for Duke, one for vDuke and the other for hDuke.
                                      You have to consider it in per possession, not flat. Consider that much of the home vs away is things like penalties and fouls. If a team produces .25 less fouls per possession, 60 vs 100 possessions makes a large amount of difference.

                                      I've always been the opposite on Home vs away. By the time you get to 10 home and 10 away, most of the season is gone. So at this point, you are probably somewhere around 4 home, 2 neutral, and 2 away or something similar. Any results that you get, especially exterme ones, are much more likely to be random, and not an actual signal.

                                      If instead, you use a constant, you can use thousands of games to generate the home vs away advantage, meaning the number is much more likely to be actually valid. In cbb, this may not be exact, because many teams play weaker teams at home, like duke playing abiline christian in the first game of the season or something like that. Also, some teams, denver comes to mind, benefit extra because the conditions are more extreme there. But in general, this is a much cleaner and more accurate approach.
                                      Comment
                                      • Bsims
                                        SBR Wise Guy
                                        • 02-03-09
                                        • 827

                                        #20
                                        If I were to use home court advantage, I'd probably use KenPom's instead of a constant value. Currently his biggest HCA's are for Colorado 4.5 and Iowa State 4.4. His lowest are Grambling St. and Navy 1.6. His median is 3.2.
                                        Comment
                                        • HeeeHAWWWW
                                          SBR Hall of Famer
                                          • 06-13-08
                                          • 5487

                                          #21
                                          Originally posted by Bsims
                                          I tend to compute a return per dollar and bet on those with returns above $1.00. If the return is something like $1.25, be very careful.
                                          Strongly agree with this (at least in any liquid market). You can prove it with sufficient betting history too: your edge estimates have errors, and as the edge increases, typically those will become asymmetrical - ie the real edge will be well below your estimate.

                                          There's a good logical explanation: very large edges represent where the market knows something your model doesn't.

                                          For anyone using Kelly this all becomes rather important :-)
                                          Comment
                                          • tsty
                                            SBR Wise Guy
                                            • 04-27-16
                                            • 510

                                            #22
                                            Originally posted by HeeeHAWWWW
                                            Strongly agree with this (at least in any liquid market). You can prove it with sufficient betting history too: your edge estimates have errors, and as the edge increases, typically those will become asymmetrical - ie the real edge will be well below your estimate.

                                            There's a good logical explanation: very large edges represent where the market knows something your model doesn't.

                                            For anyone using Kelly this all becomes rather important :-)
                                            Selectively following your model is wrong imo

                                            Either 100 or nothing
                                            Comment
                                            • Bsims
                                              SBR Wise Guy
                                              • 02-03-09
                                              • 827

                                              #23
                                              I've eliminated the neutral site games. All the correlations went up a bit. Each model does a better job of predicting the home score than the visitors, except the power rating system. Maybe I need to rethink my home court advantage.

                                              Predictor Scores (eliminating neutral site games) # Predictions Avg Regulation Score Team 1 Avg Pred Team 1 Corr 1 Avg Regulation Score Team 2 Avg Pred Team 2 Corr 2
                                              LV implied scores 3,381 69.6 70.0 0.530 74.9 75.1 0.585
                                              Like games average scores 2,956 70.4 71.1 0.502 73.7 74.1 0.516
                                              KenPom predicted scores 3,311 69.7 69.9 0.510 74.9 74.9 0.554
                                              Power ratings predicted scores 2,411 70.2 70.0 0.455 74.3 74.6 0.445
                                              Comment
                                              • danshan11
                                                SBR MVP
                                                • 07-08-17
                                                • 4101

                                                #24
                                                Originally posted by tsty
                                                You can do regression with past odds instead of results? Lol
                                                what is more accurate as a predictor of future scores. The total for a team at closing of 31 points or the actual score of 67 since the starting center of the opponent had his worst night in his career?

                                                if the books have Yale with totals of
                                                31, 33, 35, 41, 39
                                                and the actual scores were
                                                39, 20, 33, 29, 65
                                                which do you think is more indicative of their next game score
                                                37.2 actual score avg or
                                                35.8 which was the line
                                                Comment
                                                • danshan11
                                                  SBR MVP
                                                  • 07-08-17
                                                  • 4101

                                                  #25
                                                  really I dont see the idea or edge in doing this, you are not doing anything more advanced than even a basic model. I would not see how this system could give you any edge. Do you think it is possible to use this to more accurately predict than the closing line can?
                                                  Comment
                                                  • vampire assassin
                                                    SBR Sharp
                                                    • 03-09-18
                                                    • 296

                                                    #26
                                                    If you look at the set of wagers where your projected ROR is >10%, these will typically due worse than your 3-6% range. As you said, there is an injury or other big change, and your +EV bet has turned into a coin flip.

                                                    If you have a large data set, you can flag matches >10% (or <-10%), or find the sweet spot where you discard matches due to informational disadvantage. If you do this when betting, you'll save a fortune. I lost a 6-fig fortune on the sum of these small positives.
                                                    Comment
                                                    • u21c3f6
                                                      SBR Wise Guy
                                                      • 01-17-09
                                                      • 790

                                                      #27
                                                      Originally posted by HeeeHAWWWW
                                                      ...
                                                      There's a good logical explanation: very large edges represent where the market knows something your model doesn't. ...
                                                      Ding, ding, ding!!! We have a winner! (From my point of view)

                                                      The above is in large part the focus of what I look for when making selections. You see this phenomenon mentioned in various forms in many threads (think "lock" threads for one form) but not many actually try to use this to their advantage IMO.

                                                      Joe.
                                                      Comment
                                                      • ChuckyTheGoat
                                                        BARRELED IN @ SBR!
                                                        • 04-04-11
                                                        • 37237

                                                        #28
                                                        Good work, Bsims. Best of luck.
                                                        Where's the fuckin power box, Carol?
                                                        Comment
                                                        • tsty
                                                          SBR Wise Guy
                                                          • 04-27-16
                                                          • 510

                                                          #29
                                                          Originally posted by danshan11
                                                          what is more accurate as a predictor of future scores. The total for a team at closing of 31 points or the actual score of 67 since the starting center of the opponent had his worst night in his career?

                                                          if the books have Yale with totals of
                                                          31, 33, 35, 41, 39
                                                          and the actual scores were
                                                          39, 20, 33, 29, 65
                                                          which do you think is more indicative of their next game score
                                                          37.2 actual score avg or
                                                          35.8 which was the line
                                                          How do you write a model without using past results? It's literally the only way lol

                                                          Using past odds is retarded since it was less accurate in the past
                                                          Comment
                                                          • danshan11
                                                            SBR MVP
                                                            • 07-08-17
                                                            • 4101

                                                            #30
                                                            Originally posted by tsty
                                                            How do you write a model without using past results? It's literally the only way lol

                                                            Using past odds is retarded since it was less accurate in the past
                                                            because past results are not indicative of future performance past lines are better.
                                                            a team win 10 games straight by 40 points is that more indicative of their power ranking as -40 favorites or is the avg line of -8 more accurate of future performance. Also past lines are a collaboration of past game results.

                                                            I think the avg score of Yankees is 12 runs last 10 is less indicative of the offense power as the avg team total line of 7.5 in last 10
                                                            I would use the 7.5 not the 12, the 7.5 is better indicator of future performance than the 12

                                                            example you take Kluber in his last game there were 9 runs scored
                                                            do you think that 9 is a better number than the total of 6.5 for future games, which is more indicative of future performance, the line or the result?

                                                            when i say past results I am saying last 10 games up to a season not last 25 years
                                                            Comment
                                                            • tsty
                                                              SBR Wise Guy
                                                              • 04-27-16
                                                              • 510

                                                              #31
                                                              Lol u just completely ignore my question but w.e

                                                              Ill ask a different one then

                                                              How did the bookies make those odds? Where were they derived from?
                                                              Comment
                                                              • danshan11
                                                                SBR MVP
                                                                • 07-08-17
                                                                • 4101

                                                                #32
                                                                lines are made with power rankings, weather, injuries and I believe books adjust for teams and situations that they have tons of data on, such as Patriots at home probably gets a little extra push from the books even though rankings say X they are probably X plus a dash of salt.
                                                                Comment
                                                                • danshan11
                                                                  SBR MVP
                                                                  • 07-08-17
                                                                  • 4101

                                                                  #33
                                                                  Originally posted by tsty
                                                                  Lol u just completely ignore my question but w.e

                                                                  Ill ask a different one then

                                                                  How did the bookies make those odds? Where were they derived from?
                                                                  you did not answer any of my questions
                                                                  Comment
                                                                  • danshan11
                                                                    SBR MVP
                                                                    • 07-08-17
                                                                    • 4101

                                                                    #34
                                                                    Originally posted by danshan11
                                                                    because past results are not indicative of future performance past lines are better.
                                                                    a team win 10 games straight by 40 points is that more indicative of their power ranking as -40 favorites or is the avg line of -8 more accurate of future performance. Also past lines are a collaboration of past game results.

                                                                    I think the avg score of Yankees is 12 runs last 10 is less indicative of the offense power as the avg team total line of 7.5 in last 10
                                                                    I would use the 7.5 not the 12, the 7.5 is better indicator of future performance than the 12

                                                                    example you take Kluber in his last game there were 9 runs scored
                                                                    do you think that 9 is a better number than the total of 6.5 for future games, which is more indicative of future performance, the line or the result?

                                                                    when i say past results I am saying last 10 games up to a season not last 25 years
                                                                    I bolded the question to help you see it
                                                                    Comment
                                                                    • danshan11
                                                                      SBR MVP
                                                                      • 07-08-17
                                                                      • 4101

                                                                      #35
                                                                      I also just read that some books are now focusing more on line history over power rankings to try and get the line more stable start to finish
                                                                      Comment
                                                                      SBR Contests
                                                                      Collapse
                                                                      Top-Rated US Sportsbooks
                                                                      Collapse
                                                                      Working...