Statistically Modelling the NBA

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Rhuidean
    SBR Rookie
    • 06-02-11
    • 6

    #1
    Statistically Modelling the NBA
    Hello folks,

    I've developed a statistical model for the NBA that is a variant of the adjusted +/- (APM) regression model. I've tested it on the 2010-2011 dataset, and it seems to substantially outperform both APM and the home court advantage predictor (at least on the different chunks of the 2010-2011 season I evaluated it on.)

    Anyway, see a bit more information here in a post I made on the APBRmetrics forum: http://sonicscentral.com/apbrmetrics...0b32b89a020a76

    I developed the algorithm for academic reasons, but I think it might be of interest to sports bettors too. This is mostly guesswork on my part though, since I don't know exactly what quantitative tools you NBA sports bettors use to model possession (for all I know, APM is completely useless for gambling.)

    However, if +/- sort of tools are useful for NBA sports bettors, then I think this model and algorithm I've developed might be of interest.

    I'd love to hear any feedback you guys might have to offer. Thanks in advance.
  • Justin7
    SBR Hall of Famer
    • 07-31-06
    • 8577

    #2
    Have you done any backtesting? Such as tracking imaginary bets in the last year or two using your approach?
    Comment
    • Blax0r
      SBR Wise Guy
      • 10-13-10
      • 688

      #3
      Originally posted by Justin7
      Have you done any backtesting? Such as tracking imaginary bets in the last year or two using your approach?
      His paper seems to be focused on "player rankings"; possibly geared towards NBA front offices rather than bettors. I guess the only way to "bet" this would be through unique props.

      I just finished the paper (although the details of exact optimization methodologies were skimmed at best), and model definitely makes an attempt to predict game outcomes from player rankings; my fault for posting w/o reading.

      My initial comment is that you may need to retrain your data after the in-season trade deadline passes, but I imagine the impact may be minimal.

      Along with Justin7, I imagine others would definitely be interested the results of a backtest. Great read!
      Last edited by Blax0r; 06-02-11, 08:53 PM.
      Comment
      • IBT_Sports
        Restricted User
        • 05-31-11
        • 5

        #4
        From the pilot testing you've performed, are you getting data outputs you can share the results with us?
        Comment
        • Rhuidean
          SBR Rookie
          • 06-02-11
          • 6

          #5
          Yeah, the player ratings at least in theory should be useful for making bets too. Just as you know that home court advantage in the NBA is worth roughly 3-4 points, it would probably be helpful to know that LeBron is worth roughly 9 points for every 48 minutes he plays.

          I'd love to do some sort of testing. Even just for the 2010-2011 season I consider.

          What would be a good way of going about this? I guess if I had a sports book database of the 1230 NBA regular season games, with the final line (before the actual game takes place) recorded, I could come up with a simple heuristic that makes a bet when it finds what it things are mispriced bets.

          Is this the sort of thing you guys have in mind as far as backtesting goes?

          I guess I could also just track the average error the sports book makes (if we take the line for the game as its prediction) and the error my technique makes. If my technique has a smaller average absolute error, then I guess it does a better job of modelling the NBA than the sportsbooks do.

          If so, does anyone actually know of a site that would have information like this I can download? Basically a list of the 1230 games in the NBA season along with what the line was before the game?
          Comment
          • TomG
            SBR Wise Guy
            • 10-29-07
            • 500

            #6
            You will need to either write your own data scraper, or enter an agreement with someone to provide it for you. You may have a lot of success working with groups where they feed you data, and you share plays if you generate a successful model (and they bear the risk of giving you data that you are unable to develop into a useful model).
            Comment
            • Rhuidean
              SBR Rookie
              • 06-02-11
              • 6

              #7
              Googling around a bit, I found some historical lines stuff here:

              covers.com/pageLoader/pageLoader.aspx?page=/data/nba/teams/pastresults/2010-2011/team404169.html

              It will require I guess a bit of code to suck out the final lines, but I've not really seen anything cleaner than this (for example, a nice, processed CSV file somewhere with the final lines for the 1230 games of this past season.)

              Once I extract these lines, I'll see how my technique compares.
              Last edited by Rhuidean; 06-03-11, 12:26 AM. Reason: .
              Comment
              • EasyHustlin
                SBR Wise Guy
                • 07-15-10
                • 633

                #8
                Would be more than happy to share the 1230 lines you're looking for.
                Last edited by EasyHustlin; 06-03-11, 10:45 AM.
                Comment
                • Rhuidean
                  SBR Rookie
                  • 06-02-11
                  • 6

                  #9
                  If you have the 1230 lines in a CSV of some sort, I'd love to see it, it'd save me a bit of time. I guess your CSV looks something like:

                  Game Date | Home Team | Away Team | Lines

                  If so, then it it won't take me more than a few minutes to then process it a bit further and then see how well this technique is doing.

                  I'll send you my email address in a private message.
                  Comment
                  • Rhuidean
                    SBR Rookie
                    • 06-02-11
                    • 6

                    #10
                    I did some backtesting on the 2010-2011 season. See this long post here:



                    Basically, there are two main cases I considered:
                    1) Train algorithm on 820 games, use as a gambling rule for last 410 games of NBA season
                    2) Train algorithm on first 205 games, use as a gambling rule for last 1025

                    It looks like I do pretty well in Case #1, but poorly in Case #2. At first I wasn't sure if we could speak confidently about Case #1 since we only evaluate on 410 games, but I did a one-sided binomial test that I think assures us that we are doing better than coin flipping.

                    Regarding Case #2, I suspect that if I incorporate games from the 2009-2010 season, performance will improve a lot.

                    Comments/feedback welcome.
                    Last edited by Rhuidean; 06-07-11, 04:44 PM. Reason: 810->820, 420->410
                    Comment
                    • demens
                      SBR MVP
                      • 10-22-10
                      • 2785

                      #11
                      I think its a bit questionable that you did not get good performance over the bigger sample size. Dont think the smaller sample of 400 games is large enough, sometimes you have to take into account different parts of the season when games are meaningless for certain teams. Those games at the end of the season would account for a decent size chunk of that 400.

                      I haven't taken the time to read the detailed post, will do so later just wanted to share some thoughts. Good luck.
                      Comment
                      • Rhuidean
                        SBR Rookie
                        • 06-02-11
                        • 6

                        #12
                        It kind of makes sense. Remember, when I evaluate on the larger set of games (1025 datapoints), this means I'm only training the algorithms on 225 points. 225 points simply might be too small to get anything useful for my technique. But if I figure out a good way of incorporating data from say the 2009-2010 season, then I can boost the size of the training set quite a bit.

                        Regarding your second comment, I'm not sure what you mean. Even if the NBA teams and players think that the games are meaningless, they still have to play them, right? It isn't like those 410 games are easier to handicap than any other chunk of games. So I'm not sure that it makes sense to quibble with that block of games chosen. You can quibble with the size of this evaluation set, though.
                        Comment
                        • uva3021
                          SBR Wise Guy
                          • 03-01-07
                          • 537

                          #13
                          The sportsbooks take into consideration the situation behind each scheduled game, so I agree no quibbling is necessary.
                          Comment
                          SBR Contests
                          Collapse
                          Top-Rated US Sportsbooks
                          Collapse
                          Working...