Statistically Modelling the NBA

**Justin7** · 06-02-11, 01:26 PM

Have you done any backtesting? Such as tracking imaginary bets in the last year or two using your approach?

**Blax0r** · 06-02-11, 04:00 PM

Originally posted by Justin7

Have you done any backtesting? Such as tracking imaginary bets in the last year or two using your approach?

~~His paper seems to be focused on "player rankings"; possibly geared towards NBA front offices rather than bettors. I guess the only way to "bet" this would be through unique props.~~

I just finished the paper (although the details of exact optimization methodologies were skimmed at best), and model definitely makes an attempt to predict game outcomes from player rankings; my fault for posting w/o reading.

My initial comment is that you may need to retrain your data after the in-season trade deadline passes, but I imagine the impact may be minimal.

Along with Justin7, I imagine others would definitely be interested the results of a backtest. Great read!

**IBT_Sports** · 06-02-11, 04:26 PM

From the pilot testing you've performed, are you getting data outputs you can share the results with us?

**Rhuidean** · 06-02-11, 10:34 PM

Yeah, the player ratings at least in theory should be useful for making bets too. Just as you know that home court advantage in the NBA is worth roughly 3-4 points, it would probably be helpful to know that LeBron is worth roughly 9 points for every 48 minutes he plays.

I'd love to do some sort of testing. Even just for the 2010-2011 season I consider.

What would be a good way of going about this? I guess if I had a sports book database of the 1230 NBA regular season games, with the final line (before the actual game takes place) recorded, I could come up with a simple heuristic that makes a bet when it finds what it things are mispriced bets.

Is this the sort of thing you guys have in mind as far as backtesting goes?

I guess I could also just track the average error the sports book makes (if we take the line for the game as its prediction) and the error my technique makes. If my technique has a smaller average absolute error, then I guess it does a better job of modelling the NBA than the sportsbooks do.

If so, does anyone actually know of a site that would have information like this I can download? Basically a list of the 1230 games in the NBA season along with what the line was before the game?

**TomG** · 06-02-11, 11:43 PM

You will need to either write your own data scraper, or enter an agreement with someone to provide it for you. You may have a lot of success working with groups where they feed you data, and you share plays if you generate a successful model (and they bear the risk of giving you data that you are unable to develop into a useful model).

**Rhuidean** · 06-03-11, 12:25 AM

Googling around a bit, I found some historical lines stuff here:

covers.com/pageLoader/pageLoader.aspx?page=/data/nba/teams/pastresults/2010-2011/team404169.html

It will require I guess a bit of code to suck out the final lines, but I've not really seen anything cleaner than this (for example, a nice, processed CSV file somewhere with the final lines for the 1230 games of this past season.)

Once I extract these lines, I'll see how my technique compares.

**EasyHustlin** · 06-03-11, 12:27 AM

Would be more than happy to share the 1230 lines you're looking for.

**Rhuidean** · 06-03-11, 03:53 AM

If you have the 1230 lines in a CSV of some sort, I'd love to see it, it'd save me a bit of time. I guess your CSV looks something like:

Game Date | Home Team | Away Team | Lines

If so, then it it won't take me more than a few minutes to then process it a bit further and then see how well this technique is doing.

I'll send you my email address in a private message.

**Rhuidean** · 06-07-11, 03:56 PM

I did some backtesting on the 2010-2011 season. See this long post here:

http://sonicscentral.com/apbrmetrics/viewtopic.php?p=1341&sid=ca954a2b968f09a8a5b097fad8bc5e66#p1341

Basically, there are two main cases I considered:
1) Train algorithm on 820 games, use as a gambling rule for last 410 games of NBA season
2) Train algorithm on first 205 games, use as a gambling rule for last 1025

It looks like I do pretty well in Case #1, but poorly in Case #2. At first I wasn't sure if we could speak confidently about Case #1 since we only evaluate on 410 games, but I did a one-sided binomial test that I think assures us that we are doing better than coin flipping.

Regarding Case #2, I suspect that if I incorporate games from the 2009-2010 season, performance will improve a lot.

Comments/feedback welcome.

**demens** · 06-07-11, 09:19 PM

I think its a bit questionable that you did not get good performance over the bigger sample size. Dont think the smaller sample of 400 games is large enough, sometimes you have to take into account different parts of the season when games are meaningless for certain teams. Those games at the end of the season would account for a decent size chunk of that 400.

I haven't taken the time to read the detailed post, will do so later just wanted to share some thoughts. Good luck.

**Rhuidean** · 06-08-11, 12:37 AM

It kind of makes sense. Remember, when I evaluate on the larger set of games (1025 datapoints), this means I'm only training the algorithms on 225 points. 225 points simply might be too small to get anything useful for my technique. But if I figure out a good way of incorporating data from say the 2009-2010 season, then I can boost the size of the training set quite a bit.

Regarding your second comment, I'm not sure what you mean. Even if the NBA teams and players think that the games are meaningless, they still have to play them, right? It isn't like those 410 games are easier to handicap than any other chunk of games. So I'm not sure that it makes sense to quibble with that block of games chosen. You can quibble with the size of this evaluation set, though.

**uva3021** · 06-09-11, 12:30 AM

The sportsbooks take into consideration the situation behind each scheduled game, so I agree no quibbling is necessary.