1. #1
    djiddish98
    djiddish98's Avatar Become A Pro!
    Join Date: 11-13-09
    Posts: 345
    Betpoints: 237

    Open Source model building

    Typically, when a bettor builds a model for a sport,the results are used only to his or her advantage, or the results might be released while keeping the mechanics hidden.

    I spent the fall and a large chunk of the winter trying to build a successful NBA model that uses efficiencies and possessions to run random simulations. I did limited back testing which basically proved the effort was fruitless and I shut the project down.

    Now that I've hit a wall, I'd like to take things a step further - using the the collective intellect on this forum, I think we can try to build a model as a group and possibly achieve greater results as a whole rather than just pieces.

    I am a recreational gambler in the lightest of senses, and I enjoy working in excel as well as exploring new ways to look at problems - this was more of a challenge for me than a way to try to make money, and I'm looking to keep that challenge alive.

    I've attached a sample of the data collected so far that I can provide for this project - all of it comes from basketball-reference.com, so any year they have on file is usable in terms of the data set.

    I'd like to hear some initial feedback on this - if people would prefer to keep their ideas to themselves, I certainly respect that, since some are trying to make a living off their applications. If there is interest, we can proceed further.

    Good luck!
    Attached Files

  2. #2
    Justin7
    Justin7's Avatar Become A Pro!
    Join Date: 07-31-06
    Posts: 8,577
    Betpoints: 1506

    I took a quick look at your file. I don't see any individual player statistics. In NBA, failing to adjust team efficiencies for roster changes (due to trades and injuries) will introduce more noise than you can overcome.

  3. #3
    djiddish98
    djiddish98's Avatar Become A Pro!
    Join Date: 11-13-09
    Posts: 345
    Betpoints: 237

    There are two routes we can take at this point - I can easily take my file and pull individual stats as opposed to team totals; or I can just run with adjusted +/- instead.

    I've thought about using adjusted +/- in the past, but I couldn't think of a way to incorporate it to predict totals - presumably, if you took

    (player minutes)*(adj +/-) / (team minutes)

    and summed for all players, you would get a team +/- [assuming your estimate of the individual minutes is correct]. This would be good for spreads, but I'm not sure I could think of a way to bake in possessions to determine totals.

    The other thing is the published +/- stats are for the year, where I think a proper model should incorporate a weighting for recent performance.

    I'll setup the file to pull individual stats and we can run from there.

  4. #4
    Justin7
    Justin7's Avatar Become A Pro!
    Join Date: 07-31-06
    Posts: 8,577
    Betpoints: 1506

    You need to read "basketball on paper" by Dean Oliver. He explains exactly how to do this.

  5. #5
    djiddish98
    djiddish98's Avatar Become A Pro!
    Join Date: 11-13-09
    Posts: 345
    Betpoints: 237

    I am familiar with "Basketball on paper" but I didn't realize he got into adjusted +/-; I'll have to dig deeper. I have read Mathletics, and originally hoped to use Winston's method's for calculating adjusted +/-, but I didn't feeling like shelling out the money for the advanced solver package that can support more variables (all nba players).

    I've attached a sample of individual player stats I've gathered. I can do a full run if anyone's interested.
    Attached Files

  6. #6
    roasthawg
    roasthawg's Avatar Become A Pro!
    Join Date: 11-09-07
    Posts: 2,990

    Quote Originally Posted by Justin7 View Post
    I took a quick look at your file. I don't see any individual player statistics. In NBA, failing to adjust team efficiencies for roster changes (due to trades and injuries) will introduce more noise than you can overcome.
    I agree with this... the NBA's a hard one, one of the toughest sports to beat for me.

  7. #7
    Wrecktangle
    Wrecktangle's Avatar Become A Pro!
    Join Date: 03-01-09
    Posts: 1,524
    Betpoints: 3209

    I opened the player's file expecting to see player stats by each game (which is how I tend to keep them), Or having the stats by the same number of games can work, but varying the number of games is less good for me, i.e. a "standard candle" for each player is what I'm looking for. I can normalize of course, but that tends to introduce more variance.

    In any regard, Dish98, do you scrape everyday? I'm having issues with keeping my scraper working due to my source changing frequently (7 times in last 3-4 weeks, 12 times since the start of the year) any interest in just sharing data each day if so?

  8. #8
    djiddish98
    djiddish98's Avatar Become A Pro!
    Join Date: 11-13-09
    Posts: 345
    Betpoints: 237

    roasthawg - I agree on the NBA being a very hard sport to beat - When I first started, I thought it would be much simpler, based on the idea that the general betting public isn't aware of the advanced statistics that are becoming available / are available for basketball (adj +/-, efficiencies, etc), and still relied on PPG to measure how valuable a player was. However, I found that even the advanced statistics weren't doing a great job of predicting (for me at least). I was hoping to gather the group's mentality to possibly make this work.

    Wrecktangle - This was just a sample of the scrape to show what data I plan on collecting / layout - 2009 YTD is currently running and I'd be happy to post. I can also modify the scrape to pull prior years - How many years would be appropriate? 5? 10?

    Basketball-reference has only changed their format once (slightly) this year, which is nice. I can also post the scraping utility when I'm done if anyone recommends an outside source for uploading files (I don't think SBR supports .xlsm files from Excel 2007). Right now, it's setup to pull the full year every run, but I can apply some old code to have it simply fill in data for dates that haven't been pulled.

  9. #9
    djiddish98
    djiddish98's Avatar Become A Pro!
    Join Date: 11-13-09
    Posts: 345
    Betpoints: 237

    Data size for 2009 (up to 2.25.10) is > 2 megs - anyone have a good upload site?

  10. #10
    djiddish98
    djiddish98's Avatar Become A Pro!
    Join Date: 11-13-09
    Posts: 345
    Betpoints: 237

    It seems like there isn't much interest in a common model. However, I'd like to donate my excel spreadsheet to the forum as a sign of gratitude from all I've learned here.

    Simply run the updatetotal macro (feel free to inspect it before running - there's nothing locked on this sheet), and the data will be collected. The code is sloppy, because I only took a few CS classes, and never cared to organize code. It does get the job done though.

    If anyone has any questions, feel free to ask. If you do use this sheet, it would be nice if you could fill me in on your intentions with the data - either publicly or via PM. I'd love to hear some original thoughts.
    Attached Files

  11. #11
    Flying Dutchman
    Floggings continue until morale improves
    Flying Dutchman's Avatar Become A Pro!
    Join Date: 05-17-09
    Posts: 2,467
    Betpoints: 759

    djiddish, I think you might find interest, but not open source. Why would I share my code which, if it works, can erode the line especially if a pack of bone-heads, not unlike those who frequent these boards, use them? If we could somehow just keep it to the enlightened Tank members, it perhaps might be worthy, but when those degenerates in Players Talk invade our turf and abscond with MY code, I'd get cranky.

    ...I'm not here to feed those unwashed masses.

Top