1. #1
    frongi
    frongi's Avatar Become A Pro!
    Join Date: 02-05-13
    Posts: 114
    Betpoints: 174

    Backtesting/Modeling Question

    First off, I'm pretty sure this makes sense but I just want to be sure: Using data from odd years to generate a model and data from even years to backtest the model is a legitimate way to test the accuracy of my model (ie not data mining I think it's called).

    Second, when modeling and backtesting, should I be using a team's statistics on the season or at the time of the game? I would imagine that trying to use each team's stats at the time of the game would be very difficult to accomplish.

    Apologies if this is a rookie question, just a youngin over here trying to learn something.

  2. #2
    EXhoosier10
    EXhoosier10's Avatar Become A Pro!
    Join Date: 07-06-09
    Posts: 3,122
    Betpoints: 4390

    Quote Originally Posted by frongi View Post
    First off, I'm pretty sure this makes sense but I just want to be sure: Using data from odd years to generate a model and data from even years to backtest the model is a legitimate way to test the accuracy of my model (ie not data mining I think it's called).
    I suppose there is a chance that there are cycles in whatever sport you're looking at that move back and forth year after year, but i couldn't imagine what those would be. I'd just assign a random number to each season and split the numbers in half using a random number generator to pick for you.

    Quote Originally Posted by frongi View Post
    Second, when modeling and backtesting, should I be using a team's statistics on the season or at the time of the game? I would imagine that trying to use each team's stats at the time of the game would be very difficult to accomplish.
    if you're using final season statistics, you're more likely to be testing games using true talent levels. Using values closer to true talent level will likely be more profitable than using team data after one or two months. Using NBA for example, one month of 2012-2013 data is hardly as reliable as a full season of 2011-2012 data (ignoring the lockout factor). So betting in december 2012 using that one month worth of data isn't necessarily going to be very stable. That's where the bettor needs to be able to understand the sport and know what data is most likely to be true talent level and how to regress said to become more reliable.

    Overall message, using true talent to test (and bet) is much better than using in-season data. But when the time comes to start making bets and all you have is in-season data, you better be able to find which data pieces you have available to you are most likely to hold steady long term or you're going to be stuck making uninformed bets.

  3. #3
    frongi
    frongi's Avatar Become A Pro!
    Join Date: 02-05-13
    Posts: 114
    Betpoints: 174

    just as an update, i got myself a pretty sweet database (in one day!) without specific outside help (just googled stuff and searched this forum). gonna alternate games (sorted by date) for my modeling and testing halves i think.

  4. #4
    EXhoosier10
    EXhoosier10's Avatar Become A Pro!
    Join Date: 07-06-09
    Posts: 3,122
    Betpoints: 4390

    Quote Originally Posted by frongi View Post
    just as an update, i got myself a pretty sweet database (in one day!) without specific outside help (just googled stuff and searched this forum). gonna alternate games (sorted by date) for my modeling and testing halves i think.
    Congratulations on being self-reliant. You are in the top 5% of posters on this site

  5. #5
    frongi
    frongi's Avatar Become A Pro!
    Join Date: 02-05-13
    Posts: 114
    Betpoints: 174

    Quote Originally Posted by EXhoosier10 View Post
    Congratulations on being self-reliant. You are in the top 5% of posters on this site
    ha i may have spoke to soon... i think the lines from sportsdatabase are off. might need to learn to scrape.

  6. #6
    Miz
    Miz's Avatar Become A Pro!
    Join Date: 08-30-09
    Posts: 695
    Betpoints: 3162

    Cross-validation using odd/even splits doesn't account for market changes. Thank me later.

Top