Hi all. I have developed a model for college football that works fairly well when back tested using 2014 data and results. I base my game choices on contests that my capped line/total is off by x amount of the line I'm getting from the book. When back testing using the 2014-2015 season, it hits at about a 63% rate for around 300 games (out of 869) for football. I'm going to collect a couple more years data to do more back testing but I want to make sure I'm doing this properly.
My concern is that I'm using the end stats for the season to cap all games. Is that, essentially, "cheating" because I'm using data from games to cap those actual games? I'm assuming this is one of the inherent hurdles with back testing. Back testing on a weekly basis is going to be much more involved as I'll need to scrape box scores for individual game stats instead of using season stats.
So, ultimately, my question is ... is there a standard percentage I should knock off my results due to the fact that I'm using the entire season's stats to cap a game who's stats are included in the calculations or is the advantage minuscule enough to simply disregard?
My concern is that I'm using the end stats for the season to cap all games. Is that, essentially, "cheating" because I'm using data from games to cap those actual games? I'm assuming this is one of the inherent hurdles with back testing. Back testing on a weekly basis is going to be much more involved as I'll need to scrape box scores for individual game stats instead of using season stats.
So, ultimately, my question is ... is there a standard percentage I should knock off my results due to the fact that I'm using the entire season's stats to cap a game who's stats are included in the calculations or is the advantage minuscule enough to simply disregard?