Is it appropriate to backtest using end of year stats (for a statistical model) if the sample being tested is from the back end of the season? What last percent of games would it be ok for? The last 20% 10%, 5%?
I think that at a certain point the end of year stats are going to be very close to the stats at the time, so using end of year stats won't make a significant difference. I just don't know where that cutoff is.
It isn't a good idea to use end data to forecast in season. It is going to make your predictive power or whatever you testing too high.
There really isn't a hard cut time where this become a bad idea. I'd try to avoid it. You might be able to find stats up to a certain point, with Baseball they have month by month information.
Short answer... no. You should use stats up through the day before the game. Could you still learn something by using year end stats and testing the last 10%? Maybe so, but that kind of overlap is dangerous. good luck
It also depends on the sport. I've done some modeling in various sports, and the one that worked the worst was the NBA. I was using in-season stats, and I found that I would have made money betting the opposite of what the stats were telling me to do. When I narrowed the range of most recent games used I found a bit more success but still not enough to be usable.
I don't like using end of season stats because they will never be available when you start making your wagers.