Omitting older data from training set

Miz · 06-30-13 07:35 AM

I wanted to gauge other people's experience with this. I found that although my MSE or Abs error goes down marginally when i include more data (older seasons) in the training set, the performance actually improves when i only include the 2 most recent seasons in the training set. I hypothesize that it helps adjust for league changes etc, but as always limitations on dataset size are always somewhat limiting with resp to establishing statistical confidence, etc. I am looking at the NFL right now, but have seen this in other sports too. Does anyone have any thoughts/experience they'd care to share on this topic? Thanks.

brettd · 06-30-13 10:13 AM

You should weight the data somehow.

Maybe use exponential smoothing or some other time series smooth technique. Optimize the weighting/smoothing parameters on a training set (I have used evolutionary algos quite a lot doing this very same thing) against MSE/MAE, and then test on out of sample set.

Miz · 06-30-13 11:36 AM

Good thoughts Brett. Thanks for sharing them.

TravisVOX · 07-03-13 08:57 AM

I found that exponential smoothing worked well for me. I spent a few hours on my test data determining the optimal rate to "smooth" and then committed to it.

Juret · 07-03-13 01:30 PM

Is smoothing always a good idea? What happens with your predictions when teams regress to their means after being extra lucky/unlucky? Another issue could be player injuries and or suspensions. With a key player out temporarily, that team will look worse than they actually are as the games without the key player are weighted more heavily.

Miz · 07-05-13 12:46 PM

Thank you all for your feedback.

marcoforte · 08-31-13 09:35 PM

This past off season I was running binary logistic regression. Test data was 2012 NFL over/unders. The learning set varied from 4 year 2008-2011 to two year 2010-2011. I found that the predictive power was stronger using the model derived from the 4 year data.

SBR Top-Rated Sportsbooks				Best Sportsbooks List
#1 FanDuel	SBR rating 4.8/5	Review	#6 BetRivers	SBR rating 4.1/5	Review
#2 Caesars	SBR rating 4.7/5	Review	#7 Fanatics	SBR rating 4.1/5	Review
#3 DraftKings	SBR rating 4.7/5	Review	#8 Betway	SBR rating 3.8/5	Review
#4 BetMGM	SBR rating 4.6/5	Review	#9 Borgata	SBR rating 3.5/5	Review
#5 bet365	SBR rating 4.6/5	Review	#10 ClutchBet	SBR rating 2.9/5	Review

Omitting older data from training set

Thread Tools

Omitting older data from training set