Omitting older data from training set

**brettd** · 06-30-13, 10:13 AM

You should weight the data somehow.

Maybe use exponential smoothing or some other time series smooth technique. Optimize the weighting/smoothing parameters on a training set (I have used evolutionary algos quite a lot doing this very same thing) against MSE/MAE, and then test on out of sample set.

**Miz** · 06-30-13, 11:36 AM

Good thoughts Brett. Thanks for sharing them.

**TravisVOX** · 07-03-13, 08:57 AM

I found that exponential smoothing worked well for me. I spent a few hours on my test data determining the optimal rate to "smooth" and then committed to it.

**Juret** · 07-03-13, 01:30 PM

Is smoothing always a good idea? What happens with your predictions when teams regress to their means after being extra lucky/unlucky? Another issue could be player injuries and or suspensions. With a key player out temporarily, that team will look worse than they actually are as the games without the key player are weighted more heavily.

**Miz** · 07-05-13, 12:46 PM

Thank you all for your feedback.

**marcoforte** · 08-31-13, 09:35 PM

This past off season I was running binary logistic regression. Test data was 2012 NFL over/unders. The learning set varied from 4 year 2008-2011 to two year 2010-2011. I found that the predictive power was stronger using the model derived from the 4 year data.