I wanted to gauge other people's experience with this. I found that although my MSE or Abs error goes down marginally when i include more data (older seasons) in the training set, the performance actually improves when i only include the 2 most recent seasons in the training set. I hypothesize that it helps adjust for league changes etc, but as always limitations on dataset size are always somewhat limiting with resp to establishing statistical confidence, etc. I am looking at the NFL right now, but have seen this in other sports too. Does anyone have any thoughts/experience they'd care to share on this topic? Thanks.
Omitting older data from training set
Collapse
X
-
MizSBR Wise Guy
- 08-30-09
- 695
#1Omitting older data from training setTags: None -
brettdSBR High Roller
- 01-25-10
- 229
#2You should weight the data somehow.
Maybe use exponential smoothing or some other time series smooth technique. Optimize the weighting/smoothing parameters on a training set (I have used evolutionary algos quite a lot doing this very same thing) against MSE/MAE, and then test on out of sample set.Comment -
MizSBR Wise Guy
- 08-30-09
- 695
#3Good thoughts Brett. Thanks for sharing them.Comment -
TravisVOXSBR Rookie
- 12-25-12
- 30
#4I found that exponential smoothing worked well for me. I spent a few hours on my test data determining the optimal rate to "smooth" and then committed to it.Comment -
JuretSBR High Roller
- 07-18-10
- 113
#5Is smoothing always a good idea? What happens with your predictions when teams regress to their means after being extra lucky/unlucky? Another issue could be player injuries and or suspensions. With a key player out temporarily, that team will look worse than they actually are as the games without the key player are weighted more heavily.Comment -
MizSBR Wise Guy
- 08-30-09
- 695
#6Thank you all for your feedback.Comment -
marcoforteSBR High Roller
- 08-10-08
- 140
#7This past off season I was running binary logistic regression. Test data was 2012 NFL over/unders. The learning set varied from 4 year 2008-2011 to two year 2010-2011. I found that the predictive power was stronger using the model derived from the 4 year data.Comment
SBR Contests
Collapse
Top-Rated US Sportsbooks
Collapse
#1 BetMGM
4.8/5 BetMGM Bonus Code
#2 FanDuel
4.8/5 FanDuel Promo Code
#3 Caesars
4.8/5 Caesars Promo Code
#4 DraftKings
4.7/5 DraftKings Promo Code
#5 Fanatics
#6 bet365
4.7/5 bet365 Bonus Code
#7 Hard Rock
4.1/5 Hard Rock Bet Promo Code
#8 BetRivers
4.1/5 BetRivers Bonus Code