1. #1
    Miz
    Miz's Avatar Become A Pro!
    Join Date: 08-30-09
    Posts: 695
    Betpoints: 3162

    Omitting older data from training set

    I wanted to gauge other people's experience with this. I found that although my MSE or Abs error goes down marginally when i include more data (older seasons) in the training set, the performance actually improves when i only include the 2 most recent seasons in the training set. I hypothesize that it helps adjust for league changes etc, but as always limitations on dataset size are always somewhat limiting with resp to establishing statistical confidence, etc. I am looking at the NFL right now, but have seen this in other sports too. Does anyone have any thoughts/experience they'd care to share on this topic? Thanks.

  2. #2
    brettd
    brettd's Avatar Become A Pro!
    Join Date: 01-25-10
    Posts: 229
    Betpoints: 3869

    You should weight the data somehow.

    Maybe use exponential smoothing or some other time series smooth technique. Optimize the weighting/smoothing parameters on a training set (I have used evolutionary algos quite a lot doing this very same thing) against MSE/MAE, and then test on out of sample set.

  3. #3
    Miz
    Miz's Avatar Become A Pro!
    Join Date: 08-30-09
    Posts: 695
    Betpoints: 3162

    Good thoughts Brett. Thanks for sharing them.

  4. #4
    TravisVOX
    TravisVOX's Avatar Become A Pro!
    Join Date: 12-25-12
    Posts: 30
    Betpoints: 861

    I found that exponential smoothing worked well for me. I spent a few hours on my test data determining the optimal rate to "smooth" and then committed to it.

  5. #5
    Juret
    Update your status
    Juret's Avatar Become A Pro!
    Join Date: 07-18-10
    Posts: 113
    Betpoints: 1239

    Is smoothing always a good idea? What happens with your predictions when teams regress to their means after being extra lucky/unlucky? Another issue could be player injuries and or suspensions. With a key player out temporarily, that team will look worse than they actually are as the games without the key player are weighted more heavily.

  6. #6
    Miz
    Miz's Avatar Become A Pro!
    Join Date: 08-30-09
    Posts: 695
    Betpoints: 3162

    Thank you all for your feedback.

  7. #7
    marcoforte
    marcoforte's Avatar Become A Pro!
    Join Date: 08-10-08
    Posts: 140
    Betpoints: 396

    This past off season I was running binary logistic regression. Test data was 2012 NFL over/unders. The learning set varied from 4 year 2008-2011 to two year 2010-2011. I found that the predictive power was stronger using the model derived from the 4 year data.

Top