Omitting older data from training set

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Miz
    SBR Wise Guy
    • 08-30-09
    • 695

    #1
    Omitting older data from training set
    I wanted to gauge other people's experience with this. I found that although my MSE or Abs error goes down marginally when i include more data (older seasons) in the training set, the performance actually improves when i only include the 2 most recent seasons in the training set. I hypothesize that it helps adjust for league changes etc, but as always limitations on dataset size are always somewhat limiting with resp to establishing statistical confidence, etc. I am looking at the NFL right now, but have seen this in other sports too. Does anyone have any thoughts/experience they'd care to share on this topic? Thanks.
  • brettd
    SBR High Roller
    • 01-25-10
    • 229

    #2
    You should weight the data somehow.

    Maybe use exponential smoothing or some other time series smooth technique. Optimize the weighting/smoothing parameters on a training set (I have used evolutionary algos quite a lot doing this very same thing) against MSE/MAE, and then test on out of sample set.
    Comment
    • Miz
      SBR Wise Guy
      • 08-30-09
      • 695

      #3
      Good thoughts Brett. Thanks for sharing them.
      Comment
      • TravisVOX
        SBR Rookie
        • 12-25-12
        • 30

        #4
        I found that exponential smoothing worked well for me. I spent a few hours on my test data determining the optimal rate to "smooth" and then committed to it.
        Comment
        • Juret
          SBR High Roller
          • 07-18-10
          • 113

          #5
          Is smoothing always a good idea? What happens with your predictions when teams regress to their means after being extra lucky/unlucky? Another issue could be player injuries and or suspensions. With a key player out temporarily, that team will look worse than they actually are as the games without the key player are weighted more heavily.
          Comment
          • Miz
            SBR Wise Guy
            • 08-30-09
            • 695

            #6
            Thank you all for your feedback.
            Comment
            • marcoforte
              SBR High Roller
              • 08-10-08
              • 140

              #7
              This past off season I was running binary logistic regression. Test data was 2012 NFL over/unders. The learning set varied from 4 year 2008-2011 to two year 2010-2011. I found that the predictive power was stronger using the model derived from the 4 year data.
              Comment
              SBR Contests
              Collapse
              Top-Rated US Sportsbooks
              Collapse
              Working...