1. #1
    davopnz
    davopnz's Avatar Become A Pro!
    Join Date: 02-12-12
    Posts: 1,727
    Betpoints: 60

    What to do with 'explainable' outliers in a model

    Hi guys, I run a fairly simple model on a couple of sports. I'm seeking advice on what do with explainable outliers i.e key player is injured, very bad weather etc. Should I scrap the result altogether?

  2. #2
    HeeeHAWWWW
    HeeeHAWWWW's Avatar Become A Pro!
    Join Date: 06-13-08
    Posts: 5,339
    Betpoints: 207

    Rather depends on how robust your algorithm is to outliers, and whether you're talking about regression or classification.

    Also on your internal model construction - you say you can explain, but is that explanation external to the model? Or are the explanatory factors features already?

  3. #3
    gui_m_p
    gui_m_p's Avatar Become A Pro!
    Join Date: 09-18-13
    Posts: 123
    Betpoints: 1531

    Usually removing outliers serves to extract some noise of the model, so it can make better predictions in the future if you train your model without them.

    However, like HeeHAWW said, you should consider if the variables that explains the outliers are really exogenous from the model.

    E.g. if you are predicting the total points of a game and wheather is a variable you use, you cannot remove an outlier due to bad wheather.
    Nomination(s):
    This post was nominated 1 time . To view the nominated thread please click here. People who nominated: peacebyinches

  4. #4
    QuantumLeap
    Glastonbury Tor at moonlight
    QuantumLeap's Avatar SBR PRO
    Join Date: 08-22-08
    Posts: 5,591
    Betpoints: 1222

    If you can quantify the explainable outlier you can adjust your model to that amount. Not all outliers are equal. Some will downright wreck your model.

    For example, if a "key" player is out for an NBA team that may be worth 'x' amount of points. If an even more important NBA player is out then that might be worth 10 points or even more.

    I've found that the books rarely move the line enough for those player being out which allows you to fade that team.

  5. #5
    gojetsgomoxies
    gojetsgomoxies's Avatar Become A Pro!
    Join Date: 09-04-12
    Posts: 3,354
    Betpoints: 6582

    there is no easy answer to it...... you need to adjust your models, but how much and in what cases, i don't think there'd be that much agreement and it's pretty tiresome and labour-intensive

    for all the ubiquitous power ratings out there, someone should track this i.e. a central power rating source that would estimate how much you should adjust your basic model ...... sort of like some sort of "completion portfolio" in equity portfolio management i.e. you focus on the the top 100 stocks of the S&P 500 and buy/sell one ETF or customized basket for the other 400 stocks (i.e. stocks 101 through 500)

  6. #6
    gojetsgomoxies
    gojetsgomoxies's Avatar Become A Pro!
    Join Date: 09-04-12
    Posts: 3,354
    Betpoints: 6582

    it's like when someone comes up with a model that picks off crazy lines each season but only plays a few games. it's like that crazy line is just as likely to be based on some major event (injury) or even a bad line in a database.

  7. #7
    gojetsgomoxies
    gojetsgomoxies's Avatar Become A Pro!
    Join Date: 09-04-12
    Posts: 3,354
    Betpoints: 6582

    Quote Originally Posted by QuantumLeap View Post
    I've found that the books rarely move the line enough for those player being out which allows you to fade that team.
    that would be my thought too....... but there was a good poster on here that had the theory that teams play well in 1 game without their star player (everyone is energized by more minutes). it could be one-off or first game of star being out an extended period. maybe more the latter.......

    same with the opposite. "oh great, durant's back from injury". but this poster thought it takes some time to re-gel with the star coming back.

    i like all the ideas, incl. yours, and i don't necessarily see them as mutually exclusive and this poster was more focussed on first or second game of star absence/return.

  8. #8
    peacebyinches
    pull the trigger
    peacebyinches's Avatar SBR PRO
    Join Date: 02-13-10
    Posts: 850
    Betpoints: 2298

    Quote Originally Posted by gui_m_p View Post
    Usually removing outliers serves to extract some noise of the model, so it can make better predictions in the future if you train your model without them.

    However, like HeeHAWW said, you should consider if the variables that explains the outliers are really exogenous from the model.

    E.g. if you are predicting the total points of a game and wheather is a variable you use, you cannot remove an outlier due to bad wheather.
    Yes, this!

    Often times there is some very useful information in the residuals (aka noise) of your data. If you can extract the residuals from what your model is outputting you can have all sorts of fun, such as using that as a metric in determining the difference of your fitted model (aka what your model is ultimately trying to predict) and outcome. With enough noise estimates you can generalize some nifty parameters to include in the model a priori (basically quantify how influential a certain outlier circumstance is when it comes to upping your residuals). Other cool stuff like principal components analysis (PCA) come to mind but it really matters how your model is set up to start with.

  9. #9
    eaglesfan371
    The great game of POT...LIMIT...OMAHA
    eaglesfan371's Avatar SBR PRO
    Join Date: 01-07-19
    Posts: 3,686
    Betpoints: 154

    I find this forum section quite intriguing. Its like a whole new forum with completely different members. No one in the think tank comments in players talk or other sections.

    All the quants, PhD stat and math guys must live here. I must continue observing and listening.

  10. #10
    gojetsgomoxies
    gojetsgomoxies's Avatar Become A Pro!
    Join Date: 09-04-12
    Posts: 3,354
    Betpoints: 6582

    if you start making subjective adjustments (obvious reasonable ones) to your objective power ratings systems then is it still an objective system? how can you backtest something with subjective adjustments? maybe you have no interest in that, or just back-test the objective part of it.

  11. #11
    tsty
    tsty's Avatar Become A Pro!
    Join Date: 04-27-16
    Posts: 499
    Betpoints: 4291

    Quote Originally Posted by gojetsgomoxies View Post
    it's like when someone comes up with a model that picks off crazy lines each season but only plays a few games. it's like that crazy line is just as likely to be based on some major event (injury) or even a bad line in a database.
    What would be the point?

    putting in so much effort for a few bets a year lol

  12. #12
    semibluff
    refuses to update status
    semibluff's Avatar Become A Pro!
    Join Date: 04-12-16
    Posts: 716
    Betpoints: 1180

    Throwing out results that don't fit is 1 possibility. My preferred solution is to place differing emphasis on differing situations. It's not easy to formulate how much emphasis should be put on any given scenario. For example I run a moneyline Pick'em competition on NFL games. If I used the same stake unit for every game the competition would likely be won by whoever did best in picking successful longshots. Thus the stakes are weighted by how close the moneylines are together. Close lines are full stakes and +/-825 lines are at 38% with many differing lines and %s in between. NFL games with over 10 point handicaps were 22-4 for favourites in non-Thursday games. The point is to avoid the danger of ignoring unlikely events whilst avoiding being overwhelmed by them. Kim Si‑woo won the 2017 Players Championship at +25000 with Louis Oosthuizen 2nd at +8000. That probably wrecks a lot of golf betting models. There was also a +50000 player who finished in the top 4 of 1 of the majors last year. Key players being injured and very bad weather will happen. If the model excludes those results the model can't be used whenever there's a possibility of either occurring. That's also ok if you're only betting close to the scheduled start. If you bet several days in advance then it might not be very helpful.

Top