1. #1
    gamblingisfun
    I'm a 'handicapper'...
    gamblingisfun's Avatar Become A Pro!
    Join Date: 08-14-10
    Posts: 401
    Betpoints: 8632

    To those with MLB models....

    Models have different variables in them, I'm wondering if what variables work for one year work well for next year. I personally use different percentage weightings for all my variables. For those that do the same, how do they differ from year to year if you tweak them for each year? Example I use variables X Y Z, at 33.33% each. Now pretend it worked well last season, will a new season maybe have to be changed to 40-20-40% to work best? How dramatic a change should I anticipate year over year?

  2. #2
    Waterstpub87
    Slan go foill
    Waterstpub87's Avatar Become A Pro!
    Join Date: 09-09-09
    Posts: 4,044
    Betpoints: 7292

    If you have an actual accurate model, it should be the same year to year. Unless the fundamental rules of baseball change, your variables shouldn't change. If you find that you have to shift weightings year to year, I don't think you are accurately representing the underlying win probabilities of teams.

  3. #3
    gamblingisfun
    I'm a 'handicapper'...
    gamblingisfun's Avatar Become A Pro!
    Join Date: 08-14-10
    Posts: 401
    Betpoints: 8632

    I've only used my model for one year last season, this is year two. I'm just hoping the weightings don't change.

  4. #4
    Inspirited
    Inspirited's Avatar SBR PRO
    Join Date: 06-26-10
    Posts: 1,783
    Betpoints: 17864

    how did you come up with your weightings?

  5. #5
    mebaran
    Con los terroristas
    mebaran's Avatar Become A Pro!
    Join Date: 09-16-09
    Posts: 1,540
    Betpoints: 330

    Things change every year. Organizations move in their outfield fences, add seats to subtract foul territory, etc. Macro events happen like crackdowns in steroid use that COMPLETELY change the game. It's no coincidence that league hitting stats have gone down consistently the last 4 or 5 years..

    Your model has to change from season to season, and mid-season as well.

  6. #6
    matthew919
    Update your status
    matthew919's Avatar Become A Pro!
    Join Date: 11-21-12
    Posts: 421
    Betpoints: 5869

    I'm going to side with Waters on this one. A game-changing effect on park factor is rare- my feeling is that if you suspect something fishy is going on at a park, where the current year deviates drastically from your 3 or 5 year park factor, you should investigate it manually, and update your model accordingly (that is, if you can convince yourself it's real). But I claim these cases will be exceedingly rare.

    As far as the steroids argument? That's meaningless. A guy who hits 50 HR a year on steroids should perform just as well as a guy who hits 50 HR a year NOT on steroids. A good model will not need to infer what chemical cocktails might be present in a hitters bloodstream, any more than it should care what the hitter ate for breakfast. The one caveat is when a guy who was ON roids goes OFF (or vice versa, I suppose)- and therefore his historical stats are inflated (or deflated) relative to the current expectation. Of course, these cases are impossible to identify computationally, so I would just use with stats no more than a year old when modeling. Which is advisable anyway, based on career trajectory and whatnot.

    Bottom line- more data should have the effect of refining your model, not completely changing it.
    Points Awarded:

    Brebos gave matthew919 2 SBR Point(s) for this post.


  7. #7
    mebaran
    Con los terroristas
    mebaran's Avatar Become A Pro!
    Join Date: 09-16-09
    Posts: 1,540
    Betpoints: 330

    Quote Originally Posted by matthew919 View Post
    I'm going to side with Waters on this one. A game-changing effect on park factor is rare- my feeling is that if you suspect something fishy is going on at a park, where the current year deviates drastically from your 3 or 5 year park factor, you should investigate it manually, and update your model accordingly (that is, if you can convince yourself it's real). But I claim these cases will be exceedingly rare.

    As far as the steroids argument? That's meaningless. A guy who hits 50 HR a year on steroids should perform just as well as a guy who hits 50 HR a year NOT on steroids. A good model will not need to infer what chemical cocktails might be present in a hitters bloodstream, any more than it should care what the hitter ate for breakfast. The one caveat is when a guy who was ON roids goes OFF (or vice versa, I suppose)- and therefore his historical stats are inflated (or deflated) relative to the current expectation. Of course, these cases are impossible to identify computationally, so I would just use with stats no more than a year old when modeling. Which is advisable anyway, based on career trajectory and whatnot.

    Bottom line- more data should have the effect of refining your model, not completely changing it.
    By adding 2013 data into our models, aren't we carrying a moving weighting? Runs / Game have gone down measurably in the past 3-5 years, so the game, albeit not fundamentally, is changed since then, no?

  8. #8
    matthew919
    Update your status
    matthew919's Avatar Become A Pro!
    Join Date: 11-21-12
    Posts: 421
    Betpoints: 5869

    I think there's a big distinction between player performance and the dynamics of the game. The first will fluctuate, the second will not. So in that sense, no, I don't believe the game has changed at all in the past 10 years.

    But that's just my approach, which is not to say you can't build a successful model which operates under a completely different belief.

  9. #9
    EXhoosier10
    EXhoosier10's Avatar Become A Pro!
    Join Date: 07-06-09
    Posts: 3,122
    Betpoints: 4390

    If one were to model games, different run scoring environments should absolutely change your model. If Mcgwire, Sosa, and the like can hit a homerun off of anybody on PEds, hitters are going to play a much larger role in your model than if they stop and all of the sudden suck against good pitchers but can still go deep on bad ones.

    In reply to this
    As far as the steroids argument? That's meaningless. A guy who hits 50 HR a year on steroids should perform just as well as a guy who hits 50 HR a year NOT on steroids.
    , 50 HR guys on steroids don't really exist as 50HR guys not on steroids. If you don't buy 50 hr being the cutoff, 60 and 70 hr guys surely would suffice.

    Outside of steroids, park factors change/become more reliable every year, so having more data should change the % you weight PF in newer parks. Pitchers being able to dominate hitters more and more (for whatever reason -- specialized bullpens, stricter pitch counts, etc) should at least change your model by small amounts every year.

  10. #10
    mycon
    mycon's Avatar Become A Pro!
    Join Date: 04-13-11
    Posts: 29
    Betpoints: 348

    If you have to change your weighting all the time, you are probably backfitting to a degree.

  11. #11
    Waterstpub87
    Slan go foill
    Waterstpub87's Avatar Become A Pro!
    Join Date: 09-09-09
    Posts: 4,044
    Betpoints: 7292

    Quote Originally Posted by EXhoosier10 View Post
    If one were to model games, different run scoring environments should absolutely change your model. If Mcgwire, Sosa, and the like can hit a homerun off of anybody on PEds, hitters are going to play a much larger role in your model than if they stop and all of the sudden suck against good pitchers but can still go deep on bad ones.

    In reply to this
    , 50 HR guys on steroids don't really exist as 50HR guys not on steroids. If you don't buy 50 hr being the cutoff, 60 and 70 hr guys surely would suffice.

    Outside of steroids, park factors change/become more reliable every year, so having more data should change the % you weight PF in newer parks. Pitchers being able to dominate hitters more and more (for whatever reason -- specialized bullpens, stricter pitch counts, etc) should at least change your model by small amounts every year.
    I certainly understand the comments that people have made about changing environments, but perhaps we look at modeling in different ways.

    When I think modeling, I think "What is the best formula that gives me the closest implied probabilities to actual game results". One formula that will allow me to be able to forecast prices which I can then compare to the marketplace as a whole. Certainly hitting is a variable in there, that I would use a combination of statistics to get to. So if this hitting component increase, it will change my pricing and therefore have an effect on the output.

    So it is not as much as "Hitting is more important now" but more "Hitting has the effect of changing price by x times .30". So if the hitting stat increase, it will effect price by a larger degree, but not the formula if you will.

    This is just my approach, but I certainly enjoy the discussion and learning different perspectives. I'm somewhat handicapped by the fact that I never played baseball, and I sometimes don't really understand the flow the game. I have a tendency to just think of things as strings of number.

  12. #12
    mebaran
    Con los terroristas
    mebaran's Avatar Become A Pro!
    Join Date: 09-16-09
    Posts: 1,540
    Betpoints: 330

    ^There we go. Yeah, run modelling should fundamentally remain the same. Take Pythagorean, for example, and tweak exponents as necessary.

  13. #13
    gamblingisfun
    I'm a 'handicapper'...
    gamblingisfun's Avatar Become A Pro!
    Join Date: 08-14-10
    Posts: 401
    Betpoints: 8632

    I personally have like 16 variables in my model. I just created it last year, so there was almost daily tweaking in April and the first part of may until I found the correct variable weightings to make it all work the best possible. I have everything linked together on my model spreadsheet so if I change one variable it could change who I bet on, how much I bet, basically could change a win to a loss. Since I had it all linked together, I just ran it daily and collected my data and changed my variables to fit best to come to the best units won/winning percentage. I stopped messing with my variables in early may because I think I had came up with the right ones and it worked the rest of the season. So basically I'm hoping that I don't have to do much tweaking this year to it in the beginning. I could theoretically base my model 100% on how a reliever did yesterday or how a hitter did last month only lol, and I'd go with it if that's what gave me the best results. But what I came up with utilizes all my variables to some degree, so it makes sense in an actual game context.

  14. #14
    Brebos
    Brebos's Avatar Become A Pro!
    Join Date: 02-24-13
    Posts: 1,209
    Betpoints: 271

    Quote Originally Posted by matthew919 View Post
    I'm going to side with Waters on this one. A game-changing effect on park factor is rare- my feeling is that if you suspect something fishy is going on at a park, where the current year deviates drastically from your 3 or 5 year park factor, you should investigate it manually, and update your model accordingly (that is, if you can convince yourself it's real). But I claim these cases will be exceedingly rare.

    As far as the steroids argument? That's meaningless. A guy who hits 50 HR a year on steroids should perform just as well as a guy who hits 50 HR a year NOT on steroids. A good model will not need to infer what chemical cocktails might be present in a hitters bloodstream, any more than it should care what the hitter ate for breakfast. The one caveat is when a guy who was ON roids goes OFF (or vice versa, I suppose)- and therefore his historical stats are inflated (or deflated) relative to the current expectation. Of course, these cases are impossible to identify computationally, so I would just use with stats no more than a year old when modeling. Which is advisable anyway, based on career trajectory and whatnot.

    Bottom line- more data should have the effect of refining your model, not completely changing it.
    Lol all mlb players are on steroids, you definitely don't have to factor that in. Anyone who says otherwise is living in a fantasy world.

Top