1. #1
    Tackleberry
    Tackleberry's Avatar Become A Pro!
    Join Date: 12-01-10
    Posts: 441

    Best distribution model for pricing mlb team totals?

    I use the market prices for the moneyline and game total to derive the expected number of runs scored by each time. I am now stuck on what distribution model would work best for pricing team totals.

  2. #2
    skrtelfan
    skrtelfan's Avatar Become A Pro!
    Join Date: 10-09-08
    Posts: 1,913
    Betpoints: 3337

    I'm not sure if any distribution model would work well aside from a simulation because of the issue that the home team won't always bat in the bottom of the 9th (or extra innings.) If you have a game with a fair line of away +102 home -102, the away team total should actually be a bit higher because the away team will bat in the 9th inning 100% of the time while the home team will only bat in the bottom of the 9th a bit more than 50% 0f the time.

  3. #3
    Tackleberry
    Tackleberry's Avatar Become A Pro!
    Join Date: 12-01-10
    Posts: 441

    Quote Originally Posted by skrtelfan View Post
    I'm not sure if any distribution model would work well aside from a simulation because of the issue that the home team won't always bat in the bottom of the 9th (or extra innings.) If you have a game with a fair line of away +102 home -102, the away team total should actually be a bit higher because the away team will bat in the 9th inning 100% of the time while the home team will only bat in the bottom of the 9th a bit more than 50% 0f the time.
    This is something I account for. So because of this, the situation does not not neatly fit into the criteria for being able to quickly plug the numbers into a distribution model and get an accurate result?

  4. #4
    uva3021
    uva3021's Avatar Become A Pro!
    Join Date: 03-01-07
    Posts: 537
    Betpoints: 381

    runs/inning, then work from there

  5. #5
    skrtelfan
    skrtelfan's Avatar Become A Pro!
    Join Date: 10-09-08
    Posts: 1,913
    Betpoints: 3337

    Quote Originally Posted by Tackleberry View Post
    This is something I account for. So because of this, the situation does not not neatly fit into the criteria for being able to quickly plug the numbers into a distribution model and get an accurate result?
    I can't be certain but intuitively I think it's a bit more difficult than something you can quickly plug in. I handwaved the other issue by simply saying "or extra innings" and you probably understand it, but in case it isn't clear to anyone else, the other issue is that the away team's scoring from the 9th inning on will be reasonably similar to scoring in earlier innings, excepting that they may try for a strategy to specifically score one run, like sac bunting (but a lot of teams will sac bunt in earlier innings so that probably isn't a huge factor). But the home team will usually only score 1 run in any individual inning from the bottom of the 9th onward in which the game is tied going into that inning. I'm not really sure what kind of distribution, aside from a simulation, can model a scenario where the away team can score at will whereas the home team will stop scoring once they lead by 1 run from the 9th inning onward, home runs with runners on base aside, aside from a simulation.

    I suppose you could ignore the 9th inning onward, use a distribution method to determine expected runs in the first 8 innings and then account for the 9th inning and extra innings separately. I could also be wrong and it might be easier than I realize, but I was dealing with a similar issue recently and was inclined to believe that a simulation is the easiest way. I don't yet have the ability to perform such a simulation, I was thinking more theoretically.

  6. #6
    Tackleberry
    Tackleberry's Avatar Become A Pro!
    Join Date: 12-01-10
    Posts: 441

    I'm still stumped at the moment.
    In order to run a basic simulation I would need to be able to get the probabilities of there being 0,1,2,3 etc runs scored an inning and not sure how to go about doing that. For simplicity sake lets just talk about innings 1 to 8. In the past I've been able to use either the poisson or binomial distributions to come up with probabilities once I get the expected mean, for obvious reasons neither of those apply here.

  7. #7
    uva3021
    uva3021's Avatar Become A Pro!
    Join Date: 03-01-07
    Posts: 537
    Betpoints: 381

    sounds like a job for retrosheet, you can go inning by inning compared to final score and calculate the ratio of runs scored per inning to final game score

  8. #8
    buby74
    buby74's Avatar Become A Pro!
    Join Date: 06-08-10
    Posts: 92
    Betpoints: 21207

    You need the tango distribution. This was discovered by tangotiger see the very bottom of his website if you haven’t heard of him.
    I use a version of this it in my power rating system. www.pointshare.webs.com see the documents section for more info.

    Basically innings where runs are scored are in a geometric series so for every inning where one run is scored there are r innings where 2 runs are scored and r squared innings where 3 runs are scored etc etc with r around 0.45.
    The relationship breaks down for scoreless innings so I use the formula s=.43*rpi^.662 (vbased on mlb team season data) where s is the proportion of innings where a run is scored and rpi is a team’s runs per inning value.
    then call the average runs per scoring inning T= rpi/s (because s*T= rpi by definition)
    the ratio between scoring innings is r = 1-1/T (a property of a geometric series)
    the proportion of innings where
    0 runs are scored is 1-s
    1 run innings = s/t
    2 run innings = 1 run innings * r
    3 run innings = 2 run innings * r
    etc etc

    so for 0.5 rpi
    the distribution is
    0 73%
    1 15%
    2 7%
    3 3%
    4 1.4%

    Then you need set up an excel model working out the distibution of total scores initially for 8.5 innings then add a 9th for games the home team doesnt’t win. I apply the walk off rule for the bottom of the 9th so that the home team only wins by 1 run (I ignore the multi run homer exception).

    I also model extra innings as a 1 inning game.

    Put these all together and that gives you the probability that each score will occur given two teams of known runs per inning value.

    Then you use solver to set the rpi values such that the winning percentage matches the money line for a given game and the median total runs scored match the over-under.

    Once I have the game calibrated I look to see if the runline is out of whack but you could use it to get the distribution of runs scored by the home or road team.

    One word of warning a method like this should produce results that match the run line or in your case the team scoring over/under if it often doesn’t or always favours the home team or the underdog then the possibilities are
    a) There is a systematic error in the prices
    b) You have made a typo in the spreadsheet
    c) The assumptions in the spreadsheet are incorrect
    Unfortunately it is unlikely to be option a

    Hope this helps
    Points Awarded:

    Tackleberry gave buby74 100 SBR Point(s) for this post.


  9. #9
    Tackleberry
    Tackleberry's Avatar Become A Pro!
    Join Date: 12-01-10
    Posts: 441

    Very interesting and helpful. Look forward to giving this a go.

    Thanks buby

  10. #10
    A's Fan
    A's Fan's Avatar Become A Pro!
    Join Date: 07-26-10
    Posts: 513
    Betpoints: 72

    just a question regarding these, say for example you bet on the white sox team total over 4 on pinnacle last night which was rained out in the 7th, would that bet count as a loss since it was an official win by the yankees?

  11. #11
    skrtelfan
    skrtelfan's Avatar Become A Pro!
    Join Date: 10-09-08
    Posts: 1,913
    Betpoints: 3337

    Every book I've ever used requires a game go 9 innings, or 8.5 if the home team is ahead, for team totals to have action, just like regular totals have to go that long for action.

Top