Help with sabermetrics and predictive stats...

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • metaldome
    SBR Rookie
    • 02-18-08
    • 22

    #1
    Help with sabermetrics and predictive stats...
    Every year around this time (football is over and I am anticipating the beginning of the baseball season), I try to evaluate how I handicap baseball and make improvements for the year to come.


    This year, I have been trying to learn some basic sabermetric theories, in an attempt to figure out which statistics are most predictive (do the best job of correlating with wins). I asked this question last year but either no one knew the answer or they were not willing to share. I hope this year people are more willing to help others.


    What I have found so far is that DIPS (Defensive Independent Pitching Statistic) seems to be the best stat for evaluating pitching because it is not effected by the outside effects over which a pitcher has no control. DIPS looks mainly at a pitcher's strikeouts, walks, and home runs allowed per inning. It is very counterintuitive to see that singles and doubles allowed (stats sometimes influenced by variables other than pitching) don't matter as much when attempting to predict future pitching results. ERA, on the other hand, can be greatly altered by stadium characteristics, the opposing team, luck, and defense.


    As far as offense, OPS seems to be a more indicative hitting statistic than the current default of BA (Batting Average). This is because BA counts hits, but ignores power and walks, which are also important factors of an offense. About 87% of the difference in winning percentage across teams is explained by the OPS differential. How does this compare to other stats? For batting average the differential is 74%, for slugging percentage it is 79%, and for on-base percentage it is 85%. So OPS tops them all in its ability to explain team winning percentage.


    I have some other interesting facts to share if people are interested and will try to explain these theories more if anyone need help understanding them (you can find the formulas and info on them by using a search engine), but I would also like some help with the following questions (or any informative sources/articles about these topics) from people that have been sucessful using these techniques...


    1) Are there better stats to follow?


    2)If OPS is the best offensive stat, is OPS Against (OPSA) a more predictive stat than DIPS for pitching?


    3) Can you use DIPS or OPSA successfully for a bullpen (group of pitchers rather than an individual)?


    4) What would you consider the most important component of baseball (how would you rank pitching, hitting, defense, and bullpens in terms of importance and can you back this up with stats)?


    5) How can I combine these offensive and pitching stats to come up with a projected score or winning percentage for a team that will help me cap games?


    6) How should I pick games (should I be betting any game where I feel the line presents value, even if it is slight or regardless of the line)?


    7) What is the highest amount of juice you are willing to lay on a wager?
    Last edited by metaldome; 02-24-09, 01:57 AM.
  • metaldome
    SBR Rookie
    • 02-18-08
    • 22

    #2
    Some more info you may find helpful and interesting...

    Home field advantage means much less in baseball than it does in other sports, with home teams winning only 54% of the time. Compare this to the NHL, where the home team comes out on top nearly 63% of the time, 60% of the time in the NBA, and about 58% in the NFL. (Note: This number is higher during interleague play, with around 57% of those games being won by the home team.)


    Additionally, while a simple analysis of the data that focuses only on the number of wins and losses by home teams and visiting teams supports the contention of a home-field advantage in baseball, a more sophisticated analysis indicates that a home-field advantage actually exists only in very close games. Home teams have a substantially higher probability (about 60%) of winning a game when the run differential between the winning team and losing team is one run. When the game is won by more than one run, there is virtually no difference between the probability that the home team or visiting team will win the game.


    All told, home teams win one run games about 17% of the time, while the visitors manage to win by one run just 11% of the time. The lower the total, the more likely a game will end as a one run game for the favorite. Part of this has to do with the fact that the home team bats last in the ninth inning and is further explained in the section about betting run lines.


    Another fact that many people find surprising, is that the effects of travel (regardless of distance) seem to be statistically insignificant. There is difference of about .13 runs between a team that has traveled and a team that has not, a relatively small number that does not have a significant effect on the outcome of a game.


    Any thoughts in these areas would be appreciated. The more we share ideas, the better off we are, we are not playing against each other so we should be helping each other the best we can. I have spent a considerable amount of time on research and done my best to present this info (and back it up with actual statistics) in order to help others so please share your thoughts and do the same. Thanks.
    Last edited by metaldome; 02-24-09, 01:53 AM.
    Comment
    • MonkeyF0cker
      SBR Posting Legend
      • 06-12-07
      • 12144

      #3
      You are on the right track but you have a long way to go. In order to come to a probability or run total, you need to create some sort of statistical model. Multiple linear regression models or Markov Chain Monte Carlo simulation models are generally used to model MLB. These are fairly advanced mathematically, so chances are you would need to either study or review collegiate level algebra, calculus, and statistics in order to implement and test such a model. As far as your questions about betting go, essentially you bet (staking determined by Kelly) a percentage of your bankroll determined by what your edge is in relation to price versus your determined probability. There is no limit to what you would lay in terms of a favorite. If you have edge, you play it. Keep researching. If you are dedicated to it, you'll find the answers.
      Comment
      • Dark Horse
        SBR Posting Legend
        • 12-14-05
        • 13764

        #4
        Sabermetrics is doing some interesting stuff, but are those stats (like DIPS) readily available? Because if they're not within easy reach, it's just academics. (yahoo has pitcher and batter stats, but not sure if they keep those up to date: http://sports.yahoo.com/fantasy/mlb/...erglossary-mlb)

        I may be interested in software that can download the game info from the web, do its DIPS etc calculations, and allow me to play around with the stats to find useful combinations. Sabermetrics software. Does that exist?
        Last edited by Dark Horse; 02-28-09, 03:01 PM.
        Comment
        • spongerat
          SBR MVP
          • 10-01-08
          • 2023

          #5
          how about pitchers pitching on short rest? i seem to remember that being a factor
          Comment
          • Thremp
            SBR MVP
            • 07-23-07
            • 2067

            #6
            You can calculate a DIP stats from a boxscore. Though its completely useless. Same for an isolated game linear weights.
            Comment
            • billyluke
              SBR Rookie
              • 02-02-09
              • 1

              #7
              i think this is a little better than dips.
              Comment
              • waiverwire
                SBR High Roller
                • 03-08-09
                • 125

                #8
                You can definitely get FIP (which is a dumbed down approximation of DIPS) throughout the season, and probably xFIP (which is like FIP but normalizes home run rates using the assumption that all pitchers should allow the same rate of home runs per fly ball).

                Kind of puzzled by the finding here that home field advantage in baseball in entirely in close games. Studies that have broken home field down by component statistic, showed that the areas where home teams have an edge are walks, strikeouts, and especially triples. That doesn't seem entirely consistent.
                Comment
                • Data
                  SBR MVP
                  • 11-27-07
                  • 2236

                  #9
                  Originally posted by metaldome
                  a more sophisticated analysis indicates that a home-field advantage actually exists only in very close games. Home teams have a substantially higher probability (about 60%) of winning a game when the run differential between the winning team and losing team is one run. When the game is won by more than one run, there is virtually no difference between the probability that the home team or visiting team will win the game.
                  Even more sophisticated analysis indicates that a home-field advantage actually exists only in games with many runs scored. Home teams have a substantially higher probability (about 95%) of winning a game when a home team scores 10 runs or more. When a home team scores 9 runs or less , there is virtually no difference between the probability that the home team or visiting team will win the game.






                  Comment
                  • Wrecktangle
                    SBR MVP
                    • 03-01-09
                    • 1524

                    #10
                    I guess I'm still puzzling over metaldome's point of HFA goes away unless it's a 1 run game...you have no prior knowledge that it will be 1 run except for a little higher likelyhood when two defensive teams face each other...so it's still an average HFA for all games, right? ...no, excuse me, you do have park effects that you can quantify in MLB, so that must count for something.
                    Comment
                    • curious
                      Restricted User
                      • 07-20-07
                      • 9093

                      #11
                      The most predictive offensive stat is.....

                      Bases per out and its correlation bases per run. Some people call this run production.

                      You have to use these for the starting lineup and not for the team as a whole. If you think that defensive player substitutions are likely in the latter part of the game and those players might get at bats you have to factor that in also.

                      This stat is pretty straightforward. The most important factor in scoring a run is getting a runner on base (duh). But, some teams are more efficient at scoring runners than other teams, so what happens to those base runners after they reach base determines who wins and who loses. Of course is one team is very efficient at scoring runners once they get on base, but is not very good at getting runners on base in the first place then you have to count for that.

                      By determining how many base runners there will be, and knowing how many bases = 1 run you can then predict the number of runs to be scored in a given game. The MLB average is 3.89, but teams vary all over the place in how efficient they are at run production. I think you will find a correlation between last place teams and high bases per run averages.

                      There is a more detailed version of this approach which I have not found a name for, I just call it run contribution. The idea is to determine how much a player contributes to scoring a run by either reaching base or advancing other runners or both. The formula is a bit complicated but is intuitive if you start building it from scratch yourself and thinking through what is important in producing runs.

                      The most important factor in run scoring is the situation. A situation is: "runners on which bases and how many outs", for example "runners on first and second with two outs". The likelihood in scoring a run in that situation is much lower than in the situation "runners on second and third and no outs". So, we want hitters that contribute to improving the situation by moving runners over, reaching base themselves, and not contributing to an out. I know this sounds like "duh, of course", but coming up with the correct variables for a prediction model is easier if we keep the goal in mind of "improve the situation with each at bat".

                      For any given slot in the lineup, different characteristics are optimal in terms of run production. Lead-off hitters need to have a high on base percentage. Number two hitters need to hit lots of doubles, triples, and singles where the runner can advance two bases, while not grounding into double plays. A number 4 hitter needs to hit lots of home runs, triples and doubles.

                      Run production is determined by a combination of a batter reaching base himself, and how many bases did he move other runners over. Batters are penalized greatly for hitting into double plays and being caught stealing. Sacrifices are downplayed because while you did advance a runner, you made an out, and outs are a very scarce resource that must be protected at all costs. The value of an out cannot be stressed enough.

                      I penalize players for hitting into double plays and for being caught stealing. How many games have you watched where a team seemed to be starting a rally. First hitter up - single, "runner on first with no outs". Second hitter up - ground into double play, "no runners on with two outs". Grounding into double plays is especially devastating.

                      There is no "standard" way of determining run production, nor is there an agreed upon metric. Everyone that has taken this line of research has developed their own. Here is something like how I do it:
                      Bases:
                      single = 1
                      double = 2
                      triple = 3
                      home run = 4
                      walk = 1
                      HBP = 1
                      stolen base = 1
                      caught stealing = -1
                      sacrifices and sacrifice flies = 1
                      ground into double play = -1
                      Outs:
                      Outs = At-Bats - Hits + 2*CS + Sac Hits + Sac Flies + 2*GIDP

                      run production = bases / outs

                      Bases per out is another name for run production and if you have bases per out you can then arrive at runs per base and you can predict runs for a given upcoming game.

                      Of course you have to do this for the starting pitcher as well. This metric is much better than WHIP because WHIP does not count total bases, only hits. But in run production it is total bases that matter, not number of hits.

                      I use a more complex formula that weights things depending on the slot in the lineup that a given hitter will be in. For example, I weight extra base hits more for a #4 hitter. I don't really want my #4 hitter getting a walk, I want him driving in runs.

                      Here is a web site where this idea is discussed.


                      Comment
                      • Justin7
                        SBR Hall of Famer
                        • 07-31-06
                        • 8577

                        #12
                        Originally posted by Data
                        Even more sophisticated analysis indicates that a home-field advantage actually exists only in games with many runs scored. Home teams have a substantially higher probability (about 95%) of winning a game when a home team scores 10 runs or more. When a home team scores 9 runs or less , there is virtually no difference between the probability that the home team or visiting team will win the game.
                        Any stats you care to share?
                        Comment
                        • Data
                          SBR MVP
                          • 11-27-07
                          • 2236

                          #13
                          Originally posted by Justin7
                          Any stats you care to share?
                          What do you need? Those are the basic stats from the box scores.
                          Comment
                          • Wheell
                            SBR MVP
                            • 01-11-07
                            • 1380

                            #14
                            Data... I think I understand EXACTLY what you are saying and how you came to this conclusion and I am beginning to think you have a MASSIVE blind spot when it comes to statistical analysis.

                            In fact, I think I need to find a couple of stats books I've read and get them to you ASAP.
                            Comment
                            • Data
                              SBR MVP
                              • 11-27-07
                              • 2236

                              #15
                              Wheell, you did not get the sarcasm, I guess, I am very hurt hearing this from you.
                              Comment
                              • Wheell
                                SBR MVP
                                • 01-11-07
                                • 1380

                                #16
                                My bad, I saw your comment in Justin's post but not in your original post where someone who clearly is unfamiliar with the rules of baseball was commenting on home teams and 1 run games. My bad. I apologize.
                                Comment
                                • Wheell
                                  SBR MVP
                                  • 01-11-07
                                  • 1380

                                  #17
                                  Let me state this: Home field advantage manifests itself through umpire manipulation (reflected in strike outs and walks), specific knowledge of field effects (triples being one example, but watch a Twins game to see other effects), and specifically engineered field effects (think of excessively watering down the field before a team with a couple great base stealing threats came to town). There certainly are other areas where being at home provides an advantage but unique to baseball is an effect that the game simply ends after a certain period if the home team is ahead, and continues if the home team is not ahead. That produces some statistical oddities that can lead to some unusual, and frankly flat out ludicrous, conclusions.

                                  I should note that home teams also get the another benefit of specific knowledge. In the bottom of the 9th they know exactly how many runs they need to win. The road team has this knowledge as well. it is presumed that the home team gets the better of the deal although I am not certain that this is in fact the case.
                                  Comment
                                  • Data
                                    SBR MVP
                                    • 11-27-07
                                    • 2236

                                    #18
                                    Well, Wheell, there must have been something coming from me in the past that made you jump on this like you did. I would appreciate if you had pointed that out, so I could reflect on that.
                                    Comment
                                    • reno cool
                                      SBR MVP
                                      • 07-02-08
                                      • 3567

                                      #19
                                      Originally posted by metaldome
                                      Some more info you may find helpful and interesting...

                                      Home field advantage means much less in baseball than it does in other sports, with home teams winning only 54% of the time. Compare this to the NHL, where the home team comes out on top nearly 63% of the time, 60% of the time in the NBA, and about 58% in the NFL. (Note: This number is higher during interleague play, with around 57% of those games being won by the home team.)


                                      Additionally, while a simple analysis of the data that focuses only on the number of wins and losses by home teams and visiting teams supports the contention of a home-field advantage in baseball, a more sophisticated analysis indicates that a home-field advantage actually exists only in very close games. Home teams have a substantially higher probability (about 60%) of winning a game when the run differential between the winning team and losing team is one run. When the game is won by more than one run, there is virtually no difference between the probability that the home team or visiting team will win the game.


                                      All told, home teams win one run games about 17% of the time, while the visitors manage to win by one run just 11% of the time. The lower the total, the more likely a game will end as a one run game for the favorite. Part of this has to do with the fact that the home team bats last in the ninth inning and is further explained in the section about betting run lines.


                                      Another fact that many people find surprising, is that the effects of travel (regardless of distance) seem to be statistically insignificant. There is difference of about .13 runs between a team that has traveled and a team that has not, a relatively small number that does not have a significant effect on the outcome of a game.


                                      Any thoughts in these areas would be appreciated. The more we share ideas, the better off we are, we are not playing against each other so we should be helping each other the best we can. I have spent a considerable amount of time on research and done my best to present this info (and back it up with actual statistics) in order to help others so please share your thoughts and do the same. Thanks.
                                      if this is true, regardless of the reasoning, wouldn't this imply a consideration for road teams at -1.5? assuming the price would be no different for H or R teams but just based on ml.
                                      bird bird da bird's da word
                                      Comment
                                      • chrisharvard01
                                        Restricted User
                                        • 10-24-08
                                        • 2943

                                        #20
                                        Using statistics, applying them to working formulas is a powerful weapon I hope to learn more.

                                        The books have their own formulas, the goal is to be sharper than the books.

                                        TY guys this is very informative thread.
                                        Comment
                                        • Justin7
                                          SBR Hall of Famer
                                          • 07-31-06
                                          • 8577

                                          #21
                                          Originally posted by Data
                                          What do you need? Those are the basic stats from the box scores.
                                          Regarding your assertion on home field advantages...

                                          3 groups of games to look at. For each, how often does the home team win?
                                          #1: All games.
                                          #2: All games where total runs scored is less than 9.5
                                          #3: All games where total runs scored is more than 9.5?

                                          I find it counterintuitive that #3 should be the highest as you seem to claim.
                                          Comment
                                          • Data
                                            SBR MVP
                                            • 11-27-07
                                            • 2236

                                            #22
                                            Originally posted by Justin7
                                            Regarding your assertion on home field advantages...

                                            3 groups of games to look at. For each, how often does the home team win?
                                            #1: All games.
                                            #2: All games where total runs scored is less than 9.5
                                            #3: All games where total runs scored is more than 9.5?

                                            I find it counterintuitive that #3 should be the highest as you seem to claim.
                                            No, sir, I never did claim that.
                                            Comment
                                            • Justin7
                                              SBR Hall of Famer
                                              • 07-31-06
                                              • 8577

                                              #23
                                              Originally posted by Data
                                              No, sir, I never did claim that.
                                              Originally posted by Data
                                              Even more sophisticated analysis indicates that a home-field advantage actually exists only in games with many runs scored. Home teams have a substantially higher probability (about 95%) of winning a game when a home team scores 10 runs or more. When a home team scores 9 runs or less , there is virtually no difference between the probability that the home team or visiting team will win the game.
                                              I guess I misunderstood you.

                                              If BOTH home and visitor score under 9.5 runs, what is the breakdown? By eliminating the high scoring home games, all you are doing is removing a portion of the game. I could probably show a "visitor advantage" in all games where the home team scores 5.5 runs or less, as long as I ignore the visitor's offense.

                                              Similarly, a "visitor's advantage" probably exists in all games where they score more than the mean - i.e. 5 runs or more. Or even 2 runs or more, since you're removing poor games from the database.
                                              Comment
                                              • Data
                                                SBR MVP
                                                • 11-27-07
                                                • 2236

                                                #24
                                                Justin7, hello, SBR poster Data's here. Today, I am going to try explaining the mystery behind the post #9. I was not actually trying to make an absurd point. I created an artificial and methodologically wrong breakdown as a mockery for another absurd point that I had quoted. That is why I called it sarcasm in message #15. That's it for today. You can reach me via email...
                                                Comment
                                                • reno cool
                                                  SBR MVP
                                                  • 07-02-08
                                                  • 3567

                                                  #25
                                                  Originally posted by Wheell
                                                  Let me state this: Home field advantage manifests itself through umpire manipulation (reflected in strike outs and walks), specific knowledge of field effects (triples being one example, but watch a Twins game to see other effects), and specifically engineered field effects (think of excessively watering down the field before a team with a couple great base stealing threats came to town). There certainly are other areas where being at home provides an advantage but unique to baseball is an effect that the game simply ends after a certain period if the home team is ahead, and continues if the home team is not ahead. That produces some statistical oddities that can lead to some unusual, and frankly flat out ludicrous, conclusions.

                                                  I should note that home teams also get the another benefit of specific knowledge. In the bottom of the 9th they know exactly how many runs they need to win. The road team has this knowledge as well. it is presumed that the home team gets the better of the deal although I am not certain that this is in fact the case.
                                                  I did look into the idea of parlaying home teams with the under and road teams with the over at one point. I found that the overs are generally overbet,(or not a good bet). If I remember correctly the under parlayed with a home team showed promise, especially when the line was high.
                                                  bird bird da bird's da word
                                                  Comment
                                                  • Dark Horse
                                                    SBR Posting Legend
                                                    • 12-14-05
                                                    • 13764

                                                    #26
                                                    Originally posted by Data
                                                    I was not actually trying to make an absurd point. I created an artificial and methodologically wrong breakdown as a mockery for another absurd point that I had quoted. That is why I called it sarcasm in message #15.
                                                    Comment
                                                    • Peep
                                                      SBR MVP
                                                      • 06-23-08
                                                      • 2295

                                                      #27
                                                      When the game is won by more than one run, there is virtually no difference between the probability that the home team or visiting team will win the game.
                                                      Out of some 28,000 games, I have 10,628 Home teams winning by more than one run and 10,264 road teams winning by more than one run. So just counting games won or lost by more than one run, the Home team wins 50.8% of these.
                                                      Comment
                                                      • Wrecktangle
                                                        SBR MVP
                                                        • 03-01-09
                                                        • 1524

                                                        #28
                                                        Folks, a few points. Firstly, I do football and baskets only so I'm more than a little thick when it come to bases. But I do find the discussion of where MLB HFA comes from a little like the explanations of quantum mechanics: to wit; perhaps the observers, by simply observing, are bending the results...BUT that said, I find this discussion fascinating and I applaud the points made in this thread.

                                                        Second, I really liked curious' avatar (not to mention his nicely laid out post), and I hope he reinstates it.
                                                        Comment
                                                        SBR Contests
                                                        Collapse
                                                        Top-Rated US Sportsbooks
                                                        Collapse
                                                        Working...