Baseball Formula Question

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • FormulaMan
    SBR Rookie
    • 03-03-09
    • 1

    #1
    Baseball Formula Question
    Hey guys,
    First I apologize for the length of the post but i am stumped and need your help.
    I have developed a very complicated formula for bases this year that basically gives me the amount of runs a team should give up per game and the amount that they should get.

    However here is where i am stuck.

    Lets say that my formula is telling me that the starting pitcher for the Yankees on an average night will give up a run per 2.1 innings pitched.
    Lets also say that on average their bullpen will give up a run per 2.0 innings pitched.
    Also, on an average night the Starting Pitcher will pitch 7.0 innings.
    Since the Yankees are the home team, they will pitch 9 innings.
    Starting Pitcher = 7 innings / run per 2.1 innings = 3 Runs
    Bullpen = 2 innings / run per 2.0 innings = 1 run
    So therefore the Yankees should give up on average 4 runs per game.

    Thats the first half.
    The second is the batting of the Red Sox.
    Lets say that my formula gives me that the Sox should score an average of 6 runs per game...
    Now any person who can count would tell you that on average that the Sox should score 5 runs in this game (4 runs from Yankees pitching + 6 runs from Sox offense / 2).
    BUT... Do you think that more of the percentage should be given to the pitching or the offense?
    In my opinion, it should be about 66% pitching to 34% offense. Or basically 2 to one on the pitching.
    In this case the Sox should score on average 4.67 runs.
    (4 runsx2 +6 runsx1 = 14runs / 3 = 4.67)

    My question to you, what percentage should the pitching be given compared to the offense?

    Also, how much factor would you put into homefield?

    Thanks guys
  • Willie Bee
    SBR Posting Legend
    • 02-14-06
    • 15726

    #2
    Up to each person how much they want to weight either pitching or hitting in any formula. As for home field, you can crunch those numbers and see team averages home and away in various parks.

    Should also not look at the team average so much as the lineup average on a given night. Boston might average 5.34 runs per game as a team, but is Youkilis or Pedroia or Papi out of the lineup on a specific night? Same thing with pitching, is a particular reliever not going to pitch tonight because he's pitched the last three nights?
    Comment
    • EBone
      SBR MVP
      • 08-10-05
      • 1787

      #3
      A good question, FormulaMan. I too have gone back and forth on questions like this.

      I think the home field question is easier. To factor for home field, I have started to use an inverse type of adding for home field. Team A is at home with a home winning percentage of .600; Team B is on the road with a road winning percentage of .400...take .600/.400 =1.5 runs; that is how I'm starting to account for home field. Sometimes, it could be the other way around with a better road team playing an awful home team then the advantage would be with the road team. This is, at least, a good starting point. I usually don't use the visiting-home field data until after the 20th game of the season.

      I don't think there is an easy answer to your 1st question. I respectfully disagree with Willie Bee on his assessment of taking the lineup average for projecting runs scored on a given night. Many managers are playing matchups and career numbers versus a given team on a given night so I don't necessarily buy that theory. Even on get away days for road teams, the guys that are inserted are generally in there to prove whether they are competent backups or not. They are usually highly motivated to perform at a level comparable (maybe not the same) as the guy they are replacing. The other part to this is if you have a fragile mentality of a pitcher out there with a "get away" day lineup. Sometimes, these starters are not into it because they are being placed with the scrubs. With not much data to back me at this time, I find that "OVER's" are pretty good plays in these situations. But, to answer your question, I think the answer may be in the projected number of innings the starter goes. If he's a consistent 6, then maybe it should be 2:1. If he's Cole Hamels, maybe it should be higher.

      Now the bullpen argument I think is a good one. Normally, knowing who is and who isn't available on a given night to pitch out of the bullpen is a pretty valuable piece of information especially if you have a good idea of how many runs and innings the starter is going to go.


      E
      Comment
      • Willie Bee
        SBR Posting Legend
        • 02-14-06
        • 15726

        #4
        Goes to show there are so many ways to look at the issue, EBone. But I see no reason using a team scoring average if you know the big stick on that team is going to have a game off. If the Dodgers average 5.0 runs with Manny in the lineup, and 4.7 runs when he's out, why factor in the 5.0 to your capping for that game? Obviously when a huge bat in the order is out, that has to come into play for the game, right?
        Comment
        • EBone
          SBR MVP
          • 08-10-05
          • 1787

          #5
          Obviously, that would be the case as you have put it above and I don't argue that point at all. Numbers are numbers. We have to start somewhere and black-and-white concrete numbers are fantastic starting points.


          I guess, for me, it depends upon the situation. So many of us are trying to think of a way to apply a systematic cookie-cutter approach to capping these games and, in a case like Papi being out for the Sox or Manny being out for the Dodgers, I just don't think it is that simple. Believe me, I want it to be that simple but I don't think it is. In your example, if the Dodgers are averaging 5 rpg with Manny and 4.7 rpg without Manny, I would default to the 5 but look at the situation. If Manny has recently gone on the DL, I'd probably use the 5 for a few games until the newness of the replacement starter wore thin. Then, I'd move to the 4.7. If this is a spot start for someone, I think I would use the 5 unless there is a reason to think that the opposing bullpen is fresh or the starting pitcher for that day is improving to the mean after a couple of bad starts or some other angle. If situationally, the opposition looks likely to have a good day then I use the 4.7 for the Dodgers rpg.

          My thought on the OVER's on get-away days are an overcompensation in the total adjusting for big sticks not being in the lineup. It always seems to me that there are good opportunities moneyline and total on these afternoon get-away games during the week especially.

          My point in my original reply was that the manager's moves throughout the season really dictate the team rpg and the runs given up per game over a long run of games. It is all part of it but, yes, the lineup average would have to be considered a factor when the big boys are out of the lineup depending upon the type of situation from my standpoint.

          It may be that I'm entirely overthinking it. I have a tendency to do that but I guess that's gambling.



          E
          Comment
          • MonkeyF0cker
            SBR Posting Legend
            • 06-12-07
            • 12144

            #6
            I agree with Willie. You will get better predictive results by handicapping with the players in the lineup and the pitchers. One thing the OP could look into is the splits as well (how well the players in the lineup hit LHP, RHP, etc.) and historically, how much of an effect that pitcher has had on the average expected run totals of the teams that he has faced throughout the season. This would make your formula a bit more complex, however.
            Comment
            • evo34
              SBR MVP
              • 11-09-08
              • 1032

              #7
              Pitching and hitting should be weighted equally, but if you are using historical avg. runs/game, you are in trouble. You need to find out what predicts future run scoring for both hitting and pitching, rather than what has occurred thus far. And the lineup plays a big role in that, as Willie points out.

              FWIW, I totally disagree with EBone's methodology. Assuming that some teams are great at home or terrible at home based on season-to-date averages is just buying into noise. (About 6-10 weeks into the season especially is when you can make money off of people who are overweighting STD data (raw runs scored/allowed and wins/losses)). You need to use a standardized HFA tweaked mildly with whatever team-specific adjustment you think is best (usually based on several seasons of past data).

              Similarly, adjusting for a lineup change by looking at avg. rpg with player in vs. out is far too simplistic. The samples will always be too small to be accurate. You need to use predictive stats to figure out how many runs that player is worth per game, vs. his sub.
              Comment
              • Justin7
                SBR Hall of Famer
                • 07-31-06
                • 8577

                #8
                I'd look closely at your assumptions.

                If this Boston lineup averages 6 runs a game, and this NYY Pitcher with this bullpen allows 4, taking the average might not be the best approach.

                What does a typical AL team score/allow? If it is 4 runs per game, NYY is average, and I'd expect Boston to score 6. If it is 6, NYY is the "dominant factor". and I'd expect Boston to score 4 runs.

                Regardless of what weighting you use, I'd suggest you normalize. Turn each team into a "multiplier" - i.e. 6 runs vs 4 run average would be 1.5. To estimate runs, multiply these two multipliers (in the average 4 example - this would be 1.5 and 1), and multiply by the conference average.

                If you want to weight pitching and hitting differently, you can still use this formula. Let's say you decide that hitting is twice as important as pitching. When normalize, you use exponents that add up to 2, and have the ratio you want. In this example, you would use

                (Pitching multiplier) ^ 0.667 * (Hitting multiplier) ^ 1.333 * (Conference average expected runs/game).

                Good questions, and good luck.
                Comment
                • waiverwire
                  SBR High Roller
                  • 03-08-09
                  • 125

                  #9
                  I'm not the first one to say this, but definitely don't assume that extreme home field advantage is something that will persist for a team. I would guess that a portion of it is consistent (some teams are built for their home park), but that in general you'd do better assuming that all teams benefit the same amount from playing at home.

                  Definitely better to use offensive projections based on the lineup than the team. Ideally those offensive projections would also factor in lefty/righty matchups. But like the home/away factors, you're generally going to do better assuming that all hitters have the same platoon advantage/disadvantage than using their historical splits. The same can't be said for pitchers, where the degree of platoon advantage does seem to vary widely (and consistently). There's lots of good material on this on the internet, and a good chapter on it in 'The Book' (by Lichtman, Tango, and some other guy).
                  Comment
                  • Justin7
                    SBR Hall of Famer
                    • 07-31-06
                    • 8577

                    #10
                    Originally posted by waiverwire
                    There's lots of good material on this on the internet, and a good chapter on it in 'The Book' (by Lichtman, Tango, and some other guy).
                    "The Book" is amazing. It's worth the price juts on Pontoon splits. That, and info on live betting.
                    Comment
                    • waiverwire
                      SBR High Roller
                      • 03-08-09
                      • 125

                      #11
                      Originally posted by Justin7
                      "The Book" is amazing. It's worth the price juts on Pontoon splits. That, and info on live betting.
                      The one part of it that I want to see MUCH more research on and a more thorough explanation of their methodology on the work they did is the 'hitter type' vs. 'pitcher type' research. It's counterintuitive to me that a 'fly ball hitter' (which I would have assumed most home run hitters are) does better against a ground ball pitcher. The model I use for daily fantasy baseball ratings actually assumes the opposite, but my historical data aren't clean enough for me to check whether it would have done even better without that assumption.
                      Comment
                      • Data
                        SBR MVP
                        • 11-27-07
                        • 2236

                        #12
                        Originally posted by waiverwire
                        you're generally going to do better assuming that all hitters have the same platoon advantage/disadvantage than using their historical splits.
                        This sounds counter-intuitive. Do you have any reference for this?
                        Comment
                        • Dark Horse
                          SBR Posting Legend
                          • 12-14-05
                          • 13764

                          #13
                          Does anybody know the league average for innings pitched per game by starting pitchers?
                          Comment
                          • waiverwire
                            SBR High Roller
                            • 03-08-09
                            • 125

                            #14
                            Originally posted by Data
                            This sounds counter-intuitive. Do you have any reference for this?
                            'The Book' by Lichtman, Tango, and other dude (Dolphin?) covers it well. There are also numerous articles online discussing platoon differentials and whether there is an innate 'skill' to it. The studies have all found that for righty hitters there doesn't seem to be. For lefties there does...but not that much. For pitchers, there definitely IS a repeatable skill. I think anybody who tries to handicap baseball without reading 'The Book' (or online material covering the same topics) is going to get eaten alive.
                            Comment
                            • Data
                              SBR MVP
                              • 11-27-07
                              • 2236

                              #15
                              I have read a lot from their website but must have missed this somehow. Thanks.
                              Comment
                              • curious
                                Restricted User
                                • 07-20-07
                                • 9093

                                #16
                                Originally posted by FormulaMan
                                Hey guys,
                                Thats the first half.
                                The second is the batting of the Red Sox.
                                Lets say that my formula gives me that the Sox should score an average of 6 runs per game...
                                First, team runs per game is a useless stat.

                                Second, predicting runs scored requires more variables than you are trying to use.

                                Third, you cannot just look at pitching and hitting you also have to look at the defense and, fourth, the characteristics of the park.

                                Each park has idiosyncrasies that allow players who know the idiosyncrasies to take advantage and do things like hit triples which would be singles in other parks, or get an infield hit which would be an easy out in another park, or get a ball through the infield which would be an easy out somewhere else.

                                When I say look at the defense I mean you have to know the range of each defensive player and know how many balls hit to their part of the park will be outs vs hits. I'm not even concerned about errors here. Let's say you have Brooks Robinson playing 3rd base and you have a pitcher that gives up lots of line drives down the third base line, or you have 3 hitters in the opposing lineup who like to hit line drives down the 3rd base line. These will all be outs if Brooks is playing 3rd. Replace Brooks with Miguel Cabrera at 3rd against these same line drive hitters and it is a totally different story. Those 5-6 sure outs per game are now 5-6 extra base hits.

                                For offense you can't just look at the raw averages of the hitters in the lineup. You have to use lineup analysis and look at the characteristics of each hitter in his slot in the lineup. Each slot in the lineup is optimal given a different set of characteristics.

                                What you want to know about the offense is bases per out and bases per run. The leadoff hitter needs to get on base and then steal a base without being caught stealing. Better if he gets on with a double or a triple than a single or a walk but we'll take the single or walk if he steals second.

                                Predicting runs now becomes a weighted formula where you include:
                                Pitcher's bases per out and bases per run
                                Defense bases allowed (over or under expectation)
                                Offense bases per out
                                Offense bases per run

                                This isn't as difficult as my terrible explanation has made it sound.

                                I'll give an example of why you have to look at these factors more closely. Let's say you have a lead off hitter who has a high OBP but gets lots of walks, meaning his batting average is mediocre, but that is okay because we just want him to get on base. In a typical lineup vs a typical pitcher he will probably contribute to runs being scored. Now put that hitter up against Bill Fischer. That hitter isn't getting any walks. Actually NO ONE on the team is getting any walks. So, you can throw those high OBP % out the window. The manager better have changed his lineup to include the best batting averages on the team and forget OBP.

                                Anyway, i am done rambling.

                                The truth is out there.
                                Comment
                                • Willie Bee
                                  SBR Posting Legend
                                  • 02-14-06
                                  • 15726

                                  #17
                                  Originally posted by curious
                                  Third, you cannot just look at pitching and hitting you also have to look at the defense and, fourth, the characteristics of the park. Each park has idiosyncrasies that allow players who know the idiosyncrasies to take advantage and do things like hit triples which would be singles in other parks...
                                  Not sure I understand the "triples that would be singles in other parks," curious. Got the bit about the different layout of some parks where balls can take funny bounces off a wall or come out of a corner crazy. But I'm guessing that 90% of the time, you'd have to be one slow-ass sum'bitch not to get two when a ball goes to the wall. Cetainly the left field walls in Boston and Houston to name two, a carom off the wall smartly played can result in an outfielder holding a runner to one base instead of two (or three).

                                  But what parks can you really turn a triple into a single, and vice-versa, assuming the outfielder doesn't just muck the works up?
                                  Comment
                                  • Willie Bee
                                    SBR Posting Legend
                                    • 02-14-06
                                    • 15726

                                    #18
                                    formulawiz, I'm going to need a little time to read through your e-mail attachments before I get back to you.
                                    Comment
                                    • smitch124
                                      SBR Posting Legend
                                      • 05-19-08
                                      • 12566

                                      #19
                                      well early on at Pac-Bell/AT & T triples were turned into singles by playing way off the right field line, took some teams a long time to catch on....
                                      Comment
                                      • Data
                                        SBR MVP
                                        • 11-27-07
                                        • 2236

                                        #20
                                        Originally posted by Willie Bee
                                        formulawiz, I'm going to need a little time to read through your e-mail attachments before I get back to you.
                                        Willie Bee, this thread is started by FormulaMan. formulawiz is another SBR poster. You seem to be mixing them up.
                                        Comment
                                        • pavyracer
                                          SBR Aristocracy
                                          • 04-12-07
                                          • 82673

                                          #21
                                          Does your formula take into account for mental and physical toughness of pitchers?
                                          Does your formula take into account pitcher injuries that are not revealed to the public?
                                          Does your formula take into account bad bullpen sessions by pitchers the day before that are not revealed to the public?
                                          Does your formula take into account opposing team stealing the signs of catcher/pitcher or managers with the use of instant replay or other cheating devices?
                                          Does your formula take into account weather conditions or field conditions (Wrigley Field wind direction or Turned Field grass not cut on Lowe's starts for example)?

                                          If not then my answer is it will not work in the long run based purely on stats.
                                          Comment
                                          • Willie Bee
                                            SBR Posting Legend
                                            • 02-14-06
                                            • 15726

                                            #22
                                            Originally posted by Data
                                            Willie Bee, this thread is started by FormulaMan. formulawiz is another SBR poster. You seem to be mixing them up.
                                            Doh! Sorry about that, even though I will get back to formulawiz at some point.
                                            Comment
                                            • Willie Bee
                                              SBR Posting Legend
                                              • 02-14-06
                                              • 15726

                                              #23
                                              Originally posted by pavyracer
                                              Does your formula take into account for mental and physical toughness of pitchers?
                                              Does your formula take into account pitcher injuries that are not revealed to the public?
                                              Does your formula take into account bad bullpen sessions by pitchers the day before that are not revealed to the public?
                                              Does your formula take into account opposing team stealing the signs of catcher/pitcher or managers with the use of instant replay or other cheating devices?
                                              Does your formula take into account weather conditions or field conditions (Wrigley Field wind direction or Turned Field grass not cut on Lowe's starts for example)?

                                              If not then my answer is it will not work in the long run based purely on stats.
                                              Do you have a formula that takes all of that into account pavy? Details please.
                                              Comment
                                              • tweek
                                                SBR Hustler
                                                • 02-17-09
                                                • 60

                                                #24
                                                Originally posted by Justin7
                                                I'd look closely at your assumptions.

                                                If this Boston lineup averages 6 runs a game, and this NYY Pitcher with this bullpen allows 4, taking the average might not be the best approach.

                                                What does a typical AL team score/allow? If it is 4 runs per game, NYY is average, and I'd expect Boston to score 6. If it is 6, NYY is the "dominant factor". and I'd expect Boston to score 4 runs.

                                                Regardless of what weighting you use, I'd suggest you normalize. Turn each team into a "multiplier" - i.e. 6 runs vs 4 run average would be 1.5. To estimate runs, multiply these two multipliers (in the average 4 example - this would be 1.5 and 1), and multiply by the conference average.

                                                If you want to weight pitching and hitting differently, you can still use this formula. Let's say you decide that hitting is twice as important as pitching. When normalize, you use exponents that add up to 2, and have the ratio you want. In this example, you would use

                                                (Pitching multiplier) ^ 0.667 * (Hitting multiplier) ^ 1.333 * (Conference average expected runs/game).

                                                Good questions, and good luck.
                                                Justin,

                                                Can you talk a little bit about the rational for the formula? The weights seem to make sense... I'm just struggling a bit with the multiplication in general. What's the "theory" behind why you would want to multiply team A's predicted batting runs scored by team B's predicted pitching runs allowed?
                                                Comment
                                                • Justin7
                                                  SBR Hall of Famer
                                                  • 07-31-06
                                                  • 8577

                                                  #25
                                                  It's "normalization", and you can do it in all sports. Taking the geometric average... If a team is 10% above average on offense, and the opposing defense is 10% superior, any average will work.

                                                  But what if team A's offense is 10% below average, and B's defense allows 10% fewer points? A straight average would expect A to score 10% less, but they will score much less than that. This is where normalizing - taking a geometric mean - works better.
                                                  Comment
                                                  • tweek
                                                    SBR Hustler
                                                    • 02-17-09
                                                    • 60

                                                    #26
                                                    Originally posted by Justin7
                                                    It's "normalization", and you can do it in all sports. Taking the geometric average... If a team is 10% above average on offense, and the opposing defense is 10% superior, any average will work.

                                                    But what if team A's offense is 10% below average, and B's defense allows 10% fewer points? A straight average would expect A to score 10% less, but they will score much less than that. This is where normalizing - taking a geometric mean - works better.
                                                    Ah... I didn't realize the heart of it was a geometric average.

                                                    It seems, however if this is the case, the equation would be:

                                                    Expected runs scored = (Conference average expected runs/game) * sqrt( (Pitching multiplier) ^ 0.667 * (Hitting multiplier) ^ 1.333)

                                                    So I guess my question is what happened to the square root of the geometric mean?
                                                    Comment
                                                    • Formulawiz
                                                      Restricted User
                                                      • 01-12-09
                                                      • 1589

                                                      #27
                                                      Originally posted by tweek
                                                      Ah... I didn't realize the heart of it was a geometric average.

                                                      It seems, however if this is the case, the equation would be:

                                                      Expected runs scored = (Conference average expected runs/game) * sqrt( (Pitching multiplier) ^ 0.667 * (Hitting multiplier) ^ 1.333)

                                                      So I guess my question is what happened to the square root of the geometric mean?
                                                      I think you guys in my opinion are making this more complicated then it should be. Baseball is very simple to handicap and there are very few things that need to be followed. When handicapping baseball and I have been doing it for over 30 years now boils down to this. Starting pitchers, bullpen, streaks, money line movement and good handicapping software.
                                                      Comment
                                                      Search
                                                      Collapse
                                                      SBR Contests
                                                      Collapse
                                                      Top-Rated US Sportsbooks
                                                      Collapse
                                                      Working...