Use math to predict baseball games - Markov Chain Method.

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • JoshW
    SBR MVP
    • 08-10-05
    • 3431

    #1
    Use math to predict baseball games - Markov Chain Method.
    By William Atkins
    Friday, 06 April 2007
    The use of mathematics may not seem very interesting to the average person, but U.S. math professor, and N.Y. Mets fan, Bruce Bukiet can consistently beat sports experts when using his copyrighted “Markov Chain” method.

    For the 2007 major league baseball season, Bukiet is predicting that the New York Yankees will be the winningest team with 110 victories out of 162 games.

    Bukiet, who teaches at New Jersey Institute of Technology (Newark), uses a mathematical model that determines the likelihood of victory or defeat on a particular day based on the two teams’ batting orders of starters (along with five reserves) and starting pitcher (and six relievers). His model predicts the outcome of individual games based on how well each player is likely to perform against each pitcher. Bukiet also predicts outcomes for the whole baseball season.

    The model is called the “Markov Chain”. It is a series of states within a system (in this case Major League Baseball) that relies on a finite number of possible situations in any baseball game. Each time the method makes a prediction a change of state has been made, what is called a transition. A past state carries no information about future states, only information in the current state is used to predict the future. When Bukiet makes his predictions for the 2007 he will input statistics into his Markov mathematical model from the past three years: 2004, 2005, and 2006.

    His model works only for games like baseball where one-on-one events occur, such as one pitcher pitching to one batter. The model doesn’t work for more team-intensive sports such as basketball where two teams of five players each, for instance, must go up and down the court in unison in order to defend and shoot baskets—and ultimately either win or lose the game.

    Bukiet says that in the last five out of six years he has had more right than wrong predictions. His findings have been published in the paper, "A Markov Chain Approach to Baseball," in the February 1997 issue of the journal Operations Research.

    He first started using his model as a way to show students that mathematics CAN BE FUN!

    Bruce Bukiet’s Web page is at: http://cams.njit.edu/~bukiet/.

    More information on Markov Chain appears at: http://mathworld.wolfram.com/MarkovChain.html.


    WellPCB provides complete PCB solutions including in-house fabrication, assembly, component sourcing, and testing. Achieve superior quality and quick service.
  • Ganchrow
    SBR Hall of Famer
    • 08-28-05
    • 5011

    #2
    I'd be interested to learn whether or not Prof. Bukiet's results outperform closing lines.
    Comment
    • RickySteve
      Restricted User
      • 01-31-06
      • 3415

      #3
      Originally posted by Ganchrow
      I'd be interested to learn whether or not Prof. Bukiet's results outperform closing lines.
      I'll lay you +900 that they don't.
      Comment
      • durito
        SBR Posting Legend
        • 07-03-06
        • 13173

        #4
        Originally posted by Ganchrow
        I'd be interested to learn whether or not Prof. Bukiet's results outperform closing lines.
        according to this page:



        It's been profitable in 5/6 seasons.
        Comment
        • Ganchrow
          SBR Hall of Famer
          • 08-28-05
          • 5011

          #5
          Right, but are those results determined ex-post or ex-ante?

          In other words was he actually making these picks year-to-year using his model, or did he formulate his model so as to simply maximize past year results? The former represents predictive statistics and the latter descriptive statistics (aka "data mining").

          I don't pretend to know what he's done, but I would be interested in finding out.
          Comment
          • durito
            SBR Posting Legend
            • 07-03-06
            • 13173

            #6
            As far as I can tell from looking at his web page, he has made the picks year to year using the model. At least, that is what is implied.
            Comment
            • Ganchrow
              SBR Hall of Famer
              • 08-28-05
              • 5011

              #7
              Originally posted by durito
              As far as I can tell from looking at his web page, he has made the picks year to year using the model. At least, that is what is implied.
              I've seen it in quantitative finance and I've seen it in quantitative sports betting ... and that's what's always implied.

              You'd be surprised how many otherwise competent quantitative individuals simply don't understand the practical importance of segmenting a data set into in-sample and out-of-sample partitions. The fact that Prof. Bukiet is not proactive about making explicit his sampling methodology is what concerns me.

              The truth it's rather easy to come up with a model that describes the past ... predicting the future is another matter entirely.
              Comment
              • raiders72002
                SBR MVP
                • 03-06-07
                • 3368

                #8
                Most of these formulas are backfitted when they say that it's won 5 out of 6 years. If that's the case then it doesn't mean much.
                Comment
                • Ganchrow
                  SBR Hall of Famer
                  • 08-28-05
                  • 5011

                  #9
                  Originally posted by raiders72002
                  Most of these formulas are backfitted when they say that it's won 5 out of 6 years. If that's the case then it doesn't mean much.
                  Yeah, that's exactly it.

                  I just want to be clear that I have no specific reason to suspect that Prof. Bukiet's model in particular is backfitted, it's just that from my experience this is what typically seems to be the sticking point.
                  Comment
                  • jjgold
                    SBR Aristocracy
                    • 07-20-05
                    • 388179

                    #10
                    Many can predict games without odds as this guy might be able to do. But after I read his prediction of the Yanks winning 110 games he is just a fraud.

                    Lol the yanks are lucky to win 90
                    Comment
                    • bookie
                      SBR MVP
                      • 08-10-05
                      • 2112

                      #11
                      Agreed. If he's smart enough to have found the grail then you'd think he'd know he was smart enough to have found the grail and his New Jersey Institute of Technology days would be behind him.

                      There's a long history of academics being attracted to the possibility of finding in sports betting an image of their own intelligence.
                      Comment
                      • ferndog
                        SBR MVP
                        • 02-22-07
                        • 1386

                        #12
                        So what are todays plays?
                        Comment
                        • raiders72002
                          SBR MVP
                          • 03-06-07
                          • 3368

                          #13
                          So what are todays plays?
                          He'll tell you tomorrow.
                          Comment
                          • tribet
                            SBR High Roller
                            • 08-12-06
                            • 171

                            #14
                            Originally posted by raiders72002
                            He'll tell you tomorrow.
                            Comment
                            • Ganchrow
                              SBR Hall of Famer
                              • 08-28-05
                              • 5011

                              #15
                              Originally posted by bookie
                              Agreed. If he's smart enough to have found the grail then you'd think he'd know he was smart enough to have found the grail and his New Jersey Institute of Technology days would be behind him.

                              There's a long history of academics being attracted to the possibility of finding in sports betting an image of their own intelligence.
                              I think you're being a little bit harsh. Many people enjoy the life of academia and have made the conscious decision to seek knowledge and academic fame in preference to monetary success. There's nothing inherently inconsistent between finding the Holy Grail and living a life of academic austerity.

                              That said, I think where many academics fail (especially in the fields of economics and finance) is in trying to relate interesting theoretical constructs to real world practicalities. A Markov chain as it relates to baseball can make for really, really interesting cocktail party conversation (at least in certain circles) and might be exceptionally hard to pass over intellectually. However, the extent to which a given theory is jointly true out-of-sample and in excess of market efficiency is the real unknown and that which is all too frequently overlooked by academic economists and applied mathematicians (especially those either overly accustomed to dealing with descriptive statistics or too comfortable working with qualitative predictions that don't need to out-perform any market index).
                              Comment
                              • Wheell
                                SBR MVP
                                • 01-11-07
                                • 1380

                                #16
                                Actually, his plays have a bigger problem. When Kazmir pitched against Pavano week 1 he had the Yankees at +100. He is betting vig free into newspapers opening lines... and when there is no line he assumes pick'em. He's a fraud.
                                Comment
                                • Ganchrow
                                  SBR Hall of Famer
                                  • 08-28-05
                                  • 5011

                                  #17
                                  Originally posted by Wheell
                                  Actually, his plays have a bigger problem. When Kazmir pitched against Pavano week 1 he had the Yankees at +100. He is betting vig free into newspapers opening lines... and when there is no line he assumes pick'em. He's a fraud.
                                  Yeah ... that would be a rather ginormous difficulty. Good catch.

                                  Nevertheless, I'd be exceedingly hesitant to flat-out label him a "fraud". It's methodologies just like this, fully undertaken in good faith, that plague academic economic literature.
                                  Comment
                                  • Wheell
                                    SBR MVP
                                    • 01-11-07
                                    • 1380

                                    #18
                                    Upon further reflection, you are right, if only because he actually puts out his numbers and allows you to keep your own records. He is not a fraud, he's an academic.
                                    Comment
                                    • Scorpion
                                      SBR Hall of Famer
                                      • 09-04-05
                                      • 7797

                                      #19
                                      Originally posted by RickySteve
                                      I'll lay you +900 that they don't.
                                      BINGO!!!
                                      His sytem predicts -300 pitcher will win 65% of his starts, so what?
                                      Comment
                                      • bookie
                                        SBR MVP
                                        • 08-10-05
                                        • 2112

                                        #20
                                        Originally posted by Ganchrow
                                        I think you're being a little bit harsh. Many people enjoy the life of academia and have made the conscious decision to seek knowledge and academic fame in preference to monetary success.
                                        Many? I guess when I meet my first black swan academic who has the evaluative tools, capital, and savvy to crush sports betting but chooses to publish and teach I'll have to revise my conclusions.

                                        There are a number of interesting books on this topic. Two that I have enjoyed are Fortune's Formula (Poundstone) and If You're So Smart Why Aren't You Rich (McCloskey).
                                        Comment
                                        • RickySteve
                                          Restricted User
                                          • 01-31-06
                                          • 3415

                                          #21
                                          Originally posted by Scorpion
                                          BINGO!!!
                                          His sytem predicts -300 pitcher will win 65% of his starts, so what?
                                          That would be a tremendous system. I'll take the +250 starter who wins 35%.
                                          Comment
                                          • RickySteve
                                            Restricted User
                                            • 01-31-06
                                            • 3415

                                            #22
                                            Originally posted by bookie
                                            Many? I guess when I meet my first black swan academic who has the evaluative tools, capital, and savvy to crush sports betting but chooses to publish and teach I'll have to revise my conclusions.

                                            There are a number of interesting books on this topic. Two that I have enjoyed are Fortune's Formula (Poundstone) and If You're So Smart Why Aren't You Rich (McCloskey).
                                            You're either joking or have a tragically narrow view of the world. Committed academics in many fields forego tremendous riches in the private sector. Those that are lured away by financial gain are often met with opportunities which dwarf any potential profit from exploiting inefficiencies in sports markets.

                                            Maybe you should read Fortune's Formula again, since it is the story of one such individual.

                                            You also should look up the definition of 'black swan'.
                                            Comment
                                            • bookie
                                              SBR MVP
                                              • 08-10-05
                                              • 2112

                                              #23
                                              Originally posted by RickySteve
                                              You're either joking or have a tragically narrow view of the world. Committed academics in many fields forego tremendous riches in the private sector. Those that are lured away by financial gain are often met with opportunities which dwarf any potential profit from exploiting inefficiencies in sports markets.

                                              Maybe you should read Fortune's Formula again, since it is the story of one such individual.

                                              You also should look up the definition of 'black swan'.
                                              I think Poundstone tries to present the success of Claude Shannon as linked to his interest in the kelly formula, but it turns out his stock market success was due to his being a buy and hold investor in technology stocks.

                                              Are you an academic? Sorry if my comments hit a nerve.
                                              Comment
                                              • RickySteve
                                                Restricted User
                                                • 01-31-06
                                                • 3415

                                                #24
                                                Originally posted by bookie
                                                I think Poundstone tries to present the success of Claude Shannon as linked to his interest in the kelly formula, but it turns out his stock market success was due to his being a buy and hold investor in technology stocks.
                                                As a general rule you should have actually read something you reference, to avoid embarrassing situations where the source completely contradicts your argument.

                                                Originally posted by bookie
                                                Are you an academic? Sorry if my comments hit a nerve.
                                                Nope. Just continuing my mission to enlighten idiots and expose phonies, one at a time.
                                                Comment
                                                • SquareShooter
                                                  SBR High Roller
                                                  • 04-16-06
                                                  • 223

                                                  #25
                                                  Either I can't read or their 2005 season has been a complete disaster: -3.34per game for the year

                                                  link:
                                                  Comment
                                                  • bookie
                                                    SBR MVP
                                                    • 08-10-05
                                                    • 2112

                                                    #26
                                                    Originally posted by RickySteve
                                                    As a general rule you should have actually read something you reference, to avoid embarrassing situations where the source completely contradicts your argument.
                                                    What do you imagine my argument to have been?
                                                    Comment
                                                    • ugard
                                                      SBR Rookie
                                                      • 03-21-07
                                                      • 14

                                                      #27
                                                      Originally posted by Ganchrow
                                                      However, the extent to which a given theory is jointly true out-of-sample and in excess of market efficiency is the real unknown and that which is all too frequently overlooked by academic economists and applied mathematicians (especially those either overly accustomed to dealing with descriptive statistics or too comfortable working with qualitative predictions that don't need to out-perform any market index).
                                                      If testing for weak form efficiency (betting on price data alone), or some other sort of systematic betting rule such as Hausch, Zeimba and Rubenstien (1981)* where they are no parameters to maximise in the model, is out-of-sample testing necessary (I'm assuming closing odds are used)?

                                                      Am I correct in thinking out-of-sample testing involves testing your model across a range of time periods, which may all be in the past, meaning one doesn't need to wait for 'new' results (so long as you don't actually use the whole dataset for calibration)? In the case of a 'static' systematic models as above, wouldn't splitting the data into fewer time periods simply reduce the statistical significance of results from each group? Or is the whole point that any model, parameter-less or otherwise, should produce returns even when splitting the data into a range of time periods (and if you find that there is not enough data in each subset for statistical significance, you need a larger dataset)?


                                                      *This is an example of academics publishing profitable material for bookie: They used the so-called 'Harville formulas' to find inconsistencies between the place and show betting odds in horse racing. If the identified bets had been placed, they claimed a return of 1.15 at the various racetracks tested. This 'system' was later published in a book for laymen, the "Dr. Z System". Studies have since indicated that this inefficiency has greatly dimished in the interventing years.
                                                      Comment
                                                      • Ganchrow
                                                        SBR Hall of Famer
                                                        • 08-28-05
                                                        • 5011

                                                        #28
                                                        Originally posted by ugard
                                                        If testing for weak form efficiency (betting on price data alone), or some other sort of systematic betting rule such as Hausch, Zeimba and Rubenstien (1981)* where they are no parameters to maximise in the model, is out-of-sample testing necessary (I'm assuming closing odds are used)?
                                                        I don't quite understand your question, nor am I familiar with that particular paper. But if you want to make predictions of future events and formulate this model based on an historical sample, then an out-of-sample dataset upon test your predictions is essential.

                                                        Originally posted by ugard
                                                        Am I correct in thinking out-of-sample testing involves testing your model across a range of time periods, which may all be in the past, meaning one doesn't need to wait for 'new' results (so long as you don't actually use the whole dataset for calibration)?
                                                        Yes.

                                                        Originally posted by ugard
                                                        In the case of a 'static' systematic models as above, wouldn't splitting the data into fewer time periods simply reduce the statistical significance of results from each group?
                                                        Yes.

                                                        Originally posted by ugard
                                                        Or is the whole point that any model, parameter-less or otherwise, should produce returns even when splitting the data into a range of time periods (and if you find that there is not enough data in each subset for statistical significance, you need a larger dataset)?
                                                        The point is you don't want to be testing your model on the same data set you used to formulate it.
                                                        Comment
                                                        • ugard
                                                          SBR Rookie
                                                          • 03-21-07
                                                          • 14

                                                          #29
                                                          Originally posted by Ganchrow
                                                          I don't quite understand your question, nor am I familiar with that particular paper.
                                                          It is available as a .doc here, or the .pdf (if you have JSTOR access) is here.

                                                          When I say 'weak form' efficiency I'm using the definition popularised by Fama in early '70s as part of EMH. To the best of my understanding, this means making a 'model' (although I feel this definition is where I am not explaining myself) from price data alone.

                                                          For example, the simplest test (which has been performed repeatedly) in sport betting is to group outcomes in the dataset by price level and test whether betting at a particular price level would have produced a profit. I can't see how this sort of efficiency test (or any other based on some sort of parameter-less 'model', such as the famous, in horse racing circles at least, HZR system) would require out of sample testing.

                                                          Originally posted by Ganchrow
                                                          But if you want to make predictions of future events and formulate this model based on an historical sample...
                                                          I think this points at the discrepancy between the systematic rule based 'model' I was getting at, and the probability prediction model you mean.

                                                          I'm not trying to claim anything you have written is wrong. Regardless, I'm sure you would point out that you definition of 'model' did not cover this (and I think I would agree that it is a tenuous use of the word).

                                                          I'm just trying to add that there are (conceivably) profitable 'models' (or maybe a better term would be 'systems') that don't require out of sample testing.
                                                          Comment
                                                          • Ganchrow
                                                            SBR Hall of Famer
                                                            • 08-28-05
                                                            • 5011

                                                            #30
                                                            Originally posted by ugard
                                                            It is available as a .doc here, or the .pdf (if you have JSTOR access) is here.

                                                            When I say 'weak form' efficiency I'm using the definition popularised by Fama in early '70s as part of EMH. To the best of my understanding, this means making a 'model' (although I feel this definition is where I am not explaining myself) from price data alone.

                                                            For example, the simplest test (which has been performed repeatedly) in sport betting is to group outcomes in the dataset by price level and test whether betting at a particular price level would have produced a profit. I can't see how this sort of efficiency test (or any other based on some sort of parameter-less 'model', such as the famous, in horse racing circles at least, HZR system) would require out of sample testing.



                                                            I think this points at the discrepancy between the systematic rule based 'model' I was getting at, and the probability prediction model you mean.

                                                            I'm not trying to claim anything you have written is wrong. Regardless, I'm sure you would point out that you definition of 'model' did not cover this (and I think I would agree that it is a tenuous use of the word).

                                                            I'm just trying to add that there are (conceivably) profitable 'models' (or maybe a better term would be 'systems') that don't require out of sample testing.
                                                            Nothing you've described would in any way obviate the need for proper out-of-sample hypothesis testing.

                                                            This is just the point I've been making throughout this post. Too frequently, otherwise intelligent and quantitative people overlook proper testing methodology and then after losing their proverbial shirts, wonder why their models (which passed every statistical test imaginable in-sample) are so poor at predicting the future.
                                                            Comment
                                                            • ugard
                                                              SBR Rookie
                                                              • 03-21-07
                                                              • 14

                                                              #31
                                                              Originally posted by Ganchrow
                                                              Nothing you've described would in any way obviate the need for proper out-of-sample hypothesis testing.
                                                              I've thought about this a little more. I thought that because the simple model I was describing reqired no training (or other jiggery-pokery with variables, trying to modify it to fit it to the data at hand), testing it on two sets of data (neither having been used in the models formulation) would be no better than lumping all the data together and getting (hopefully) one highly significant result. I see now that the central point is not whether a subset has been used to tune parameters, but simply that one tests on mulitple subsets.

                                                              The question now is, how many subsets, and what criterion does one use to decide between the favourability of, for example, 10 subsets all with positive returns at the 1% level and 50 subsets all with returns at the 5% level. Time for a little more reading.

                                                              Originally posted by Ganchrow
                                                              This is just the point I've been making throughout this post.
                                                              Thank you for hammering it home, now it's finally got there I feel rather enlightened.

                                                              The more I think about it, the more I realise how valid your critisism that this "plagues" the literature is. I have read, or skimmed, a substantial number of the papers testing horse racing for weak form efficiency and never have I seen a study where part of the data was reserved for out of sample testing (I assume that, as there have been so many studies testing the same thing, all the studies taken together count as a sort of informal unfinished out-of-sample test).

                                                              Come to think of it, many of the papers I read in other areas of economics, the abstract ends with "...and we find our model predicts the observed data.", but I don't see any evidence of out of sample testing. Also, the academic who I have had close dealings with on this subject (whilst very good with growth models!) knows very little about even basic significance testing.
                                                              Comment
                                                              Search
                                                              Collapse
                                                              SBR Contests
                                                              Collapse
                                                              Top-Rated US Sportsbooks
                                                              Collapse
                                                              Working...