Monte Carlo Simulations for College Sports

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Combato
    SBR Hustler
    • 09-12-17
    • 76

    #1
    Monte Carlo Simulations for College Sports
    Curious is anyone here knows much about the use of monte carlo game simulations in Excel.

    Playing a game 10,000 times sounds good but what data would be useful to simulate from?

    Simulate a game 10,000 times, take the median scores of the game, and look at the difference?

    If this difference varies from the line by a few points, then that would be the play?

    Does it make sense to run some regressions, find out which data points are best correlated to covering the spread and then incorporate these into a simulation?

    What is wrong with my logic here about using this approach?
  • Bsims
    SBR Wise Guy
    • 02-03-09
    • 827

    #2
    There's nothing wrong with your approach. But it's more difficult than it would seems. For example, let's assume you run regression against different variables in CFB. You will find the most significant variables would be turnovers. Now, how would you predict turnovers for your simulations? Same issues with all other variables.
    Comment
    • Waterstpub87
      SBR MVP
      • 09-09-09
      • 4102

      #3
      Originally posted by Combato
      Curious is anyone here knows much about the use of monte carlo game simulations in Excel.

      Playing a game 10,000 times sounds good but what data would be useful to simulate from?

      Simulate a game 10,000 times, take the median scores of the game, and look at the difference?

      If this difference varies from the line by a few points, then that would be the play?

      Does it make sense to run some regressions, find out which data points are best correlated to covering the spread and then incorporate these into a simulation?

      What is wrong with my logic here about using this approach?
      Great post. I run a monte carlo system for NCAAF.

      I simulate play by play data. Literally reconstruct a game from kickoff.

      I used to do this in excel. 1000 games per matchup, times roughly 55 games a week. It took roughly 2 days to run. My code was probably not optimal, and if you wrote perfect vba, you could probably cut this down. 10,000 games in this method would not be doable.

      Now, I run in python. Takes 3 hours to run the entire slate of games. Learn python, will make these doable for you to run 10000 games.

      You do not need to find a median at all. sim your games. You now have 10000 final scores. Figure out the % that the away covers the spread, figure out the % that are over the total. You then have the information you need to know. Assign a cutoff point, like a 6% difference from the odds, then bet those games.
      Comment
      • KVB
        SBR Aristocracy
        • 05-29-14
        • 74817

        #4
        Originally posted by Waterstpub87
        ...You do not need to find a median at all. sim your games. You now have 10000 final scores. Figure out the % that the away covers the spread, figure out the % that are over the total. You then have the information you need to know. Assign a cutoff point, like a 6% difference from the odds, then bet those games.
        Agreed but you can also find value in the medians when it comes to a more general sense of whether or not the market is in line with the league and bettors, or how much it varies from a baseline at any given moment.

        Regardless of the matchups or simulations, which change day to day, there can be a use for the bigger data being developed with these simulations. One thing that won't change, games keep coming and after some time, there is a lot of data to work with.

        It's not just scores or margins of victory, the data can be compiled for every individual stat that goes into making the play by play simulation.

        I hope that makes sense without getting into some specifics of the data usage.
        Comment
        • Combato
          SBR Hustler
          • 09-12-17
          • 76

          #5
          Great responses and definitely additional things to consider
          Thx
          Comment
          • Combato
            SBR Hustler
            • 09-12-17
            • 76

            #6
            Originally posted by Bsims
            There's nothing wrong with your approach. But it's more difficult than it would seems. For example, let's assume you run regression against different variables in CFB. You will find the most significant variables would be turnovers. Now, how would you predict turnovers for your simulations? Same issues with all other variables.
            Not sure if this makes sense but wouldn't it be possible to project turnovers using the random number function generator? Maybe incorporate this into the simulation to account for turnovers?

            Also, does anyone know how many yards an interception is worth? A fumble? I know down and distance come into play here but what would be the average yards to account for either a fumble or interception? Some one is bound to have done this work somewhere on line.
            Comment
            • Combato
              SBR Hustler
              • 09-12-17
              • 76

              #7
              Originally posted by KVB
              Agreed but you can also find value in the medians when it comes to a more general sense of whether or not the market is in line with the league and bettors, or how much it varies from a baseline at any given moment.

              Regardless of the matchups or simulations, which change day to day, there can be a use for the bigger data being developed with these simulations. One thing that won't change, games keep coming and after some time, there is a lot of data to work with.

              It's not just scores or margins of victory, the data can be compiled for every individual stat that goes into making the play by play simulation.

              I hope that makes sense without getting into some specifics of the data usage.
              To run simulations for that many statistical variables, I would think the overfitting or data mining could come into play here. If so, how do you address this assuming it's even a problem to begin with?

              I am new to this but just trying to assess and plan before I waste time working on the wrong issues.
              Comment
              • KVB
                SBR Aristocracy
                • 05-29-14
                • 74817

                #8
                Originally posted by Combato
                To run simulations for that many statistical variables, I would think the overfitting or data mining could come into play here. If so, how do you address this assuming it's even a problem to begin with?

                I am new to this but just trying to assess and plan before I waste time working on the wrong issues.
                Start tracking the individual stats being used, and where they are landing in the simulations, and how that lines up with the lines.

                At one point, it's about figuring out which stats are relevant so when you simulate the rest acts as noise.

                It's hard to do that without laying it all out there first, then trying to see what matters. In the end, you may have to regress and tests each stat individually to see which ones have the most relevant influence.

                The reason I say to track it, even graph it, is that variables can and will move in and out of favor and some variables are predictable.

                You have to account for recent performance and build that into the model.
                Comment
                • Combato
                  SBR Hustler
                  • 09-12-17
                  • 76

                  #9
                  Thank You
                  Comment
                  • Waterstpub87
                    SBR MVP
                    • 09-09-09
                    • 4102

                    #10
                    Originally posted by KVB
                    Agreed but you can also find value in the medians when it comes to a more general sense of whether or not the market is in line with the league and bettors, or how much it varies from a baseline at any given moment.

                    Regardless of the matchups or simulations, which change day to day, there can be a use for the bigger data being developed with these simulations. One thing that won't change, games keep coming and after some time, there is a lot of data to work with.

                    It's not just scores or margins of victory, the data can be compiled for every individual stat that goes into making the play by play simulation.

                    I hope that makes sense without getting into some specifics of the data usage.
                    On the first point, of medians matching the line, that is factored in. If you run the sim, and then find that the away team covers 3 50%, you can conclude that is a good line. I normally have a buffer, this point in baseball, anything with an edge of 4% or greater is bet.

                    The individual stats can be pulled from somewhere else and incorporated. Like, average strikeout rate 22%, how does plate discipline effect this. I am not sure what you mean by using the medians of simulations for this.
                    Comment
                    • Waterstpub87
                      SBR MVP
                      • 09-09-09
                      • 4102

                      #11
                      Originally posted by Combato
                      To run simulations for that many statistical variables, I would think the overfitting or data mining could come into play here. If so, how do you address this assuming it's even a problem to begin with?

                      I am new to this but just trying to assess and plan before I waste time working on the wrong issues.
                      You should avoid over fitting. It is easy. Test out your variables from 2012 to 2016, 5 years. Then simulate the last 2 years.

                      What you are describing is more like machine learning and not a monte carlo sim. Machine learning is going to try to use those variables to project games, figuring out the formula to do so.Monte carlo sim is more that you supply it variables, and an equation with a random element, and then let it do its thing. Machine learning over fits things.

                      Do not waste your time with figuring out yard values for things. If you can program a monte carlo sim, you can do better than some yards per point bullshit. It is a very noisy way to assess games. The problem is that many teams get garbage yards, and really yards in the middle of the field dont count for much.
                      Comment
                      • KVB
                        SBR Aristocracy
                        • 05-29-14
                        • 74817

                        #12
                        Originally posted by Waterstpub87
                        You should avoid over fitting. It is easy. Test out your variables from 2012 to 2016, 5 years. Then simulate the last 2 years.

                        What you are describing is more like machine learning and not a monte carlo sim. Machine learning is going to try to use those variables to project games, figuring out the formula to do so.Monte carlo sim is more that you supply it variables, and an equation with a random element, and then let it do its thing. Machine learning over fits things.

                        Do not waste your time with figuring out yard values for things. If you can program a monte carlo sim, you can do better than some yards per point bullshit. It is a very noisy way to assess games. The problem is that many teams get garbage yards, and really yards in the middle of the field dont count for much.
                        I can't disagree here, the use of simulations can get rid of some noise, but can also make it tougher to produce reliable adjustments.

                        The variable time frame is solid advice as well.

                        I suppose it depends on just what you program into the simulation.

                        Personally, I would make different simulations with different variable based on different strategies.

                        If only there was more time in the day.

                        This is why many of us prepare for the upcoming season well in advance of the start of the season (like in the offseason).
                        Comment
                        • KVB
                          SBR Aristocracy
                          • 05-29-14
                          • 74817

                          #13
                          Originally posted by Waterstpub87
                          On the first point, of medians matching the line, that is factored in. If you run the sim, and then find that the away team covers 3 50%, you can conclude that is a good line. I normally have a buffer, this point in baseball, anything with an edge of 4% or greater is bet.

                          The individual stats can be pulled from somewhere else and incorporated. Like, average strikeout rate 22%, how does plate discipline effect this. I am not sure what you mean by using the medians of simulations for this.
                          Yeah, without going into an example I was pretty vague. I was thinking that before your post.

                          I am really mixing methods here. The medians and where they stand would more likely come into play in deciding what variables to use, so it's before the simulation.

                          I sort of jumped to assessing the variable, instead of focusing on the simulation results.

                          I might have stretched the topic a bit. Still trying to work on a good example though, one that I feel comfortable posting.
                          Comment
                          • Waterstpub87
                            SBR MVP
                            • 09-09-09
                            • 4102

                            #14
                            Originally posted by KVB
                            I can't disagree here, the use of simulations can get rid of some noise, but can also make it tougher to produce reliable adjustments.

                            The variable time frame is solid advice as well.

                            I suppose it depends on just what you program into the simulation.

                            Personally, I would make different simulations with different variable based on different strategies.

                            If only there was more time in the day.

                            This is why many of us prepare for the upcoming season well in advance of the start of the season (like in the offseason).
                            Crazy amount of stuff. August and September are the busiest months of the year in that regard. Still a full slate of baseball games, plus testing and launch of 2 football models, NBA and NCAA basketball testing.
                            Comment
                            • Combato
                              SBR Hustler
                              • 09-12-17
                              • 76

                              #15
                              Originally posted by Waterstpub87
                              Great post. I run a monte carlo system for NCAAF.

                              I simulate play by play data. Literally reconstruct a game from kickoff.

                              I used to do this in excel. 1000 games per matchup, times roughly 55 games a week. It took roughly 2 days to run. My code was probably not optimal, and if you wrote perfect vba, you could probably cut this down. 10,000 games in this method would not be doable.

                              Now, I run in python. Takes 3 hours to run the entire slate of games. Learn python, will make these doable for you to run 10000 games.

                              You do not need to find a median at all. sim your games. You now have 10000 final scores. Figure out the % that the away covers the spread, figure out the % that are over the total. You then have the information you need to know. Assign a cutoff point, like a 6% difference from the odds, then bet those games.
                              What kind of results do you have from running the indepth of a simulation? Is the additionally complexity worth all the trouble and cause you to increase your edge over time?
                              Comment
                              • Waterstpub87
                                SBR MVP
                                • 09-09-09
                                • 4102

                                #16
                                My results have been good. I consistantly beat the close in NCAAF. This the only model I have ever run, so I can't really answer the other questions. I did something similar in baseball. So I was like " maybe I can write something for football". At the time I was working somewhere that I was able to finish a weeks worth of work in about 5 hours on Monday, so I had tons of time on my hands, because I had to sit at a desk looking busy for 50 hours a week.

                                It isn't particularly complex. It is a certain power rating that I use to adjust play data. The mechanism, the monte carlo if you will, isn't complex either. It just takes a while to bang all the bugs out of your code. I was able to rewrite it in python in maybe 10 hrs or so.
                                Comment
                                • Combato
                                  SBR Hustler
                                  • 09-12-17
                                  • 76

                                  #17
                                  Very good. Thanks
                                  Comment
                                  • Bsims
                                    SBR Wise Guy
                                    • 02-03-09
                                    • 827

                                    #18
                                    Originally posted by Combato
                                    Not sure if this makes sense but wouldn't it be possible to project turnovers using the random number function generator? Maybe incorporate this into the simulation to account for turnovers?

                                    Also, does anyone know how many yards an interception is worth? A fumble? I know down and distance come into play here but what would be the average yards to account for either a fumble or interception? Some one is bound to have done this work somewhere on line.
                                    You are correct in feeling that you would be better off using some sort of probability function to generate a turnover distribution rather than using an average.

                                    There isn't a value that you could assign to the impact of an interception. Consider two examples. First and goal, then an interception. Big impact. Second, a hail mary at the end of a half that was picked off. No impact. They look alike in a boxscore.
                                    Comment
                                    • Toledo Ed
                                      SBR Wise Guy
                                      • 09-04-10
                                      • 728

                                      #19
                                      Originally posted by Waterstpub87
                                      My results have been good. I consistantly beat the close in NCAAF. This the only model I have ever run, so I can't really answer the other questions. I did something similar in baseball. So I was like " maybe I can write something for football". At the time I was working somewhere that I was able to finish a weeks worth of work in about 5 hours on Monday, so I had tons of time on my hands, because I had to sit at a desk looking busy for 50 hours a week.

                                      It isn't particularly complex. It is a certain power rating that I use to adjust play data. The mechanism, the monte carlo if you will, isn't complex either. It just takes a while to bang all the bugs out of your code. I was able to rewrite it in python in maybe 10 hrs or so.

                                      Impressive. I have the time while at work but don’t have a clue about excel. I want your code!!!!!!!
                                      Comment
                                      • Waterstpub87
                                        SBR MVP
                                        • 09-09-09
                                        • 4102

                                        #20
                                        Originally posted by Toledo Ed
                                        Impressive. I have the time while at work but don’t have a clue about excel. I want your code!!!!!!!
                                        python, dog. Its the future. Easier to program in then VBA.
                                        Comment
                                        • Combato
                                          SBR Hustler
                                          • 09-12-17
                                          • 76

                                          #21
                                          Originally posted by Bsims
                                          You are correct in feeling that you would be better off using some sort of probability function to generate a turnover distribution rather than using an average.

                                          There isn't a value that you could assign to the impact of an interception. Consider two examples. First and goal, then an interception. Big impact. Second, a hail mary at the end of a half that was picked off. No impact. They look alike in a boxscore.
                                          Exactly. Context is everything isn't it.
                                          Comment
                                          • Combato
                                            SBR Hustler
                                            • 09-12-17
                                            • 76

                                            #22
                                            [QUOTE=Waterstpub87;28849368]On the first point, of medians matching the line, that is factored in. If you run the sim, and then find that the away team covers 3 50%, you can conclude that is a good line. I normally have a buffer, this point in baseball, anything with an edge of 4% or greater is bet.

                                            Example - Run 1000 game simulations that simulate a score for each team.

                                            Take the median score for all the simulated favorite sides (all 1000 simulated games)
                                            Take the median score for the simulated underdog sides (all 1000 simulated games)
                                            Compare the 2 medians. First median score is 27 for fav. Second median score is 20 for dog
                                            The median difference is 7 points.

                                            This median difference would make a nice try for making a line. Yes or No?

                                            I'm just speculating here. I have no idea if this is even feasiblble
                                            Comment
                                            • Waterstpub87
                                              SBR MVP
                                              • 09-09-09
                                              • 4102

                                              #23
                                              [QUOTE=Combato;28851532]
                                              Originally posted by Waterstpub87
                                              On the first point, of medians matching the line, that is factored in. If you run the sim, and then find that the away team covers 3 50%, you can conclude that is a good line. I normally have a buffer, this point in baseball, anything with an edge of 4% or greater is bet.

                                              Example - Run 1000 game simulations that simulate a score for each team.

                                              Take the median score for all the simulated favorite sides (all 1000 simulated games)
                                              Take the median score for the simulated underdog sides (all 1000 simulated games)
                                              Compare the 2 medians. First median score is 27 for fav. Second median score is 20 for dog
                                              The median difference is 7 points.

                                              This median difference would make a nice try for making a line. Yes or No?

                                              I'm just speculating here. I have no idea if this is even feasiblble
                                              Why would you do it that way instead of the way I suggested? What advantages do you think that has?
                                              Comment
                                              • Combato
                                                SBR Hustler
                                                • 09-12-17
                                                • 76

                                                #24
                                                None really. Just speculating.
                                                Comment
                                                • Waterstpub87
                                                  SBR MVP
                                                  • 09-09-09
                                                  • 4102

                                                  #25
                                                  Originally posted by Combato
                                                  None really. Just speculating.
                                                  Ok, my way is easier operationally. Consider that you will want to set a margin of where you bet vs. The line. My way, you can set this as a percentage margin very easily, and it is easy to calculate.

                                                  Your way is more difficult. You would then need to either set a difference in points margin, like 3 points off, or convert the number of points to a margin, so a line of 3 vs a projection of zero would be getting -150 at -110. This becomes more difficult with football as the points do not have symmetrical value. Consider a 3 pts difference between 1 and 4, vs the value or 3 pts from 36 to 39.

                                                  Unless you are careful in this regard, pricing this correctly is going to be time consuming. Whereas my way, it is right there, when you want it, correctly priced.
                                                  Comment
                                                  • Combato
                                                    SBR Hustler
                                                    • 09-12-17
                                                    • 76

                                                    #26
                                                    I get that part and will move forward. Thank you for the great feedback
                                                    Comment
                                                    • Waterstpub87
                                                      SBR MVP
                                                      • 09-09-09
                                                      • 4102

                                                      #27
                                                      Originally posted by Combato
                                                      I get that part and will move forward. Thank you for the great feedback
                                                      good luck. The thing with coding is that you just need to put in the work. That is all. It isn't some dark magic that only a few people are able to do. Google is your friend. Start off by simple stuff like googling " Use VBA to create a new worksheet" and someone will have answered this somewhere. Do that for every step you need, and you'll have a process in no time.
                                                      Comment
                                                      SBR Contests
                                                      Collapse
                                                      Top-Rated US Sportsbooks
                                                      Collapse
                                                      Working...