1. #1
    Combato
    Combato's Avatar Become A Pro!
    Join Date: 09-12-17
    Posts: 76
    Betpoints: 1076

    Monte Carlo Simulations for College Sports

    Curious is anyone here knows much about the use of monte carlo game simulations in Excel.

    Playing a game 10,000 times sounds good but what data would be useful to simulate from?

    Simulate a game 10,000 times, take the median scores of the game, and look at the difference?

    If this difference varies from the line by a few points, then that would be the play?

    Does it make sense to run some regressions, find out which data points are best correlated to covering the spread and then incorporate these into a simulation?

    What is wrong with my logic here about using this approach?

  2. #2
    Bsims
    Bsims's Avatar Become A Pro!
    Join Date: 02-03-09
    Posts: 827
    Betpoints: 13

    There's nothing wrong with your approach. But it's more difficult than it would seems. For example, let's assume you run regression against different variables in CFB. You will find the most significant variables would be turnovers. Now, how would you predict turnovers for your simulations? Same issues with all other variables.
    Nomination(s):
    This post was nominated 1 time . To view the nominated thread please click here. People who nominated: Combato

  3. #3
    Waterstpub87
    Slan go foill
    Waterstpub87's Avatar Become A Pro!
    Join Date: 09-09-09
    Posts: 4,043
    Betpoints: 7236

    Quote Originally Posted by Combato View Post
    Curious is anyone here knows much about the use of monte carlo game simulations in Excel.

    Playing a game 10,000 times sounds good but what data would be useful to simulate from?

    Simulate a game 10,000 times, take the median scores of the game, and look at the difference?

    If this difference varies from the line by a few points, then that would be the play?

    Does it make sense to run some regressions, find out which data points are best correlated to covering the spread and then incorporate these into a simulation?

    What is wrong with my logic here about using this approach?
    Great post. I run a monte carlo system for NCAAF.

    I simulate play by play data. Literally reconstruct a game from kickoff.

    I used to do this in excel. 1000 games per matchup, times roughly 55 games a week. It took roughly 2 days to run. My code was probably not optimal, and if you wrote perfect vba, you could probably cut this down. 10,000 games in this method would not be doable.

    Now, I run in python. Takes 3 hours to run the entire slate of games. Learn python, will make these doable for you to run 10000 games.

    You do not need to find a median at all. sim your games. You now have 10000 final scores. Figure out the % that the away covers the spread, figure out the % that are over the total. You then have the information you need to know. Assign a cutoff point, like a 6% difference from the odds, then bet those games.

  4. #4
    KVB
    It's not what they bring...
    KVB's Avatar SBR PRO
    Join Date: 05-29-14
    Posts: 74,849
    Betpoints: 7576

    Quote Originally Posted by Waterstpub87 View Post
    ...You do not need to find a median at all. sim your games. You now have 10000 final scores. Figure out the % that the away covers the spread, figure out the % that are over the total. You then have the information you need to know. Assign a cutoff point, like a 6% difference from the odds, then bet those games.
    Agreed but you can also find value in the medians when it comes to a more general sense of whether or not the market is in line with the league and bettors, or how much it varies from a baseline at any given moment.

    Regardless of the matchups or simulations, which change day to day, there can be a use for the bigger data being developed with these simulations. One thing that won't change, games keep coming and after some time, there is a lot of data to work with.

    It's not just scores or margins of victory, the data can be compiled for every individual stat that goes into making the play by play simulation.

    I hope that makes sense without getting into some specifics of the data usage.

  5. #5
    Combato
    Combato's Avatar Become A Pro!
    Join Date: 09-12-17
    Posts: 76
    Betpoints: 1076

    Great responses and definitely additional things to consider
    Thx

  6. #6
    Combato
    Combato's Avatar Become A Pro!
    Join Date: 09-12-17
    Posts: 76
    Betpoints: 1076

    Quote Originally Posted by Bsims View Post
    There's nothing wrong with your approach. But it's more difficult than it would seems. For example, let's assume you run regression against different variables in CFB. You will find the most significant variables would be turnovers. Now, how would you predict turnovers for your simulations? Same issues with all other variables.
    Not sure if this makes sense but wouldn't it be possible to project turnovers using the random number function generator? Maybe incorporate this into the simulation to account for turnovers?

    Also, does anyone know how many yards an interception is worth? A fumble? I know down and distance come into play here but what would be the average yards to account for either a fumble or interception? Some one is bound to have done this work somewhere on line.

  7. #7
    Combato
    Combato's Avatar Become A Pro!
    Join Date: 09-12-17
    Posts: 76
    Betpoints: 1076

    Quote Originally Posted by KVB View Post
    Agreed but you can also find value in the medians when it comes to a more general sense of whether or not the market is in line with the league and bettors, or how much it varies from a baseline at any given moment.

    Regardless of the matchups or simulations, which change day to day, there can be a use for the bigger data being developed with these simulations. One thing that won't change, games keep coming and after some time, there is a lot of data to work with.

    It's not just scores or margins of victory, the data can be compiled for every individual stat that goes into making the play by play simulation.

    I hope that makes sense without getting into some specifics of the data usage.
    To run simulations for that many statistical variables, I would think the overfitting or data mining could come into play here. If so, how do you address this assuming it's even a problem to begin with?

    I am new to this but just trying to assess and plan before I waste time working on the wrong issues.

  8. #8
    KVB
    It's not what they bring...
    KVB's Avatar SBR PRO
    Join Date: 05-29-14
    Posts: 74,849
    Betpoints: 7576

    Quote Originally Posted by Combato View Post
    To run simulations for that many statistical variables, I would think the overfitting or data mining could come into play here. If so, how do you address this assuming it's even a problem to begin with?

    I am new to this but just trying to assess and plan before I waste time working on the wrong issues.
    Start tracking the individual stats being used, and where they are landing in the simulations, and how that lines up with the lines.

    At one point, it's about figuring out which stats are relevant so when you simulate the rest acts as noise.

    It's hard to do that without laying it all out there first, then trying to see what matters. In the end, you may have to regress and tests each stat individually to see which ones have the most relevant influence.

    The reason I say to track it, even graph it, is that variables can and will move in and out of favor and some variables are predictable.

    You have to account for recent performance and build that into the model.

  9. #9
    Combato
    Combato's Avatar Become A Pro!
    Join Date: 09-12-17
    Posts: 76
    Betpoints: 1076

    Thank You

  10. #10
    Waterstpub87
    Slan go foill
    Waterstpub87's Avatar Become A Pro!
    Join Date: 09-09-09
    Posts: 4,043
    Betpoints: 7236

    Quote Originally Posted by KVB View Post
    Agreed but you can also find value in the medians when it comes to a more general sense of whether or not the market is in line with the league and bettors, or how much it varies from a baseline at any given moment.

    Regardless of the matchups or simulations, which change day to day, there can be a use for the bigger data being developed with these simulations. One thing that won't change, games keep coming and after some time, there is a lot of data to work with.

    It's not just scores or margins of victory, the data can be compiled for every individual stat that goes into making the play by play simulation.

    I hope that makes sense without getting into some specifics of the data usage.
    On the first point, of medians matching the line, that is factored in. If you run the sim, and then find that the away team covers 3 50%, you can conclude that is a good line. I normally have a buffer, this point in baseball, anything with an edge of 4% or greater is bet.

    The individual stats can be pulled from somewhere else and incorporated. Like, average strikeout rate 22%, how does plate discipline effect this. I am not sure what you mean by using the medians of simulations for this.

  11. #11
    Waterstpub87
    Slan go foill
    Waterstpub87's Avatar Become A Pro!
    Join Date: 09-09-09
    Posts: 4,043
    Betpoints: 7236

    Quote Originally Posted by Combato View Post
    To run simulations for that many statistical variables, I would think the overfitting or data mining could come into play here. If so, how do you address this assuming it's even a problem to begin with?

    I am new to this but just trying to assess and plan before I waste time working on the wrong issues.
    You should avoid over fitting. It is easy. Test out your variables from 2012 to 2016, 5 years. Then simulate the last 2 years.

    What you are describing is more like machine learning and not a monte carlo sim. Machine learning is going to try to use those variables to project games, figuring out the formula to do so.Monte carlo sim is more that you supply it variables, and an equation with a random element, and then let it do its thing. Machine learning over fits things.

    Do not waste your time with figuring out yard values for things. If you can program a monte carlo sim, you can do better than some yards per point bullshit. It is a very noisy way to assess games. The problem is that many teams get garbage yards, and really yards in the middle of the field dont count for much.

  12. #12
    KVB
    It's not what they bring...
    KVB's Avatar SBR PRO
    Join Date: 05-29-14
    Posts: 74,849
    Betpoints: 7576

    Quote Originally Posted by Waterstpub87 View Post
    You should avoid over fitting. It is easy. Test out your variables from 2012 to 2016, 5 years. Then simulate the last 2 years.

    What you are describing is more like machine learning and not a monte carlo sim. Machine learning is going to try to use those variables to project games, figuring out the formula to do so.Monte carlo sim is more that you supply it variables, and an equation with a random element, and then let it do its thing. Machine learning over fits things.

    Do not waste your time with figuring out yard values for things. If you can program a monte carlo sim, you can do better than some yards per point bullshit. It is a very noisy way to assess games. The problem is that many teams get garbage yards, and really yards in the middle of the field dont count for much.
    I can't disagree here, the use of simulations can get rid of some noise, but can also make it tougher to produce reliable adjustments.

    The variable time frame is solid advice as well.

    I suppose it depends on just what you program into the simulation.

    Personally, I would make different simulations with different variable based on different strategies.

    If only there was more time in the day.

    This is why many of us prepare for the upcoming season well in advance of the start of the season (like in the offseason).

  13. #13
    KVB
    It's not what they bring...
    KVB's Avatar SBR PRO
    Join Date: 05-29-14
    Posts: 74,849
    Betpoints: 7576

    Quote Originally Posted by Waterstpub87 View Post
    On the first point, of medians matching the line, that is factored in. If you run the sim, and then find that the away team covers 3 50%, you can conclude that is a good line. I normally have a buffer, this point in baseball, anything with an edge of 4% or greater is bet.

    The individual stats can be pulled from somewhere else and incorporated. Like, average strikeout rate 22%, how does plate discipline effect this. I am not sure what you mean by using the medians of simulations for this.
    Yeah, without going into an example I was pretty vague. I was thinking that before your post.

    I am really mixing methods here. The medians and where they stand would more likely come into play in deciding what variables to use, so it's before the simulation.

    I sort of jumped to assessing the variable, instead of focusing on the simulation results.

    I might have stretched the topic a bit. Still trying to work on a good example though, one that I feel comfortable posting.

  14. #14
    Waterstpub87
    Slan go foill
    Waterstpub87's Avatar Become A Pro!
    Join Date: 09-09-09
    Posts: 4,043
    Betpoints: 7236

    Quote Originally Posted by KVB View Post
    I can't disagree here, the use of simulations can get rid of some noise, but can also make it tougher to produce reliable adjustments.

    The variable time frame is solid advice as well.

    I suppose it depends on just what you program into the simulation.

    Personally, I would make different simulations with different variable based on different strategies.

    If only there was more time in the day.

    This is why many of us prepare for the upcoming season well in advance of the start of the season (like in the offseason).
    Crazy amount of stuff. August and September are the busiest months of the year in that regard. Still a full slate of baseball games, plus testing and launch of 2 football models, NBA and NCAA basketball testing.
    Nomination(s):
    This post was nominated 1 time . To view the nominated thread please click here. People who nominated: KVB

  15. #15
    Combato
    Combato's Avatar Become A Pro!
    Join Date: 09-12-17
    Posts: 76
    Betpoints: 1076

    Quote Originally Posted by Waterstpub87 View Post
    Great post. I run a monte carlo system for NCAAF.

    I simulate play by play data. Literally reconstruct a game from kickoff.

    I used to do this in excel. 1000 games per matchup, times roughly 55 games a week. It took roughly 2 days to run. My code was probably not optimal, and if you wrote perfect vba, you could probably cut this down. 10,000 games in this method would not be doable.

    Now, I run in python. Takes 3 hours to run the entire slate of games. Learn python, will make these doable for you to run 10000 games.

    You do not need to find a median at all. sim your games. You now have 10000 final scores. Figure out the % that the away covers the spread, figure out the % that are over the total. You then have the information you need to know. Assign a cutoff point, like a 6% difference from the odds, then bet those games.
    What kind of results do you have from running the indepth of a simulation? Is the additionally complexity worth all the trouble and cause you to increase your edge over time?

  16. #16
    Waterstpub87
    Slan go foill
    Waterstpub87's Avatar Become A Pro!
    Join Date: 09-09-09
    Posts: 4,043
    Betpoints: 7236

    My results have been good. I consistantly beat the close in NCAAF. This the only model I have ever run, so I can't really answer the other questions. I did something similar in baseball. So I was like " maybe I can write something for football". At the time I was working somewhere that I was able to finish a weeks worth of work in about 5 hours on Monday, so I had tons of time on my hands, because I had to sit at a desk looking busy for 50 hours a week.

    It isn't particularly complex. It is a certain power rating that I use to adjust play data. The mechanism, the monte carlo if you will, isn't complex either. It just takes a while to bang all the bugs out of your code. I was able to rewrite it in python in maybe 10 hrs or so.

  17. #17
    Combato
    Combato's Avatar Become A Pro!
    Join Date: 09-12-17
    Posts: 76
    Betpoints: 1076

    Very good. Thanks

  18. #18
    Bsims
    Bsims's Avatar Become A Pro!
    Join Date: 02-03-09
    Posts: 827
    Betpoints: 13

    Quote Originally Posted by Combato View Post
    Not sure if this makes sense but wouldn't it be possible to project turnovers using the random number function generator? Maybe incorporate this into the simulation to account for turnovers?

    Also, does anyone know how many yards an interception is worth? A fumble? I know down and distance come into play here but what would be the average yards to account for either a fumble or interception? Some one is bound to have done this work somewhere on line.
    You are correct in feeling that you would be better off using some sort of probability function to generate a turnover distribution rather than using an average.

    There isn't a value that you could assign to the impact of an interception. Consider two examples. First and goal, then an interception. Big impact. Second, a hail mary at the end of a half that was picked off. No impact. They look alike in a boxscore.

  19. #19
    Toledo Ed
    Toledo Ed's Avatar Become A Pro!
    Join Date: 09-04-10
    Posts: 728
    Betpoints: 1804

    Quote Originally Posted by Waterstpub87 View Post
    My results have been good. I consistantly beat the close in NCAAF. This the only model I have ever run, so I can't really answer the other questions. I did something similar in baseball. So I was like " maybe I can write something for football". At the time I was working somewhere that I was able to finish a weeks worth of work in about 5 hours on Monday, so I had tons of time on my hands, because I had to sit at a desk looking busy for 50 hours a week.

    It isn't particularly complex. It is a certain power rating that I use to adjust play data. The mechanism, the monte carlo if you will, isn't complex either. It just takes a while to bang all the bugs out of your code. I was able to rewrite it in python in maybe 10 hrs or so.

    Impressive. I have the time while at work but don’t have a clue about excel. I want your code!!!!!!!

  20. #20
    Waterstpub87
    Slan go foill
    Waterstpub87's Avatar Become A Pro!
    Join Date: 09-09-09
    Posts: 4,043
    Betpoints: 7236

    Quote Originally Posted by Toledo Ed View Post
    Impressive. I have the time while at work but don’t have a clue about excel. I want your code!!!!!!!
    python, dog. Its the future. Easier to program in then VBA.

  21. #21
    Combato
    Combato's Avatar Become A Pro!
    Join Date: 09-12-17
    Posts: 76
    Betpoints: 1076

    Quote Originally Posted by Bsims View Post
    You are correct in feeling that you would be better off using some sort of probability function to generate a turnover distribution rather than using an average.

    There isn't a value that you could assign to the impact of an interception. Consider two examples. First and goal, then an interception. Big impact. Second, a hail mary at the end of a half that was picked off. No impact. They look alike in a boxscore.
    Exactly. Context is everything isn't it.

  22. #22
    Combato
    Combato's Avatar Become A Pro!
    Join Date: 09-12-17
    Posts: 76
    Betpoints: 1076

    [QUOTE=Waterstpub87;28849368]On the first point, of medians matching the line, that is factored in. If you run the sim, and then find that the away team covers 3 50%, you can conclude that is a good line. I normally have a buffer, this point in baseball, anything with an edge of 4% or greater is bet.

    Example - Run 1000 game simulations that simulate a score for each team.

    Take the median score for all the simulated favorite sides (all 1000 simulated games)
    Take the median score for the simulated underdog sides (all 1000 simulated games)
    Compare the 2 medians. First median score is 27 for fav. Second median score is 20 for dog
    The median difference is 7 points.

    This median difference would make a nice try for making a line. Yes or No?

    I'm just speculating here. I have no idea if this is even feasiblble

  23. #23
    Waterstpub87
    Slan go foill
    Waterstpub87's Avatar Become A Pro!
    Join Date: 09-09-09
    Posts: 4,043
    Betpoints: 7236

    [QUOTE=Combato;28851532]
    Quote Originally Posted by Waterstpub87 View Post
    On the first point, of medians matching the line, that is factored in. If you run the sim, and then find that the away team covers 3 50%, you can conclude that is a good line. I normally have a buffer, this point in baseball, anything with an edge of 4% or greater is bet.

    Example - Run 1000 game simulations that simulate a score for each team.

    Take the median score for all the simulated favorite sides (all 1000 simulated games)
    Take the median score for the simulated underdog sides (all 1000 simulated games)
    Compare the 2 medians. First median score is 27 for fav. Second median score is 20 for dog
    The median difference is 7 points.

    This median difference would make a nice try for making a line. Yes or No?

    I'm just speculating here. I have no idea if this is even feasiblble
    Why would you do it that way instead of the way I suggested? What advantages do you think that has?

  24. #24
    Combato
    Combato's Avatar Become A Pro!
    Join Date: 09-12-17
    Posts: 76
    Betpoints: 1076

    None really. Just speculating.

  25. #25
    Waterstpub87
    Slan go foill
    Waterstpub87's Avatar Become A Pro!
    Join Date: 09-09-09
    Posts: 4,043
    Betpoints: 7236

    Quote Originally Posted by Combato View Post
    None really. Just speculating.
    Ok, my way is easier operationally. Consider that you will want to set a margin of where you bet vs. The line. My way, you can set this as a percentage margin very easily, and it is easy to calculate.

    Your way is more difficult. You would then need to either set a difference in points margin, like 3 points off, or convert the number of points to a margin, so a line of 3 vs a projection of zero would be getting -150 at -110. This becomes more difficult with football as the points do not have symmetrical value. Consider a 3 pts difference between 1 and 4, vs the value or 3 pts from 36 to 39.

    Unless you are careful in this regard, pricing this correctly is going to be time consuming. Whereas my way, it is right there, when you want it, correctly priced.

  26. #26
    Combato
    Combato's Avatar Become A Pro!
    Join Date: 09-12-17
    Posts: 76
    Betpoints: 1076

    I get that part and will move forward. Thank you for the great feedback

  27. #27
    Waterstpub87
    Slan go foill
    Waterstpub87's Avatar Become A Pro!
    Join Date: 09-09-09
    Posts: 4,043
    Betpoints: 7236

    Quote Originally Posted by Combato View Post
    I get that part and will move forward. Thank you for the great feedback
    good luck. The thing with coding is that you just need to put in the work. That is all. It isn't some dark magic that only a few people are able to do. Google is your friend. Start off by simple stuff like googling " Use VBA to create a new worksheet" and someone will have answered this somewhere. Do that for every step you need, and you'll have a process in no time.

Top