sports modeling approaches

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Wrecktangle
    SBR MVP
    • 03-01-09
    • 1524

    #1
    sports modeling approaches
    I posted this over at BC as a new poster. I doubt I'll get much response as most of those guys are vegas sharps who do less modeling and more action, but who knows...

    Anyway in the past, I've always simply divided up the sports modeling work into two basic techniques:

    Simulations - where every play, or set of plays are coded

    Expected Value - where the game play is synthesized into a metadata set and a value or set of values are compared

    In The Simulation world (sims), Monte Carlo (**) seems to be the norm. In the modeling and especially the Statistical world, ** is held in pretty high regard. Typically, ** has two difficulties, 1) it typically requires a very large db to derive its characteristics, and 2) the actual running of the sim can go to thousands of iterations, sometimes requiring a lot of time since a lot of code these days is built with interpreters instead of compilers.

    In the Expected Value (EV) world, all sorts of sub-variations seem to exist. One of the earliest that I’ve seen was the Dudley Model described in Art Glantz’s “The Winner’s guide to Pro Football Betting.” While Dudley was simple enough to be easily constructed by using a hand calculator, it simply was not as efficient as a simple Yards per Point formula. As I’ve mentioned in other posts, we’ve had Elo (from the chess World) and the recursive Predictor (Jeff Sagarin calls this Pure Points) that are fairly well known.

    The nice thing about EV methods is: it’s simple - doesn’t require a lot of data to build, fast running, and can be built on a spreadsheet so even the unwashed masses can do it.

    Lately, I’ve been playing with Hybrids, models that have combinations of both sims and EV. While in the past I’ve used some market efficiency models, but lately I’ve been playing around with a Signal to Noise approach that has it’s derivation in the SONAR/RADAR world. Yeah OK, maybe it's because I’m working with the Navy and I’m always fascinated by weird new algorithms but there just might be something there.

    So the point of this whole diatribe is, I’m just wondering if there are any other approaches that folks are working on that you’d (gasp!) care to mention in an open forum.
    Last edited by Wrecktangle; 04-19-09, 04:37 PM. Reason: spellos
  • Justin7
    SBR Hall of Famer
    • 07-31-06
    • 8577

    #2
    I think you've summed it up.
    Comment
    • 20Four7
      SBR Hall of Famer
      • 04-08-07
      • 6703

      #3
      Originally posted by Justin7
      I think you've summed it up.
      That's pretty much it. As I work I take the easy way out usually and don't bother with the sims. I thought I was the only one who knew about dudley as that was the first sports betting book I read.
      Comment
      • Wrecktangle
        SBR MVP
        • 03-01-09
        • 1524

        #4
        24-7: Dudley got me into the biz...and frankly I contend that it was/is the best book ever written. I know it worked in the 70s, but it eroded in the 80s maybe because the computer group was bending lines all over the place. Everything came to a screaming halt in 1985 with that massive HFA, and it worked partially in '86, if I remember right, and then with the strike in '87...Dudley just quit working.

        BTW, Glantz played in the Hilton contest a few times under the name of Dudley when it got started right after the Castaways dropped their contest.
        Comment
        • Bsims
          SBR Wise Guy
          • 02-03-09
          • 827

          #5
          Baseball is where I've spent most of my analytical time. I've done both types of analysis that you describe, all of my own creation. I've tried the EV approach, using linear regression, but didn't find this particularly useful. The variance in actual results from what the formula would say was somewhere around 1.5 runs per game. I believe you might call this noise and it was too big to ignore.

          What I'm really interested in is achieving a probability of each team winning. So most of my work has been on a baseball simulator that simulates each game based on player statistics. The code is actually compiled, so it runs quite quickly. I've found the simulated winning percentages settle fairly quickly, and after 5000 games it is usually in a 1% range which is close enough for me. I can simulate these 5000 games in less than 2 minutes which allows plenty of time to do this after getting the starting lineups (these are usually available about an hour before the game starts).

          I've done some work this winter on the other major sports and have developed a technique that you might consider a hybrid. It involves a generalized power rating for any particular statistic. I've fed these into an EV model based on regression, but am not comfortable with the results. I've also used these ratings to do simulations which shows a little more promise. Most of my recent work has been with the NHL and these hybrid simulations are returning a decent profit in the 3-4% range. But, I haven't done any back testing to see if this is a sustainable return.

          This work has given me an idea for a hybrid approach to baseball. I'm hoping to develop this approach, complete with back testing over the last few years. I suspect at the rate I'm not yet working on this I may be a few months developing it. But, since it would probably need a couple of months of current season data, I'm not in that big of a hurry.

          You wanted to know what others are doing, and this is my story.
          Comment
          • Wrecktangle
            SBR MVP
            • 03-01-09
            • 1524

            #6
            Bsims: Every time I think about MLB, I always come up simulation. I've done a little EV on it, and no joy...so I rapidly flit back to football. As you mention, the batting order comes out so late, and I have a day job that doesn't allow any outside "non-approved" connections (I can't even check my e-mail accounts), so it wouldn't work right now for me...sigh.

            Good luck on your approach, if you want to talk over anything (I do have a few ideas) give me a PM.
            Comment
            • Data
              SBR MVP
              • 11-27-07
              • 2236

              #7
              Originally posted by Bsims
              after getting the starting lineups (these are usually available about an hour before the game starts).
              Do you get the lineups at usatoday or someplace else?
              Comment
              • MadTiger
                SBR MVP
                • 04-19-09
                • 2724

                #8
                Originally posted by Wrecktangle

                Good luck on your approach, if you want to talk over anything (I do have a few ideas) give me a PM.
                Same here.
                Comment
                • Bsims
                  SBR Wise Guy
                  • 02-03-09
                  • 827

                  #9
                  Originally posted by Data
                  Do you get the lineups at usatoday or someplace else?
                  I get them from Yahoo. I have a program that loads the schedules and scores page, then looks for boxscores for games not yet started. They generally appear in the hour before the game, and the program downloads the boxscore and sets up the simulation program.

                  I suspect there are lineup changes after the boxscores are available and before the game starts. But it's probably the best information available at the time I get them.
                  Comment
                  • Data
                    SBR MVP
                    • 11-27-07
                    • 2236

                    #10
                    This has been my experience thus far that the lineups appear a few minutes earlier on usatoday than on Yahoo. However, even usatoday posts the lineups much less than an hour before the game time while for some games it is just about 5 minutes or so.

                    I am looking for a service that can provide this info earlier. Even more so for NBA. For NBA I found out (and that cost me some) that Yahoo boxscores that do appeaer 5 minutes before game time are totally unreliable and, basically, random.
                    Comment
                    • homerbush
                      SBR MVP
                      • 11-17-08
                      • 2317

                      #11
                      I have not swung by there lately by stats.com use to have good lineups early for MLB. But this was a couple years ago and I am not sure if that is still the case.
                      Comment
                      • Bsims
                        SBR Wise Guy
                        • 02-03-09
                        • 827

                        #12
                        Originally posted by Data
                        This has been my experience thus far that the lineups appear a few minutes earlier on usatoday than on Yahoo. However, even usatoday posts the lineups much less than an hour before the game time while for some games it is just about 5 minutes or so.

                        I am looking for a service that can provide this info earlier. Even more so for NBA. For NBA I found out (and that cost me some) that Yahoo boxscores that do appeaer 5 minutes before game time are totally unreliable and, basically, random.
                        I was able to download the lineups for the FLA-PIT game from Yahoo at 6:04 and the ATL-WAS game at 6:23. I did check out USA Today and the lineups were the same. I notice the COL-ARZ lineups are available on both sites now at 7:03.
                        Comment
                        • Data
                          SBR MVP
                          • 11-27-07
                          • 2236

                          #13
                          Originally posted by homerbush
                          I have not swung by there lately by stats.com use to have good lineups early for MLB. But this was a couple years ago and I am not sure if that is still the case.
                          Thanks, I'll check them out.
                          Comment
                          • Data
                            SBR MVP
                            • 11-27-07
                            • 2236

                            #14
                            Originally posted by Bsims
                            I was able to download the lineups for the FLA-PIT game from Yahoo at 6:04 and the ATL-WAS game at 6:23. I did check out USA Today and the lineups were the same. I notice the COL-ARZ lineups are available on both sites now at 7:03.
                            Very good, let's hope they do this for most of the games this season.
                            Comment
                            • waiverwire
                              SBR High Roller
                              • 03-08-09
                              • 125

                              #15
                              Getting back to the original topic in this thread, I haven't tried simulations yet. I'm mostly interested in daily fantasy baseball contests, which are kind of a hybrid of fantasy sports and sports betting. The thing is that the 'market' for them is so new, and so inefficient that simulations are completely unnecessary. My EV model is extremely crude but still VERY +EV, and I'm simply working on improving all the errors and omissions in it to stay one step ahead of my opponents. I'd imagine it will be years before it's necessary to use simulations...although I am interested in using them for some specific hard to forecast things like looking at expected innings pitched for a pitcher in a specific game.
                              Comment
                              • Bsims
                                SBR Wise Guy
                                • 02-03-09
                                • 827

                                #16
                                Originally posted by waiverwire
                                I'm mostly interested in daily fantasy baseball contests, which are kind of a hybrid of fantasy sports and sports betting.
                                I was not familiar with this type of contest. After a few searches, it looks interesting. Can anyone recommend the biggest and best site for this?
                                Comment
                                • losturmarbles
                                  SBR MVP
                                  • 07-01-08
                                  • 4604

                                  #17
                                  i did a search earlier and found


                                  just glanced over the site, i think that's what he was talking about.
                                  Comment
                                  • whatisit
                                    SBR Sharp
                                    • 01-25-09
                                    • 319

                                    #18
                                    Do you know of any good books/sites about developing models for betting? Completely new to this, wanted to read up on it in depth.
                                    Comment
                                    • Kyleben
                                      SBR High Roller
                                      • 03-30-09
                                      • 153

                                      #19
                                      Originally posted by whatisit
                                      Do you know of any good books/sites about developing models for betting? Completely new to this, wanted to read up on it in depth.
                                      Its very hard to find books on sports modeling, but it is not very hard to find books in other areas of modeling. Look for books on economic modeling and other areas outside of betting and try to apply them to betting. This is where bill james got his start, and he turned baseball upside down.
                                      Comment
                                      • Wrecktangle
                                        SBR MVP
                                        • 03-01-09
                                        • 1524

                                        #20
                                        No books are in print anymore, and the few that were are pretty much unobtainable.

                                        Justin7 is writing one, but he'll probably charge $10,000 per copy because the market is about 20 people world wide.

                                        ...and Dan Gordon's book is NOT about modeling, it's about bullsh*t
                                        Comment
                                        • Neil Nollidge
                                          SBR Rookie
                                          • 02-27-09
                                          • 41

                                          #21
                                          Wrecktangle, sorry to be asking questions, rather than revealing a new approach, but my past is crammed with racing stuff. Sims - I imagine that the coding is a big job too - have you experimented with variation to monitor the result/probability for sensitivity? Market efficiency - I would have thought that you would be making some sort of judgement on it with every approach, or are you? - this is not clear to me. My understanding of the term " efficiency ", has been formed only by observing its usage. I gather that the term has meaning only in relative contexts. So the idea is to create a raw probability set with an efficiency comparable to that of the market? I suppose that this relates to the Bayesian Priors that you refer to in your Kelly thread. ( I know next to nothing about Bayesian Inference. ) No idea about Signal to Noise, but if the approach is unique and produces numbers close to closing lines, it is cause for excitement. Given the situation with books, you guys on this thread seem to be key focul points.
                                          Comment
                                          • TomG
                                            SBR Wise Guy
                                            • 10-29-07
                                            • 500

                                            #22
                                            Here are two examples to get you started.. GL

                                            Baseball modeling from Markov Chains
                                            Pengeluaran sgp dan keluaran sgp hari ini pada tabel data sgp prize 2022 memiliki hubungan penting pada pasaran togel singapore dan toto sgp


                                            User Solver to Rate Sports Teams (An Excel Tutorial mostly)
                                            Comment
                                            • Neil Nollidge
                                              SBR Rookie
                                              • 02-27-09
                                              • 41

                                              #23
                                              Thanks, Tom. Neil.
                                              Comment
                                              • Wrecktangle
                                                SBR MVP
                                                • 03-01-09
                                                • 1524

                                                #24
                                                TomG, a note about solver in current implementation of Solver in Excel: it breaks a lot and crashes Excel. Apparently MicroSoft Bill won't pay to have a newer version of Solver integrated into the code. So, you call Frontline (the makers of Solver) and you find the fantastic prices Frontline wants for an add-on.

                                                Can you do it in Excel? Yep, but expect trouble.
                                                Last edited by Wrecktangle; 05-01-09, 06:45 PM. Reason: spellos
                                                Comment
                                                • Wrecktangle
                                                  SBR MVP
                                                  • 03-01-09
                                                  • 1524

                                                  #25
                                                  Neil, not exactly sure what you're asking, but typically I don't go into much on my code. If you want to build a sim, EV, or combination, I say have at it. If you want any more, PM me.
                                                  Comment
                                                  • 20Four7
                                                    SBR Hall of Famer
                                                    • 04-08-07
                                                    • 6703

                                                    #26
                                                    For the dude who was asking about sports modelling books. A good start is to read Basketball on Paper. It might help you figure out where to start.
                                                    Comment
                                                    • whatisit
                                                      SBR Sharp
                                                      • 01-25-09
                                                      • 319

                                                      #27
                                                      That was me, thanks alot bro I'll look into it
                                                      Comment
                                                      • lukeouk
                                                        SBR Rookie
                                                        • 02-19-09
                                                        • 13

                                                        #28
                                                        Hey All,

                                                        This post has really grabbed my attention. Is there anyone who would be able (and willing) to give a slightly more detailed description of how a simulation based model would work. I am slightly familiar with Monte Carlo from a signal processing course at University but was wondering how it is put together in a system one could use to calculate win probabilities etc..

                                                        Even a very brief outline of a Monte Carlo simulation, that can predict the outcome of sporting events, is seemingly hard to come by (maybe for obvious financial reasons) . If anyone could sketch out some sort of loose algorithm/guidelines for a ** approach I would be seriously happy.

                                                        Thanks guys,
                                                        LOJ.

                                                        PS. When you say "BC", Wrecktangle, what are you referring to?
                                                        Comment
                                                        • wintermute
                                                          SBR Rookie
                                                          • 05-05-09
                                                          • 20

                                                          #29
                                                          lukeouk

                                                          Building a simulation model is pretty straightforward if you happen to be a programmer and know something about the sport you're modeling.

                                                          Take baseball for example. For each player in the lineup for each of two teams you have to estimate his chances of striking out, being out on a fielding play, walking, hitting a single, double, triple or home run, getting hit by a pitch etc. These probabilities must sum to 1.

                                                          Using a random number generator to generate a number in the range 0 to 1 you determine what each batter does in each plate appearance. To take a very simple example, if a batter either strikes out ( 60 percent of the time ) or hits a single ( 40 percent of the time ) and your random number generator generates the number 0.4523, you say that the batter strikes out ( 0.4523 is less than 0.60 ).

                                                          You start off each inning with the bases empty and none out. If the first batter hits a single, you have a runner at first with none out. If the second batter strikes out you have a runner at first with 1 out. You keep processing batters until there are 3 outs. Along the way you keep track of the runs scored. You do this inning by inning

                                                          If after 8 and a half innings the home team is leading you stop the game. Otherwise you play the bottom of the ninth and extra innings if required. You repeat your simulation thousands of times - say 10000 times. If the home team wins the game 5666 times then you have calculated the probability of the home team winning as 0.5666

                                                          If Pinnacle has posted the home team as +101 and you trust your simulation you bet on the home team!!!

                                                          Of course the devil is in the details. You have to estimate the getting on base probabilities for all the players and this is typically done by extracting the relevant information from trustworthy data sources like retrosheet. You have to look at the pitcher opposing the batter and alter the batter's stats appropriately. You have to keep track of base runners and assign probabilities to how far they advance from 1st if the batter singles, doubles, gets out etc. You have to adjust for park effects and home field advantage. You have to somehow handle relief pitching and so on and so on...

                                                          You end up spending 100 times as much time gathering and analyzing the data that goes into your simulation as you actually spend coding the simulation.

                                                          Anyway if you have lots of time it's great fun. Even more fun if you can outsmart the sports books.

                                                          Good luck to you.
                                                          Comment
                                                          • Bsims
                                                            SBR Wise Guy
                                                            • 02-03-09
                                                            • 827

                                                            #30
                                                            Originally posted by losturmarbles
                                                            i did a search earlier and found


                                                            just glanced over the site, i think that's what he was talking about.
                                                            Thanks to waiverwire for bringing up daily fantasy games. And thanks to lostumarbles for pointing me to draftbug. I signed up and have really enjoyed the games. Just wish more people were playing to provide more opportunities and competition.
                                                            Comment
                                                            • rdgarza
                                                              SBR Rookie
                                                              • 09-30-09
                                                              • 9

                                                              #31
                                                              Originally posted by wintermute
                                                              lukeouk Building a simulation model is pretty straightforward if you happen to be a programmer and know something about the sport you're modeling. Take baseball for example. For each player in the lineup for each of two teams you have to estimate his chances of striking out, being out on a fielding play, walking, hitting a single, double, triple or home run, getting hit by a pitch etc. These probabilities must sum to 1. Using a random number generator to generate a number in the range 0 to 1 you determine what each batter does in each plate appearance. To take a very simple example, if a batter either strikes out ( 60 percent of the time ) or hits a single ( 40 percent of the time ) and your random number generator generates the number 0.4523, you say that the batter strikes out ( 0.4523 is less than 0.60 ). You start off each inning with the bases empty and none out. If the first batter hits a single, you have a runner at first with none out. If the second batter strikes out you have a runner at first with 1 out. You keep processing batters until there are 3 outs. Along the way you keep track of the runs scored. You do this inning by inning If after 8 and a half innings the home team is leading you stop the game. Otherwise you play the bottom of the ninth and extra innings if required. You repeat your simulation thousands of times - say 10000 times. If the home team wins the game 5666 times then you have calculated the probability of the home team winning as 0.5666 If Pinnacle has posted the home team as +101 and you trust your simulation you bet on the home team!!! Of course the devil is in the details. You have to estimate the getting on base probabilities for all the players and this is typically done by extracting the relevant information from trustworthy data sources like retrosheet. You have to look at the pitcher opposing the batter and alter the batter's stats appropriately. You have to keep track of base runners and assign probabilities to how far they advance from 1st if the batter singles, doubles, gets out etc. You have to adjust for park effects and home field advantage. You have to somehow handle relief pitching and so on and so on... You end up spending 100 times as much time gathering and analyzing the data that goes into your simulation as you actually spend coding the simulation. Anyway if you have lots of time it's great fun. Even more fun if you can outsmart the sports books. Good luck to you.
                                                              This is a very good post, I'm just a beginner in this and this post clarify a lot of things for me.

                                                              I have a few questions:

                                                              I thing the better random numbers I get and the more times I run the simulation the better results I get, am I right?

                                                              What do you use to code your simulations?

                                                              Are this type of simulations too intensive for my hardware?

                                                              How can I know if my simulation is too much for my hardware?
                                                              Comment
                                                              • wintermute
                                                                SBR Rookie
                                                                • 05-05-09
                                                                • 20

                                                                #32
                                                                I thing the better random numbers I get and the more times I run the simulation the better results I get, am I right?

                                                                I don't understand what you mean by better random numbers. The reason for running more simulations is to get better estimates of calculated probabilities. Take the example of flipping a coin. The more times you flip the coin, the closer you are likely to come to the true probability of 50 percent heads.

                                                                I find that in my simulations, probabilities don't start to settle down until I run at least 10000 simulations but 100000 is better.

                                                                What do you use to code your simulations?

                                                                I use two languages - Python and Objective C.

                                                                Are this type of simulations too intensive for my hardware?

                                                                How can I know if my simulation is too much for my hardware?


                                                                As I mentioned in another post, modern day PC's are incredibly powerful computing platforms if used intelligently. Just a couple of days ago I was looking at an old version of the Guinness Book of Records - 1969 I think. The most powerful computer on earth at that time costing about $15,000,000 is not as powerful as my iMac.

                                                                I use Python for data collection and preliminary analysis because it is such an easy language to work with. But Python is an interpretive language and fairly slow. I use Objective-C - a compiler - when I want speed. The code it generates runs about 100 times faster than Python's.

                                                                To get an idea of how fast your computer is, find a language and see how long it takes to generate 10,000,000 random numbers. Python on my iMac takes about 7.5 seconds.
                                                                Comment
                                                                • billdo75
                                                                  SBR Sharp
                                                                  • 05-11-09
                                                                  • 418

                                                                  #33
                                                                  Just to throw something out there...

                                                                  Have you ever tried out the Strategic Baseball Simulator? It's completely free and the theory behind it is as mentioned above. The team files are flat text .DAT files. I played around with it earlier this year using weighted stats (50% current season, 33% career, 17% last 10 games) prorated out to 162 games or 200 innings. You might want to look into it.
                                                                  Comment
                                                                  • rdgarza
                                                                    SBR Rookie
                                                                    • 09-30-09
                                                                    • 9

                                                                    #34
                                                                    Originally posted by wintermute
                                                                    I don't understand what you mean by better random numbers. The reason for running more simulations is to get better estimates of calculated probabilities. Take the example of flipping a coin. The more times you flip the coin, the closer you are likely to come to the true probability of 50 percent heads.
                                                                    With better random numbers I mean truly random numbers and no pseudo-random numbers, I read something about that in the following link but I don't know if this factor is important for the results of the simulation.

                                                                    Comment
                                                                    • wintermute
                                                                      SBR Rookie
                                                                      • 05-05-09
                                                                      • 20

                                                                      #35
                                                                      I use pseudo-random numbers. I doubt that truly random numbers would make a difference to the simulation results.
                                                                      Comment
                                                                      SBR Contests
                                                                      Collapse
                                                                      Top-Rated US Sportsbooks
                                                                      Collapse
                                                                      Working...