MLB Baseball Model

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • OMGRandyJackson
    SBR MVP
    • 02-07-10
    • 1680

    #1
    MLB Baseball Model
    I am interested in possibly working on a baseball model. It will most likely be ineffective, however I want to do it for the experience and to just see what I can come up with.

    Obviously I have to collect all the data I can and go from there, but my main thing is, I do not know how to start the actual math part.

    I am wondering is there any models out there that I can view and see how they started? Or if anyone here has some Tips! Thanks!
  • MrX
    SBR MVP
    • 01-10-06
    • 1540

    #2
    Do you want to make a simulation model, something regression based, or other?
    Comment
    • runt23
      SBR High Roller
      • 02-09-10
      • 134

      #3
      yea, im looking to make a mlb model too for this season.. i keep pushing it back because i dont want to start crunching numbers.. and i dont know where i'm gonna start. plus if i do actually get something created soon, i would want to try it right away and not wait for the season to start!
      Comment
      • OMGRandyJackson
        SBR MVP
        • 02-07-10
        • 1680

        #4
        Originally posted by MrX
        Do you want to make a simulation model, something regression based, or other?
        Not sure, I do not know what a regression based system would be?
        Comment
        • durito
          SBR Posting Legend
          • 07-03-06
          • 13173

          #5
          Then my suggestion would be to take a stats class first.

          And then get some data.
          Comment
          • MrX
            SBR MVP
            • 01-10-06
            • 1540

            #6
            Originally posted by OMGRandyJackson
            Not sure, I do not know what a regression based system would be?
            A regression based model would use regression analysis to find the relationship between any number of variables (usually past game stats, but anything that could affect game outcomes might be used) and the win%, total score distributions, or whatever else you might want to predict.

            Here's the wikipedia article: http://en.wikipedia.org/wiki/Regression_analysis

            The good news is that regression analysis is easy to perform (you can even use Excel) and, used properly, it can be a powerful tool.

            The bad news is that it's really easy to use improperly and it's easy to draw false conclusions if you don't know what you're doing. Also, it's pretty widely used. So, while performing regression analysis using typical stats on a major market (MLB) will be a good learning experience, it's unlikely to produce fruitful results for a beginner.
            Comment
            • DuncHen22
              SBR MVP
              • 11-20-09
              • 1079

              #7
              Originally posted by runt23
              yea, im looking to make a mlb model too for this season.. i keep pushing it back because i dont want to start crunching numbers.. and i dont know where i'm gonna start. plus if i do actually get something created soon, i would want to try it right away and not wait for the season to start!
              how would you try it if the season hasnt started yet?
              Comment
              • whatsgood5
                Restricted User
                • 10-13-09
                • 15359

                #8
                Good luck with this, I'd love to see your results once you finish!
                Comment
                • Joe Dogs
                  SBR MVP
                  • 07-20-09
                  • 1931

                  #9
                  Originally posted by DuncHen22
                  how would you try it if the season hasnt started yet?
                  He will probably try to back test his data for any significant results.
                  Comment
                  • OMGRandyJackson
                    SBR MVP
                    • 02-07-10
                    • 1680

                    #10
                    Originally posted by MrX
                    So, while performing regression analysis using typical stats on a major market (MLB) will be a good learning experience, it's unlikely to produce fruitful results for a beginner.
                    Which is perfectly fine. I am doing this for some experience.

                    So is there any sample models out there? Even if they are garbage lol?
                    Comment
                    • Flying Dutchman
                      SBR MVP
                      • 05-17-09
                      • 2467

                      #11
                      Originally posted by OMGRandyJackson
                      Which is perfectly fine. I am doing this for some experience.

                      So is there any sample models out there? Even if they are garbage lol?
                      Check with Justin7, he's writing a "Sports Modeling for Dummies" book it seems.

                      Comment
                      • DuncHen22
                        SBR MVP
                        • 11-20-09
                        • 1079

                        #12
                        Originally posted by Joe Dogs
                        He will probably try to back test his data for any significant results.
                        Okay, that's what I thought... but I just figured that was a given with any new system.
                        Comment
                        • runt23
                          SBR High Roller
                          • 02-09-10
                          • 134

                          #13
                          Originally posted by DuncHen22
                          Okay, that's what I thought... but I just figured that was a given with any new system.
                          Yea, that's what I'd do. It's just that backtesting is so tedious.
                          Comment
                          • MonkeyF0cker
                            SBR Posting Legend
                            • 06-12-07
                            • 12144

                            #14
                            I honestly can't imagine trying to model games without knowing a programming language. It would take eons.
                            Comment
                            • Flying Dutchman
                              SBR MVP
                              • 05-17-09
                              • 2467

                              #15
                              Monkey, I know folks who use Excel. Matter of fact, the regression boys can do a lot...

                              One guy's "code" I've seen was very good.

                              Comment
                              • OMGRandyJackson
                                SBR MVP
                                • 02-07-10
                                • 1680

                                #16
                                Originally posted by Flying Dutchman
                                Check with Justin7, he's writing a "Sports Modeling for Dummies" book it seems.
                                Yah I pretty sure he is, due out around July. I was just hoping to get some sort of start before than, but I may just have to wait lol.
                                Comment
                                • MonkeyF0cker
                                  SBR Posting Legend
                                  • 06-12-07
                                  • 12144

                                  #17
                                  Originally posted by Flying Dutchman
                                  Monkey, I know folks who use Excel. Matter of fact, the regression boys can do a lot...

                                  One guy's "code" I've seen was very good.
                                  How are they scraping game data and line histories? Manual copy/paste? It would just consume so much time...
                                  Comment
                                  • MonkeyF0cker
                                    SBR Posting Legend
                                    • 06-12-07
                                    • 12144

                                    #18
                                    Those things could be done 1000x faster programmatically.
                                    Comment
                                    • Desert Tortoise
                                      SBR Rookie
                                      • 11-15-09
                                      • 14

                                      #19
                                      Doesn't anybody on this site use SAS, it's by far the best statistical software out there?

                                      BTW, if you are doing a logistic regression remember to do a weighted logistic regression. Theoretically, all logistic regression should be weighted to account for unequal variance on the edges of the binomial distribution, but even more so in sports you're going to have way more observations near a 50-50 matchup than near the edges so you need to adjust for that.
                                      Comment
                                      • MonkeyF0cker
                                        SBR Posting Legend
                                        • 06-12-07
                                        • 12144

                                        #20
                                        I use R.
                                        Comment
                                        • roasthawg
                                          SBR MVP
                                          • 11-09-07
                                          • 2990

                                          #21
                                          Originally posted by Desert Tortoise
                                          Doesn't anybody on this site use SAS, it's by far the best statistical software out there?

                                          BTW, if you are doing a logistic regression remember to do a weighted logistic regression. Theoretically, all logistic regression should be weighted to account for unequal variance on the edges of the binomial distribution, but even more so in sports you're going to have way more observations near a 50-50 matchup than near the edges so you need to adjust for that.
                                          What's the price on SAS?
                                          Comment
                                          • Flying Dutchman
                                            SBR MVP
                                            • 05-17-09
                                            • 2467

                                            #22
                                            Originally posted by MonkeyF0cker
                                            How are they scraping game data and line histories? Manual copy/paste? It would just consume so much time...
                                            If the only thing you can use is a hammer, everything looks like a nail...

                                            This was a pretty slick hammer, tho.

                                            Comment
                                            • Flying Dutchman
                                              SBR MVP
                                              • 05-17-09
                                              • 2467

                                              #23
                                              Originally posted by roasthawg
                                              What's the price on SAS?
                                              R is free, so why spend crazy buks on SAS?

                                              Lots of other free, or cheap stat stuff around.

                                              Auto scraping can be a problem, I'm told Covers has changed 6 times in just the last 3 weeks.

                                              Comment
                                              • roasthawg
                                                SBR MVP
                                                • 11-09-07
                                                • 2990

                                                #24
                                                Originally posted by Flying Dutchman
                                                R is free, so why spend crazy buks on SAS?

                                                Lots of other free, or cheap stat stuff around.

                                                Auto scraping can be a problem, I'm told Covers has changed 6 times in just the last 3 weeks.
                                                Because R isn't really setup for what I need when it comes to automatic stepwise regressions and all that.
                                                Comment
                                                • MonkeyF0cker
                                                  SBR Posting Legend
                                                  • 06-12-07
                                                  • 12144

                                                  #25
                                                  Originally posted by roasthawg
                                                  Because R isn't really setup for what I need when it comes to automatic stepwise regressions and all that.
                                                  How so?
                                                  Comment
                                                  • IrishTim
                                                    SBR Wise Guy
                                                    • 07-23-09
                                                    • 983

                                                    #26
                                                    Originally posted by MonkeyF0cker
                                                    How so?
                                                    It's too hard to learn if you don't have some kind of background in computer science/engineering. Not saying it's impossible, but most people don't have the time and/or commitment needed to learn R if they aren't already familiar with it. I understand how it can be the best for this type of thing, but most of us would spend more time trouble shooting than actually constructing the model.
                                                    Comment
                                                    • luigi
                                                      SBR Rookie
                                                      • 08-29-09
                                                      • 32

                                                      #27
                                                      Originally posted by OMGRandyJackson
                                                      I am interested in possibly working on a baseball model. It will most likely be ineffective, however I want to do it for the experience and to just see what I can come up with.

                                                      Obviously I have to collect all the data I can and go from there, but my main thing is, I do not know how to start the actual math part.

                                                      I am wondering is there any models out there that I can view and see how they started? Or if anyone here has some Tips! Thanks!

                                                      If i can manage to find the time I'd also like to pursue this this season as well. I would suggest you learn about the various baseball forecasting systems that are out there (like pecota...if ure interested in the stats, don't buy the book, I'd suggest you get a fantasy player membership on their site as u can get an excel format of all the projections)..

                                                      If you're not at all familiar with regression to the mean then u are gonna have problems if you attempt to use a relatively small sample size of at bats or IP when evaluating players.. which is why I like to look at projections.

                                                      Michael Murray's book "betting baseball" should get you started on the right track..but you're gonna have to dig deeper than that. You'll have to be able to make quantitative adjustments for games based on platoon splits/pitcher type (fly ball/ground ball..power/finesse) etc.

                                                      Good luck with this and keep us informed
                                                      Comment
                                                      SBR Contests
                                                      Collapse
                                                      Top-Rated US Sportsbooks
                                                      Collapse
                                                      Working...