data/programming/updating model question

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Rufus
    SBR High Roller
    • 03-28-08
    • 107

    #1
    data/programming/updating model question
    Hi all,

    I have been working on modeling MLB for most of a year and have a (primarily econometric) model that I can confidently say should be a long-term winner. Problem is, I'm not much of a programmer. I did take an Intro Programming course (IN JAVA) in my last semester of school (last spring) and I can program pretty well in Stata (the statistical program I use), but the real problem is updating the model every day with new data. I've written code to generate a prediction (with the two teams and pitchers as inputs) but I'll need to add data new game data every day.

    My question is this: Do any of you/have any of you faced this issue? What is your solution? Do I pretty much need to learn to write a webcrawler using Perl to scrape data offline?

    Any help/advice would be greatly appreciated.
  • Sinister Cat
    SBR MVP
    • 06-03-08
    • 1090

    #2
    Yes, scrape the data. I use Tcl to do it-- a lot easier than Perl. Python or Ruby would be other choices. Perl is popular for this kind of thing too but probably more difficult for a novice programmer.
    Comment
    • Ganchrow
      SBR Hall of Famer
      • 08-28-05
      • 5011

      #3
      I'm personally partial to Perl, in which I probably do close to 90% of my programming. If you have experience with Java you should have absolutely no problem with Perl.

      You might also want to look into hiring programming help off of rentacoder.com or a similar site.
      Comment
      • Rufus
        SBR High Roller
        • 03-28-08
        • 107

        #4
        Any good book you would recommend to learn Perl?
        Comment
        • Ganchrow
          SBR Hall of Famer
          • 08-28-05
          • 5011

          #5
          Originally posted by modelman
          Any good book you would recommend to learn Perl?
          The O'Reilly Learning Perl and Programming Perl books are very user-friendly.
          Comment
          • durito
            SBR Posting Legend
            • 07-03-06
            • 13173

            #6
            I have a programmer I hired through rentacoder.com

            He's pretty cheap, but the work isn't quite what i want. If i can ever get my brain to work again, I'm going to try and learn again myself.
            Comment
            • durito
              SBR Posting Legend
              • 07-03-06
              • 13173

              #7
              Originally posted by Ganchrow
              The O'Reilly Learning Perl and Programming Perl books are very user-friendly.
              Ordered. Finding out that amazon can deliver to Colombia has not been good for my spending habits.
              Comment
              • Data
                SBR MVP
                • 11-27-07
                • 2236

                #8
                Originally posted by durito
                Ordered. Finding out that amazon can deliver to Colombia has not been good for my spending habits.
                This is a cheaper way:


                You read the books online (or save them on your computer as PDFs). A great time saving benefit, you get all the sample code in downloadable files.
                Comment
                • MrX
                  SBR MVP
                  • 01-10-06
                  • 1540

                  #9
                  If you have the time, I'd definitely recommend learning enough to write your own scrapers.

                  Occasionally the site you're scraping from will make a slight change to their format, or some aspect of a report will be different enough from the norm to throw off your scraper and it's sure nice to be able to make changes on the fly instead of waiting for your programmer.

                  As a side note, I scrape most of my data from MLB.com and they have remained blessedly consistent for a couple of years.
                  Comment
                  • Rufus
                    SBR High Roller
                    • 03-28-08
                    • 107

                    #10
                    Thanks everyone! I really appreciate the help.
                    Comment
                    • Justin7
                      SBR Hall of Famer
                      • 07-31-06
                      • 8577

                      #11
                      I paid a programmer to write a scraper in Perl. It would automatically download stats from USAToday every day.
                      Comment
                      • Rufus
                        SBR High Roller
                        • 03-28-08
                        • 107

                        #12
                        Originally posted by Justin7
                        I paid a programmer to write a scraper in Perl. It would automatically download stats from USAToday every day.
                        How much would that sort of thing cost?
                        Comment
                        • rsigley
                          SBR Sharp
                          • 02-23-08
                          • 304

                          #13
                          i use php, works pretty well - never had a problem

                          and i use windows scheduler to run it once a day at 7am and input it into mysql db

                          also for mlb you can just use dougstats, he updates once a day though i notice he's missing a couple players (like e. gonzalez from the padres)
                          Comment
                          • Rufus
                            SBR High Roller
                            • 03-28-08
                            • 107

                            #14
                            I just looked at dougstats. It seems pretty good, except I need the game-by-game stats since I don't use uniform weights. I normally get it from baseball-reference (I have a subscription so I can use the Play Index) but it's pain in the ass to copy and paste it all.
                            Comment
                            • Rufus
                              SBR High Roller
                              • 03-28-08
                              • 107

                              #15
                              What database/statistical software do other people use? Being an econ major in college, I learned Stata, which works well for me once I get data into it. I can do all the regressions, statistical analysis, and data management. Anybody else have other database preferences?
                              Comment
                              • MrX
                                SBR MVP
                                • 01-10-06
                                • 1540

                                #16
                                Originally posted by modelman
                                What database/statistical software do other people use? Being an econ major in college, I learned Stata, which works well for me once I get data into it. I can do all the regressions, statistical analysis, and data management. Anybody else have other database preferences?
                                I find that the statistical functions in Excel 2007 meet most of my needs. I've dabbled in a couple other programs for regression analysis, but not lately.

                                Mysql for database needs.
                                Comment
                                • Ganchrow
                                  SBR Hall of Famer
                                  • 08-28-05
                                  • 5011

                                  #17
                                  Originally posted by modelman
                                  What database/statistical software do other people use? Being an econ major in college, I learned Stata, which works well for me once I get data into it. I can do all the regressions, statistical analysis, and data management. Anybody else have other database preferences?
                                  I use custom written software and AMPL for quantitative modeling along with a mySQL database.
                                  Comment
                                  • rsigley
                                    SBR Sharp
                                    • 02-23-08
                                    • 304

                                    #18
                                    r because that was what i learned how to use in school

                                    no other reason really
                                    Comment
                                    • Justin7
                                      SBR Hall of Famer
                                      • 07-31-06
                                      • 8577

                                      #19
                                      Originally posted by Justin7
                                      I paid a programmer to write a scraper in Perl. It would automatically download stats from USAToday every day.
                                      It was years ago. I think about $3k.
                                      Comment
                                      • VBOMBER
                                        SBR High Roller
                                        • 01-02-08
                                        • 228

                                        #20
                                        What sites have you guys found consistent/reliable for scraping data for NBA and College Hoops?
                                        Comment
                                        • rsigley
                                          SBR Sharp
                                          • 02-23-08
                                          • 304

                                          #21
                                          if anyone wants wnba boxscore data i wrote this little script that will download all the team stats info and put into a db every morning if there was a game the night before

                                          it uses the pinnacle team abbreviations for the name too, so if you want to link to your lines db it works



                                          you can also just manipulate it to get data from other sports too, i have one for each sport but this is the only one i use ESPN for
                                          Comment
                                          Search
                                          Collapse
                                          SBR Contests
                                          Collapse
                                          Top-Rated US Sportsbooks
                                          Collapse
                                          Working...