An introduction to research

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • mikmak
    SBR Rookie
    • 05-03-13
    • 29

    #246
    I would be interested in any literature that will help me get better and specifically with data scraping and model building. I'm guessing there are many others that would as well.
    Comment
    • wizcodlifa
      SBR Wise Guy
      • 01-10-12
      • 921

      #247
      Hello all

      I am willing to learn and master how to program. I am going to first admit I really have no idea even the littlest bit on how to program. Is there any info or classes to introduce this??

      All Help is greatly appreciated
      Comment
      • matthew919
        SBR Sharp
        • 11-21-12
        • 421

        #248
        Not trying to be a jerk, but did you even read this thread? Start at page 1 and make an attempt. The links that were posted (by Inspirited, I believe) in a previous thread you started will also be very helpful. Ask specific questions when you run into problems.

        Or, if you're looking for classroom instruction, take (or audit) a course at a community college. Good luck.
        Comment
        • easyliving
          SBR Hall of Famer
          • 06-25-12
          • 8876

          #249
          thanks for the info will read this in my spare time.
          Comment
          • ExodusNZ
            SBR Wise Guy
            • 09-02-11
            • 605

            #250
            Hi

            Im after Rugby Union or Rugby League data ?
            Comment
            • Cyyyyk
              SBR Rookie
              • 08-30-13
              • 11

              #251
              Interesting thread. I have been very successful building systems in excel but getting the raw data has always been the primary sticking point. I am going to have to look into some of these scraping methods. Thanks to everyone who posted!
              Comment
              • Cyyyyk
                SBR Rookie
                • 08-30-13
                • 11

                #252
                Originally posted by bobbydrake
                Simple Perl Web Parser Script
                I don't mean any disrespect toward the original poster's preferred programming language. This is just another method. We all need options in life.
                ---------------------------------------------------------------
                #! /usr/bin/perl
                print "Content-type: text/html\n\n";

                use LWP::Simple;

                use HTML::TreeBuilder;

                use HTML::FormatText;

                $URL = get("http://www.websiteyouwanttoparse.com");

                $Format = HTML::FormatText->new;

                $TreeBuilder = HTML::TreeBuilder->new;

                $TreeBuilder->parse($URL);

                $Parsed = $Format->format($TreeBuilder);

                print "$Parsed";
                open(FILE, ">file.txt");
                print FILE "$Parsed";
                close(File);

                exit;
                --------------------------------------------------------------
                Things you need to edit:
                $URL = get("http://www.websiteyouwanttoparse.com");
                Change the website name to a website you want to parse. If you know a little perl, those quotations have to be included.

                and also edit
                open(FILE, ">file.txt");
                Change file.txt to a file name of your choice. Remember to change this every time you parse a new website or you'll have this program delete all the info on that file from a previous save.

                This is my opinion. I take no responsibility for your actions. This is only for educational purposes.
                Forgive my ignorance...... but how would I actually utilize this script? Do I upload to a server cgibin and then go to the page? I mean how do I get this script to run?
                Comment
                • dolyms
                  SBR Rookie
                  • 02-07-12
                  • 23

                  #253
                  thank you for doing this.
                  Comment
                  • bozeman
                    SBR MVP
                    • 11-11-09
                    • 2162

                    #254
                    don't waste your time research something else, but not stats
                    Comment
                    • V4Value
                      SBR Sharp
                      • 09-01-13
                      • 368

                      #255
                      Just came across this..can anyone point out any other valuable threads like this one.
                      Comment
                      • HUY
                        SBR Sharp
                        • 04-29-09
                        • 253

                        #256
                        Originally posted by mikmak
                        I would be interested in any literature that will help me get better and specifically with data scraping and model building. I'm guessing there are many others that would as well.
                        Since you're asking, here's a plug for my site, www.statsfair.com . You might pick up some ideas from there.
                        Comment
                        • elgreco
                          SBR Wise Guy
                          • 12-16-09
                          • 988

                          #257
                          Wrote my first python script thanks to this thread
                          Comment
                          • bobbydrake
                            SBR Rookie
                            • 02-16-09
                            • 38

                            #258
                            Originally posted by Cyyyyk
                            Forgive my ignorance...... but how would I actually utilize this script? Do I upload to a server cgibin and then go to the page? I mean how do I get this script to run?
                            First, find out if you have >= perl 5.0 installed
                            Install perl modules (www.cpan.org) LWP::Simple, HTML::TreeBuilder, HTML::FormatText
                            Copy and Paste script in text editor. Save and make file executable (chmod +x file.pl)
                            On command line ~$ perl file.pl

                            Forgive me for the late response. Don't come here that often.

                            My original post, it seems, was a digression to the OP's intention for this thread. I apologize.
                            I would have to yield to the OP's programming language (python) in this thread since he can educate better than I can. Python is a great programming language. It's MIT's language of choice.
                            Comment
                            • drawster
                              SBR Rookie
                              • 11-03-13
                              • 23

                              #259
                              Great Post!
                              Comment
                              • kaalhode
                                SBR Rookie
                                • 01-11-14
                                • 15

                                #260
                                Really informative and great post. Got a tiny bit of programming experience, but have never really used it for something useful
                                Comment
                                • jtluongo
                                  SBR High Roller
                                  • 01-06-14
                                  • 104

                                  #261
                                  before learning programming I would like to learn more about predictive analytics and modeling. is there any good books out there I can read up on or maybe a youtube series etc
                                  Comment
                                  • Inspirited
                                    SBR MVP
                                    • 06-26-10
                                    • 1788

                                    #262
                                    edx.com has a sabermetrics course coming up
                                    mathletics is a good book to start with
                                    Comment
                                    • metaldeth
                                      SBR Rookie
                                      • 02-28-14
                                      • 43

                                      #263
                                      I've a basic knowledge of programming in java and excel. I've read the first 4 chapters of the Python book, nothing new, just a new language. But I find the transition from those 4 chapters to scraping webpages extremely difficult. I've never worked with websites before.

                                      Are there any examples, books, exercises out there that can help me with this transition?


                                      Thanks!
                                      Comment
                                      • Maniac
                                        SBR Wise Guy
                                        • 04-12-11
                                        • 667

                                        #264
                                        Originally posted by Inspirited
                                        edx.com has a sabermetrics course coming up
                                        mathletics is a good book to start with
                                        That Sabermetrics course starts today for anyone still interested.
                                        Comment
                                        • horja1
                                          SBR Hall of Famer
                                          • 01-13-11
                                          • 5646

                                          #265
                                          Comment
                                          • dodo_molnar
                                            SBR Rookie
                                            • 06-02-14
                                            • 3

                                            #266
                                            Hi,
                                            maybe on-line software (betviz) can help you.
                                            Comment
                                            • lamichaeljames
                                              SBR Rookie
                                              • 06-02-14
                                              • 40

                                              #267
                                              great stuff here. Hopefully, this continues.
                                              Comment
                                              • donkeyshark
                                                SBR Rookie
                                                • 08-09-14
                                                • 6

                                                #268
                                                Originally posted by jtluongo
                                                before learning programming I would like to learn more about predictive analytics and modeling. is there any good books out there I can read up on or maybe a youtube series etc
                                                Coursera has a machine learning course which is offered by Stanford. It's a good place to start.
                                                Comment
                                                • poetbil
                                                  SBR Rookie
                                                  • 07-11-13
                                                  • 4

                                                  #269
                                                  people like this ,who share knowledge are Blessed
                                                  Comment
                                                  • chipper
                                                    SBR MVP
                                                    • 01-07-10
                                                    • 1994

                                                    #270
                                                    Great info... I need to read & reread it a few times for sure.
                                                    Comment
                                                    • markorozo
                                                      SBR Rookie
                                                      • 01-24-15
                                                      • 13

                                                      #271
                                                      Originally posted by ljump12
                                                      Section D) How to scrape the internet for data

                                                      One of the most important aspects of research is the data that you have. Without data, there can't be any model. Fortunately, most data is free -- Unfortunately, most data isn't immediately in the best computer parsable formats [like .csv, or .xml]. To get the data into formats we can use we will need to "scrape" websites for it.

                                                      A couple "packages" have been created that will greatly improve our ability to scrape webpages. It can certaintly be done in python without them -- but they will make your life a whole lot easier:

                                                      Mechanize - This will allow us to open webpages easily (http://wwwsearch.sourceforge.net/mechanize/)
                                                      Beautiful Soup - This will allow us to parse apart the webpages (http://www.crummy.com/software/BeautifulSoup/)

                                                      Installing Beautiful Soup is pretty easy, you can just put the http://www.crummy.com/software/Beaut...lSoup-3.0.0.py Beautiful soup python file in the same directory you are running your code from.

                                                      Installing Mechanize is a little tougher, on a *nix machine, cd to the directory of where you downloaded it and extract it (tar -xzvf [filename]). Then cd into the extracted directory and install it by typing "sudo python setup.py install" It should install, you can post here if you have any problems. As far as windows goes, you may be on your own -- I can't imagine it's very tough, and there's probably a tutorial somewhere online.

                                                      Now that the installation is out of the way, it's time to get down to business. I'll give you the basics here, and you should be able to refer to the documentation for more complicated examples. I'm going to assume you have a basic familiarity of html -- if you don't, you may want to search for a quick tutorial. Let's make our first example getting a list of today's injuries from statfox for MLB baseball:

                                                      PHP Code:
                                                      
                                                      
                                                      from BeautifulSoup import BeautifulSoup, SoupStrainer ## This tells python to use Beautiful Soup
                                                      from mechanize import Browser   ## This tells python we want to use a browser (which is defined in mechanize)
                                                      import re   ## This tells python that we will be using some regular expressions.
                                                                  ## .. Regular expression allow us to search for a sequence of characters
                                                                  ## .. within a larger string
                                                      import time
                                                      import datetime
                                                      
                                                      ## The first step is to create our browser..
                                                      br = Browser()
                                                      
                                                      ## Now let's open the injuries page on statfox. This one line will open and retreive the html.
                                                      response = br.open("http://www.sbrodds.com/StoryArchivesForm.aspx?ShortNameLeague=mlb&ArticleType=injury&l=3").read()
                                                      
                                                      ## Now we need to tell Beautiful Soup that we would like to search through the response.
                                                      ## .. This next line will tell beautiful soup to only return links to the individual inuries.
                                                      ## .. We know that all the links to the injuries have "ShortNameLeague=mlb&ArticleType=injury" 
                                                      ## .. in their url, so we search for these links. Each of these links has a title that describes
                                                      ## .. the injury which we will use in the next line.
                                                      linksToInjuries = SoupStrainer('a', href=re.compile('ShortNameLeague=mlb&ArticleType=injury'))
                                                      
                                                      ## This will put the title of all links in the "linksToInjuries" into an array.
                                                      ## We then call Set on our array to change the array to a "set" which by definition has no duplicates.
                                                      injuryTitles = set([injuryPage['title'] for injuryPage in BeautifulSoup(response, parseOnlyThese=linksToInjuries)])
                                                      
                                                      
                                                      ## Finally let's print all the injuries out that are for today's date.
                                                      today = datetime.date.today()
                                                      # the function strftime() (string-format time) produces nice formatting
                                                      # All codes are detailed at http://www.python.org/doc/current/lib/module-time.html
                                                      date =  today.strftime("%m/%d") 
                                                      
                                                      ## Now let's print out the injuries that we have.
                                                      for title in injuryTitles:
                                                          ## See if the date is in the title, if it is: print it.
                                                          if re.search(date, title):
                                                              print title 
                                                      
                                                      It might seem like a lot at first, but it's not much code. Take it slow and use google when you dont know what a function does. Googling "python [some piece of code you dont understand]" will work magic. Ask here and i can further break down any slice of code.

                                                      Sorry I haven't had much time -- If anyone can post an example of what kind of data they would like to be scraped, I will create one more example using both BeautifulSoup and Mechanize.
                                                      Thanks for sharing all this

                                                      How would you go to scrap a Javascript website? Lets say whoscored.com?
                                                      Comment
                                                      • stevenash
                                                        Moderator
                                                        • 01-17-11
                                                        • 65433

                                                        #272
                                                        Jump12?
                                                        What do you do for a living?

                                                        And so far all your posts are spot on.
                                                        Well done.

                                                        I'm an I/T Operations Analyst myself.
                                                        The business end of sports is pretty much all analytics.

                                                        I'm big on SQL.

                                                        Good work here, I'll drop by later.
                                                        Comment
                                                        • stevenash
                                                          Moderator
                                                          • 01-17-11
                                                          • 65433

                                                          #273
                                                          Originally posted by jtluongo
                                                          before learning programming I would like to learn more about predictive analytics and modeling. is there any good books out there I can read up on or maybe a youtube series etc
                                                          Read this twice.
                                                          Buy it, it's not cheap, but it's the best.
                                                          You can get it in the Google store

                                                          Comment
                                                          • Big Bear
                                                            SBR Aristocracy
                                                            • 11-01-11
                                                            • 43253

                                                            #274
                                                            good stuff here is food for thought

                                                            Decision =power
                                                            Progress = happiness

                                                            Learning is the relationship of a "Known" to an "unknown" If you are having trouble learning something it's b/c you are unable to relate it to something you already know.

                                                            example Fishing is like sales... If you know how to fish then you know how to sell.

                                                            A strategy is specific way of organizing resources in order to consistently generate the results you want.

                                                            The formula for happiness is to be able to meet or exceed your expectations.

                                                            think about this when designing your model for this baseball season.
                                                            Comment
                                                            • oilcountry99
                                                              SBR Wise Guy
                                                              • 08-29-10
                                                              • 707

                                                              #275
                                                              This was a great thread until post #26 and the OP stopped after section 1d. What happened?
                                                              Comment
                                                              • Elko
                                                                SBR Rookie
                                                                • 10-09-15
                                                                • 4

                                                                #276
                                                                this thread has been around so long I think 1/2 of the original posters have died.
                                                                Comment
                                                                • ljump12
                                                                  SBR High Roller
                                                                  • 12-08-09
                                                                  • 112

                                                                  #277
                                                                  Originally posted by Elko
                                                                  this thread has been around so long I think 1/2 of the original posters have died.
                                                                  Count me in the half that's alive

                                                                  For what it's worth (because I doubt I'm ever going to finish this guide):


                                                                  &



                                                                  Should get you far.
                                                                  Last edited by ljump12; 10-12-15, 01:40 PM.
                                                                  Comment
                                                                  • elgary
                                                                    SBR Rookie
                                                                    • 10-20-15
                                                                    • 4

                                                                    #278
                                                                    help

                                                                    if someone can help me i will appreciate it, i read the chapters and all that stuff, wathc videos, but i dont know how to download the csv file form the link. i hope someone can help me with that, thanks.
                                                                    Comment
                                                                    • Muller Rose
                                                                      SBR High Roller
                                                                      • 06-01-16
                                                                      • 219

                                                                      #279
                                                                      Great info you have here. Thanks for taking time to post this.
                                                                      Comment
                                                                      • mtalock
                                                                        SBR Rookie
                                                                        • 08-09-16
                                                                        • 45

                                                                        #280
                                                                        Wow,this is easy?

                                                                        Hey, thank you for the tech info and the links.
                                                                        this seems like a plethora of info but I am diving into the deep end.
                                                                        Comment
                                                                        SBR Contests
                                                                        Collapse
                                                                        Top-Rated US Sportsbooks
                                                                        Collapse
                                                                        Working...