Why is scrape not working?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • illfuuptn
    SBR MVP
    • 03-17-10
    • 1860

    #1
    Why is scrape not working?
    Idk why it is returning this: [' '] but it is and it makes no sense.


    webpage=urlopen('http://www.sbrforum.com/betting-odds/ncaa-basketball/?date=20120102').read()
    findTeamName= re.compile('"team-name".*>(.*)</span></div')
    find_it = re.findall(findTeamName,webpage)
    print find_it
    [' ']


    I've checked tutorials and the pages they scrape work fine when I do them. So what's up with this?
    Last edited by illfuuptn; 12-09-12, 02:08 AM. Reason: find_it should be on new line btw
  • illfuuptn
    SBR MVP
    • 03-17-10
    • 1860

    #2
    sorry for no code tags but when I put them on it deletes stuff for some reason
    Comment
    • illfuuptn
      SBR MVP
      • 03-17-10
      • 1860

      #3
      Can't post source code because sbr reads it. So I guess you can look for yourself if you want to double check
      Last edited by illfuuptn; 12-09-12, 02:05 AM.
      Comment
      • illfuuptn
        SBR MVP
        • 03-17-10
        • 1860

        #4
        any help appreciated
        Comment
        • Maverick22
          SBR Wise Guy
          • 04-10-10
          • 807

          #5
          Whatever language that is... I don't know it. Sorry, that I am not able to help you
          Comment
          • illfuuptn
            SBR MVP
            • 03-17-10
            • 1860

            #6
            Originally posted by Maverick22
            Whatever language that is... I don't know it. Sorry, that I am not able to help you
            That's quite alright Maverick. You've helped me many times in the past. It's python for the record.
            Comment
            • Blax0r
              SBR Wise Guy
              • 10-13-10
              • 688

              #7
              Hey illfuuptn, I'm not very familiar with python, but I think you should try adding a question mark after the .* in your regular expression.

              So it would look like this: findTeamName= re.compile('"team-name".*?>(.*?)). Also, you may need to put something after the (.*?), or risk saving ALL the text after team name.

              The question mark makes the regular expression "not greedy";
              http://www.regular-expressions.info/optional.html.

              Hope this helps.
              Comment
              • EXhoosier10
                SBR MVP
                • 07-06-09
                • 3122

                #8
                Originally posted by illfuuptn
                Idk why it is returning this: [' '] but it is and it makes no sense.


                webpage=urlopen('http://www.sbrforum.com/betting-odds/ncaa-basketball/?date=20120102').read()
                findTeamName= re.compile('"team-name".*>(.*)</span></div')
                find_it = re.findall(findTeamName,webpage)
                print find_it
                [' ']


                I've checked tutorials and the pages they scrape work fine when I do them. So what's up with this?
                I've never used urllib before. I downloaded beautfiful soup 4 from the get-go. If you change your code to that (since I have no idea how urlopen works or how compile or findall work), I'll be glad to take a look.

                Also, try attaching your code in a .txt file
                Comment
                • HUY
                  SBR Sharp
                  • 04-29-09
                  • 253

                  #9
                  Originally posted by illfuuptn
                  Idk why it is returning this: [' '] but it is and it makes no sense.


                  webpage=urlopen('http://www.sbrforum.com/betting-odds/ncaa-basketball/?date=20120102').read()
                  findTeamName= re.compile('"team-name".*>(.*)</span></div')
                  find_it = re.findall(findTeamName,webpage)
                  print find_it
                  [' ']


                  I've checked tutorials and the pages they scrape work fine when I do them. So what's up with this?
                  Never parse HTML using hand-crafted regular expressions. You will go insane and the result will always be unreliable. Just use BeautifulSoup and save yourself the hassle.
                  Comment
                  • KennyRogers
                    SBR Rookie
                    • 12-20-12
                    • 1

                    #10
                    Is there an update to this? I'm not familiar with BeautifulSoup, I am more familiar with Scrapy.
                    Comment
                    • Spektre
                      SBR High Roller
                      • 02-28-10
                      • 184

                      #11
                      Is scraping SBR allowed?
                      Comment
                      SBR Contests
                      Collapse
                      Top-Rated US Sportsbooks
                      Collapse
                      Working...