Why is scrape not working?

illfuuptn · 12-09-12, 01:57 AM

sorry for no code tags but when I put them on it deletes stuff for some reason

illfuuptn · 12-09-12, 02:02 AM

Can't post source code because sbr reads it. So I guess you can look for yourself if you want to double check

illfuuptn · 12-17-12, 02:53 PM

any help appreciated

Maverick22 · 12-17-12, 03:54 PM

Whatever language that is... I don't know it.

Sorry, that I am not able to help you

illfuuptn · 12-17-12, 05:36 PM

Originally posted by Maverick22

Whatever language that is... I don't know it.

Sorry, that I am not able to help you

That's quite alright Maverick. You've helped me many times in the past. It's python for the record.

Blax0r · 12-18-12, 02:20 AM

Hey illfuuptn, I'm not very familiar with python, but I think you should try adding a question mark after the .* in your regular expression.

So it would look like this: findTeamName= re.compile('"team-name".*?>(.*?)). Also, you may need to put something after the (.*?), or risk saving ALL the text after team name.

The question mark makes the regular expression "not greedy"; http://www.regular-expressions.info/optional.html.

Hope this helps.

EXhoosier10 · 12-18-12, 02:04 PM

Originally posted by illfuuptn

Idk why it is returning this: [' '] but it is and it makes no sense.

webpage=urlopen('http://www.sbrforum.com/betting-odds/ncaa-basketball/?date=20120102').read()
findTeamName= re.compile('"team-name".*>(.*)</span></div')
find_it = re.findall(findTeamName,webpage)
print find_it
[' ']

I've checked tutorials and the pages they scrape work fine when I do them. So what's up with this?

I've never used urllib before. I downloaded beautfiful soup 4 from the get-go. If you change your code to that (since I have no idea how urlopen works or how compile or findall work), I'll be glad to take a look.

Also, try attaching your code in a .txt file

HUY · 12-18-12, 03:57 PM

Originally posted by illfuuptn

Idk why it is returning this: [' '] but it is and it makes no sense.

webpage=urlopen('http://www.sbrforum.com/betting-odds/ncaa-basketball/?date=20120102').read()
findTeamName= re.compile('"team-name".*>(.*)</span></div')
find_it = re.findall(findTeamName,webpage)
print find_it
[' ']

I've checked tutorials and the pages they scrape work fine when I do them. So what's up with this?

Never parse HTML using hand-crafted regular expressions. You will go insane and the result will always be unreliable. Just use BeautifulSoup and save yourself the hassle.

KennyRogers · 12-20-12, 11:22 PM

Is there an update to this? I'm not familiar with BeautifulSoup, I am more familiar with Scrapy.

Spektre · 01-12-13, 04:13 PM

Is scraping SBR allowed?