Why is scrape not working?

illfuuptn · 12-09-12 12:55 AM

Idk why it is returning this: [' '] but it is and it makes no sense.

webpage=urlopen('http://www.sportsbookreview.com/betting-odds/ncaa-basketball/?date=20120102').read()
findTeamName= re.compile('"team-name".*>(.*) find_it = re.findall(findTeamName,webpage)
print find_it
[' ']

I've checked tutorials and the pages they scrape work fine when I do them. So what's up with this?

illfuuptn · 12-09-12 12:57 AM

sorry for no code tags but when I put them on it deletes stuff for some reason

illfuuptn · 12-09-12 01:02 AM

Can't post source code because sbr reads it. So I guess you can look for yourself if you want to double check

illfuuptn · 12-17-12 01:53 PM

any help appreciated

Maverick22 · 12-17-12 02:54 PM

Whatever language that is... I don't know it.

Sorry, that I am not able to help you

illfuuptn · 12-17-12 04:36 PM

Originally Posted by Maverick22

Whatever language that is... I don't know it.

Sorry, that I am not able to help you

That's quite alright Maverick. You've helped me many times in the past. It's python for the record.

Blax0r · 12-18-12 01:20 AM

Hey illfuuptn, I'm not very familiar with python, but I think you should try adding a question mark after the .* in your regular expression.

So it would look like this: findTeamName= re.compile('"team-name".*?>(.*?)). Also, you may need to put something after the (.*?), or risk saving ALL the text after team name.

The question mark makes the regular expression "not greedy"; http://www.regular-expressions.info/optional.html.

Hope this helps.

EXhoosier10 · 12-18-12 01:04 PM

Originally Posted by illfuuptn

Idk why it is returning this: [' '] but it is and it makes no sense.

webpage=urlopen('http://www.sportsbookreview.com/betting-odds/ncaa-basketball/?date=20120102').read()
findTeamName= re.compile('"team-name".*>(.*) find_it = re.findall(findTeamName,webpage)
print find_it
[' ']

I've checked tutorials and the pages they scrape work fine when I do them. So what's up with this?

I've never used urllib before. I downloaded beautfiful soup 4 from the get-go. If you change your code to that (since I have no idea how urlopen works or how compile or findall work), I'll be glad to take a look.

Also, try attaching your code in a .txt file

HUY · 12-18-12 02:57 PM

Originally Posted by illfuuptn

Idk why it is returning this: [' '] but it is and it makes no sense.

webpage=urlopen('http://www.sportsbookreview.com/betting-odds/ncaa-basketball/?date=20120102').read()
findTeamName= re.compile('"team-name".*>(.*) find_it = re.findall(findTeamName,webpage)
print find_it
[' ']

I've checked tutorials and the pages they scrape work fine when I do them. So what's up with this?

Never parse HTML using hand-crafted regular expressions. You will go insane and the result will always be unreliable. Just use BeautifulSoup and save yourself the hassle.

KennyRogers · 12-20-12 10:22 PM

Is there an update to this? I'm not familiar with BeautifulSoup, I am more familiar with Scrapy.

Spektre · 01-12-13 03:13 PM

Is scraping SBR allowed?

SBR Top-Rated Sportsbooks				Best Sportsbooks List
#1 FanDuel	SBR rating 4.8/5	Review	#6 BetRivers	SBR rating 4.1/5	Review
#2 Caesars	SBR rating 4.7/5	Review	#7 Fanatics	SBR rating 4.1/5	Review
#3 DraftKings	SBR rating 4.7/5	Review	#8 Betway	SBR rating 3.8/5	Review
#4 BetMGM	SBR rating 4.6/5	Review	#9 Borgata	SBR rating 3.5/5	Review
#5 bet365	SBR rating 4.6/5	Review	#10 ClutchBet	SBR rating 2.9/5	Review

Why is scrape not working?

Thread Tools

Why is scrape not working?