Thanks for all the great info guys; really helpful.
Thanks for all the great info guys; really helpful.
I use a mysql db with a series of bash scripts on a linux box for my college basketball stuff. Was a F*ck ton of work at the beggining to get it going but now that it's running it doesn't require much from me. only does college basketball though.
thank you ver much for this
This is thread is really great, thanks a lot, i have some questions about using
br.set_handle_robots(False) in mechanize
when a site has a robots.txt file, i know there are legal o ethical issues respecting this,
i want to try scraping but from what i read you need to set timeouts on your scripts so your ip doesnt get ban, and other measures.
are there any sites that are "ok" with being scrape for stats (sbr?)?? or should you be really careful with your scraping since most i would guess dont like it, what other things should we consider??
dmolition, most sites are NOT OK with scraping due to copyright and not a few will actively block you. And it seems that even those who tolerate it change formats so often that you are always in tweaking code to get around the changes.
Which Sites are you referring to that will block you?
very interesting stuff
Yeah i figured as much, so to scrape 10 seasons of any sport i imagine i need multiple IPs, timeouts in scripts, constantly checking for changes in DOM structure of the HTML,etc,etc. Now i know the cost of data.
Im gonna research and maybe if i gather enough data i'll be willing to trade it (after i validate it of course)
It would be nice to have a list of sites of where they enforce more strictly anti scraping policies or where NOT to try it so we can have a little piece of mind.
Also i'm taking the hard road and learning R and python (checking out SciPy also) for data analysis, i'm savvy with software development, when i can actually start doing some serious data analysis if anyone wants to exchange technical tips of how to do that and this, maybe we can open a "hacking/data analysis stuff" thread to discuss tips and such, to ask general questions,tips and contribute in general.
i abuse statfox and have yet to be banned
Ok im starting to collect the data so far so good, but the next step is to check the integrity,
im comparing my data against covers.com and espn.com mostly,
are these sites accurate with stat records?
what sites are more reliable in your opinion for stat comparing?
i will look this up to see the sense in it. thanks.
what would be nice is 'data for dummies', but that comes with many challenges for the sharpies...
my 2 cents
Hello to every One
Do anybody remember what time ? the ball kick off from 2nd between Tenn adn Jacksonvill? Please!! Thank you
Looking forward to learning more. Thank you