Does anyone in the tank know anything about using Windmill for scraping?
Normally I use the mechanize library (for Python) to get the html for most of my scraping, which works great as long as the content is directly coded in the page. But some sites (like mlb.com) use a lot of Javascript that auto-generates the content only after you open the page - mechanize won't work in such cases.
I've read several external discussions about Windmill, which is a testing library that allows you to interact with a real web browser. I'm using Windows, and have installed it successfully as a Python site-package following the directions here.
I've also come across a couple of good coding descriptions for it for scraping, but I can't figure out how to actually fire it up, either from the Python command line or the Python GUI (Idle).
[For those in the know, I want to be able to get to the point where I can issue a command such as: client = WindmillTestClient()]
Any help appreciated.
Normally I use the mechanize library (for Python) to get the html for most of my scraping, which works great as long as the content is directly coded in the page. But some sites (like mlb.com) use a lot of Javascript that auto-generates the content only after you open the page - mechanize won't work in such cases.
I've read several external discussions about Windmill, which is a testing library that allows you to interact with a real web browser. I'm using Windows, and have installed it successfully as a Python site-package following the directions here.
I've also come across a couple of good coding descriptions for it for scraping, but I can't figure out how to actually fire it up, either from the Python command line or the Python GUI (Idle).
[For those in the know, I want to be able to get to the point where I can issue a command such as: client = WindmillTestClient()]
Any help appreciated.