Web Scraping for dummies?
July 22, 2005 11:37 AM   Subscribe

A project at work has come up, and I would save a lot of time and hassle if I could somehow get my hands on a free (cheap is acceptable, as long as I can try it first), easy to use web-scraping program. The URL from which I will be scraping is static, unencrypted, and otherwise extremely vanilla. Suggestions?
posted by Kwantsar to Computers & Internet (12 answers total) 1 user marked this as a favorite
 
curl or wget.

Or, what platform?
posted by trevyn at 11:41 AM on July 22, 2005


Pretty much every scripting language has a capacity to do scraping. I'd say pick something you're comfortable with and try out some examples.
posted by mathowie at 11:45 AM on July 22, 2005


Install perl, and the download and install the simple web libraries. There's a sample script that comes along with them for straightforward scraping.
posted by thanotopsis at 11:45 AM on July 22, 2005


Response by poster: Platform is XP, code-writing skills are minimal.
posted by Kwantsar at 11:52 AM on July 22, 2005


If you can't find a script or something that you can modify easily, it'd be relatively trivial to write a perl script for it. I could whip something up fairly fast if you don't find something out of the box you like.
posted by devilsbrigade at 12:05 PM on July 22, 2005


Wget will get webpages for you.

"Scraping" is usually defined as a combination of both getting the webpage and parsing it for whatever data you want. There's no magic bullet for the parsing part, since, uh, webpages are different. You're going to have to write some code of some sort, somehow.
posted by jellicle at 12:05 PM on July 22, 2005


HTTrack for Windows et al
SiteSucker for OS X (for future AxeMe reference)
posted by pedantic at 12:07 PM on July 22, 2005


I use and enjoy Python and mechanize.

You could provide the page and say what you want to scrape from it.
posted by grouse at 12:37 PM on July 22, 2005


Urltoys will do the j-o-b and is not too too difficult to use.
posted by Capn at 1:14 PM on July 22, 2005


Perl's WWW::Mechanize is a good toolkit for the job. You will need some programming skills to do anything with it though.
posted by sad_otter at 2:33 PM on July 22, 2005


WWW::Mechanize::Shell is a really neat front end to WWW::Mechanize as well. Create scraping scripts interactively!
posted by singingfish at 3:46 PM on July 22, 2005


A lot of people seem to be interpreting "web scraping" as copying a file down to your computer from a URL. To me scraping is pulling data OUT of a page to do something with.

If that's what you're trying to do, you should take a look at Sprog, which is a graphical way to pull data out of text.
posted by revgeorge at 5:16 PM on July 22, 2005


« Older Bulk Batteries??   |   Graduate school, what to do? Newer »
This thread is closed to new comments.