What are the best resources to learn about web crawling and scraping?

What's the best resource to learn about web scraping from scratch?

  • Any books or resources?  If one needed to acquire a basic learning of a programming language, which would it be?

  • Answer:

    If you want to write the scraping code yourself in Php, then http://simplehtmldom.sourceforge.net/ would be a good place to get started with. There are good number of online scraping tools like http://open.dapper.net/. If you need some work done quickly without having to meddle in the complications simply create a new dapp in dapper and get started.

Harsh Beria at Quora Visit the source

Was this solution helpful to you?

Other answers

I have used Python & Javascript (independently) for scraping,both require minimal code, and require very less reading to master. Scrapy + scrapely combined make a full fledged webscraping platform. Other useful libraries are PyQuery / webscraping / XPCOM. Also you might like PhantomJS/casperJS if you want to write even lesser code. All the above resources I mentioned require a few hours to get started, all you need to do is go through their "hello world" programs on respective websites.

Muktabh Mayank

IF you know Python, - https://realpython.com/blog/python/web-scraping-with-scrapy-and-mongodb/    will be a straight forward one for you . If you wish to go for a python framework, Scrapy is the best out there. As the complexity of the target site is increasing , scraping can be a very difficult task .

Tony Paul

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.