How to make a web crawler?

What are some libraries in Python that can help me make a web crawler?

  • The libraries can be in standard library or 3rd party. The crawler will just scrape some content of the pages.

  • Answer:

    Module: urllib and beautiful soup for html parsing Framework: Scrapy

Sachit Adhikari at Quora Visit the source

Was this solution helpful to you?

Other answers

lxml (http://lxml.de/) is an excellent library. It's fast and powerful, but it takes some time (a few hours) to learn properly. Beautiful Soup (http://www.crummy.com/software/BeautifulSoup/bs4/doc/) is another amazing library. It's a joy to work with, and you can get up and running quickly. If speed isn't important, use Beautiful Soup. No question. If speed is a factor, go with lxml. With some problems, the speed difference can be a matter of hours, even days. I'd also suggest using the requests library (http://docs.python-requests.org/en/latest/) to request each webpage rather than using urllib(2). It makes working with HTTP requests a pleasure.

Michael Kolodny

Related Q & A:

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.