Is there an "idle" speed for the web?

How do you define the speed of a web crawler?

How do you determine and compare the speed of two web crawlers. Is it the number of pages that it can crawl in a given amount of time that decides speed?
Answer:

This somewhat repeats my answer to this question: A web crawler is made up of two components: A downloader, that downloads pages and adds them to a queue, and An Information Extractor that adds more links to the downloaders queue. Ideally benchmark both separately. A good downloader will handle the parallelism of multiple downloads efficiently. The only think you are interested in is speed to download pages. For a fair test the Information Extractors need to be running the same algorithm (e.g. Pagerank). Here your concern is the per-page processing time plus any periodic processing (e.g. nutch extracts the links from each page and periodically recalculates PageRank). Here it is efficent speed and memory usage that determines speed. Typically while your data set is small the rate at which you can download pages will be your limiting factor. As the body of data you have grows it is your batch processing that will then become the bottleneck. Note this is not always the case. For example crawlers that mimic full browser environments (DOM, javascript, Flash etc.) can easily find per page processing becoming a bottleneck (primarily CPU bound).

Simon Overell at Quora Visit the source

Was this solution helpful to you?

Related Q & A:

How to make a web crawler?Best solution by Stack Overflow
How to speed up a wordpress function with multiple loops?Best solution by WordPress
How to deal with timeout when accessing a web service?Best solution by Stack Overflow
How to call .aspx page from a web web service(service.svc?Best solution by Stack Overflow
How can I speed up a download?Best solution by Yahoo! Answers

Just Added Q & A:

How many active mobile subscribers are there in China?Best solution by Quora
How to find the right vacation?Best solution by bookit.com
How To Make Your Own Primer?Best solution by thekrazycouponlady.com
How do you get the domain & range?Best solution by ChaCha
How do you open pop up blockers?Best solution by Yahoo! Answers

For every problem there is a solution! Proved by Solucija.

Got an issue and looking for advice?
Ask Solucija to search every corner of the Web for help.
Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.