Is it legal to crawl search engines and display the results on another site
-
Hi, I have a specialist search engine and have recently had the problem of being crawled and the results being displayed in another site. Is there any protection against this kind of thing? I have looked around on the net, and as well as the 1000's of crawler software running sites [eg. Smartsearch offered by http://smarterscripts.com/ which crawls ODP, MSN and/or ALTAVISTA is apparently used by 1000's of sites] which seems to suggest there is no protection, I have also found freelance jobs posted specifiaclly asking for crawlers to be made for many engines, Google, Alta vista images and videos etc. Here's an example http://search-them-all.com/ I had though that metasearches must be licenced. But it seems I was being naive. Also I found this article http://www.ivanhoffman.com/database.html which further suggest that there is no protection in this matter. If even the mighty MSN is being crawled regularly by specialist software it seems like there is no chance. :( All this has led me to think why don't I just start crawling other engines myself to expand my site? !!! What I don't get is I would have thought that there was a way to block being crawled. -- A few questions in the above ramble: 1. Is it legal to crawl a search engine and display the results in a non-realted site in search engine style [different colours etc of course] without permission. 2. If yes any exceptions. 3. Any way to block crawlers. Thnaks
-
Answer:
Hello searching777, Thanks for the questions. I have some experience within this market, so I'll do my best to fully answer your questions. 1. Is it legal to crawl a search engine and ...? ------------------------------------------------------------------------------- If the website being crawled offers a 'terms of use' (or anything similar) , then usage of that website falls under the specified details. Here is an example of Google's agreement, stating that 'metasearching' or crawling of their content is not permitted : Google - Personal Use Only "You may not take the results from a Google search and reformat and display them, or mirror the Google home page or results pages on your Web site. You may not "meta-search" Google..." ://www.google.com/terms_of_service.html Here is an excerpt from the MSN website as well, that seems very clear : MSN Terms of Use "The MSN Web Sites are only for your personal use. You will not use the MSN Web Sites for commercial purposes....you may not use the MSN Web Sites in any manner that could damage, disable, overburden, or impair any MSN Web Site..." http://privacy.msn.com/tou/ Let's look past the database issue, and skip right to bandwidth and server drain. When a remote computer crawls another website, it uses bandwidth and resources paid for by the company that is being crawled. In some cases the information may be free, but the process of retrieving that information comes at a cost that is covered by the company hosting the information. In short, the best protection is to offer your users a terms of use that is clear, and warns against illegal usage. This will give you solid ground to stand on should any litigation arise. Many important websites carry these agreements. Here is an example of the Superior Courts of California's agreement : County Website within the Superior Courts of CA http://www.siskiyou.courts.ca.gov/disclaimer.asp 2. If yes any exceptions ------------------------------------------------------------------------------- There are exceptions. It depends on the information being requested, and the guidelines of the offering entity. The DMOZ may be one exception, although their agreement says nothing about live retrieval of their data, rather it refers to the usage of their RDF dumps for local use. I couldn't find one example that directly allows you to crawl their content, although I am certain they exist. When in doubt, the best approach is to ask. I did this with a few companies in Ireland and the United Kingdom, and a majority of them allowed me to crawl their content, simply because I was the only one to ask. 3. Any way to block crawlers ------------------------------------------------------------------------------- There are a couple of methods, with the robots exclusion being the preferred method : Robots Exclusion http://www.robotstxt.org/wc/exclusion.html If the crawler does not adhere to the robots exclusion, you can use a firewall to block access, assuming the I.P. address(es) are known. Here is an example firewall for a *nix web server : KISS Firewall http://www.geocities.com/steve93138/ This simple firewall allows you to add individual I.P. addresses as well as ranges simply by dropping them into a configuration file. If these methods don't work, then you can always contact their internet provider, stating cleary which terms of your agreement are being broken. Most upstream/hosting providers understand these issues, as they too carry usage guidelines that they do not want to see abused. To assist with this answer, I referred directly to the terms of use on a few search engines. Most of the information provided is from first hand experience. Should you need further clarification, please do not hesitate to ask. I will do my best to assist! SgtCory
searching777-ga at Google Answers Visit the source
Related Q & A:
- How To Submit Url To Search Engines For Free?Best solution by Yahoo! Answers
- Which are the best search engines which give only few site names relevant to the subject?Best solution by pandia.com
- Where can I search for someone and get results free?Best solution by whitepages.com
- What should I do to get my site noticed by search engines?Best solution by inmotionhosting.com
- How do I add my site to search engines?Best solution by eHow old
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.