How to scrape data from a website?

Is it illegal to scrape data resulting from someone else's unpermitted scrape?

  • SinglePlatform and Locu have scraped hundreds of thousands of restaurant websites, without permission, and have structured their menu data into their respective offerings. Since their APIs essentially prohibit commercial uses, why couldn't a startup seeking menu data just scrape from publisher partners Yelp, OpenTable, TripAdvisor who have made the SinglePlatform/Locu structured menus publicly available? Neither Locu nor SinglePlatform have added their own information or changed the visual representation of the data set, only transformed its structure.

  • Answer:

    You have two things to worry about, copyright, and the various rules that deal with scraping.  Copyright applies to content, not the source.  So if the content is copyrighted, you cannot make use of it (exceptions apply) without permission. Scraping rules apply to source, not content.  These laws and claims include the Computer Fraud and Abuse Act (CFAA), Unjust Enrichment, Trespass to Chattels, various state laws, breach of contract, etc.  Each of the above rules basically state that when a scraper accesses another person's server, he must follow the owner's rules.  So, if the owner of the server says no scraping, then scraping is not allowed.  If the owner says scraping is only permitted under certain conditions, then those conditions must be followed.  Failure to follow the owner's rules, could result in a civil claim (see above list), or even criminal charges (under CFAA).  So, if you want to scrape, you have two questions to ask.  1) Is the content copyrighted? 2) does the owner of the server prohibit scraping?  If the answer to both questions is "no," scrape all you want.  If the answer to either is "yes," scraping could lead to legal consequences.  One easy way around all of these scraping rules, is to scrape from a search engine that caches pages, and that do not prohibit scraping.  The last I checked, Yahoo was one such site.  So, if you access Yahoo's cached version of the page you want to scrape, you never access the original site's server, and do not have to worry about the various scraping rules.  You still have to worry about copyright.

Neil Aggarwal at Quora Visit the source

Was this solution helpful to you?

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.