What data can I scrape out of online resources about plants without worrying about copyright infringement?

What data can I scrape out of online resources about plants without worrying about copyright infringement?

  • Examples of data I would be interested in: Name of plant variety, species, days to germination, days to maturity, color of fruit, frost hardiness, shape of fruit, and so on. I am least interested in the prose description, but if that description includes data I would like to extract that data from the description as well (example: the presence of the word 'indeterminant' in the description). The kinds of data I want are generally not already available (at least in bulk) under open licenses from sources such as Wikipedia and Freebase. It is my understanding that bare facts are not copyrightable without the element of creative expression (so, for example, you can't copyright the numbers in a phone book). If the only data I take from a website is this sort of 'bare fact' such as 'days to maturity', have I avoided infringement? What if I extract such a 'bare fact' from a prose description (which does have an element of creative expression)?

  • Answer:

    If Wikipedia is of interest to you, all its content is available for download as an XML dump with a fairly permissive license (Creative Commons with attribution and GNU Free Documentation License) http://en.m.wikipedia.org/wiki/Wikipedia:Database_download

Barak Cohen at Quora Visit the source

Was this solution helpful to you?

Other answers

In addition to the Wikipedia with its permissive license, you might find some good resources on US Government web sites, such as the Department of Agriculture. Information published by the US Government is not protected by copyright in the US, which means you have a lot of material at your disposal that can be used freely under some circumstances. Things in this area are not always clear cut. You might want to check into http://public.resource.org, which is dedicated to making government works accessible. - Mark

Mark Nelson

It really isn't possible to scrape at all without worrying about copyright infringement, since "scraping" in nearly all cases is copyright infringement. Scaping is copying without permission; copying without permission is infringement. The best you can hope for is to target a resource that has already "given permission" via some sort of Creative Commons or Open Source license. Now, what you really seem to want is a database of plants, not really a website describing these things. You can acquire public databases on plants usually from universities or government bodies (since works by the Government are in the public domain). There is really no need to scrape from existing online resources when you can probably just e-mail the horticultural department at your state university and get either a referral or something sent to you

Todd Gardiner

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.