How to add bulk virtual data in database?

If I want to bulk add a bunch of data to a free online public database such as Freebase, which database should I pick?

  • Considerations include: Practical feasibility of bulk import? Licensing? Will others profit from my work? Is the database likely to be around in the future? Is the database likely to be closed off in future and could I lose access to the data I myself added? Is the database capable of coping with the volume and type of queries I'd like to make in future, perhaps with an app built on top of it? Data dump availability? This is a follow-up question to .

  • Answer:

    This question is interesting because it is an example of where tech and politics collide. Sadly, these considerations (scale, license, format, commerce, legacy, access) do not get nearly enough attention even though I believe that they are the issues most people will care about. Geeks tend to come up with grand solutions to the technical battles and assume that someone else will take care of the larger challenge of creating something that a broad cross-section of society will use. A good baseline for product design in this category might be "if you can use Facebook, you can use my data explorer". I have some follow-up questions (which themselves constitute an answer because every situation is different): who are you? are you important to the people who want your data? will you be a curator in addition to a publisher? what data are you trying to upload? is it timely, accurate, complete? did you make it, or are you posting on behalf of an organization? are you planning on changing the data after the initial upload? why are you uploading the data? is it yours to upload? are you looking to make a profit? is the data currently available elsewhere? what are your definitions for "practical", "likely", "future", "access", "volume", "type", "capable of coping" and "app"? The simplest answer that will work in most cases today is "it probably doesn't matter". The space changes daily, with new players, infrastructure improvements and most importantly more people paying attention. In the early days, I always suggest that folks act first and apologize later. Now, it just so happens that my own company is currently working on a solution to a lot of these problems. Please consider signing up to the mailing list at http://buzzdata.com so that you can be at the front of the line for beta invitations this Spring. If you have any suggestions or wishlist items for a soon-to-be released data collaboration hub, please let us know.

Pete Forde at Quora Visit the source

Was this solution helpful to you?

Other answers

Pretty interesting questions asked here! I just recently wrote an overview of the capabilities of several Google Tools: Freebase, Fusion Tables, Docs, Public Data Explorer, Base. http://www.sendung.de/2011-02-26/google-open-data-toolkit-mess-wealth/ As a suggestion, you might want to look into Google Fusion Tables. It has a pretty easy to use write API, using a subset of SQL. And I trust Google enough that they will at least let us dump our data before they take Fusion Tables away from us. Edit 2011-11-08: I just discovered http://www.datacouch.com/ , which is the outcome of a Code for America project. It is specifically designed to publish Open Data. It is build like GitHub for data, which means that databases can be forked (copied) to be edited by others and re-unioned if wanted. The backend is Apache CouchDB and CouchDB API access to the data on DataCouch is open. DataCouch is still in an early stage, but worth considering. The source code is open, too.

Marian Steinbach

Time for an update. BuzzData and Data Couch are both gone.  Freebase is being shut down by Google. Suhas' presentation is gone.  Wikidata is new on the scene.  The data mart entrants such as Azure Data Market haven't found much success. It doesn't seem like a hard problem, but as Pete Forde describes in his answer, it lies at the intersection of technology, licensing, intellectual property law, etc which is always tricky place to operate. If it's something of commercial value to you, you're probably stuck with self-hosting.  If it's something you're willing to give away, you could look at Wikidata (or even Github for collaboratively updated bulk downloads which aren't too big).

Tom Morris

Actually, if the license conditions for uploading data let you retain all the rights to your own data, it might be OK to just pick any of them and try it - because you can still copy your data to another service later if you decide you don't like the one you're using. A problem would be if other people started editing the data you uploaded, though, because then you wouldn't necessarily be able to transfer their modifications to a new service. Again, it depends on the specific licenses of each database.

Robin Green

Interesting question. Like others have suggested, I think the purpose and vision you have for your data is important. Azure Data Market may be an option if you are trying to monetize your data. Irrespective of whether you want to monetize, here's a presentation I made recently that gives an overview different aspects you may want to consider and your options therein: http://www.slideshare.net/MindTreeLtd/content-as-a-service-for-website-content-experience-and-opportunities

Suhas Mallya

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.