How do you decide whether to use a document-store, a key-value store, a graph data store, or a relational data store?
-
A lot of the blog articles comparing database technologies focus on system properties (ACID, BASE, eventual consistency, scalability, etc) but I'm curious if this is really what application developers care about when making a platform decision. Doesn't the nature of the app, the kind of data being processed, need for analytics, etc matter? I'd appreciate any guidance on how you've made database platform decisions.
-
Answer:
I would say that there are two issues that will determine your choice, i.e. the data and the application/type-of-questions you want to answer. I am not going to give you a taxonomy of cases, nor the final solution for your problem. I'll just state the important issues that you may consider, yet I may forget many depending on your specific case. Regarding the data, there are a few aspects: Data quality. Depending on the quality you will need to use extra processes to improve their quality . For instance, you might need to enrich the relationships among the entities. Imagine a bibliographic database with authors and papers. If you have different authors with similar names, you may use the author's relationships to infer whether some of them are the same person. This is clearly a case of understanding relationships and Graph databases will help you to find patterns that identify relationships. Data variety. The quantity of sources that you have. Some applications require multiple sources and in this case you may need to integrate them on the fly. For instance, the case for the integration of multiple social network data sources, where you have friends in multiple sources but you need to identify which friend is which in each source. In this case, again, a graph database will help you setting up pattern matching processes that identify similar names and certain relationships that might be of interest. Data quantity. Big data tells you that you'll be restricted to some type of databases or data processing frameworks that are clearly in the range of quantity that you are going to deal with. However, persistency (see below) may play an important role. Nowadays, for instance, graph databases, in the pure and strict sense, are not in the big data range. Data dinamicity. When you have streams of data that need to be integrated on the fly and are being queried in real time, you will have to take about how big the data are and how good the databses are to ingest those data. Data persistency. When you need to make sure your data will be securely stored, then some types of big data frameworks will not be your specific choice as they recreate the database in memory, and this is not desired. You need a database with certain persistency properties. Regarding the application/type of operations you have: High availability. In this case, if you need 24/7 application availability, you will also be restricted to some solutions. Complex relationship questions. When you need to delve into relationships and discover patterns or communities in your data set, you are going to need Graph databases or graph frameworks where those will be solve (sometimes, not as fast as you might want). Aggregational questions. This types of questions are very well solved by relational databases, where you may use SQL as a standard language. Other types of data stores like Graphs, may be very slow in this unless they are implemented on top of relational databases. Transactionality. This issue is of paramount importance, and it may stop you from using the apparently best data processing framework or database. In any case, it is important to make sure that transactional properties (i.e. ACID properties) are well solved for this. Reasoning queries. This is a case for only RDF and Graph databases where you may need inference which is implemented easily in those types of systems. As a conclusion, let me say that the question is not simple and requires a lot of your thinking and analysing. Companies take a lot of consulting time to decide on this question, and yet, I am sure the engineers who take the decision evolve as the design and deployment evolve. Also, I may have left important issues behind like the amount of money you want to spend, both in software and hardware as some of the solutions require large computing facilities that you are not ready to deploy. Also, the query language issue, and you may consider this irrelevant but at the end it may save the life of the programmer and the user, making the former's life easier, and the answers to the latter faster depending on how good the optimizer is. Finally, one important issue is how the system is implemented, technology in this case is very important and how the implementation is done determines the performance of the system (for instance, some RDF stores are implemented on top of relational data stores, which undermines their performance capabilities, also, some graph databases are stored in better implementations than others). And one final issue. There are no global benchmarks that might help you take decisions on this issue. However, there are to remarkable efforts in this regard, i.e. the Transaction Processing Performance Council (http://www.tpc.org), and the Linked Data Benchmark Council (http://www.ldbc.eu), which intend to help in relational and linked data workloads.
Josep Lluis Larriba Pey at Quora Visit the source
Other answers
The first key part of the answer is experience. If you don't have experience, ask lots of people who do and see if they can provide useful guidance. Doesn't the nature of the app, the kind of data being processed, need for analytics, etc matter? This assumes that there is even a singular "nature" to the app. Many modern apps actually break out into several different data stores. You typically need some form of "primary" data store for your Core data, but most systems have lots of "transactional" data that can live in very different stores. The best starting point I have found is really this: how do I need to access this data?
Gaëtan Voyer-Perrault
Such decisions generally depend on the scale of the data and the criticality of the operations: For very important and critical transactions such as payments on e-commerce sites, relational databases such as Oracle/MySQL are used which ensure that transactions are valid and there is no error due to concurrency, network faults, etc. (ACID properties) For operations such as fetching data to display on the website (such as product listings on e-commerce sites), since they are less critical and the scale of data is huge, relational databases are not able to provide satisfactory performance. Thus, it is common practice to implement a cache over the RDBMS, using document stores such as MongoDB or Memcache to improve performance. Even though there is a delay when any updates in the underlying relational database are propagated into the cache, it is acceptable as the importance of performance gains by using a cache far outweigh the delay issues. For other operations such as serving static content (banners, images, logos), it may be possible to simply skip RDBMS and build static content servers entirely on key-value stores as there is no requirement of transactions or relations.
Dhruv Chaudhary
Related Q & A:
- How can I implement an atomic incr and decr on top of an eventually consistent key-value store?Best solution by Stack Overflow
- How cookies are handled if you use a proxy between a client and server in HTTP?Best solution by Stack Overflow
- How would you determine whether a change in matter is a physical change or a chemical change?Best solution by Yahoo! Answers
- How is it decided whether a road will be called a road, street, avenue or drive?Best solution by Yahoo! Answers
- What's the difference between a static data member and a regular data member?Best solution by eHow old
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.