Should cloud database be centralized or distributed?

What is the difference between a distributed database system  and a centralized data warehouse?

  • answers for non-software engineers

  • Answer:

    A distributed database focuses on high throughput of small queries changing data across the database while providing simplified horizontal scaling and maintaining data integrity. A centralized data warehouse focuses on speed of complex analytical queries

Vinay Sahni at Quora Visit the source

Was this solution helpful to you?

Other answers

To begin with, let's break this up into the key terms: Distributed vs centralized Database vs Data Warehouse. (because we can have centralized database systems and distributed data warehouse systems, as well) Let's start with the difference between a database vs data warehouse. A Database system involves tables of data (somewhat like an Excel spreadsheet: each table consists of colums and rows). Each table contains different, but related data, and the database system maps information from one table to the other. For example, one table in your table may have First and Last Names. The second table may have Zip Codes. On your first spreadsheet, instead of  having a zip code next to each name, it references a particular zip code on the second spreadsheet. When you ask the database for the data in a certain way, it will return the first and last name of the person you asked for, and pull the zip code for that person from the other spreadsheet. A data warehouse is a complex business system that includes a database for storing data, and also a means of pulling in the data from an outside source and doing something with it before storing it. For example, maybe you have a business system that forecasts manufacturing based on sales data. You have a custom manufacturing application running internally, but all your customer data is kept in a third-party application like Salesforce.com, and your actual sales info is kept in a billing/records system. You data warehouse will import customer information from Salesforce, and then import the billing information, and then do something to organize and process that data before storing it in its database. Then, when the manufacturing application needs it, it can connect to the data warehouse rather than pulling separate info from the billing system or Salesforce.com. The data warehouse offers a central repository of data for a number of applications to simplify development and integration when a new application is needed. Finally, the remaining two terms: distributed vs centralized. Centralized simply means everything is on one location. There is one system of record. It may be replicated to other systems so that they can continue functioning if the primary system goes down, but they all keep the same copies of data. Distributed systems tend to fragment the data and store pieces of it in different systems to spread the overall requests to multiple systems. This provides the advantage of scaling to handle more and more data. Examples? Certainly! I have a set of data, a simple sequence of numbers: "1234567890". If I were storing this sequence of numbers in a database, I'll simply store it as a record and ask for it when I need it. To put this into a data warehouse, some system will need to import it in (or my database may push it into the data warehouse), it will do some sort of business logic on it, maybe extract some other data, associate it with a particular sales person, add the date it was entered, and THEN store it for later use. In a centralized system, I'll store it on one server. Then, I'll replicate that exact information to Server #2 for safekeeping. If i update the sequence to "2345678901", then I'll have to update the second server as well. But if the first server goes down, I know I can get the same info from the second server. It may look like this: Server #1 = "1234567890" Server #2 = "1234567890" In a distributed system, maybe I'll break that sequence into pieces and store it in  five different servers: Server #1 = "12" Server #2 = "34" Server #3 = "56" Server #4 = "78" Server #5 = "90" In order to assemble the entire set, I have to ask all five servers for the info and reassemble. If I only need the first four numbers for some reason, I'll only ask the first two servers. If I need the last four numbers, I'll only ask the last two servers, and so on. This has some advantages with certain types of data, and allows some applications to grow much larger than with a single, centralized system.

Andrew Boring

They are two very different things: A distributed database is a system where different parts of the database are distributed at different locations. You need some clever technology to keep the parts of the database synchronised across locations. A data warehouse is a database which is designed for analysing data and making decisions; centralised means its is located in one place. I don't know of any data warehouses that are distributed.

Simon Griffiths

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.