How can a T+n query perform a real-time, all-data query of a large amount of historical data stored in a transaction database?
-
Real-time all-data query of huge historical data stored in transaction database will create heavy workload to database and heavily destroy the necessary low-latency of transaction system. Separating historical data from database to ensure the performance of transaction system leads to difficult implementation of all-data real-time query, usually only T+n query can be achieved. Any better ways to achieve mixed computing between massive historical data and current small transaction data to realize T+0 all-data real-time query?
-
Answer:
Hello. I'll address your question as two separate but related ones: How can I organize my transactional and historical data to make small transactional queries fast and up-to-date historical queries easy? How can I query huge historical data efficiently? The traditional answer to #1 is to have a data warehouse (DW) separate from the transactional (OLTP) database. Data is routinely archived from the OLTP system to the DW. One popular method for doing this is to partition your fact tables -- the ones that grow big -- by time, so that the OLTP database has only recent data, but the DW has everything. Data is replicated to the DW so that it is pretty much always current. There are many open source tools to do this replicate; SymmetricDS and Tungsten Replicator are two. Note that you don't actually have to separate your OLTP and DW systems. They can be the same database. However, splitting them lets you more easily optimize for each case (different indexes, for example), and it also lets you optimize each use case with newer technologies which tend to be optimized either for transactional or historical use. Here are some popular choices: Cassandra - NoSQL store, good for transactions. It's gotten SQL support recently, but querying huge amounts of data is slow, so it may not be suitable for some historical data sets. DynamoDB - NoSQL store hosted by Amazon. Good for transactions. HBase - NoSQL store, mostly used as a data warehouse. HDFS - Hadoop's distributed filesystem. Saving your historical data as files is common and can't be beat in performance for full data scans. MongoDB - A big data compromise between transactional and historical which many find attractive. Supports SQL. Redshift - Data warehouse hosted by Amazon. Supports SQL and performs well for historical queries but not transactional loads. For querying large amounts of data efficiently (#2), you generally don't want to query all of your data. That will just keep getting slower as your data grows. You should try to aggregate or otherwise process your data incrementally. There are two common patterns for that: batch processing and stream processing. In batch processing, your job runs periodically on the data that accumulated since it last ran. Hadoop is the most popular choice, but Spark is gaining ground. You can also schedule a series of queries if you're using a SQL DW like MongoDB or Redshift. Stream processing is less mature and more difficult to understand, but it's a good choice if you want to instantly know the answer to historical queries. Essentially, your queries are always running and getting a "stream" of data. You put your new data on a queue to be processed in real time. Apache Storm is a good place to start with this. The new hot thing is micro-batches. It's batch processing but runs very frequently, sometimes several times a second. Spark Streaming and Apache Storm's Trident are two popular solutions. If by real-time you just meant you want your historical data store to include the latest data, look at #1. If you also want instant answers based on your historical data, look at the stream or micro-batch processing described in #2.
Yuval Oren at Quora Visit the source
Related Q & A:
- How can a work of art that is outwardly nonobjective actually be a reference to something real?Best solution by Yahoo! Answers
- How can a work of art that is nonobjective can be a reference to something real?Best solution by Yahoo! Answers
- How can a 17 year old get a job during the summer?Best solution by Yahoo! Answers
- How can I get into my Yahoo mail that I haven't used for a while?Best solution by Yahoo! Answers
- How can I delete my all mails at a time?Best solution by Yahoo! Answers
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.