What are the use cases, pros and cons of the different SQL implementations for Hadoop?
-
Using Hadoop using SQL or SQL-like language, not access to SQL databases.
-
Answer:
I think there's 4 general approaches to SQL for Hadoop. Connector to Hadoop: This is the approach that most of the existing players have taken. , , , etc all rely on staging your data somewhere on HDFS and then extracting it to their platform for further processing / serving. Most companies that implement the Hadoop stack use this approach in some way. Hacked Open Source: This is the approach of tools like , , HAWQ, Splice, and others. These tools generally rely on local processing & storage of the nodes outside of HDFS and YARN and then serve data through a Hadoop based interface. This is due to these tools not being able to leverage HDFS storage and processing as it isn't a POSIX compliant file system. Start Fresh: Basically build a whole new product based entirely on technologies in the Hadoop stack. The first of these was Hive, and now we have others like Impala, Presto, and Big Insights. In the case of , it relies heavily on Map-Reduce and Tez, which are batch oriented technologies (with future plans to leverage spark for sub-second in memory processing). and use proprietary processing architectures that rely almost entirely on in-memory processing (which means queries fail when they can't fit in memory). In the case of Presto, it doesn't run as a YARN service requiring a separate cluster of machine outside of Hadoop. Overhaul Mature Platform: Take an existing mature database platform and retool it to leverage . To my knowledge, only two companies are doing this today, and . In these examples, the databases will use their existing architecture but leverage the resources made available through YARN. Data is stored and processed directly on HDFS. In my ever so humble opinion, I believe option 4 gives the best user experience. At Customer Conference 2014, I gave a live demo using 's platform to refresh dashboards in sub-second response times. I was using a 5 node Hadoop cluster with over 5 TB of data and 2.2 billion rows.
Chris Schrader at Quora Visit the source
Other answers
Two common SQL implementations are Hive and Presto. Hive excels in analytics tasks that don't require real-time performance and is a complementary tool for ETL workflows. The benefits of using Hive include it's a proven solution that runs on the proven MapReduce framework, and it works well with other systems including HBase. The major disadvantage to Hive is that it is much slower than other solutions making it unsuitable for real-time analysis. Presto is a fast SQL-engine that has been battle-tested at Facebook though it is still a fairly young technology. The ANSI SQL support makes Presto compatible with many analytics tools used for reporting and visualisation. It is also designed to support a wide range of data sources including data stored in Hive, HDFS, HBase, Amazon's S3 or RDMS's. (More info here http://hubs.ly/y071Dw0) Many companies opt to use a combination of the two depending on the type of query they are running.
Gil Allouche
Related Q & A:
- What are the pros and cons of donating blood?Best solution by Quora
- What are the pros and cons of xenotransplantation?Best solution by Yahoo! Answers
- What are pros and cons of using expansionary and contractionary fiscal and monetary policy tools?Best solution by Yahoo! Answers
- What are the pros and cons of pre-paid debit cards?Best solution by Yahoo! Answers
- What are the Pros and Cons of being against Mercury Pollution?Best solution by Yahoo! Answers
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.