Why is the "join" query in SQL not scalable?
-
"The problem is joins are relatively slow, especially over very large data sets, and if they are slow your website is slow. It takes a long time to get all those separate bits of information off disk and put them all together again. Flickr decided to denormalize because it took 13 Selects to each Insert, Delete or Update." Source: http://highscalability.com/scaling-secret-2-denormalizing-your-way-speed-and-profit
-
Answer:
Usually the problem is bad query design with inadequate where criteria and lack of indexes. If one is table scanning the query will be a dog. If one has a designed query and indexes pulling few records each from multiple tables this is not a problem. As usual the key is understanding you critical requirements, selecting the right tools and using them properly. Learn how to use a query profiler and a db populated with realistic amounts of data. One can do sharding and distribution of data by themselves, but typically databases support partitioning of data for improved performance, and modern disk farms can also distribute data at a hardware level. HP Vertica for instance is fully distributed and redundant and compares favorably to other big data solutions for pure data access. Who wants to maintain all this stuff, especially when the architect moves on and now it's somebody's job to maintain somebody else's the one off.
Pete Ashly at Quora Visit the source
Other answers
Join is scalable vertically, up to practical limits (plug more memory, cache entire database). Join may be scalable horizontally, but rarelly is, for all data needed in result have to be on one place. Note that 'all data' includes (subsets of) both indexes and table rows. Strategy to improve both horizontal and vertical scalability is database http://en.wikipedia.org/wiki/Partition_%28database%29. Idea is simple divide and rule: slice the data in smaller pieces, so each needed piece fits into memory. It can especially improve horizontal scallability, since multiple nodes can process smaller pieces in parallel. Partitioning has big drawbacks though: to be efficient, each slice should contain about the same ammount of data. To achieve this, slicing criteria may become too complex. This critera is evaluated not only during join, but for every database operation, because DBMS must know to which partition a record belongs. In turn, entire database cluster slows down. Furthermore, explain/analyze output can become ridiculously complex and totally unusable. Also, general database maintenance gets more complex, for larger number of tables and indexes. And so on. So sometimes it rocks, but sometimes it sucks. Details depend on underlying technology, but also on actual data.
Josip Almasi
If their inserts consisted of inserting an image or images into a relational database and they have thousands or 10s of thousands of those inserts going on at peak times I can see how they they could easily tax both their hardware and software architecture. Inserting many images could lead to IO bottlenecks very easily. Proper use of SAN technology spreading the data across enough drives and having large enough RAM caches on the SAN should alleviate the much of the IO bottleneck the other critical factor is RAM on the database server. A 64 bit properly tuned database capable of accessing large amounts or RAM would reduce the need for reads by keeping much of the active users data cached in RAM. And as other answers have already stated proper indexing and a good clustered index (always ascending key values) will keep the data from being reorged with every insert.
Gary Miller
Related Q & A:
- What is the Hibernate or Criteria query for sql?Best solution by Stack Overflow
- How to convert sql query to Hibernate Criteria query?Best solution by Stack Overflow
- How to write Join Query for two tables without foreign key in Yii2?Best solution by Stack Overflow
- How to do join query in laravel?Best solution by Stack Overflow
- How can I convert the query from SQL to LINQ?Best solution by Stack Overflow
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.