INNER JOIN vs LEFT JOIN performance in SQL Server
-
I've created SQL command that use INNER JOIN for 9 tables, anyway this command take a very long time (more than five minutes). So my folk suggest me to change INNER JOIN to LEFT JOIN because the performance of LEFT JOIN is better, at first time its despite what I know. After I changed, the speed of query is significantly improve. I want to know why LEFT JOIN is faster than INNER JOIN? My SQL command look like below: SELECT * FROM A INNER JOIN B ON ... INNER JOIN C ON ... INNER JOIN D and so no Update: This is brief of my schema. FROM sidisaleshdrmly a -- NOT HAVE PK AND FK INNER JOIN sidisalesdetmly b -- THIS TABLE ALSO HAVE NO PK AND FK ON a.CompanyCd = b.CompanyCd AND a.SPRNo = b.SPRNo AND a.SuffixNo = b.SuffixNo AND a.dnno = b.dnno INNER JOIN exFSlipDet h -- PK = CompanyCd, FSlipNo, FSlipSuffix, FSlipLine ON a.CompanyCd = h.CompanyCd AND a.sprno = h.AcctSPRNo INNER JOIN exFSlipHdr c -- PK = CompanyCd, FSlipNo, FSlipSuffix ON c.CompanyCd = h.CompanyCd AND c.FSlipNo = h.FSlipNo AND c.FSlipSuffix = h.FSlipSuffix INNER JOIN coMappingExpParty d -- NO PK AND FK ON c.CompanyCd = d.CompanyCd AND c.CountryCd = d.CountryCd INNER JOIN coProduct e -- PK = CompanyCd, ProductSalesCd ON b.CompanyCd = e.CompanyCd AND b.ProductSalesCd = e.ProductSalesCd LEFT JOIN coUOM i -- PK = UOMId ON h.UOMId = i.UOMId INNER JOIN coProductOldInformation j -- PK = CompanyCd, BFStatus, SpecCd ON a.CompanyCd = j.CompanyCd AND b.BFStatus = j.BFStatus AND b.ProductSalesCd = j.ProductSalesCd INNER JOIN coProductGroup1 g1 -- PK = CompanyCd, ProductCategoryCd, UsedDepartment, ProductGroup1Cd ON e.ProductGroup1Cd = g1.ProductGroup1Cd INNER JOIN coProductGroup2 g2 -- PK = CompanyCd, ProductCategoryCd, UsedDepartment, ProductGroup2Cd ON e.ProductGroup1Cd = g2.ProductGroup1Cd
-
Answer:
A LEFT JOIN is absolutely not faster than an INNER JOIN. In fact, it's slower; by definition, an outer join (LEFT JOIN or RIGHT JOIN) has to do all the work of an INNER JOIN plus the extra work of null-extending the results. It would also be expected to return more rows, further increasing the total execution time simply due to the larger size of the result set. (And even if a LEFT JOIN were faster in specific situations due to some difficult-to-imagine confluence of factors, it is not functionally equivalent to an INNER JOIN, so you cannot simply go replacing all instances of one with the other!) Most likely your performance problems lie elsewhere, such as not having a candidate key or foreign key indexed properly. 9 tables is quite a lot to be joining so the slowdown could literally be almost anywhere. If you post your schema, we might be able to provide more details. Edit: Reflecting further on this, I could think of one circumstance under which a LEFT JOIN might be faster than an INNER JOIN, and that is when: Some of the tables are very small (say, under 10 rows); The tables do not have sufficient indexes to cover the query. Consider this example: CREATE TABLE #Test1 ( ID int NOT NULL PRIMARY KEY, Name varchar(50) NOT NULL ) INSERT #Test1 (ID, Name) VALUES (1, 'One') INSERT #Test1 (ID, Name) VALUES (2, 'Two') INSERT #Test1 (ID, Name) VALUES (3, 'Three') INSERT #Test1 (ID, Name) VALUES (4, 'Four') INSERT #Test1 (ID, Name) VALUES (5, 'Five') CREATE TABLE #Test2 ( ID int NOT NULL PRIMARY KEY, Name varchar(50) NOT NULL ) INSERT #Test2 (ID, Name) VALUES (1, 'One') INSERT #Test2 (ID, Name) VALUES (2, 'Two') INSERT #Test2 (ID, Name) VALUES (3, 'Three') INSERT #Test2 (ID, Name) VALUES (4, 'Four') INSERT #Test2 (ID, Name) VALUES (5, 'Five') SELECT * FROM #Test1 t1 INNER JOIN #Test2 t2 ON t2.Name = t1.Name SELECT * FROM #Test1 t1 LEFT JOIN #Test2 t2 ON t2.Name = t1.Name DROP TABLE #Test1 DROP TABLE #Test2 If you run this and view the execution plan, you'll see that the INNER JOIN query does indeed cost more than the LEFT JOIN, because it satisfies the two criteria above. It's because SQL Server wants to do a hash match for the INNER JOIN, but does nested loops for the LEFT JOIN; the former is normally much faster, but since the number of rows is so tiny and there's no index to use, the hashing operation turns out to be the most expensive part of the query. You can see the same effect by writing a program in your favourite programming language to perform a large number of lookups on a list with 5 elements, vs. a hash table with 5 elements. Because of the size, the hash table version is actually slower. But increase it to 50 elements, or 5000 elements, and the list version slows to a crawl, because it's O(N) vs. O(1) for the hashtable. But change this query to be on the ID column instead of Name and you'll see a very different story. In that case, it does nested loops for both queries, but the INNER JOIN version is able to replace one of the clustered index scans with a seek - meaning that this will literally be an order of magnitude faster with a large number of rows. So the conclusion is more or less what I mentioned several paragraphs above; this is almost certainly an indexing or index coverage problem, possibly combined with one or more very small tables. Those are the only circumstances under which SQL Server might sometimes choose a worse execution plan for an INNER JOIN than a LEFT JOIN.
Anonymous at Stack Overflow Visit the source
Other answers
There is one important scenario that can lead to an outer join being faster than an inner join that has not been discussed yet. When using an outer join, the optimizer is always free to drop the outer joined table from the execution plan if none of the columns are selected from the outer table. For example SELECT A.* FROM A LEFT OUTER JOIN B ON A.KEY=B.KEY. Both Oracle (I believe I was using release 10) and Sql Server (I used 2008 R2) prune table B from the execution plan. The same is not necessarily true for an inner join: SELECT A.* FROM A INNER JOIN B ON A.KEY=B.KEY may or may not require B in the execution plan depending on what constraints exist. If A.KEY is a nullable foreign key referencing B.KEY, then the optimizer cannot drop B from the plan because it must confirm that a B row exists for every A row. If A.KEY is a mandatory foreign key referencing B.KEY, then the optimizer is free to drop B from the plan because the constraints guarantee the existence of the row. But just because the optimizer can drop the table from the plan, doesn't mean it will. SQL Server 2008 R2 does NOT drop B from the plan. Oracle 10 DOES drop B from the plan. It is easy to see how the outer join will out-perform the inner join on SQL Server in this case. This is a trivial example, and not practical for a stand-alone query. Why join to a table if you don't need to? But this could be a very important design consideration when designing views. Frequently a "do-everything" view is built that joins everything a user might need related to a central table. (Especially if there are naive users doing ad-hoc queries that do not understand the relational model) The view may include all the relevent columns from many tables. But the end users might only access columns from a subset of the tables within the view. If the tables are joined with outer joins, then the optimizer can (and does) drop the un-needed tables from the plan. It is critical to make sure that the view using outer joins gives the correct results. As Aaronaught has said - you cannot blindly substitute OUTER JOIN for INNER JOIN and expect the same results. But there are times when it can be useful for performance reasons when using views. One last note - I haven't tested the impact on performance in light of the above, but in theory it seems you should be able to safely replace an INNER JOIN with an OUTER JOIN if you also add the condition <FOREIGN_KEY> IS NOT NULL to the where clause.
dbenham
I know of several cases where a left join has been faster than a inner join. The underlying reason I can think of is this: If you have two tables and you join on a column with an index (on both tables). The inner join will produce the same result no matter if you loop over the entries in the index on table one and match with index on table two as if you would do the reverse: Loop over entries in the index on table two and match with index in table one. The problem is when you have old statistics, I assume that the query optimizer will use the statistics of the index to find the table with least matching entries (based on your other criteria). If you have two tables with 1 million in each, in table one you have 10 rows matching and in table two you have 100000 rows matching. The best way would be to do an index scan on table one and matching 10 times in table two. The reverse would be an index scan that loops over 100000 rows and tries to match 100000 times and only 10 succeed. So if the statistics isn't correct the optimizer might choose the wrong table and index to loop over. A left join on the other hand is not optimized in that way. It will always loop over the table and index that you choose.
Kvasi
Your performance problems are more likely to be because of the number of joins you are doing and whether the columns you are joining on have indexes or not. Worst case you could easily be doing 9 whole table scans for each join.
eddiegroves
Related Q & A:
- How to control the excessive use of ram by SQL Server?Best solution by Database Administrators
- Can I create a second filestream container on an existing SQL Server 2008 database without going offline?Best solution by Database Administrators
- How to connect Sql Server Database from android app?Best solution by Stack Overflow
- How to increase application performance with Centralised SQL Server?Best solution by Database Administrators
- How to write delete query with inner join?Best solution by codeproject.com
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.