SQLcommitted

Committed with SQL

SQL Server – Join Algorithm

Today we’ll find how SQL Server works when we do join operation. SQL Server optimizer chooses one of the below physical operator to perform join operation.

 

image

  • Hash Match
  • Merge Join
  • Nested Loop Join

As far as performance is concern let me tell you, we can’t say which one is suitable for all kind of scenarios. Each operator has its own advantages and disadvantages. Most of the time query optimizer picks the correct physical join operator to perform join operation. Here we’ll find out how these operators works and in which scenario which operator is suitable.

 

Hash MatchHash Match

Two processes together makes hash join i.e. Build and Probe. Build process create the build input. Optimizer chooses the smaller input to create the build input. During build process , it creates a build hash table by computing a hash value for each row from the build input. Then from the probe input it creates the hash value for applicable rows using same hash function and looks in build hash table for match.   In case of multiple join on the same join column, these operation are grouped into a hash team. There are 3 types of hash join i.e. in-memory hash join, grace hash join, and recursive hash join. To know more on Hash Join refer MSDN.

Apart from join operation, this operator is used for other operations which includes  intersection, union, difference, grouping , distinct.
Hash join algorithm is suitable for large unsorted or non indexed  input.

Lets try the below example:

Create two tables,  Table1 and Table1. Insert some records to both the tables.

Create Table Table1(Col1 int identity, Col2 varchar(20))
Go
Create Table Table2(Col1 int identity, Col2 varchar(20))
Go

Declare @i int =0

While(@i<10000)
Begin
Insert Table1 values( 'A'+cast(@i as varchar))
if(@i%2=0)
Insert Table2 values( 'A'+cast(@i as varchar))
Set @i+=1
End


 
Press Ctrl+M to show the Actual Execution Plan and execute the below query
 
Select T1.Col2
From Table1 T1
Inner Join Table2 T2 ON T1.Col1 = T2.Col1

 
You can see below the Actual Execution Plan for the above query where Optimizer uses Hash Match Operator to join Table1 and Table2.
Here Optimizer chooses Hash Match operator because of high volume of unsorted input rows.

image

 
If you move your mouse pointer to Hash Match operator , you can see the tool tip as below.
See in the below image, Table2 is used as Build input because Table2 has less number of records compared to Table1 and  Table1 is used as probe input.
 
Hash Match Tool Tips
 
If we significantly reduce the number of input rows in the above query by applying filter ,  Even though there is no Index defined, Optimizer will use Nested Loop Join operator instead of Hash Match Operator.
 
Lets try the below example,
 
Select  T1.Col2 
From Table1 T1
Inner Join Table2 T2 ON T1.Col1 = T2.Col1 Where T1.Col1 = 1
 
Below is the Actual Execution Plan for the above query, where optimizer chooses Nested Loop Join.
 
image
 
 
 

Merge Join Merge Join Operator

Most of the time optimizer chooses Merge Join operator during Join operation when both the inputs are large and sorted on the join column. If Optimizer uses Merge Join, it scans an index if exist on the join column else it applies a sort operator before Merge Join Operator. During Merge Join operation it reads row from each input and compare them. To Know more on Merge Join refer MSDN .

Merge Join is possible when the inputs are sorted and not small . The Merge Join operator performs the inner join, left outer join, left semi join, left anti semi join, right outer join, right semi join, right anti semi join, and union logical operations.

 

Lets try the below example:

Case 1: Now we know that Merge Join required both inputs should be sorted. Here we’ll apply Merge Join Hint to join Table1 and Table2 to see how optimizer applies sort operator if inputs are not sorted.

Select  T1.Col2 
From Table1 T1
Inner Merge Join Table2 T2 ON T1.Col1 = T2.Col1

image 

In case you are using Inner Join Operation with out any join hint and if you see your plan is look like above then in that case it is advisable to create index on join columns which will enhance the performance of join operation.

Case 2: Lets create an Unique Clustered Index on join columns and then will try to join the tables.

Create   Unique Clustered Index IX_Table1_Col1 ON Table1(Col1)
Go
Create Unique Clustered Index IX_Table2_Col1 ON Table2(Col1)
Go

Select T1.Col2
From Table1 T1
Inner Join Table2 T2 ON T1.Col1 = T2.Col1

Even though we are not using any hint in the above query, Optimizer will use Merge Join operator to perform join operation because both inputs are already sorted when we defined  index on the join columns. See below the Actual Execution Plan for the above select query.

 
image
 
 
 

 Nested Loop Join  Nested Loop Join

This Join is most suitable when outer input is small  and inner input is large and  the join column is indexed. In this kind of join operation it process each row from outer input and loop through all rows of inner input to search for matching row based on join column. To know more refer MSDN.

Nested loops joins perform a search on the inner table for each row of the outer table, typically using an index. The Nested Loops operator performs the inner join, left outer join, left semi join, and left anti semi join logical operations

Lets try the below example:

Here we reduce the input rows by applying where clause.


Select T1.Col2
From Table1 T1
Inner Join Table2 T2 ON T1.Col1 = T2.Col1 AND T1.Col1 between 1 AND 36

 
Below is the Actual Execution Plan for the above query.
 
image

 
Now lets compare how it will perform if we apply merge join for the above query. Here I use a range where cost of both the query is same. But if you  compare the Nested Loop operator cost and Merge Join operator cost, Nested Loop operator cost is very much smaller then Merge Join operator cost.
 
image
 

For the above query we can’t say which operator is efficient. Now lets reduce the number of input rows and will see how it perform.

Select T1.Col2
From Table1 T1
Inner Join Table2 T2 ON T1.Col1 = T2.Col1 AND T1.Col1 between 1 AND 15

Select T1.Col2
From Table1 T1
Inner Merge Join Table2 T2 ON T1.Col1 = T2.Col1 AND T1.Col1 between 1 AND 15

Below is the Actual Execution Plan for the above query.

image

In the above screenshot we can see when the number of input is small Nested Loop Join perform better.

 

All of the above join operators perform well based on the criteria. We can’t say which one is better.

Can you answer to the below question?

To demonstrate merge join Why I created Unique Clustered Index? What will happen If I create Clustered Index without Unique key word?

Reference : MSDN

If you liked this post, do like on Facebook at https://www.facebook.com/s4sql

About these ads

August 20, 2012 - Posted by | Performance Tips and Tricks, Transact-SQL | , , , , , , ,

No comments yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 60 other followers

%d bloggers like this: