SQL Server – Index Selection Fundamental
Index is one of most widely discussed topic among developers and DBAs. During query optimization Index is one of the important aspects. I came across multiple instances where Developer/ DBA creates Index but they are not sure whether that index is used by the optimizer or not. Also I received emails from developer asking How to design an Index? So Here I’ll try to explain the fundamental of Index Selection.
While designing an Index the basic idea is that the optimizer should use the index so that it will fetch the data using an optimal path with fewer disks I/O operation and less system resource usage.
Below are the important parameters for Optimizer to consider before it selects an Index to use
- Predicate from the WHERE clause
- Join Condition
- Other limiting option used in a query (Ex: Group by, Distinct etc.)
Using Index, optimizer can perform either Seek Operation or Scan Operation.
Seek Operation is used when fetching a single value or a range of value and in Scan operation it traverse through the Index pages either forward or backward.
All predicate used in where clause are not always used for Index selection. Predicate which can be converted into an Index operation are often called SARGgable or Search-ARGument-able and predicate that do not match the selected index are called non-sargable predicates which are used as filter predicate. SQL SERVER usually evaluated non-sargable predicates with in the seek/scan operator in the query tree.
Once you create an Index always checks with your query to determine whether it improves your query performance. If it is not improving your query performance then remove that index.
To understand SARGgable and Non SARGgable predicate, let me demonstrate it using a simple demo.
For this Demo I’m using AdventureWorks database which can be downloadable from here.
Step 1: create table Product_1 using Product table.
SELECT * INTO Product_1 FROM [Production].[Product]
Step 2: Create an UNIQUE Clustered Index on Product_1 for Column ProductID
CREATE UNIQUE CLUSTERED INDEX CIX_Product ON Product_1 (ProductID)
Step 3: Run the below query and check its actual execution plan. You can enable actual execution plan either by shortcut Ctrl+M or from the Query menu in SSMS.
SELECT * FROM Product_1 WHERE ProductID <100 and Makeflag=1
We can see below Optimizer uses the Index Seek operation to fetch the data. If you mouse over on Clustered Index Seek operator you will get the operator details. Here we can see that ProductID is used for Seek predicate as we have Index on that and column MakeFlag is used for filter predicate.
Actual Execution Plan:
If you liked this post, do like on Facebook at https://www.facebook.com/s4sql
SQL Server – New Analytical functions in SQL Server 2012 – Lead and Lag
LAG:
Using this function we can fetch the nth row before (Lag) the Current row without using self-join.
It is useful when we need to compare the current row values with values in a previous row.
Syntax:
LAG (scalar_expression [,offset] [,default])
OVER ( [ partition_by_clause ] order_by_clause )
EXAMPLE:
Create an Employee table and insert some records to it.
CREATE TABLE Employee (EmpID int, HikeDate date, Salary money)
GO
INSERT INTO employee values (1, '2009-04-01 00:00:00.000',10000),
(1, '2010-04-01 00:00:00.000',17000),
(1, '2011-04-01 00:00:00.000',23000)
GO
SELECT EmpID, HikeDate, Salary
FROM Employee
ORDER BY HikeDate
Output:
Now my requirement is to show what the previous Salary before current hike was?
Solution Using Lag function:
LEAD:
Using this function we can fetch the nth row after (LEAD) the current row without using self-join.
It is useful when we need to compare the current row values with values in a following row.
Syntax:
LEAD (scalar_expression [,offset] [,default])
OVER ( [ partition_by_clause ] order_by_clause )
Example:
My requirement is to show what my following year salary is?
Solution using Lead function:
Note: Default offset value is 1. We can specify offset value by a column or a sub query, or other expression that evaluates to a positive integer. Negative value or analytical function can’t be used as offset value.
If you liked this post, do like on Facebook at https://www.facebook.com/s4sql
SQL Server – Order of Column and Impact of equality operator on Index operation
Does order of Column in where clause matters?
Before you answer to this question I recommend all to read the blog post SQLAuthority.
Whenever I have any doubt, I always play around with it to clarify my doubt. While doing experiment with index operation, I learnt something and here I’m demonstrating what I learnt?
Let’s do some experiment,
In below experiment I’m using the AdevntureWorks database, which can be downloadable from Codeplex.
Demo – 1 (Order doesn’t matter)
Create Customer1 table from Sales.Customer table.
SELECT * INTO Customer1 FROM Sales.Customer
Now Create an Index on Column CustomerID,TerritoryID
CREATE INDEX IXNC_Customer1_1 ON Customer1 (CustomerID,TerritoryID)
Run below two queries, the only difference in the queries are the order of column in where clause.
–Query – 1
SELECT CustomerID, TerritoryID
FROM Customer1
WHERE CustomerID =1 AND TerritoryID =1
–Query – 2
SELECT CustomerID, TerritoryID
FROM Customer1
WHERE TerritoryID =1 and CustomerID = 1
Compare Execution plan for both the queries:
We can see in the above Demo (Fig-1) both queries uses Index Seek and query cost is same.
Now compare index seek operation for both of the queries, Seek predicates prefix CustomerID as the first column in both the cases and both CustomerID and TerritoryID used by SQL server as seek predicates.
Before proceed further, from Demo- 1 we can conclude that the order doesn’t matter in where clause.
Demo – 2 (It matters if other than first column of the index used in where clause)
Now let’s see what will happen if we use only one column in where clause?
Run below two queries. Here I’ve used CustomerID in first query and TerritoryID in second query respectively in where clause.
–Query – 3
SELECT CustomerID, TerritoryID
FROM Customer1
WHERE CustomerID = 1
–Query – 4
SELECT CustomerID, TerritoryID
FROM Customer1
WHERE TerritoryID = 1
In Demo – 2, We can see that when we use Column CustomerID in Where clause it uses Seek Predicate for CustomerID Column but in Query – 4, there is no seek predicate because it is using Index scan.
In Fig – 4, The actual query plan shows the actual number of row are more in Query – 4, so we may assume that in second query, the number of resultant rows are more so it uses Index Scan and It doesn’t matter whether we use first column or second column of the index in where clause. It does matter if we use only one column of the index.
To prove that, now let’s see if we have only one row for TerritoryID is equal to 1, which operation will be used by SQL Server.
Below query will delete all records from Customer1 table except one row where TerritoryID = 1.
DELETE FROM Customer1 WHERE CustomerID >1 AND TerritoryID = 1
Confirm it by running below query that we have only one record in Customer1 table where TerritoryID = 1
SELECT * FROM Customer1 WHERE TerritoryID = 1
Now let’s run the same query that is Query – 4
–Query – 5
SELECT CustomerID, TerritoryID
FROM Customer1
WHERE TerritoryID = 1
In above example we can see that even though there is only one row for TerritoryID = 1, It is not using TerritoryID column as seek predicate.
It is using Index scan because TerritoryID is not the first column of the index.
Demo – 3: The second column will participate in seek predicate Only If the first column of the index compares with equality operator irrespective of the order in where clause.
Note: As I’ve deleted some rows from the table. I’m dropping the table and recreating the Table and Index.
DROP TABLE Customer1
SELECT * INTO Customer1 FROM Sales.Customer
CREATE INDEX IXNC_Customer1_1 ON Customer1 (CustomerID,TerritoryID)
GO
Let’s run below two queries, where in first query, CustomerID is compared with equality operator and second query, CustomerID is compared with non-equality operator.
–Query – 6
SELECT CustomerID, TerritoryID
FROM Customer1
WHERE TerritoryID >1 AND CustomerID = 1
–Query – 7
SELECT CustomerID, TerritoryID
FROM Customer1
WHERE TerritoryID > 1 AND CustomerID > 1
In the above example, we can see that even though both the queries uses Index seek operation but in Query – 5, Both CustomerID and TerritoryID columns are participated in Seek Predicate and in Query – 6 only CustomerID is participated in Seek predicate and TerritoryID is used as filter predicate.
Conclusion:
- Order of column in where clause doesn’t matter.
- It matters if we use Columns except first column of the Index.
- If first column of the index compared with equality operator then only second column will participate in Seek predicate.
If you liked this post, do like on Facebook at https://www.facebook.com/s4sql.
SQL SERVER – Trace Flags
To know about a topic we usually ask questions. Here I used the same method to explain about trace flag of SQL Server.
Below are some questions related to trace flags.
What is the use of Trace Flag?
Trace Flags are used to enable or disable certain functionality of SQL server temporarily. It is used to diagnose performance issues or to debug stored procedures.
Where will I get all the trace flags available for SQL Server?
For SQL Server 2012 : – http://msdn.microsoft.com/en-us/library/ms188396.aspx
For SQL Server 2008 R2 : – http://msdn.microsoft.com/en-us/library/ms188396(v=sql.105).aspx
For SQL Server 2008 : – http://msdn.microsoft.com/en-us/library/ms188396(v=sql.100).aspx
For SQL Server 2005 : – http://msdn.microsoft.com/en-us/library/ms188396(v=sql.90).aspx
What are the types of Trace Flags?
SQL Server provides below two types of trace flag.
Session Level: Active and visible only to a specific connection where it is set. A session level trace flag never effect any other session.
Global Level: Active and visible at the server level (for all connections). Some trace flag can only be set at global level. If a global trace flag enable at session level there will be no effect of the flag on the server.
Note: Some of the trace flags can be enabled either at session level or at global level.
How to enable or disable a Trace Flag?
Using method DBCC TRACEON, we can enable a trace flag.
Ex: DBCC TRACEON (1224)
To set a trace flag at Global level use argument -1.
Ex: DBCC TRACEON (1224, -1)
Global level trace flag can also be set using –T start-up option. Using –T start up option we can’t enable any session level trace flag.
How to check a trace flag is enabled or disabled?
Using method DBCC TRACESTATUS, we can check the status of a trace flag.
Ex: DBCC TRACESTATUS (-1) will display list of trace flags enabled at global level
DBCC TRACESTATUS () will display list of trace flags enabled for current session
If you liked this post, do like on Facebook at https://www.facebook.com/s4sql.
SQL Server– Unique Constraint VS Unique Index
Both Unique constraint and Unique Index enforces uniqueness of the column. When we define an Unique constraint, SQL server creates an Unique non clustered index on the column where Unique constraint is defined.
Even though both of them can be used for same purpose but consider below points before you decide which one to use.
1 - From SQL server 2008 onwards we have Filtered index , Include columns which can’t be define on index which is created while defining Unique constraint but we can define it while defining Unique Index. We can have better control over the Unique index.
2 - Derived from point 1, sometime we need our column should be unique but it should allow multiple NULL values in that case by implementing Unique filtered Index we can achieve it. Also we can avoid Key lookup by adding included columns in an Unique Index but not in Unique constraint.
3 - Unique constraint always creates non clustered index. Unique Index can be defined either Clustered or non Clustered.
4 - You can’t drop only the index associated with the Unique constraint.
Leave your comment, If you have any points which we should consider before we decide which one to use between Unique Constraint and Unique Index.
If you liked this post, do like on Facebook at https://www.facebook.com/s4sql
SQL Server – Join Algorithm
Today we’ll find how SQL Server works when we do join operation. SQL Server optimizer chooses one of the below physical operator to perform join operation.
- Hash Match
- Merge Join
- Nested Loop Join
As far as performance is concern let me tell you, we can’t say which one is suitable for all kind of scenarios. Each operator has its own advantages and disadvantages. Most of the time query optimizer picks the correct physical join operator to perform join operation. Here we’ll find out how these operators works and in which scenario which operator is suitable.
Two processes together makes hash join i.e. Build and Probe. Build process create the build input. Optimizer chooses the smaller input to create the build input. During build process , it creates a build hash table by computing a hash value for each row from the build input. Then from the probe input it creates the hash value for applicable rows using same hash function and looks in build hash table for match. In case of multiple join on the same join column, these operation are grouped into a hash team. There are 3 types of hash join i.e. in-memory hash join, grace hash join, and recursive hash join. To know more on Hash Join refer MSDN.
| Apart from join operation, this operator is used for other operations which includes intersection, union, difference, grouping , distinct. Hash join algorithm is suitable for large unsorted or non indexed input. |
Lets try the below example:
Create two tables, Table1 and Table1. Insert some records to both the tables.
Create Table Table1(Col1 int identity, Col2 varchar(20))
Go
Create Table Table2(Col1 int identity, Col2 varchar(20))
Go
Declare @i int =0
While(@i<10000)
Begin
Insert Table1 values( 'A'+cast(@i as varchar))
if(@i%2=0)
Insert Table2 values( 'A'+cast(@i as varchar))
Set @i+=1
End
Select T1.Col2
From Table1 T1
Inner Join Table2 T2 ON T1.Col1 = T2.Col1
Select T1.Col2
From Table1 T1
Inner Join Table2 T2 ON T1.Col1 = T2.Col1 Where T1.Col1 = 1
Most of the time optimizer chooses Merge Join operator during Join operation when both the inputs are large and sorted on the join column. If Optimizer uses Merge Join, it scans an index if exist on the join column else it applies a sort operator before Merge Join Operator. During Merge Join operation it reads row from each input and compare them. To Know more on Merge Join refer MSDN .
| Merge Join is possible when the inputs are sorted and not small . The Merge Join operator performs the inner join, left outer join, left semi join, left anti semi join, right outer join, right semi join, right anti semi join, and union logical operations. |
Lets try the below example:
Case 1: Now we know that Merge Join required both inputs should be sorted. Here we’ll apply Merge Join Hint to join Table1 and Table2 to see how optimizer applies sort operator if inputs are not sorted.
Select T1.Col2
From Table1 T1
Inner Merge Join Table2 T2 ON T1.Col1 = T2.Col1
In case you are using Inner Join Operation with out any join hint and if you see your plan is look like above then in that case it is advisable to create index on join columns which will enhance the performance of join operation.
Case 2: Lets create an Unique Clustered Index on join columns and then will try to join the tables.
Create Unique Clustered Index IX_Table1_Col1 ON Table1(Col1)
Go
Create Unique Clustered Index IX_Table2_Col1 ON Table2(Col1)
Go
Select T1.Col2
From Table1 T1
Inner Join Table2 T2 ON T1.Col1 = T2.Col1
Even though we are not using any hint in the above query, Optimizer will use Merge Join operator to perform join operation because both inputs are already sorted when we defined index on the join columns. See below the Actual Execution Plan for the above select query.
This Join is most suitable when outer input is small and inner input is large and the join column is indexed. In this kind of join operation it process each row from outer input and loop through all rows of inner input to search for matching row based on join column. To know more refer MSDN.
| Nested loops joins perform a search on the inner table for each row of the outer table, typically using an index. The Nested Loops operator performs the inner join, left outer join, left semi join, and left anti semi join logical operations |
Lets try the below example:
Here we reduce the input rows by applying where clause.
Select T1.Col2
From Table1 T1
Inner Join Table2 T2 ON T1.Col1 = T2.Col1 AND T1.Col1 between 1 AND 36
For the above query we can’t say which operator is efficient. Now lets reduce the number of input rows and will see how it perform.
Select T1.Col2
From Table1 T1
Inner Join Table2 T2 ON T1.Col1 = T2.Col1 AND T1.Col1 between 1 AND 15
Select T1.Col2
From Table1 T1
Inner Merge Join Table2 T2 ON T1.Col1 = T2.Col1 AND T1.Col1 between 1 AND 15
Below is the Actual Execution Plan for the above query.
In the above screenshot we can see when the number of input is small Nested Loop Join perform better.
All of the above join operators perform well based on the criteria. We can’t say which one is better.
Can you answer to the below question?
To demonstrate merge join Why I created Unique Clustered Index? What will happen If I create Clustered Index without Unique key word?
Reference : MSDN
If you liked this post, do like on Facebook at https://www.facebook.com/s4sql
SQL Server – Error handling using Throw
MS-SQL Server 2012 introduced a new way of error handling using THROW statement. Prior to SQL Server 2012 we had to use RAISEERROR for error handling.
Throw statement raises an exception and transfers the execution to a CATCH block of a TRY…CATCH construct. Session will be ended if Throw is used without TRY…CATCH construct.
Syntax:
THROW [ { error_number | @local_variable },
{ message | @local_variable },
{ state | @local_variable }
] [ ; ]
To use error number in throw statement, it is not required to add the message in sys.messages table but the error number should be greater than equal to 50000. It can only raise severity of 16. Throw can be used without parameter in side CATCH construct.
You can find the difference between THROW and RAISERROR here.
The interesting point about throw is that you can re-throw the original error information from the catch block without passing any parameter.
Example:
Begin Try
print 'Start'
Begin Try
Declare @Name varchar(20) = '$XYZ'
If (LEFT(@Name,1)='$')
Begin
Throw 51234,'Name should not starts with $',1
End
End Try
Begin Catch
Throw;
End Catch
End Try
Begin Catch
Select ERROR_MESSAGE() As Error
End Catch
Output:
Reference: http://msdn.microsoft.com/en-us/library/ee677615
If you liked this post, do like on Facebook at https://www.facebook.com/s4sql
SQL Server – ROWCOUNT_BIG
ROWCOUNT_BIG returns the number of rows affected by the last statement executed.
Syntax:
ROWCOUNT_BIG ( )
This function works like @@ROWCOUNT. The only difference is in its return type. The return type of ROWCOUNT_BIG ( ) is BIGINT.
When you have large number of records where @@ROWCOUNT doesn’t work, you can use this function to find number of affected rows.
If you liked this post, do like on FaceBook at https://www.facebook.com/s4sql
SQL SERVER – OBJECTPROPERTY
I see many developers open tables in design mode to find whether a tables has an identity column or a delete trigger or a clustered index etc. defined or not.
Using function OBJECTPROPERTY , we can fetch various information about a table without looking at table in design mode.
It returns information about all schema- scoped object.
Syntax:
OBJECTPROPERTY ( id , property )
Example 1: Check whether a table has an Identity Column or not.
USE SQLcommitted
GO
IF OBJECTPROPERTY(OBJECT_ID('MyTable'), 'TableHasIdentity') = 1
Print 'Identity Column Exist'
Example 2: Check table is referenced by any foreign key or not.
USE SQLcommitted
Go
IF OBJECTPROPERTY(OBJECT_ID('MyTable'), 'TableHasForeignRef') = 1
Print 'Table is reference by a foreign key'
Example 3: Find all tables of a database where primary key is not defined .
EXEC sp_MSforeachtable
'IF OBJECTPROPERTY (OBJECT_ID(N''?''),''TableHasPrimaryKey'') = 0 print ''?'''
Refer OBJECTPROPERTY for more.
Guidelines to write International T-SQL?
Follow below guidelines to make your database or T-SQL code portable from one language to another.
- Use Unicode characters ( use nchar, nvarchar,nvarchar(max) in place of char, varchar, text data type)
- To compare date part (Month or day of week ) use numeric date part instead of the name string. Month and week days name string is different for different language setting. Use date part name only for displayed result sets because it is more meaningful as compared to numeric representation.
- For Date comparisons or for Input/update statement use constants which are interpreted same for all language settings.
Refer Guidelines to know more.
-
Archives
- March 2013 (1)
- January 2013 (3)
- December 2012 (2)
- November 2012 (1)
- August 2012 (4)
- July 2012 (2)
- June 2012 (2)
- May 2012 (1)
- April 2012 (5)
- March 2012 (1)
- February 2012 (1)
- January 2012 (5)
-
Categories
-
RSS
Entries RSS
Comments RSS

