Wednesday, February 29, 2012

How to Optimize a Stored Procedure using the Execution Plan

  1. Find the most costly statements
  2. Determine why the statement is costly
  3. Get an accurate baseline for the procedure
  4. Optimize

Find the Most Costly Statments

There are two main ways to determine costly queries. One is to execute the procedure including an Actual Execution Plan. The other is to get a time difference of a before and after for each statement. The are also other ways (including IO usage and CPU usage but we won’t cover those here).
I like to start with the execution plan method. Though this is not the most precise method, it is the easiest.. so let’s start there. Here is how to find the slowest query in a procedure using the execution plan.

Determine why the statement is costly

This can be them most difficult task. With the use of the execution plan, we can help deduce the most common issues.
Now that we’ve found the statement, we need to drill down even further to find out what is causing the statement to be slow. This is done by looking at the cost for each operation in the execution plan. Under each operation, you will see a “cost: x%”. You want to find the operation with the highest percentage and focus on that.
When you do find it, many times it will be one of the following operations:
Interpreting the Execution Plan
SymptomCause(s)example
Table scan, index scan, clustered index scanMissing or improper Indexes, cardinality < 5%
thick lines (arrows) from one operation to anotherBad Joins, missing filter operation
RID Lookups, Key LookupsData is not at the leaf level, can use include clause for indexes
ParalellismBad join, improper order of operations, maxdop or parallel threshold set too low on server
Scans
Scan operations are not bad when the cardinality of the column being searched is less than approx 6%, or when the amount of records being scanned is below around 500 records. Otherwise, if you have a scan that has a high cost associated with it, you will want to find the appropriate indexes to fix this issue.
Large Work Tables / Results
This is the issue above that is depicted by thick arrows. When this happens a lot of times it is a bad order of operations. What that means is that the optimizer did not choose the limit the overall result set by joining the smallest tables first. Instead, it is trying to join and filter from the largest tables, and then join the smaller tables. While this may not be the only scenario this happens in, it does happen often.
To fix this, you may consider breaking the query up and dump results into a temp table prior joining on the rest of the tables. Make sure to index your temp table!
RID Lookups / Key Lookups
These are also not always bad. In fact you cannot get around this issue all the time. Why? Because you can only have one clustered index, and you don’t want to include every column in the table in all your non clustered indexes. However if you run into this issue having a high cost, you will want to consider changing your clustered index, or adding the columns being looked up in the Key Lookup using the INCLUDE statement for indexes.
Parallelism
Parallelism can have a significant impact on queries and your server. What this means is that the amount of data being usurped by your query is very large, so in order to speed it up, SQL Server thinks it would be best to create more SPIDs to handle the operation. When this happens look for your page life expectency to dip low and your disk utilization to go high. Consider employing the same technique as above of breaking up your query. Also look at the order of operations and change the query around. If that doesn’t help, add the query hint OPTION (MAXDOP 1). Like this:
SELECT *
FROM dbo.mytable mt
JOIN dbo.yada y
ON mt.ID = y.ID
OPTION (MAXDOP 1)

Get an Accurate Baseline

Next before you make changes, get an accurate baseline prior to making changes so when you do make changes, you can be sure that they sped things up.
When you execute a procedure, the time it takes to execute is in the lower right-hand corner as shown below.
Run Duration
From the above, we can see the problem with only using the client side timer. It does not show milliseconds, and milliseconds do matter. The other problem that we cannot see above with the client-side timer is that it also includes the round trip time to the server and back to your computer. While this may seem like a more accurate depiction, it makes optimizing harder because you get less consistent results. In my opinion, it’s better to get the server-side timings.
To get the server-side timings, we turn on the time statistics, using set statistics time command.
We enable it using this:
SET STATISTICS TIME ON
Once executed, it stays on for the entire session (or window) until you close the window or turn it off (by replacing ON with OFF). So you only need to execute this one time, then you can remove the statement.
The next step needed to get consistent results is to clear the caches. One cache holds the compiled version of the SQL Procedure (the execution plan), the other cache holds the data pages that are retrieved from disk.
To clear the compiled execution plan, we use:
DBCC FREEPROCCACHE
To clear the data pages, we use:
DBCC DROPCLEANBUFFERS

Optimize

Now optimize and apply the changes discovered above! I know, easier said than done sometimes. Even though we did not cover all situations for query slowness, hopefully this will provide a good launching pad. We will delve further into fixes in a later article.

Find Best Clustered Index

The following query will compare the nonclustered indexes vs the clustered index and determine which index would qualify as the best clustered index based upon the DMV statistics.
If a clustered index does not exist on the table, it will also suggest one of the existing nonclustered indexes as the clustered index.
DECLARE @NonClusteredSeekPct float
DECLARE @ClusteredLookupFromNCPct float
-- Define percentage of usage the non clustered should
-- receive over the clustered index
SET @NonClusteredSeekPct = 1.50 -- 150%
-- Define the percentage of all lookups on the clustered index
-- should be executed by this non clustered index
SET @ClusteredLookupFromNCPct = .75 -- 75%
SELECT
TableName = object_name(idx.object_id)
,NonUsefulClusteredIndex = idx.NAME
,ShouldBeClustered = nc.NonClusteredName
,Clustered_User_Seeks = c.user_seeks
,NonClustered_User_Seeks = nc.user_seeks
,Clustered_User_Lookups = c.user_lookups
,DatabaseName = db_name(c.database_id)
FROM sys.indexes idx
LEFT JOIN sys.dm_db_index_usage_stats c
ON idx.object_id = c.object_id
AND idx.index_id = c.index_id
--AND c.database_id = @DBID
JOIN (
SELECT
idx.object_id
,nonclusteredname = idx.NAME
,ius.user_seeks
FROM sys.indexes idx
JOIN sys.dm_db_index_usage_stats ius
ON idx.object_id = ius.object_id
AND idx.index_id = ius.index_id
WHERE idx.type_desc = 'nonclustered'
AND ius.user_seeks =
(
SELECT MAX(user_seeks)
FROM sys.dm_db_index_usage_stats
WHERE object_id = ius.object_id
AND type_desc = 'nonclustered'
)
GROUP BY
idx.object_id
,idx.NAME
,ius.user_seeks
) nc
ON nc.object_id = idx.object_id
WHERE
idx.type_desc IN ('clustered','heap')
-- non clustered user seeks outweigh clustered by 150%
AND nc.user_seeks > (c.user_seeks * @NonClusteredSeekPct)
-- nc index usage is primary cause of clustered lookups 80%
AND nc.user_seeks >= (c.user_lookups * @ClusteredLookupFromNCPct)
ORDER BY nc.user_seeks DESC

SQL Server Indexes Tutorial

One of the important parts of SQL Server development and optimization is the creation of indexes. In order to create proper indexing strategies it is necessary to understand how indexes work. This tutorial will guide you step by step to fully understand indexes.
There are only two different types of indexes. Clustered and NonClustered. There can only be one clustered index on a table and the reason is simple:
  • A Clustered index is the data of table sorted according to the columns you choose.
  • A NonClustered index is just like the index of a book. It contains data sorted so that it’s easy to find, then once found, it points back to the actual page that contains the data. (In other words, it points back to the clustered index)
Suppose we are reading a book about biographical information of all the U.S. Presidents, and the book itself orders the biographies starting from the first president to the latest president. This ordering of the pages would represent the clustered index.
Now suppose you asked two different people to find Franklin D. Roosevelt’s biography. Person-A was a historian and Person-B was unschooled. Person-A would quickly be able to find the presidents biography while Person-B would have to scan through each page in order to find the biography. Even if the Person-B used the book’s index (akin to the non-clustered index), he would still have to search for the page after he found the page number.
So it is always faster to find information off of the clustered index because the data in already at the “leaf-level” off the index.
With this information, how do we determine what the clustered index should be? Well, it depends on the population of the people searching for the biographies. If it is mostly unschooled people, then it would be more efficient to sort the book alphabetically rather than the historical order of the presidents.
Now let’s say that 75% of the population are historians and the other 25% are unschooled. Let’s assume the data the historians will need consists of a lot of different information regarding the president’s biographies, while all the unschooled need is the president’s age at the time they took office. In this scenario, it is more plausible to keep the ordering of the book (or the clustered index) based on the order of the president, then simply add the age of the president in the back index of the book (the non clustered index). That way the unschooled people do not have to look into the front of the book (clustered index) for the president’s age. They could simply find it in the back of the book by doing one single lookup. The presidents age being stored in the rear index would be considered at the “leaf” level. This would satisfy both requirements and would be efficient for both historians and the unschooled group.
First let’s create our president’s table download and run: PresidentsTable
After running, let’s turn on the execution plan (In SQL Server Managment Studio place your mouse in the query window and select Query -> Include Actual Execution Plan)
Now execute the following query:
SELECT
PresidentNumber
,President
,YearsInOffice
,YearFirstInaugurated
FROM Presidents
WHERE PresidentNumber = 32
Now let’s view the execution plan:
tablescanexecutionplan
Without a clustered index, our book is in no particular order. To find president 32, we need to scan every page.
Now let’s add a clustered index so we can organize our book according to PresidentNumber:
CREATE UNIQUE CLUSTERED INDEX IDX_C_Presidents_PresidentNumber ON Presidents(PresidentNumber)
And let’s run our query again:
SELECT
PresidentNumber
,President
,YearsInOffice
,YearFirstInaugurated
FROM Presidents
WHERE PresidentNumber = 32
tableseekexecutionplan
Our execution plan now shows a “clustered index seek”. Meaning we did not have to look through every page of our book. We jumped right to page 32 and found the information on our president there.
In summary, when we look up information based on the clustered index (the way the table is physically sorted), we naturally find all the information we are looking for already there (President, YearsInOffice, YearFirstInaugurated).
Now, let’s create a non clustered index and look up the YearFirstInaugurated by president’s name:
CREATE NONCLUSTERED INDEX IDX_NC_Presidents_President ON Presidents(President)
Now let’s run our query to find the YearFirstInaugurated:
-- Force our query to use the index
-- (table is so small SQL Server bypasses it)
SELECT
YearFirstInaugurated
FROM Presidents WITH(INDEX(IDX_NC_Presidents_President))
WHERE President = 'Franklin Roosevelt'
If we look at our execution plan now, we will see that we initially looked the president’s name up in our index, then after finding the page where the presidents biography was located, we went to that page to grab the YearFirstInaugurated. This is denoted by the “Key Lookup”. (Also known as “Bookmark Lookup”)
bookmarklookup
This is a more expensive operation because our data is not at the “leaf-level” (or inline with the index we just searched), rather it is in the clustered index instead.
So how do we fix this? In SQL Server 2005, a new feature was introduced called “included columns”. This allows us to include data at the leaf-level of an index. So rather than looking up YearFirstInaugurated in the clustered index, we can find it in the nonclustered index. Let’s drop our index and include YearFirstInagurated in our nonclustered index:
DROP INDEX Presidents.IDX_NC_Presidents_President
GO
CREATE NONCLUSTERED INDEX IDX_NC_Presidents_President ON Presidents(President) INCLUDE(YearFirstInaugurated)
And run our query one more time:
-- Force our query to use the index
-- (table is so small SQL Server bypasses it)
SELECT
YearFirstInaugurated
FROM Presidents WITH(INDEX(IDX_NC_Presidents_President))
WHERE President = 'Franklin Roosevelt'
Now we only have an index seek. Because as soon as we looked the president up in the index, we immediately also found the YearFirstInaugurated:
indexseek