Performance Tuning Series: Part 1

The more you know how to tune the slow running queries, the more confident you are in your job as a DBA. No matter how many times we tune the queries, most of us still feel we are missing something and we keep looking for any tips and tricks to make sure we implement them to run our queries faster! Am I right? I know this because I always feel very interested and get excited to read articles or attending webinars about “Performance tuning”. There is always something new to learn every single time you read about this topic.

In this series of articles, come along with me and learn some Performance tuning tips and tricks to make your queries run faster. This series will be an open series, meaning I will be adding new parts to it as I learn something new in this area. If you wish to learn along with me as we go, please feel free to subscribe by entering your email address in “WANT TO BE NOTIFIED?” section and get immediate updates to your email as soon as I add new posts to this blog.

In this first part, lets start with basics.

When we submit any query to the SQL Server, SQL Server optimizer will create a query plan behind the scenes for that query. Query processor does what the query plan tells it to do. SQL optimizer is a cost based optimizer. Which query plans costs less will be the winner and that “Good enough” plan will be saved in the plan cache. A Good plan is really essential for better performance of any query. A query plan is based on the estimates. Sometimes, due to the bad estimates optimizer might choose non optimal plan and save that plan inside plan cache. In that case, we need to be able to identify and fix the issues.

How Query plans works?

Let’s see an example of the estimation plan. We will be using Adventureworks2016 database. Let’s see the estimated execution plan first by running below query. Estimates show the row, execution and the cost estimates.

select * from [Person].[Person]

When you hover your mouse on the clustered index scan, you see the estimates.

Estimates shows the number of execution is 1. That means the estimate is to execute this operator clustered index scan once. Estimated number of rows per execution shows 19972 which is correct as we see 19972 rows coming out of the table which is 100% accurate. That means the estimates are accurate. Estimated row size is 8277bytes. On the top, we see the cost (estimated I/O cost, operator cost, subtree cost, CPU cost) which is the estimated number of seconds that the query is expected to run in the very earlier versions of the SQL Server. Internally, the cost is used by the optimizer as a metric so that it can decide what best plan it can come up with. This metric is only used at the optimization time but not at the run time. Apart from that, cost for us externally is not having much meaning.

Now, lets run the same query by enabling the actual execution plan turn on. The query plan is identical to the estimated execution plan we had previously. Almost all the times, the estimated plan is same to actual execution plan except when the recompilation occurs. When you observe the actual plan, we see the actual number of rows read shows as 19972 and the number of executions as 1. We will not have the actual cost because cost is not actual, it’s just a metric used by sql server at optimization time and not at the run time.

Let’s take another example with top 100 rows from table. First check with the estimated execution plan

select TOP(100) * from [Person].[Person]

When the clustered index scan is shown, we usually believe SQL Server will always scan and read the entire table, right? But that might not be always the case. Here, we only wanted top 100 rows. Look at the clustered index scan information by hovering the mouse on to clustered index scan operator. Actual number of rows shows as 100. Now, why is that?

Let’s see how the query plan is read in the first place. There are 2 ways we can read an execution plan. Data flows from Right to left and logic flows from Left to Right in any execution plan.

Each operator in the Query plan also known as iterator. In the above query plan, we have select iterator which will request the Top iterator to provide a row which the Top iterator will go ahead and request the clustered index scan iterator to provide a row. A row is passed from clustered index scan to the top iterator which then is passed over to the select operator. Again for the next row, the select iterator requests the top iterator to provide a row in which this process repeats back and forth for each row until the TOP iterator reaches 100. When you hover your mouse over to the Top iterator, we see the TOP expression which means internally it’s called Row Goal. Once the Row goal is hit on the TOP iterator and once the 100 rows are given to select iterator, the clustered index scan iterator will be shut down, later the top iterator will shut down and then the query completes.

Let’s see with an example (Credits: Adam Machanic)

select TOP(1000) 
from [Production].[Product] as p
INNER LOOP JOIN [Production].TransactionHistory as pch
and p.StandardCost<10

When you hover your mouse on the rows passing by top operator, we are getting 357 rows and from the clustered index scan, we see the number of rows read were 223 and not the total number of rows from clustered index scan which is 504 rows. This is because SQL Server doesn’t require to scan once the top operator is hit or once it hit some of the predicates. For each row we have nested loops which is a seek on the ProductCostHistory table.

When you hover your mouse on to the index seek on TransactionHistory table, you will see the number of executions for this index seek is 223 times and number of rows read were 40058.

When you check the number of executions for Index seek on TransactionHistory table, it is 40058. That’s too many index operations. These were some of the key things that we have to consider when we tune the queries. How many number of times the indexes were accessed and how many rows. When your execution plans are slow, then it is always good to look at the estimates vs actual number of rows. If they are different in some magnitude, then that might be a statistics problem or optimizer may be unable to deal in some situations like if you have multi statement table valued functions. Optimizer do not have information into these kind of things and doesn’t predict the number of rows coming out of these Table valued functions. In SQL Server 2012, TVF will predict only 1 row which in SQL Server 2014, it predicts 100 rows but in the latest versions of SQL Server 2019, it can estimate close to actuals by using the SQL Server 2019 Intelligent Query Processing: Table variable deferred compilation.

If that is a statistics problem, make sure the statistics are up to date. Thick lines in the execution plan like below represent the number of rows passing to the next iterator. Thin line meaning less number of rows passing on. The faster we can filter the data, the sooner we can push any predicates down into the plan, the faster the execution plan for the query will be. One of the example for this is using views where the predicate cannot be pushed down but by replacing views using the inline table valued function, the predicate can be easily pushed down by the optimizer without have to do any work.

We will need to check for the thicker lines in the execution plans and see the resource consuming operators which are lookups, sorts, spool, hash and serial nested loops.

Lookups: We have 2 types of lookups. RID lookups (Heaps) and Key lookups (clustered). Let’s say, you are trying to run a simple query searching on a column and that column have a NCI on that column. This query uses the NCI but what if you are also trying to get other columns in the select statement as well. In that case, SQL Server will use the NCI but to get the other columns, it will have to access either the heap table (for RID lookups) and clustered index (for the Key lookups). If there are lot of rows that we are getting, then this may lead to lot of lookups and cause performance problems.

Spools: These are used as the cache in the query processor. The query processor has to cache the data because either we have inadequate indexes or we did not tell the optimizer about the uniqueness of our data like the PK and Unique keys on your tables representing the uniqueness of your data. Spools are created as the hidden tables in Tempdb. Spools iterators will be storing the data at the run time. This may be the case when you do not have appropriate indexes needed on your tables.

Sorts: Many queries we run usually need sorts. Sorts can be really painful if we are sorting large amounts of input data. Order by, merge join, stream aggregation all needs sorting of data. Sorting in the SQL Server is very expensive. The other way to sort the data is through the application.

Hashes: Hashes are two types. We have hash match and hash match aggregate. Hash match is used for the joins and the hash match aggregate is used for aggregation (stream aggregation for ordered input and hash aggregation used for hash table). Hash match builds a hash table in the memory but if there is not enough memory to build a hash table then it will have to spill to disk. Hash joins are not preferable in OLTP when compared with data warehouses where hash joins are fine for the star schemas. For the OLTP systems, we need to check if we are missing any appropriate indexes and that may be the reason for the hash joins.

Nested Loops: These can be serial. Nested loops are the basic join methodology and will be best for the smaller inputs of data. Inappropriate nested loops can cause serious slowness in the queries. This can happen when your statistics are out of date.


In this post, we learned some basics of viewing the execution plan. Estimated and the actual execution plan. Clustered index scan doesn’t always means it is bad and doesn’t mean it is always scanning the entire table. We have learned about the resource consuming iterators and when do we see these iterators in our execution plans. We have learned few tips to improve the performance of our slow running queries.

In the coming post, we will see more examples on tuning the queries.

Thanks for reading!

4 thoughts on “Performance Tuning Series: Part 1

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s