Performance Tuning Series: Part 3

In the Part 2 of the series, we have learned about the Key lookups and the Spools. We have seen when they can be useful and when they needs to be avoided with examples. In this session let’s learn about Sorts, Hash matches and Serial nested loops.

This is a continuation to the Performance tuning series Part 1 and Part 2. As I mentioned in the first part of this series, there is no end to this series. As I learn about performance tuning tips and tricks, I will keep on adding to this series. If you love to learn along with me and level up your Query tuning skills, Subscribe to dbanuggets Here and give me that boost to work more 🙂

Sorts:

Sorts are important and necessary for many queries as most of the queries needs to sort their results. Sorted input is often what we exactly need. We will see with an example of a query we run on BigTransactionHistory table to get 500 products that sold for the highest cost and sold for greater than 1500. We will get the rows from the bigtransactionhistory table where the rows are greater than or equal to 1500, tied to the rownumber and do the partition by productid so for each productid, we get a new rownumber or sets of rownumbers and orderby the actualcost dec and get the total 500 rows from there. Run the below query that does the same thing and check the execution plan.

SELECT TOP (500) WITH TIES
ProductID, 
ActualCost
from
(
SELECT 
productID,
ActualCost,
Row_number() over
(
Partition by
productID
order by 
Actualcost desc)as r
from bigtransactionhistory
where
Actualcost>=1500
) AS x
WHERE x.r=1
ORDER BY 
x.Actualcost DESC
GO

Query runs in 8 seconds. When you observe the query plan in the data centric view, first sql server did a NC index scan, sorted the data so we can have the row number, calculated the row number by window aggregate and filter operators and sort the rows again, gathering streams through parallelism and top 500 rows gathered. The first sort after the NC index scan is sorting many rows. When you hover over the mouse to the rows coming out of NC index scan, you can see how many rows are sorted.

What if we convert the same query by making sure we get the nested loops as they are cheaper. To get the nested loop in the execution plan, we can use CROSSAPPLY and remove the partition by and we run the query to get one product at a time.

SELECT TOP (500) WITH TIES
ProductID, 
ActualCost
from bigproduct as p
CROSS APPLY
(
SELECT 
bt.ActualCost,
Row_number() over
(
order by 
bt.Actualcost desc)as r
from bigtransactionhistory as bt
where
bt.ProductID=p.ProductID and 
bt.Actualcost>=1500 ) as x
where x.r=1
ORDER BY 
x.Actualcost DESC
GO

Check the execution plan. Now the query runs much faster in 3 seconds. It did a CI scan on bigproduct table and goes into the loop once per the product, it goes and grab the rows just for that product, sort just the rows for that product and calculate the rows just for the product (segment and sequence project). The sort here in this plan is also expensive but small sorts are very less expensive when compared with the sorts on huge amount of rows.

Hash Match:

Hash match operator is appropriate for the Datawarehouse workloads but not the OLTP workloads. It is appropriate for the medium set of rows against the large set of rows but not god for the large set of rows against the large set of rows. Let’s take an example. In my company, we regularly receive data from different vendors and we import the data into multiple tables depending up on the type of data. Before we import any data, we need to check if the data is already existed in the database. That’s the example we want to take here. Run the below query and check the execution plan.

SELECT TOP(5000000)
[SalesOrderID],
[OrderDate],
[Status],
[TotalDue]
INTO #sohe
FROM [Sales].[SalesOrderHeaderEnlarged]
WHERE
[SalesOrderID] BETWEEN 43659 AND 1181640


SELECT COUNT(*)
FROM #sohe as b
WHERE 
EXISTS(
SELECT * FROM [Sales].[SalesOrderHeaderEnlarged] as sohe
WHERE
sohe.[OrderDate]=b.[OrderDate]
AND sohe.[SalesOrderID]=b.[SalesOrderID]
AND sohe.[SalesOrderID] BETWEEN 43659 AND 1081640
)
OPTION (MAXDOP 1)
GO

When you observe the execution plan, we see the clustered Index seek on the SalesOrderHeaderEnlarged table. Index seek is on the smaller set of the SalesOrderId’s. It did not do the table seek on the temp table sohe because we do not have an Index on the temp table. That’s the SQL Server is forced to do the Hash Match. This is the type of situation where we see the SQL Server is forced to use the Hash match because we do not have proper index to seek.

As you can see, in the execution plan we have a Index recommendation that says the performance will be improved by 13%. Lets go ahead and create the recommended index on the temp table. Then run the same query again to see the difference in the execution plan.

CREATE CLUSTERED INDEX ix_SalesOrderID_OrderDate
ON #sohe
(
[SalesOrderID],[OrderDate]
)
SELECT COUNT(*)
FROM #sohe as b
WHERE 
EXISTS(
SELECT * FROM [Sales].[SalesOrderHeaderEnlarged] as sohe
WHERE
sohe.[OrderDate]=b.[OrderDate]
AND sohe.[SalesOrderID]=b.[SalesOrderID]
AND sohe.[SalesOrderID] BETWEEN 43659 AND 1081640
)
OPTION (MAXDOP 1)
GO

We have the Merge join this time because we have two sets of the data that are identical in size which is great for the merge join. Here the merge join is better than the Hash match. In this scenario, the index creation changed the hash match to merge join which is what we need here to improve the performance.

Serial Nested Loops:

Sometimes, we have never ending queries due to the serial nested loops. Let’s see this with an example. We will use Adventureworks2016 database and as my database is 2019 mode and the server is sql 2019, let me go ahead and place the Auto update statistics to be turned OFF on the database level. Then lets see how many rows have to be updated on the table Sales.SalesOrderHeaderEnlarged to hit the threshold to invalidate the statistics ()on the CustomerID column.

ALTER DATABASE  [adventureworks2016Big]
SET AUTO_UPDATE_STATISTICS OFF

UPDATE [Sales].[SalesOrderHeaderEnlarged] 
SET CustomerID = 11091
WHERE CustomerID < 16575;
GO

SELECT  
	[sch].[name] + '.' + [so].[name] AS [TableName] ,
	[ss].[name] AS [Statistic] ,


        [sp].[modification_counter] AS [RowModifications] ,

        SQRT([sp].[rows]*1000) [NewThreshold]
FROM [sys].[stats] [ss]
JOIN [sys].[objects] [so] 
	ON [ss].[object_id] = [so].[object_id]
JOIN [sys].[schemas] [sch] 
	ON [so].[schema_id] = [sch].[schema_id]
LEFT OUTER JOIN [sys].[indexes] AS [si] 
	ON [so].[object_id] = [si].[object_id]
	AND [ss].[name] = [si].[name]
OUTER APPLY [sys].[dm_db_stats_properties]([so].[object_id], [ss].[stats_id]) sp
WHERE [so].[object_id] = OBJECT_ID(N'Sales.SalesOrderHeaderEnlarged')
ORDER BY [ss].[stats_id];
GO

When you observe in the above results, we are focused on the IX_SalesOrderHeaderEnlarged_CustomerID statistics that is being created on CustomerID column. We need to update 49221 rows to invalidate this particular statistics as shown in the NewThreshold column. We updated 890k records as you see in the Rowmodifications. Now, the statistics is outdated but as you disabled the AUTO update statistics to be turned OFF, SQL Server will not go ahead and just update the stats the next time you run the query using that outdated statistics. Since there are no updates to the statistics, execution plan will still use the same outdated statistics information for the queries using CustomerID column. Let’s go ahead and run the query using the outdated CustomerID column statistics and check the execution plan.

SELECT
	oh.[CustomerID],
	oh.[OrderDate],
	oh.[ShipMethodID],
	od.[OrderQty],
	od.[ProductID]
FROM [Sales].[SalesOrderHeaderEnlarged] oh
JOIN 
[Sales].[SalesOrderDetailEnlarged] od
ON oh.SalesOrderID = od.salesOrderID
where [CustomerID]=11091

When you observe the execution plan above, the highlighted Red color rectangle box represents the estimated number of rows coming out of the SalesorderHeaderEnlarged table which is just 2156 rows but look at the actual rows on the top of the estimated rows. Its 890k rows which is way too above than the estimates. Now, why is that? Its because of our AUTO update stats is set to OFF. Update stats did not kick in when we run the query using the outdated stats as the option is set to OFF. Now let’s go ahead and enable the AUTO update stats to be turned ON and see the magic by running the same query one more time. Check the execution plan this time.

USE [master]
GO
ALTER DATABASE [adventureworks2016Big] SET AUTO_UPDATE_STATISTICS ON WITH NO_WAIT
GO

USE [adventureworks2016Big];
GO
SELECT
	oh.[CustomerID],
	oh.[OrderDate],
	oh.[ShipMethodID],
	od.[OrderQty],
	od.[ProductID]
FROM [Sales].[SalesOrderHeaderEnlarged] oh
JOIN 
[Sales].[SalesOrderDetailEnlarged] od
ON oh.SalesOrderID = od.salesOrderID
where [CustomerID]=11091

When you observe the execution plan now, we see the execution plan itself changed. The estimated are lined up very close to the actuals for the clustered index scan on the SalesOrderHeaderEnarged table.

Point to remember: When your Auto update stats is turned ON at the database level and when you statistic is outdated, the next time you run any query using that outdated statistic, that’s when the SQL Server will first go ahead and update that stats behind the scenes, invalidate the execution plan and create a new execution plan for that query and uses the newly updated statistics information in the execution plan. Once that is done behind the scenes, the query execution comes after that and user will get back the results. SQL Server here thought it would be better to do the clustered index scan as the number of the rows that comes out of the table are around 890k rows.

It is always better to make sure we keep the statistics up to date on the database level and maintain them using maintenance plans as well. For the older versions of SQL server before SQL Server 2016, optimizer will use a different threshold value to invalidate the stats. That is 20%+500 number of rows needs be changed to invalidate the stats and to kick the AUTO update statistics the next time a query runs using that outdated statistic. If your table is huge, then it is hard to get that 20% of the rows to be changed and to hit the threshold. For that reason, you have an option to enable the trace flag 2371 so you can use the new cardinality estimator which only needs SQRT(1000*number of rows) to be changed to hit the new threshold which is way smaller than older threshold.

Summary:

In the part 3 of the Performance tuning series, we learned about Sorts, Hash matches and Serial nested loops. We have seen when they can be appropriate and when they are not and how to avoid them when they are inappropriate and improve the performance of the queries. As we add more posts to this series, we will get better at tuning our queries.

Thanks for reading!

One thought on “Performance Tuning Series: Part 3

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s