Performance Tuning Series: Part 1

The more you know how to tune the slow running queries, the more confident you are in your job as a DBA. No matter how many times we tune the queries, most of us still feel we are missing something and we keep looking for any tips and tricks to make sure we implement them to run our queries faster! Am I right? I know this because I always feel very interested and get excited to read articles or attending webinars about “Performance tuning”. There is always something new to learn every single time you read about this topic.

In this series of articles, come along with me and learn some Performance tuning tips and tricks to make your queries run faster. This series will be an open series, meaning I will be adding new parts to it as I learn something new in this area. If you wish to learn along with me as we go, please feel free to subscribe by entering your email address in “WANT TO BE NOTIFIED?” section and get immediate updates to your email as soon as I add new posts to this blog.

In this first part, lets start with basics.

When we submit any query to the SQL Server, SQL Server optimizer will create a query plan behind the scenes for that query. Query processor does what the query plan tells it to do. SQL optimizer is a cost based optimizer. Which query plans costs less will be the winner and that “Good enough” plan will be saved in the plan cache. A Good plan is really essential for better performance of any query. A query plan is based on the estimates. Sometimes, due to the bad estimates optimizer might choose non optimal plan and save that plan inside plan cache. In that case, we need to be able to identify and fix the issues.

How Query plans works?

Let’s see an example of the estimation plan. We will be using Adventureworks2016 database. Let’s see the estimated execution plan first by running below query. Estimates show the row, execution and the cost estimates.

select * from [Person].[Person]

When you hover your mouse on the clustered index scan, you see the estimates.

Estimates shows the number of execution is 1. That means the estimate is to execute this operator clustered index scan once. Estimated number of rows per execution shows 19972 which is correct as we see 19972 rows coming out of the table which is 100% accurate. That means the estimates are accurate. Estimated row size is 8277bytes. On the top, we see the cost (estimated I/O cost, operator cost, subtree cost, CPU cost) which is the estimated number of seconds that the query is expected to run in the very earlier versions of the SQL Server. Internally, the cost is used by the optimizer as a metric so that it can decide what best plan it can come up with. This metric is only used at the optimization time but not at the run time. Apart from that, cost for us externally is not having much meaning.

Now, lets run the same query by enabling the actual execution plan turn on. The query plan is identical to the estimated execution plan we had previously. Almost all the times, the estimated plan is same to actual execution plan except when the recompilation occurs. When you observe the actual plan, we see the actual number of rows read shows as 19972 and the number of executions as 1. We will not have the actual cost because cost is not actual, it’s just a metric used by sql server at optimization time and not at the run time.

Let’s take another example with top 100 rows from table. First check with the estimated execution plan

select TOP(100) * from [Person].[Person]

When the clustered index scan is shown, we usually believe SQL Server will always scan and read the entire table, right? But that might not be always the case. Here, we only wanted top 100 rows. Look at the clustered index scan information by hovering the mouse on to clustered index scan operator. Actual number of rows shows as 100. Now, why is that?

Let’s see how the query plan is read in the first place. There are 2 ways we can read an execution plan. Data flows from Right to left and logic flows from Left to Right in any execution plan.

Each operator in the Query plan also known as iterator. In the above query plan, we have select iterator which will request the Top iterator to provide a row which the Top iterator will go ahead and request the clustered index scan iterator to provide a row. A row is passed from clustered index scan to the top iterator which then is passed over to the select operator. Again for the next row, the select iterator requests the top iterator to provide a row in which this process repeats back and forth for each row until the TOP iterator reaches 100. When you hover your mouse over to the Top iterator, we see the TOP expression which means internally it’s called Row Goal. Once the Row goal is hit on the TOP iterator and once the 100 rows are given to select iterator, the clustered index scan iterator will be shut down, later the top iterator will shut down and then the query completes.

Let’s see with an example (Credits: Adam Machanic)


select TOP(1000) 
p.ProductID, 
pch.TransactionID
from [Production].[Product] as p
INNER LOOP JOIN [Production].TransactionHistory as pch
with (FORCESEEK) ON
p.[ProductID]=pch.[ProductID]
WHERE
pch.ActualCost>50
and p.StandardCost<10

When you hover your mouse on the rows passing by top operator, we are getting 357 rows and from the clustered index scan, we see the number of rows read were 223 and not the total number of rows from clustered index scan which is 504 rows. This is because SQL Server doesn’t require to scan once the top operator is hit or once it hit some of the predicates. For each row we have nested loops which is a seek on the ProductCostHistory table.

When you hover your mouse on to the index seek on TransactionHistory table, you will see the number of executions for this index seek is 223 times and number of rows read were 40058.

When you check the number of executions for Index seek on TransactionHistory table, it is 40058. That’s too many index operations. These were some of the key things that we have to consider when we tune the queries. How many number of times the indexes were accessed and how many rows. When your execution plans are slow, then it is always good to look at the estimates vs actual number of rows. If they are different in some magnitude, then that might be a statistics problem or optimizer may be unable to deal in some situations like if you have multi statement table valued functions. Optimizer do not have information into these kind of things and doesn’t predict the number of rows coming out of these Table valued functions. In SQL Server 2012, TVF will predict only 1 row which in SQL Server 2014, it predicts 100 rows but in the latest versions of SQL Server 2019, it can estimate close to actuals by using the SQL Server 2019 Intelligent Query Processing: Table variable deferred compilation.

If that is a statistics problem, make sure the statistics are up to date. Thick lines in the execution plan like below represent the number of rows passing to the next iterator. Thin line meaning less number of rows passing on. The faster we can filter the data, the sooner we can push any predicates down into the plan, the faster the execution plan for the query will be. One of the example for this is using views where the predicate cannot be pushed down but by replacing views using the inline table valued function, the predicate can be easily pushed down by the optimizer without have to do any work.

We will need to check for the thicker lines in the execution plans and see the resource consuming operators which are lookups, sorts, spool, hash and serial nested loops.

Lookups: We have 2 types of lookups. RID lookups (Heaps) and Key lookups (clustered). Let’s say, you are trying to run a simple query searching on a column and that column have a NCI on that column. This query uses the NCI but what if you are also trying to get other columns in the select statement as well. In that case, SQL Server will use the NCI but to get the other columns, it will have to access either the heap table (for RID lookups) and clustered index (for the Key lookups). If there are lot of rows that we are getting, then this may lead to lot of lookups and cause performance problems.

Spools: These are used as the cache in the query processor. The query processor has to cache the data because either we have inadequate indexes or we did not tell the optimizer about the uniqueness of our data like the PK and Unique keys on your tables representing the uniqueness of your data. Spools are created as the hidden tables in Tempdb. Spools iterators will be storing the data at the run time. This may be the case when you do not have appropriate indexes needed on your tables.

Sorts: Many queries we run usually need sorts. Sorts can be really painful if we are sorting large amounts of input data. Order by, merge join, stream aggregation all needs sorting of data. Sorting in the SQL Server is very expensive. The other way to sort the data is through the application.

Hashes: Hashes are two types. We have hash match and hash match aggregate. Hash match is used for the joins and the hash match aggregate is used for aggregation (stream aggregation for ordered input and hash aggregation used for hash table). Hash match builds a hash table in the memory but if there is not enough memory to build a hash table then it will have to spill to disk. Hash joins are not preferable in OLTP when compared with data warehouses where hash joins are fine for the star schemas. For the OLTP systems, we need to check if we are missing any appropriate indexes and that may be the reason for the hash joins.

Nested Loops: These can be serial. Nested loops are the basic join methodology and will be best for the smaller inputs of data. Inappropriate nested loops can cause serious slowness in the queries. This can happen when your statistics are out of date.

Summary:

In this post, we learned some basics of viewing the execution plan. Estimated and the actual execution plan. Clustered index scan doesn’t always means it is bad and doesn’t mean it is always scanning the entire table. We have learned about the resource consuming iterators and when do we see these iterators in our execution plans. We have learned few tips to improve the performance of our slow running queries.

In the coming post, we will see more examples on tuning the queries.

Thanks for reading!

SQL Server 2019 Intelligent Query Processing: Approximate QP with Approx_Count_Distinct

In the previous post, we learned about Scalar UDF Inlining feature. In this post, let’s focus on the Approximate QP with Approx_Count_Distinct feature introduced as a part of SQL Server 2019 Intelligent Query Processing features. This feature is useful to get the approximate count of distinct values just like the  Count distinct function to get the distinct number of records but this new feature will take less amount of CPU and memory to process the query.

First let’s see with an example with the regular COUNT(DISTINCT Column) works and check for the number of rows as a result. For this below example, we are using AdventureworksDW2016 database.

SET STATISTICS TIME ON
select count(distinct LastName) from [dbo].[DimEmployee

Now, run the same query using Approx_Count_Distinct and see the distinct count

SET STATISTICS TIME ON
select APPROX_COUNT_DISTINCT(lastname) from [dbo].[DimEmployee]

We see the distinct count value is less by one when compared with the query using the COUNT(DISTINCT Column). Compare the execution plans for both the queries by choosing the select properties. When you check the compile CPU, memory and time comparing both the plans, query run with the Approx_Count_Distinct have lower values when compared with the other plan. The performance of query run with the new feature using Approx_Count_Distinct is better when compared with the query using COUNT(DISTINCT Column).

Let’s check with an another example using Adventureworks2016 database.

SET STATISTICS TIME ON
select count(distinct SalesOrderID) from [Sales].[SalesOrderDetailEnlarged]

This time, try to run the query using the Approx_Count_Distinct and check for the results count

When you compare the results, there is a little bit variation in the results. The query using Approx_Count_Distinct produced 27997 more than the number produced by the query using COUNT(DISTINCT Column). This feature will produce the approximate distinct column count by taking less resources. Let’s check the resource utilization by comparing both the actual execution plans produced by both the queries side by side. Bottom execution plan is generated by the COUNT(DISTINCT Column) which took 1.084s to perform the index scan and the top execution plan generated by using the Approx_Count_Distinct took 0.374s which is very less. Also check the properties of the select to see the difference in the compile CPU, memory and compile time for both the actual execution plans. Approx_Count_Distinct query took more values when compares to the query using COUNT(DISTINCT Column).

The approximate values do not have to always be more. There may be cases where query using Approx_Count_Distinct results in less distinct values when compared with the query using COUNT(DISTINCT Column). Let’s see this with a different example using COUNT(DISTINCT Column) first.

SET STATISTICS TIME ON
select count(distinct CreditCardID) from [Sales].[SalesOrderHeaderEnlarged]

Run the same query using the Approx_Count_Distinct

SET STATISTICS TIME ON
select APPROX_COUNT_DISTINCT(CreditCardID) from [Sales].[SalesOrderHeaderEnlarged]

This time we had less number of distinct count using Approx_Count_Distinct when compared with previously run query using COUNT(DISTINCT Column). This column CreditcardID is a nullable column and have null values. When you compare the actual execution plan select properties, we can see this time the CPU, memory time is less for the query using Approx_Count_Distinct when compared with the query plan using COUNT(DISTINCT Column).

Summary:

By using the Approximate QP feature with Approx_Count_Distinct, we can get distinct non nullable values in SQL Server with better performance than using COUNT(DISTINCT Column). In this blog post, we have seen both the scenarios showing the less values and more distinct values using this new feature. In three examples we have seen in this post, two sample queries were having performance boost while running the query using the new feature Approx_Count_Distinct with less compile CPU, Memory and time.

Thanks for Reading!

Server 2019 Intelligent Query Processing: Scalar UDF Inlining

In the previous post, we learned about Batch Mode on Row Store feature. In this post, let’s focus on Scalar UDF Inlining feature introduced as a part of SQL Server 2019 Intelligent Query Processing features. This feature is really useful for improving the performance of scalar user-defined function (UDF) inlining without have to modify any code.

Source: https://www.pexels.com/photo/macro-photography-of-tree-235615/

Scalar user defined function will return a single value by taking one or more parameters. For example, if you use a calculation that be used in many number of your queries, instead of repeating the code each time for the queries, we can place the same code inside the function and use the same function in all queries requiring the same type of calculations. These functions basically simplify the code. Scalar functions are good when they are used for small set of rows but might cause performance impact with large amount of data.

First let’s run a sample query on Adventureworks2016 database without and with the UDF function in it and see the I/O information

SET STATISTICS IO ON
SET STATISTICS TIME ON
SELECT
Product.Name AS PName,
Product.ProductNumber AS PNumber,
Product.StandardCost AS StandardPrice,
History.EndDate as Enddate
FROM  Production.Product Product
INNER JOIN Production.ProductListPriceHistory History
ON Product.ProductID = History.ProductID

Look at the number of the logical reads for both the tables, ProductListPriceHistory, Product table and total elapsed time.

Let’s run the same code this time adding just another UDF function to it and look at the information again.

SET STATISTICS IO ON
SET STATISTICS TIME ON
SELECT
Product.Name AS PName,
Product.ProductNumber AS PNumber,
Product.StandardCost AS StandardPrice,
History.EndDate as Enddate,
dbo.ufnGetProductListPrice(Product.ProductID,History.StartDate) as Listprice
FROM  Production.Product Product
INNER JOIN Production.ProductListPriceHistory History
ON Product.ProductID = History.ProductID

When you observe the Logical read for ProductListPriceHistory, Product table and total elapsed time its more this time (76ms).

The logical reads information is the same but the difference here is the elapsed time. Its a little difference here for my 395 rows but this time can differ significantly when you are pulling large number of rows. We cannot get the accurate I/O statistics information for scalar UDF functions. Scalar valued UDF also cannot calculate the Execution cost information accurately. When you check the execution plan of the query we executed before with the UDF in it, we can see the cost shown as 0% for the compute scalar operator but when you check the properties for the select we can see the time for the UDF (UdfElapsedTime) is almost 70%. This cost is not properly estimated by the optimizer causing the performance issues for queries using scalar UDF’s.

Scalar UDF’s always run the query in serial and not parallel. If you have a scalar UDF in your query but that not actually touching any of your data (Like for example, UDF getting the Getdate()), even then you can see that causing the performance issue because the query runs serially. Here in this above query execution plan, the compute scalar operator invokes UDF for 395 times because UDF also runs once per row. UDF query plan is not shown in the actual query as it could generate a different query plan for each execution which can be 395 times in this case.

Scalar UDF Inlining

Scalar UDF inlining feature will convert the UDF query into simple subquery that is placed inline into the code we have in our query with APPLY statement. All these operations happen during the query runtime so we would not have to change any code. To enable the new feature on SQL Server 2019, we need to have the database compatibility to 150 and run the same query having UDF and check the elapsed time now. It is 60sec.

ALTER DATABASE AdventureWorks2016 SET COMPATIBILITY_LEVEL = 150;

SET STATISTICS IO ON
SET STATISTICS TIME ON
SELECT
Product.Name AS PName,
SET STATISTICS IO ON
SET STATISTICS TIME ON
SELECT
Product.Name AS PName,
Product.ProductNumber AS PNumber,
Product.StandardCost AS StandardPrice,
History.EndDate as Enddate,
dbo.ufnGetProductListPrice(Product.ProductID,History.StartDate) as Listprice
FROM  Production.Product Product
INNER JOIN Production.ProductListPriceHistory History
ON Product.ProductID = History.ProductID

Now when you check the actual execution plan, you cannot see the UDF being invoked because of the inlining feature. You will not find the UdfCpuTime, UdfElapsedTime in the select properties as UDF won’t be invoked.

We can disable or enable the scalar UDF feature on the database level, function level and also at the query level.

Database level:

ALTER DATABASE SCOPED CONFIGURATION SET TSQL_SCALAR_UDF_INLINING = ON/OFF

Query Level using a HINT:

OPTION (USE HINT('DISABLE_TSQL_SCALAR_UDF_INLINING'));

Function Level:

CREATE OR ALTER FUNCTION UDF(@Parameter1 [int])
RETURNS [int] 
WITH INLINE = OFF /ON
AS 
BEGIN
--Code here
END

For this new Scalar UDF inlining feature, there are few requirements that the scalar UDF functions needs to meet. You can check if the function is inlineable or not by using the sys.sqlmodules and look for the column is_inlineable.

Conclusion:

Scalar UDF Inlining feature is introduced to improve the performance of your queries invoking the scalar UDF’s where the execution is performance bottleneck. As this new feature transform the UDF into scalar expression or scalar subqueries that are placed inline in the code of the UDF so the query gets called replacing the UDF operator. These expressions and subqueries are then optimized by the optimizer. As a result, we would not be able to see the UDF operator inside our execution plan providing the performance boost.

Thank you for Reading!

SQL Server 2019 Intelligent Query Processing: Batch Mode on Row Store

In the previous post, we learned about Table variable deferred compilation. In this blog, lets focus on the batch mode on rowstore feature introduced in SQL Server 2019. This feature improves the performance of the analytical queries using the batch mode query processing. This feature is for CPU optimization helping analytical queries to run faster. We do not have to specify this option if the database compatibility is 150.

This feature is especially for the analytical queries for CPU bound analytic workloads without needing the columnstore indexes. We can specifically mention the hints in the query for using the batch mode or not.

Columnstore Indexes are used for the data warehouses and analytical purposes improving the I/O and optimizing the CPU through the batch mode execution query processing. Columstore indexes are not suitable for the OLTP systems as these causes overhead for the deletes and update operations. Using batch mode on row store feature will help solve this problem for the OLTP analytical workloads.

Row mode execution will processes are performed on row by row basis where as the batch mode will process as a batch of 900 rows. This is a huge benefit as this feature will use the CPU efficiently.

First, Let’s see with an example of how disabling the batch mode on rowstore option works by running the below query turning the feature OFF

USE AdventureWorks2016Big
GO
ALTER DATABASE SCOPED CONFIGURATION SET BATCH_MODE_ON_ROWSTORE = OFF;

Then run the below query and check the actual execution plan

SELECT ModifiedDate,CarrierTrackingNumber ,
SUM(OrderQty*UnitPrice) FROM Sales.SalesOrderDetailEnlarged
GROUP BY ModifiedDate,CarrierTrackingNumber

When you observe the query plan and hover over the clustered index scan, the actual execution mode is ‘Row’ mode and it has processed 9341409 rows.

Now, enable the Batch mode by using the HINT in the same query and run again. You can also enable the Batch mode on database level option to turn back ON (as we turned OFF for the previous example)

SELECT ModifiedDate,CarrierTrackingNumber ,
SUM(OrderQty*UnitPrice) FROM Sales.SalesOrderDetailEnlarged
GROUP BY ModifiedDate,CarrierTrackingNumber
OPTION(USE HINT('ALLOW_BATCH_MODE'))

There are some limitations that will not allow the use of the batch mode processing like OLTP tables, Indexes that doesn’t have B-Tree structures, heaps, LOB columns, XML and Sparse columns. Batch mode on Row store feature will significantly improve the analytical workloads with queries having groupby, aggregations or sorts operations. Queries which will result in one few rows or queries which do not do any aggregation operations will not see much performance gains with this feature.

Summary:

In this post, we learned about Batch Mode on Row Store feature of SQL Server 2019. This feature will help the analytical workloads and use CPU effectively. If your databases are in 150 compatibility mode, then you are already taking advantage of this new feature.

In the next blog post, we will learn about the next Intelligent Query Processing feature T-SQL Scalar UDF Inlining Feature

Thanks for Reading!

SQL Server 2019 Intelligent Query Processing: Table variable deferred compilation

In our last post, we have seen one of the feature of Intelligent Query processing(Memory Grant Feedback feature). In this post, lets focus on the Table variable deferred compilation.

With SQL Server 2017 and below versions, SQL Server always estimates one row for table variable as the table variable data gets inserted during the run time, so optimizer doesn’t know how many values it can expect coming out of the table variable. Due to this bad estimation, performance of the queries is effected.

Let’s see this behavior with an example:

Place the database in 140 compatibility mode (SQL Server 2017) and run below query using a table variable enabling the actual execution plan

SET STATISTICS IO ON;
SET STATISTICS TIME ON;
DECLARE @IDinfo TABLE
([ID] INT, 
 [Person_id] [int]
);
INSERT INTO @IDinfo
       SELECT [ID], 
            [Person_id]  
       FROM test;
SELECT top 100 *
FROM @IDinfo F1
     JOIN test F2 ON F1.[ID] = F2.[ID];

Check the execution plan and see the estimated number of rows shows as 1 but actual number of rows were 100.

In versions later than SQL Server 2012 SP2, we have a trace flag 2453 which we can turn on so the table variable gets recompiled when the number of rows changes. Let’s run the same query by turning on the trace flag. This can be done at the query level or you can turn on the trace flag at the server level by using the command DBCC TRACEON(2453,-1).


DBCC TRACEON(2453);
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
DECLARE @IDinfo TABLE
([ID] INT, 
 [Person_id] [int]
);
INSERT INTO @IDinfo
       SELECT [ID], 
            [Person_id]  
       FROM test;
SELECT top 100 *
FROM @IDinfo F1
     JOIN test F2 ON F1.[ID] = F2.[ID];

Check the execution plan and see the estimated number of rows shows as 100 and actual number of rows were 100 which is 100% accurate estimate.

We can also recompile the query by using the OPTION(RECOMPILE) in the statement which will recompile the statement at every execution.

SQL Server 2019 Feature: Table variable deferred compilation

As we have seen in the previous examples, SQL Server estimates one row for table variables SQL Server 2017 and earlier versions of SQL Server. In SQL Server 2019, a new feature called Table variable deferred compilation compilation of the table variable statement is deferred until the first execution. This will help the optimizer to estimate the accurate number of rows which will help improve the query performance. This feature is available in SQL Server 2019 databases with compatibility level 150.

Change the database compatibility mode to 150 and run the same query one more time and check the execution plan

SQL Server will be able to estimate the number of rows perfectly.

Summary: With this new feature Table variable deferred compilation in SQL Server 2019, we don’t have a need to use trace flag 2453 or use OPTION(RECOMPILE) to specifically recompile the statements using the table variable. If you have queries with table variables that regularly gets populated with relatively same number of rows, this feature will get the better estimates with out we changing our code. However, if the number of rows that populate the table variable changes greatly and if the queries doesn’t cause the high compilation times, then considering this new feature along with OPTION(RECOMPILE) is helpful.

Thanks for reading!

SQL Server 2019 Intelligent Query Processing: Memory Grant Feedback

In our previous posts of Memory Grant Internals, we have learned all about the memory grant internals. We have learned from where do they come from and how SQL Server will manage the memory for the memory grants.

In this series, we will learn all about SQL Server 2019 Intelligent Query processing.

Intelligent query processing is a set of features which helps improve the performance of the queries with very less effort in implementation. Below flow chart shows all the features that were released in SQL Server 2017 and SQL Server 2019 related to Intelligent query processing. In this series of posts, lets learn about each Intelligent query processing feature released in SQL Server 2019.

Source: Microsoft

In this part, let us focus on the Memory Grant Feedback feature released in SQL Server 2017 for Batch mode execution and in SQL Server 2019 for Row mode execution.

Memory Grants are used by the SQL Server for the Hashes and Sort operations. SQL Server optimizer uses the statistics information and allocate the memory needed by the query before the query executes. When the query is executed, SQL Server uses the allocated memory to process the query for the hashes and sorts. If this memory grant is not enough to process the query, data will use tempdb spilling to disk. When too much memory is allocated based up on the estimates, we can effect the concurrency as all other queries requires memory grants to process the queries as well. Bad estimates can effect the memory grants allocated to the queries. Too much or too little memory grants is bad.

Now, how do we solve this issue?

By making sure we maintain the statistics up to date. Sometimes, we may see the queries spilling to disk due to low memory grant allocation than needed, in this case its really not possible to stop queries spilling to disk if we are in SQL Server 2016 or below versions. If you are on SQL Server 2019, Memory Grant Feedback is enabled on databases by default.

Memory Grant Feedback feature will collect the information of the memory that has actually spilled to disk and add that memory to the next executions of the query plus a possible buffer. In reverse if more memory grants are allocated to the queries than needed, Memory Grant Feedback feature will save the information in the query plan of how much memory has not being used and decrease that memory to the next executions plus a possible buffer. This feature adjust these grants as needed by the query.

Memory Grant Feedback is really helpful as this feature is especially to estimate the necessary amount of memory grants needed for the query to run without any performance degradation.

Let’s see how the queries work without Memory Grant feedback. As my database is in SQL Server 2019 compatibility mode, lets turn off the feature first, create a stored procedure and execute the stored procedure with a date range values for a year. Check for the actual execution plan.

ALTER DATABASE SCOPED CONFIGURATION 
	SET ROW_MODE_MEMORY_GRANT_FEEDBACK = OFF;
GO
DROP PROCEDURE IF EXISTS [Sales].[customerdata_OrderDate];
GO

CREATE OR ALTER PROCEDURE [Sales].[customerdata_OrderDate]
	@StartDate DATETIME,
	@EndDate DATETIME
AS
SELECT
	oh.[CustomerID],
	oh.[OrderDate],
	oh.[ShipMethodID],
	od.[OrderQty],
	od.[ProductID]
FROM [Sales].[SalesOrderHeaderEnlarged] oh
JOIN 
[Sales].[SalesOrderDetailEnlarged] od
ON oh.SalesOrderID = od.salesOrderID
WHERE [OrderDate] >= @StartDate 
	AND [OrderDate] <= @EndDate
ORDER BY [OrderDate];
GO
DECLARE @StartDate DATETIME = '2012-01-01'
DECLARE @EndDate DATETIME = '2013-03-31'

EXEC [Sales].[customerdata_OrderDate] @StartDate, @EndDate;
GO

When you observe the actual execution plan, you see the warning that sort operator used tempdb to spill the data and wrote 2234 pages to disk.

SQL Server allocated memory grant for this query is not sufficient so it had to spill to disk. As the memory is granted to any query before execution, SQL Server cannot grant the memory on the fly during the query execution. This results in sorting data in tempdb which is I/O intensive. This causes queries to run very slow impacting query performance.

Now let’s enable the Memory grant feature to ON and run the query again. Check actual execution plan.

ALTER DATABASE SCOPED CONFIGURATION SET ROW_MODE_MEMORY_GRANT_FEEDBACK = ON;
GO

Sort did not spill to disk this time. Look at the Memory Grant Feedback info from the properties. IsMemoryGrantFeedbackAdjusted shows as ‘NoFirstexecustion’. This is because we executed the stored procedure for the first time after we enabled the memory grant feedback feature on the database and that is the reason we also see LastrequestedMemory as 0. The requested memory shows as 117704KB.

Let’s run the same query one more time and see the IsMemoryGrantFeedbackAdjusted status. Now the status shows as ‘YesAdjusting’. The last requested memory is 117704KB which is exact memory we have seen for our last execution RequestedMemory. This time the RequestedMemory is 75016KB.

Let’s run the query once again and see the status. Now the status shows as ‘YesStable’. The last requested memory is 75016KB which is exact memory we have seen for our last execution RequestedMemory. This time the RequestedMemory is the same 75016KB as now the status shows as stable.

Isn’t it wonderful? OH YES!! but we need to remember few things here. What if we have lot of skew in our data and if the results drastically differ with each variable? Like for example, if your stored procedure gives 1000 rows with a variable and 1 million rows for different variable. If you have to execute the store procedure frequently, then the memory grant feedback tries to adjust itself to allocate the memory required by the query but as these results differ in many rows, memory grant feedback cannot adjust itself, it goes back and forth changing the memory grants and eventually turns itself OFF.

To describe this scenario, lets take an example of the same query we executed before but this time we run two different executions. First execution of the stored procedure with one year date range results shows as 600,000 rows and second query results in 6,000 rows. Let us the run these queries 20 times. Total 40 executions.

EXEC [Sales].[customerdata_OrderDate] '2012-01-01', '2013-03-31'
go 20
EXEC [Sales].[customerdata_OrderDate] '2012-01-01', '2012-01-08'
go 20

On the 22nd execution, I see the Memory Grant feedback info status shows as ‘YesAdjusting’ but when you check the 23rd execution, status shows as ‘NoFeedbackDisabled’. Once it shows as disabled, that means the memory grant feedback is turned off for this execution plan.

Points to Remember: Feedback will be change only the Cached plan. For the first execution, the memory grant feedback information will be stored inside the cached plan. In the next executions of the same query, the benefits of the feedback adjustments will be used. Feedback information is not captured in the query store. If there is memory pressure or if in any case the cached plan is evicted out of the cache, feedback will be lost for that evicted plan. If there is a failover, in that case feedback will be lost.

Note: Memory Grant Feedback is not the replacement for the out of date statistics.

Summary: Memory Grant Feedback feature is really useful in scenarios like pulling reports every couple of months with different variables. If you have lot of skew in your data, results differ a lot with each variable and you need to run the query very frequently, in that case Memory Grant feedback may not be helpful as the feature will turn itself OFF at certain point of time while trying to adjust itself. We need to make sure our statistics are up to date and we maintain them regularly.

Thank you for Reading!

T-SQL Tuesday #135: Tools of the Trade

With the Invitation of Mikey Bronowski for this month T-SQL Tuesday Tools of the Trade, I would like share some useful tools that make my job easier:

  1. OBS Studio: This is a free and open source software for video recording and live streaming. I mostly prerecord my sessions using OBS. I personally love this tool as we have pretty much good content on YouTube that teach us how to use this tool.
  2. SentryOne Plan Explorer: Plan explorer is an amazing tool to analyze your execution plan and tune your queries very quickly. Its completely free.
  3. SQL Search: It is a free search for databases. If you need to find any specific object/column or any string on any database on the server, this tool is very useful. I use SQL search regularly.
  4. SQL Compare: SQL compare tool is very useful to compare the database schema and data between environments. I use this tool regularly.
  5. SQLQueryStress: It is a lightweight performance testing tool, designed to load test individual queries. I personally use this tool for my demos and testing any queries for performance.
  6. Zoomit: ZoomIt is a free screen zoom and annotation tool for presentations. I use it regulary and I am sure most of the speakers already know about the tool.
  7. Diffchecker: A very useful too to compare code/documents.
  8. Adobe Lightroom: This is a photo editing tool which I personally liked. It is professional and used by many photographers across the world.
  9. Free Download Manager: I use this tool for my downloads. It is much faster to download files. I use this tool especially for YouTube downloads. If you need to download the playlists at a time in a single shot, this tool is really useful. It is completely free.
  10. Nimbus Screenshot: It is a chrome extension. This tool is really useful when you wanted to take the screenshot of entire page. You can select the area you wanted to take the screenshot by scrolling the page. Very useful.

These are some of the useful tools I use regularly. I would also like to see the list of the tools others feel useful. I would love to start using those tools as well.

Memory Grant Internals Part 5

This part is the continuation of the series to the post Memory Grant Internals part 4

In this final part of the series, lets talk about how we balance these memory grants. If lot of your queries are requesting for more memory than they required, then that causes concurrency issues. Memory grants are allocated to every query even before they get executed based on the cardinality estimates and memory consuming iterators. Once the query memory is allocated to a query, that memory is only released once the query execution completes even if the query actually uses a part of the allocated query. If more memory is requested by the queries than they need, then we are wasting the memory. What happens if the queries only receive less memory grants than they need, then we there is a performance impact on the running queries.

These issues occurs due to some problems. If your estimates are inaccurate, if the plans that got generated are non-optimal in scenarios where you have skew in your data and creating a plan compiled based on your initial parameter value. For example, if you execute the stored procedure for the first time using a parameter value giving you 100 rows, a plan gets generated and with a next parameter gives you 2 million rows, we still use the same execution plan which will be non-optimal for these parameters having large number of rows. This is also known as parameter sniffing. Sometimes, we have “out of model” situation meaning anything we put into a query plan that the query optimizer isn’t handle to set up properly. We need to make sure we are out of “out of model scenarios” to get good estimates. Sometimes, rewriting of the queries will be helpful as well.

There are few ways where we can fix these problems.

  1. Updating the statistics. Even you have the default auto update statistics set to ON, maintaining the statistics regularly using the maintenance plan
  2. Recompiling the query. Recompiling will help keep away the parameter sniffing issues.
  3. Try to avoid the multi statement TDF’s. From SQL Server 2017, Adaptive query processing is introduced where the estimates for these TDF’s are lined up with the actual number of rows but that is not the case for the older versions of SQL Server.
  4. You can use the OPTION(OPTIMIZE FOR) to influence the optimizer.
  5. Check for the variable length Data types. For example, we have a column with varchar(4000) data type but all your values are just using a single character. The estimate is going to be for the average row size is going to be half. Varchar and other variable length data type columns estimate only 50% of the defined size.
  6. Use the resource Governor. If you have many high grant and low cost queries waiting in the queue, you can categorize them into a workload group by creating a pool for just for these types of queries. If you need concurrency, limiting the grant percentage and the setting the timeout options for resource pool helps.

Lets see with the examples.

  1. Low memory grant even the stats were updated. Run the below query and see the actual execution plan along with the extended event live data to see if it spills to disk.
select top(1000) * from
(
select top (600000)
* from DBO.bigTransactionHistory
)as s
order by ActualCost desc
OPTION (MAXDOP 1)
GO

The memory grant was 55MB which is very less than the maximum grant. There is a spill warning in the event information as well. Why did the spill happen when we have enough memory? Statistics were up to date but may be the estimated row size might differ with the actual and that is the reason it had to spill to disk. In this case, the optimizer was not able to get the good plan for this query.

2. Multi statement table valued function. In the second example, lets see how many rows multi statement variables get estimated. First lets see for the sql server 2012 database compatibility.

SELECT tst.CustomerID, COUNT(*) 
FROM  Sales.Customer c 
INNER JOIN dbo.TEST_MTVF('1/1/2014') tst on c.CustomerID = tst.CustomerID
INNER JOIN Production.Product prod on tst.ProductID = prod.ProductID
GROUP BY tst.CustomerID

When you observe the actual execution plan and hover the mouse over to the rows passing from table to nested loop iterator. The estimated number of rows is 1 but the actual number of rows are 37339 which is a lot. For SQL Server 2012 and below, the estimates to the MTLF is always 1 no matter how many actual number of rows were there.

Now, lets place the database in the sql server 2016 model and see the estimates. Run the same query again and check the actual execution plan to see the estimates. When you observe below, the estimates were showing 100 rows. For SQL Server database compatibility of both 130 (SQL Server 2016) and 120 (SQL Server 2014) estimates for MTVF always show 100 rows no matter how many actual rows might be.

Lets see for the latest versions 2019. Run the same query and check the estimates for MTVF. Now the estimates are exactly lined up with the actual number of rows. This is the same for SQL Server 2017 database compatibility mode (140) That’s because in 2017, Adaptive query processing is introduced which will optimize the execution plan in a better way and get the correct estimates for MTVF’s.

3. Variable length Data types. Lets run the below query which have a varchar datatype and see how the estimates work here:

select *, CONVERT(VARCHAR(8000), 'a') as new
into #B
from bigProduct
GO

select * from #B 
CROSS JOIN
( SELECT 1
UNION ALL 
SELECT 2
UNION ALL
SELECT 3
) AS n (l)
ORDER BY name, l
GO

While running the above query, if you query select * from sys.dm_exec_query_memory_grants, you will see the granted memory is 183MB but used memory is only 19MB. Rest all the memory which is granted is wasted. Based on the varchar(8000), memory grant has been allocated but when in actual only some memory is being used to process the query. Due to this, all the other queries running concurrently have to wait for this query memory to be released. Once the memory has been granted, memory doesn’t be released whether the query uses that memory or not.

4. Using TOP with (optimize for) option. In this example, lets run the below query and see the event information. we can use a variable and use the optimize for option like below. There is no sort spilling warning.

declare @C int=100000
select top(1000) * from
(select top (@C)
*
from sales.SalesOrderHeaderEnlarged
) as a
order by
TotalDue desc
option (MAXDOP 1, OPTIMIZE FOR (@C=150000))
GO

We can use Optimize for option in many ways where we can limit the grant by mentioning less number of rows in optimize for option to get lower amount of grant where as if the memory grant is needed more than optimize for bigger value than we mentioned in the variable, like the see in the above example.

5. We can also change the resource semaphore options request_max_memory_grant_percent to the needed percentage. The default is 25%. We can place the limit on how much percentage the workload group can take. In below example, We can set the limit by altering the workload group with 15% like below:

Alter Workload Group [default] with
(
request_max_memory_grant_percent=15
)
GO
Alter Resource Governor RECONFIGURE
GO

Summary:

In this final part of the series, we have learned how to balance these memory grants with examples. This is our final part for the Memory Grant Internals Series. Too much or Too little memory grants is bad. Balancing the memory is important. During these parts of the series, we started the learning path with memory consuming iterators, how these iterators work and share the memory between them, what is a resource governor, parts of the resource governor, how the spills to disk happens with these iterators, how to identify these spills and finally how to avoid them.

Hope you liked the series and learned from it. If you have any ideas about the topics we can cover in coming blog posts, please do share with me. I would love to learn and share the knowledge while I learn along with you.

Thank you so much for all your support!

Thank you for reading!

Memory Grant Internals Part 4

This part is the continuation of the series to the post Memory Grant Internals part 3

In our previous part of the series, we have learned about what are these query memory/work space memory and how it works, what happens when the memory allocated to the queries is not sufficient and how the queries have to wait before they get the memory grants. In this part4 of the series, lets learn about how queries will spill to disk when the grants allocated are not enough and how to make sure we address and solve these issues.

If your tables are huge and you are sorting them in your queries, then it will take enormous amount of memory to do the sorting task and it is very hard to allocate memory for such sorts unless you have a huge amount of memory. In such cases, data have to spill to disk. Spilling to disk meaning using the “SQL server scratch pad” which is tempdb to do these sorts and as these spills directly to tempdb on disk, our queries become very slow to run.

As we learned in the first part of the series, memory is needed by these memory consuming iterators: Hash match, Sorts and Exchange iterators. There are two phases for each of these iterators. A build phase and a probe phase. Lets go and learn how these three iterators work in detail:

Hash Match: At the build phase, it will take the outer input and build the hash table. It builds a set of hash buckets. Now what are these hash buckets? Each hash bucket is like a memory location which is a pointer to the linked list of rows. Any rows that will hash to the bucket go into the linked list and be placed there. So, the input side is scanned, buckets are created, rows are hashed and then these rows are placed into the linked list in the respective buckets using a Hash function and during this phase, we have a hash table created. Next, the match operation will start where the inner set will be scanned and the rows on inner set are hashed and go to the respective buckets as well. Then they iterate over the linked list to find the match rows.

How do the hash match, spills to disk?

During the build time, the memory is allocated based on the cardinality estimates and the estimates are based on the input size and the probe. The buckets are created in the memory and we place the rows in those respective buckets. If the grant is exceeded then some of the buckets will be send to the disk. As some of the buckets are already in memory and some in disk, the initial probe of the data using the inner set. The probe rows will be scanned if the row hash to the in memory bucket the match is done for those rows. If the rows match to the on disk bucket, the probe row will be return to the disk along with the outer side bucket. So, because of this we have more disk writes to tempdb at the probe time. This is a build side spill.

There is also probe side spills that can occur at the second phase of the hash match iterator. The buckets that has spilled to disk from the first phase should be now moved to the memory back again and then rebuild the hash table, then the spilled inner probe rows will be read back to the memory to match. In this situation the memory is exceeded again so the buckets have to be rewritten to disk. This is called recursion. So this will be the cycle that the plan will not complete making the plan to bailout. Bailout meaning the plan will use the unoptimized nested loop which cannot complete as it uses inner table multiple times. This plan will use little memory but it can run forever.

Sort: Sort will begin sorting in the memory space that was allocated using the quicksort algorithm (The memory requirement is of at least 200% of the input size). If the memory grant is exceeded even a little bit then the entire intermediate set will spill directly to disk and the sort will continue on the disk completely using the merge sort algorithm. . For the Hash match, if the memory grant is exceeded then few of the buckets will go and spill to tempdb.

Exchange: Exchange iterators will deal with the parallel set of rows from multiple threads. There can be three types of exchanges. Distribute(one stream comes in and multiple streams comes out), repartition(depending on the DOP, n streams can come in and n streams can come out and distributed through a different algorithm) and Gather(DOP n streams comes in and 1 stream comes out). For the distribute and the gather, we have DOP set of buffers but for repartition, there is DOP on each side (DOPx2 sets of buffers). There are two sides for the exchange iterator. Producer and consumer side. These sides of the iterator will communicate with each other using buffers/packets of rows. Each side has DOPx2 per stream sets of buffers.

Gather and repartition iterators can come as merging or order preserving meaning if all the streams that are coming in are ordered then the iterators can output the data in the same order. If there is a data skew and if the huge amount of data is flowing in and flowing in very fast then the merging iterators can not keep these as the same order. In such cases, all the data will be spilled to disk and resorted. This situation is called as intra query parallel deadlock which will extremely slow down the queries.

How to check for these spills?

sys.dm_os_waiting_tasks where all the spills will be waiting on the wait IO_completion which means the slow disks. We can also use the SQL Trace to monitor these spills by capturing the Exchange events Hash warning, sort warning and exchange spill. We can also use the sys.dm_db_[task|session]_space_usage which will have the columns for user object allocations like temp tables and table variables and have internal object allocations column which gives information about anything that spills out to tempdb from the workspace memory.

Lets see how these spills can be monitored with the examples (Credits: Adam Machanic)

Create an Extended event filtering the database and choosing the events for Hash and Sort warnings by running below query:

-- Create Event Session SortandHashWarning
CREATE EVENT SESSION SortandHashWarning ON SERVER
ADD EVENT sqlserver.sort_warning(
ACTION(sqlserver.sql_text,sqlserver.tsql_frame)
WHERE (sqlserver.database_id=(8))), -- replace database_id
ADD EVENT sqlserver.hash_spill_details(
ACTION(sqlserver.sql_text)),
ADD EVENT sqlserver.hash_warning(
ACTION(sqlserver.sql_text))
ADD TARGET package0.event_file(SET filename=N'C:\Spills.xel',max_file_size=(50),max_rollover_files=(2))
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF)
GO

Enable the event and watch live data by right clicking on the extended event we created and choose watch live data.

Sort warnings:

Run the below query to see if the data spills to disk:

select top(1000) * from 
(
select top (668935)
*
from bigTransactionHistory
) as x
order by 
x.ActualCost desc
go

When you observe, no data has actually spilled to disk. What if we go ahead and change the rows and see if data spills to disk:

select top(1000) * from 
(
select top (668936)
*
from bigTransactionHistory
) as x
order by 
x.ActualCost desc
go

When you observe, we have 8 sort warnings as this query went parallel (DOP 8 plan) so sort warning was fired on each thread. If you check the duration, it is 0.3sec but previous query is 0.2seconds. There is only some difference in the duration because we only bought one row extra when compared with previous query. By the time the sort spilled to disk, most of the data is already sorted in memory, but if the getting lot and lot of data need to be spilled then the duration will be more as more data needs to be sorted in disk.

Hash spill warnings:

Lets look at the Hash spill with an example. Run the below query and watch live data in extended events.

select count(*) from 
(
select top (3800000) *
from bigTransactionHistory
) as bth
INNER HASH JOIN
(
Select top(10000)
*
from bigTransactionHistory
) as bth1 on
bth.productid=bth1.ProductID
go

This query did not spill to disk. Now lets increase the number of row we get and see the event information.

Now, you can see the hash warnings and the duration is 0.84sec. With out the spill, the duration was 0.84sec. There is no difference in the duration without the spill and with the spill. Due to the Grace hash algorithm used by the hash match iterator, we only spills some of the buckets to the disk. Hash match spills much less unless it hits the bailout stage. Did you observe, we have only 3 threads spilling to disk for the hash warning but not all 8 threads. That is because if one thread have more data then the other thread, that thread might not spill but the threads having more data can spill. In this scenario, it spilled three threads. This happens when you have skew in your data.

Lets go ahead and see that information in the execution plan for the same query by running it again.

When you observe the input number of rows passing through the Hash match iterator, by looking at the properties (F4), the number of rows for each thread is not even, the last three threads are having more rows compared to other threads. These three threads have spilled to disk and that is the reason we have seen only three hash match warnings in the event information.

Exchange spill:

Lets go ahead and add two more events to capture the exchange spill: Exchange_spill event and Lock_deadlock_chain event.

Now lets run a query that will have an exchange spill (Credits: Erik Darling)

Creating the table first

TRUNCATE TABLE DEADLOCK;
INSERT INTO DEADLOCK WITH (TABLOCK)
SELECT
  (RN - 1) / 12500
, REPLICATE('Z', 100)
FROM (
       SELECT TOP (50000) ROW_NUMBER()
        OVER (ORDER BY (SELECT NULL)) RN
       FROM master..spt_values t1
       CROSS JOIN master..spt_values t2
       CROSS JOIN master..spt_values t3
) t
OPTION (MAXDOP 1);
UPDATE STATISTICS DEADLOCK CI__DEADLOCK WITH FULLSCAN;

Now, run the below query which can give the expange spills:

SELECT t1.ID
FROM DEADLOCK t1
WHERE EXISTS (
       SELECT 1
       FROM DEADLOCK t2
       WHERE t1.ID = t2.ID
)
ORDER BY t1.ID
OPTION (QUERYTRACEON 8649, MERGE JOIN, MAXDOP 2);

Due to the data coming to the final parallelism iterator and the merging iterator as you can see the order by at the last parallelism iterator. If we have these inter query parallel deadlocks, we can go ahead and see if we have any of these merging iterators in our execution plan.

Summary:

In this part 4 of the series, we have learned about types of spills that can happen from these memory consuming iterators when the grants allocated are not enough and how to make sure we address these issues. In the next coming part, we will learn about how to solve these issues.

Thanks for reading!

Memory Grant Internals Part 3

This part is the continuation of the series to the post Memory Grant Internals part 2

We have 1gb of memory. 1GB is 128000 pages. Each page is 8kb. Multiplying 8×128000 gives 1GB. There are two default resource semaphore pools. Query memory objects (default) available and small query memory objects (default) available from dbcc memorystatus. when we add these both values 87823+4619=92442 which divided by 128000 gives 0.72 which is the percentage of the total available buffer pool.

Lets go ahead and run a query and check the execution plan for the memory grant

select top (1000) * from
(
select top (10000000)
* from 
[Sales].[SalesOrderHeaderEnlarged]) as a
order by
TotalDue desc
OPTION (MAXDOP 1)
GO

When you observe the memory grant, its 183MB. What if we go and get more rows this time

select top (1000) * from
(
select top (15000000)
* from 
[Sales].[SalesOrderHeaderEnlarged]) as a
order by
TotalDue desc
OPTION (MAXDOP 1)
GO

If you run the query select * from sys.dm_exec_query_memory_grants and see the ideal memory which sql server wanted to allocate for this query, we see 1022120kb (1GB). Depending on the 1GB of memory set as Max server memory, the requested_memory_kb is 183496kb (183MB) and used_memory_kb is 183496kb (183MB)

The memory grant for the above query is 183496/8=22937 pages of memory grant. The total available grant for the queries is 92442. We can run the above query 4 times to hit the total memory grant (22937*4=91748). We can use the SQLQueryStress tool to run the query and see the query memory grants. Let us run the query with 100 number of iterations and 10 number of threads.

While the query runs, check the memory grants by running select * from sys.dm_exec_query_memory_grants and see the queries running.

Three queries are running and others are waiting for the memory grants which is represented as the grant_time as NULL as they are waiting for their grants to be released so they can run. Only three queries are running as each query requires 150% of the requested memory grant. Once these queries completes, the other queries gets the memory grants.

As the workload groups has three importance queries. Low queries (0-4), medium queries (5-9) and high queries (10-14). By default, the default workload group has importance of medium queries.

Adam Machanic has explained with some awesome query examples of how the high cost query with low memory grants and high cost query that requires high memory grant works. Lets go ahead and look at these examples. (Credits: Adam Machanic)

Lets go ahead and run the below query which have high query cost but low memory. Run the sqlquerystress at the same time

use Adventureworks2016
go
select top (5000)
* from
(select top(1000)
p.*, b.Actualcost
from dbo.bigProduct as p
CROSS APPLY
( Select count(*) as actualcost
from dbo.bigTransactionHistory as b
where
b.ActualCost between p.productid-1000 and p.productid+1000
)as b
)as x
ORDER BY name
OPTION (MAXDOP 1)
GO

When you check the actual execution plan, the estimated cost is very high, 11479.3 sec to run this query.

But when you check the input to the sort, it only have estimated data size as 263kb which should be very less memory.

While you run the above query, run the sqlquerystress again to see the running queries and queues

When you do the select *from sys.dm_exec_query_memory_grants and see the spid number 56 is waiting and shows the grant_time as null which means the query is waiting. This query is waiting for the 1.1MB memory though the requested_memory is very less. This is due to the high query cost. Look at the queue id is shown as 9. So, the low cost queries which have queue id of 8 should all run through before spid 56 can run. The timeout in seconds is 86400 which is long time for the timeout to occur. SQL Server prioritize low cost queries to run first but as this query is high in cost and low in memory, the query still have to wait until all the low cost queries run.

Now if we have lot and lot of queries having high query cost but only need some memory to run, how do we resolve this issue? We can solve this issue by altering the default workload group and set the request_memory_grant_timeout_sec. Lets go ahead and alter the default workload group to set the timeout to 10 sec.

Alter workload group [default] with
(
--default=0
request_memory_grant_timeout_sec=10
)
go

Alter resource governor reconfigure
go

Now run the same query again with the other queries running with the SQLQueryStress and look the results by running select *from sys.dm_exec_query_memory_grants. When you observe, the timeout_sec on all queries has changed to 10sec. Now this query with spid 56 has to only wait 10sec to get the memory granted to run the query.

After 10 seconds, if you run select *from sys.dm_exec_query_memory_grants again, now the query should be running and the grant_time is no more null.

We can also view these waits for the memory grants by viewing the sys.dm_os_waiting_tasks. You can see all the spids waiting for their resource semaphore grants and waiting to run.

select * from sys.dm_os_waiting_tasks
where wait_type='resource_semaphore'

Lets run a high cost and high memory grant query and check the execution plan for the query cost (129778 sec) and memory grant (566MB). Remember, we have the total available memory only 1gb. So this query needs the entire available memory to run.

select top(5000)
* from
(select top(1000)
p.*, b.thecount
from
(select top(1000000)
productid from dbo.bigTransactionHistory
order by actualcost) as p
CROSS APPLY
(select count(*) as thecount
from dbo.bigTransactionHistory as b
where b.ActualCost between p.productid-1000 and p.productid +1000
)as b
order by b.thecount desc
)as x
order by ProductID
OPTION (MAXDOP 1)
GO

Lets execute this query along with the other queries running through the SQLQuerystress. When you check the sys.dm_os_waiting_tasks requested memory grant, its 185MB but the query need 150% of requested memory grant to get the memory grant and to be run.

Now, when I go and run the sys.dm_os_waiting_tasks again and check the granted memory, now the query has the granted memory because we hit the timeout of 10sec and as it completed this timeout, it received the granted memory of 640KB to start running the query.

So, once the query time out occurs (here in the query timeout meaning not timeout of the query but instead it waits for the query timeout seconds which is 10sec in this case and then go ahead and grant the memory as required memory to run the query. So, once the query needs to run after the timeout period, it will have to do one of these three things. If there is enough memory to run, it will run. If there is not enough memory to run even for the minimum grant we will get an error. If we have memory for the minimum grant but not enough memory for full grant, then it will scale all the way back and just grant that minimum memory to run. So this query waits until the 10sec timeout and then run with the required memory grant which is the minimum required grant to start the query to run (640kb).

Summary:

In this part 3 of the memory grant internal series, we have seen about how the query memory/work space memory works. We have also learned what will happen if we do not have enough memory. That’s when things can get complicated and now the sorts and hashes have to spill to disk in tempdb.

In the next part of the series, we will learn about how queries spill to disk when the query memory is not granted as required for the query to sort in memory. When the query spill to disk, queries runs very slower. Lets learn about these in the next part of the series.

Thank you for reading!