In the previous post, we learned about Scalar UDF Inlining feature. In this post, let’s focus on the Approximate QP with Approx_Count_Distinct feature introduced as a part of SQL Server 2019 Intelligent Query Processing features. This feature is useful to get the approximate count of distinct values just like the Count distinct function to get the distinct number of records but this new feature will take less amount of CPU and memory to process the query.
First let’s see with an example with the regular COUNT(DISTINCT Column) works and check for the number of rows as a result. For this below example, we are using AdventureworksDW2016 database.
SET STATISTICS TIME ON select count(distinct LastName) from [dbo].[DimEmployee
Now, run the same query using Approx_Count_Distinct and see the distinct count
SET STATISTICS TIME ON select APPROX_COUNT_DISTINCT(lastname) from [dbo].[DimEmployee]
We see the distinct count value is less by one when compared with the query using the COUNT(DISTINCT Column). Compare the execution plans for both the queries by choosing the select properties. When you check the compile CPU, memory and time comparing both the plans, query run with the Approx_Count_Distinct have lower values when compared with the other plan. The performance of query run with the new feature using Approx_Count_Distinct is better when compared with the query using COUNT(DISTINCT Column).
Let’s check with an another example using Adventureworks2016 database.
SET STATISTICS TIME ON select count(distinct SalesOrderID) from [Sales].[SalesOrderDetailEnlarged]
This time, try to run the query using the Approx_Count_Distinct and check for the results count
When you compare the results, there is a little bit variation in the results. The query using Approx_Count_Distinct produced 27997 more than the number produced by the query using COUNT(DISTINCT Column). This feature will produce the approximate distinct column count by taking less resources. Let’s check the resource utilization by comparing both the actual execution plans produced by both the queries side by side. Bottom execution plan is generated by the COUNT(DISTINCT Column) which took 1.084s to perform the index scan and the top execution plan generated by using the Approx_Count_Distinct took 0.374s which is very less. Also check the properties of the select to see the difference in the compile CPU, memory and compile time for both the actual execution plans. Approx_Count_Distinct query took more values when compares to the query using COUNT(DISTINCT Column).
The approximate values do not have to always be more. There may be cases where query using Approx_Count_Distinct results in less distinct values when compared with the query using COUNT(DISTINCT Column). Let’s see this with a different example using COUNT(DISTINCT Column) first.
SET STATISTICS TIME ON select count(distinct CreditCardID) from [Sales].[SalesOrderHeaderEnlarged]
Run the same query using the Approx_Count_Distinct
SET STATISTICS TIME ON select APPROX_COUNT_DISTINCT(CreditCardID) from [Sales].[SalesOrderHeaderEnlarged]
This time we had less number of distinct count using Approx_Count_Distinct when compared with previously run query using COUNT(DISTINCT Column). This column CreditcardID is a nullable column and have null values. When you compare the actual execution plan select properties, we can see this time the CPU, memory time is less for the query using Approx_Count_Distinct when compared with the query plan using COUNT(DISTINCT Column).
By using the Approximate QP feature with Approx_Count_Distinct, we can get distinct non nullable values in SQL Server with better performance than using COUNT(DISTINCT Column). In this blog post, we have seen both the scenarios showing the less values and more distinct values using this new feature. In three examples we have seen in this post, two sample queries were having performance boost while running the query using the new feature Approx_Count_Distinct with less compile CPU, Memory and time.
Thanks for Reading!