T-SQL Tuesday #131: SQL Server Statistics Analogy

T-SQL Tuesday Logo - A blue database with a calendar showing days of the week wrapped artfully around the database.

With Rob Volk Invitation to October month of T-SQL Tuesday for the Data Analogies, I tried thinking of an analogy for SQL Server Statistics.

Here is my analogy:

For example, One of our friends requested us to volunteer for organizing their kid first birthday party. You and me, we both are more than happy to help. We are so excited to celebrate it with all our friends inviting as many people as possible.

We gathered invitee list together making thousand people in total including adults and children. We planned to create a RSVP list with some beautiful invitation link requesting all the invitees to submit the RSVP with head count number for girl kids and boy kids along with the head count number of families without kids. We sent the invite two months before the event date expecting everyone would submit the RSVP. Based up on the number of head count (girls, boys and families without kids), we want to purchase the return gifts.

Girls get Barbie dolls, boys get Car toy and families without kids get a nice home decor set.

We waited for the RSVP submissions and gathered the results five days before the event date so we can shop for the return gifts. Out of thousand invitees, we received only three hundred RSVP’s. We believed that is the only head count we will see on our event date and we just bought return gifts and food items for three hundred people.

On our surprise, we see thousand people on the event date at the party since the party is on a weekend. We now understood that many people were actually willing to come to the party but for some reason, they did not had chance to do the RSVP. Now, we are all tensed up not knowing what to do.

This is not at all what we expect would happen. We expected something and what actually happening at this event is something else. Everything got messed up now. There is stress everywhere more than fun in the party. We immediately requested some of our friends to order for more dishes, asked some other friends to grab some random gifts from near by store. Many invitees were waiting for their turn to get the food on their plate. People frustrations were clearly seen on their faces. One situation carried to next situation in the event making the whole event a disaster. We anyway passed the event but with no satisfaction. We should have had good estimates of the head count to have the good party plan.

Instead, if we have got the good percent of RSVP’s like around 80% of the total thousand invitees, we would have been in the better shape with the plan. 80% of thousand is 800 people. This would have given us the better estimate close to the actual estimate of the head count. What if 100% of the invitees submitted the RSVP, that would be wonderful, right? Estimation would have been exact match with our actual head count in the party. WOW, that is so good to be true. There is no one single party I have seen where all invitees submitted their RSVP’s. There are always people missing RSVP’s. It is always still good to have the hundred percent estimate though. From invitees point of view, it takes time and effort to make sure they all update the RSVP’s to get the exact estimate.

For making this event successful, we should have been a little cautious while planning for getting these RSVP’s updated by the invitees. We would have send them the notification for every few days or once every week depending up on the number of invitees list. The more invitees we add to the event and the more updates we do to the list, the more frequent notification emails for our RSVP’s should be.

Now we know what we have missed. As we always learn from our past mistakes, we now learn we would have had send the notifications regularly so we get the RSVP’s updated and so we get close estimation which ultimately produces a good party plan.

What if the same concept applies to the SQL Server statistics?

SQL Server optimizer always get the estimate from statistics for how many number of rows (Invitee column statistics) it might expect to get. How does it do that? It will gather and trust the estimates being correct from statistics. Invitees is the column statistics here and RSVP’s are our statistics updates. Our notifications to update the RSVP’s is same as updating our Invitees column statistics using sql agent update stats job or through AUTO_UPDATE_STATS option enabled or through Ola hellengren scripts.

From the estimates of our RSVP’s, we ordered food and gifts for only 300 people out of 1000 people. In the same way, SQL optimizer will allocate less memory based up on the estimates. Because of the less number of the gifts we bought, we requested other friends to go and purchase gifts immediately from the near by gift store. This caused stress on our friends and they did not find all the gifts they need anyway. This made the whole party plan go bad. In the same way, due to the bad statistics estimation, Sql optimizer allocates less memory to process the query which is not sufficient and so it will finally have to spill to the disk. One bad column statistics estimate carries the stress to other operators in the query plan effecting the other operators in the plan and eventually effect complete execution plan.

To solve this issue, Sql Server invitees column stats should be updated regularly.

Depending up on the number of inserts and updates happen to this invitees column in the table, we have to plan for updating these statistics as well. As it takes energy and time for most of the invitees to go and update the RSVP’s, it takes resources and time for updating the invitees column statistics as well.

In order to get good estimates, RSVP’s should be updated as much as possible. If there are not many invitees that are being updated frequently, decent percentage of RSVP’s should be always good for good enough party plan. In the same way, decent amount of percentage stats update should be good enough to generate good plan. If updates happens regularly to the invitees column, frequent statistics updates is recommended.

It is always best to set the AUTO_UPDATE_STATISTICS option enabled. With this option enabled, Sql Server will go ahead and update these statistics automatically in the background.

For the versions SQL Server 2014 and below, the auto update stats kicks in when the threshold hits when 20% of the records gets updated+500 record change but for the later versions SQL Server 2016 and above, auto update stats kicks in when the threshold hits sqrt(1000 * total number of rows) which is way small than the older threshold.

Bottom line from this post is,

The better the estimates, the better the party plan will be 🙂

T-SQL Tuesday #130 – Automate Your Stress Away

T-SQL Tuesday Logo - A blue database with a calendar showing days of the week wrapped artfully around the database.

With Elizabeth Noble Invitation, this will be my first post for T-SQL Tuesday. I am very excited while writing this blog post. Thank you so much Elizabeth for sending the invite.

With the current COVID situations, I believe there would be no one on this planet without stress. I wish we had a fast forward stress relief button which can automatically remove the stress from our lives.

With new challenges we face every day comes with new opportunities. While we travel the journey making these challenges into the opportunities, we will have chances to take some smart decisions and actions making our lives easier. Automation is one of them.

As new technologies emerge, automation becoming the key. The tasks which take longer time before can be completed in just few minutes by using automation.

In my everyday job, there are many tasks that I have to do repeatedly across many servers. These tasks can be very simple but time consuming when you are dealing with some hundreds of servers. Automating these tasks can be very helpful. Below are some of the simple tasks which I automated and saved lot of time:

  1. We have several databases that are being replicated between different SQL server versions. This includes SQL server 2008, all the way up to SQL server 2016 versions. There are some databases still on SQL 2000 compatibility level on SQL 2008 servers. Sometimes, replicating these databases to other versions of SQL servers is a challenging task. When ever we had to take a fresh snapshot and replicate, distribution job fails with errors related to default setting options(example, ANSI_PADDING set to OFF when the option should be ON) on the .sch files the snapshot generates (.sch file contains the script to create the table and the replication specific stored procedures on the subscriber). It is a very tedious task to go find all .sch files having this setting as OFF and manually change the setting to ON. So, I used a powershell script in the snapshot job added as a second step after the first snapshot creation step that changes the ANSI_PADDING setting from OFF to ON on all .sch files. When we had many databases having the same issue, automating this task saved me from so much stress.
  2. Once, I had to change the owner of the SQL agent jobs to sysadmin account on around 300 servers at a time. This is a stressful task if I had to do it manually one by one server. This will take me at least few days to complete for all 300 servers. I used powershell script by using commands to get the server list from the text file so the same SQL query runs on each SQL server mentioned in the list. This took me few minutes of time to complete for 300 servers. Imagine how much time this process would have saved me.
  3. There are other instances where I use powershell automation. If I had to give permissions for logins on multiple servers at a time, automating this task using powershell really helps.
  4. I had to convert around 200 servers between SQL server editions with out losing data and the databases. I used powershell script to do this task. If I had to do this manually by uninstalling and installing different edition takes me so much time and effort.
  5. In my work environment, there are many non-production servers. We do make sure all these databases are in simple recovery mode. Developers do create and drop databases on non-prod environments for their testing purposes with full recovery mode. I created a SQL agent job which will check the database recovery modes and send me a report of database names which are not in simple recovery mode. This job runs every day at the scheduled time.
  6. I use automation for documentation purposes too. For example, gathering all SQL agent jobs, gathering login permissions etc; from multiple servers using SQL agent jobs.

These are some of the examples of how I use automation in my every day work life. I would like to know how automation is helping others in their day to day work life.

A single valuable advice can change a Life!

There are few things in Life which money cannot buy!

You may have heard this sentence many times in your life. A real true advice can change a person perspective of viewing the life differently. Sometimes, a sincere advice which you do not want to hear can take you places you never knew exist. When you have an open mind to see the life as it is, you are already accepting the opportunities life gifting you. In today’s world, a sincere true advice is very hard to find. If you find one, you are really lucky because now you can take that advice and think mindfully of where that advice can take you.

I was not with this mindset few years ago. As life throws at me the challenges, I struggled first, was angry next thinking, why me? I couldn’t bear my mind chattering with all these negative thoughts taking me no where. Instead, these negative thoughts pulled me down, caused self doubt and I lost confidence in myself. While I am still at this stage of life, something magical happened. Something triggered me to think very deeply about life and that completely changed my perspective of Life.

A single valuable advice changed my life forever.

Here is the advice I received:

“The only limits in our life are those we impose on ourselves. We, in our mind believe we are not worthy enough, limit ourselves in not dreaming big and think we are not worth of achieving big things in life. If you can remove that limit which you put on yourselves, no one can stop you except you! If you have the will power and intensity of putting your dreams in to actions, no one can ever stop you. Take each challenge life throws at you and instead of dwelling on why this happened to me, take the same challenge, put it in a positive way and prove yourself by facing the challenge and see where this takes you in your life”

This is the most valuable advice I received from Tom Bilyeu and Bob proctor. I summarized this in my own words.

I was so desperate to find a solution to my problem and I tried to figure out myself by checking on the motivational videos on youtube to remove the negative mind chattering. I came across “Impact theory” by Tom Bilyeu and “Proctor Gallagher Institute” by bob proctor where I received this valuable advice. In my view, these two motivational speakers are my mentors. The communication might not be in person but the way they made an impact on my life is incredible. Nothing can buy a valuable advice from a true mentor who want to change your life in a better way.

From the day I received this advice, I started viewing life differently. It’s not someone who stopped me from achieving something, its me who limited myself. There is no change in life throwing challenges at me, they remain the same but what changed is me taking each challenge as an opportunity for my growth. Instead of worrying about my troubles, I started thinking how can I grow from my difficulties in life and see them positively. As I see life positively, Positive things started happening to me.

I thought of getting better at my skills as a SQL Server DBA. I had a thought, how about I learn and share my knowledge with others so others can benefit along with me. That thought triggered me to start this blog. I started writing blog posts of challenges I face in my daily work and how I fix them. I started reading Brentozar blog posts. I wanted to open twitter account just to see what Brentozar post about his blog posts so I can be updated on his content. I was so much impressed on how much knowledge he have on subject and the way he share his knowledge with other people. I then see on twitter, many professionals talking about SQL Server where I figured out there is something with hashtag #sqlfamily, which I found and felt is a completely different world. I see so much positively here and I liked how much people are sharing and caring for each other. I started seeing people posting about presentations, blogs and what not! I am already loving these positive vibes #sqlfamily spreading around. One day, I saw the post on Introducing “New Stars of Data” conference especially for new speakers. This event was hosted by Ben Weissman and William Durkin. I have the opportunity but I have fear of speaking in public. I thought, I can only grow if I face my fears. I immediately submitted abstract. Luckily, I was selected. I was given a mentor, Deborah Melkin. In the beginning, my self confidence was so low until I met my mentor Deborah. I always had a thought that people will judge me for who I am, but no! that’s my thinking. Not True. Deborah is my second mentor in changing the way I look at life and myself. She build a confidence in me with out me actually realizing it until I gave my final presentation. This blew my mind as I realized I am actually capable of speaking which I did not think even possible. Is that not a true mentoring meaning? Deborah changed the way I looked at myself and life. There are many people out there who change other people lives in a positive way without them actually realizing it.

I always have many questions in my mind. I have interest in learning more and share with others as I learn. As I gave my first presentation, I started submitting abstracts to other conferences but something I know I am missing. I do not know what it is. As I am fairy new in speaking at conferences, I have lot to learn. I faced my fear of speaking and I do not want to stop here. I truly feel this is the first step to my next steps. I want to learn and involve in community more. The more I get involved in the community events, the less I worry about the things that do not matter. A month ago, I was talking to my family member and I said, how wonderful it would be if someone from the SQL community can give me a career advice and mentor me on how I can grow as a DBA.

There is a saying, “You will always find what you are looking for!” As I was searching and keeping my eyes wide open for any opportunity that will help me, I see a post from Paul Randal “Mentoring 2020 – want some help from me?” I am like, now this is what I am looking for! I started reading the post and thought no matter if I get selected or not, I am going to apply for sure. If I am selected, I will have a very good opportunity to get valuable advice that can change my career in a better way. If I am lucky to get the opportunity to have a very well known and experienced person in the Industry as my mentor, that will really help me in solving some of my questions I always had.

I have many questions running in my mind that I need to get some clarity on. Some of them, I cannot even describe here. I need to get started by explaining one by one to get the flow.

If I am lucky enough to be chosen, I need help on some of these below non-technical subjects:

  • Presentation skills and improve public speaking.
  • Community exposure. I am interested in involving more in the community.
  • Work and life balance
  • Getting better at Blogging
  • Goal setting
  • Improving my value as a DBA.
  • How can I develop myself so I can mentor others.
  • Improving self confidence
  • Communication skills
  • Dealing with stress.

A single valuable advice from this mentor can have a real positive impact on my life. I am ready to make any number of changes in my life to become a better person. I am really looking forward to this.

Closing this post by saying this again, as I truely believe it-

There are few things in Life which money cannot buy!

Replication Error: ‘Cannot execute as the database principal because the principal “dbo” does not exist,this type of principal cannot be impersonated, or you do not have permission.’

This error is caused Intermittently by the Log reader agent in the transnational replication. When you observe the error, the database principal ‘dbo’ associated to the login mapped to the ‘dbo’ user cannot be impersonated. The database owner or login associated to ‘dbo’ user are not ‘sa’ but other windows authentication/SQL authentication login.

Solution:

  1. Stop the Log reader agent
  2. Change the ownership of the database using the below query:
ALTER AUTHORIZATION ON DATABASE::Databasename TO sa;
GO

3. Restart the Log Reader Agent.

Conclusion:

It is always advisable to map the ‘dbo’ user of the login to ‘sa’ to avoid the intermittent issues with Log reader agent in Transnational Replication.

Cannot Use Special Principal ‘dbo’ Error: 15404

Everything seems to be normal until someone yells at a DBA asking why they couldn’t give database role permissions to a login due to this error.

This error occurs when particular login in question is associated to ‘dbo’ user in the database. In other words, this particular login is the ‘db_owner’ of the database which is causing the issue.

This error occurs while we try to give any database role permissions to the login associated with ‘dbo’ user

I recreated the error by creating a new database name ‘Test’, create a new login with name ‘TestUser’ and giving the authorization to the newly created ‘testuser’.

CREATE DATABASE TEST
GO
USE [master]
GO
CREATE LOGIN [TestUser] WITH PASSWORD=N'@eE$9@', DEFAULT_DATABASE=[master]                
GO
ALTER AUTHORIZATION ON DATABASE::[Test] TO [TestUser]
GO

In the Login ‘TestUser’ properties, database mapping tab, you should see the login associated to user ‘dbo’

I then tried to add a database role ‘db_datareader’ to the database Test which throws an error

Solution:

Change the database owner to ‘sa’, then create a user associated to the login in the Database and then add the login to database roles with the new created user.

USE [Test]
GO
-- Change the database owner to 'sa' account
ALTER AUTHORIZATION ON DATABASE::[Test] TO [sa]
GO
USE [Test]
GO
CREATE USER [TestUser] FOR LOGIN [TestUser]
GO
USE [Test]
GO
ALTER ROLE [db_datareader] ADD MEMBER [TestUser]
GO

Conclusion:

These kind of errors seems to be tricky in the beginning, but once we understand the reason behind it, its pretty easy to solve. These kind of errors are caused by couple of reasons including database migration from lower environments of sql server. When the database is migrated to latest versions, the user ‘dbo’ user was not associated with ‘sa’ account. The other reason is when the authorization of the database is changed to specific login, ‘dbo’ user is associated to the login. It is always important to check which login is associated with the ‘dbo’ user in the database. You can check that using the below query:


--check which login is associated with 'dbo' user
select ab.name as DatabaseUser, cd.name as SrvrLogin, ab.type_desc, ab.owning_principal_id, ab.authentication_type_desc
	From sys.database_principals ab
		inner join sys.server_principals cd
			on ab.sid = cd.sid
where ab.name = 'dbo';

The result of the query looks like this:

Hope this helps!

Downgrading SSIS packages by changing the package version, why it might not be the right choice!

While we upgrade the sql server, we do upgrade the DTS/SSIS packages to the latest versions compatible with the upgraded sql server version. SSIS packages for sql server 2008/2008 R2 are build by Business Intelligence Development Studio (BIDS) and sql server 2012 or higher are build by using SQL server Data tools.

In some situations, Developers build SSIS packages using latest version of Data tools thinking it can be compatible with lower versions of sql server. I recently faced this situation where a developer build a package in 2012 version of data tools and tried to run the package in sql 2008 through sql agent job.

Changing the package version seems to be an easy fix, but usually its not!

Here’s why?

Once you change the package version by right clicking the dtsx package>open in notepad and change the version from 6 (sql 2012 version number) to 3 (sql 2008 version number), the package will be able to run on sql 2008 via sql agent job. But what about the execution results? Are they executing properly? Not always.

To test this scenario, I created a SSIS package using sql 2008 BIDS with name package.dtsx. This package has a execute sql task which executes a simple error raising script to see how these versions handle errors.

Below is the piece of code I use in the execute sql task for both versions of packages

Raiserror(‘Something went wrong’,16,1)

Note: For both the SSIS packages created, in the Execute SQL task properties> set both FailPackageOnFailure and FailParentOnFailure to ‘True’. This will make sure the package fails if the Execute SQL task fails.

Once I created the package, I tried to execute this ‘package.dtsx’ ( created by using sql 2008 BIDS) on sql 2008/2008 R2 using sql agent job. Ideally, it should throw an error.

Executed the job and as expected, job failed. This is an expected behavior.

I Created the same package in Data tools 2012 with name ‘package_2012.dtsx’. Open the package in notepad and Change the package version 6 to version 3 and save the file.

This image has an empty alt attribute; its file name is capture-2.png

Now run this ‘package_2012.dtsx’ using sql server agent job

Execute the job and check the results, it get executed successfully. In this scenario. this is a wrong behavior. As Execute sql task has raiseerror code in it, package should throw an error, but it didn’t.

Conclusion: By just changing the Package version from higher to lower version numbers may allow lower versions of sql server execute the package through sql agent job but It doesn’t mean that the package is executed efficiently. In this scenario, package should fail by throwing an error but it doesn’t fail and executed as success. It is always better to create the SSIS packages using BIDS or Data tools based on the sql server version on which the package should run.