With the Invitation of Brent Ozar for this month T-SQL Tuesday, I would like to share some thoughts on least favorite data type in SQL Server. Thank you Brent for hosting March month of T-SQL Tuesday.
Though VARCHAR(MAX) is suitable in situations with large strings of data, it has its own complications that we need to consider. In my career as a DBA, I at least saw couple of times SQL developers using VARCHAR(MAX) when they should not. Fixing the datatypes once in production is painful and causes risks.
When the data is stored in VARCHAR(n) datatype column, these values get stored in standard data page. VARCHAR(MAX) uses IN_ROW_Data up to 8000 bytes of data but if the data in more than 8000 bytes, it uses LOB_Data page and a pointer (reference) is stored in the IN_ROW_Data_page. We cannot create an Index on the VARCHAR(MAX) data type as a key column in the index. You can add the column of VARCHAR(MAX) as an included column but that wouldn’t be seekable and will duplicate the column completely which is lot of storage. We cannot compress the LOB data and the data retrieval from the LOB data is relatively slow. If the LOB data is stored as IN-ROW, it can be compressed but only when we compress the table with the PAGE level compression. LOB data cannot be compressed if you use ROW level compression and doesn’t depend if the LOB data is stored as in row or out of row.
Any variable you declare in VARCHAR(MAX) as the datatype will automatically spills to Tempdb which can cause performance problems. If you are using the VARCHAR(MAX) datatype column in your queries and sorting them, that requires so much memory to sort this data which may cause the memory issues. For selecting data having this datatype more than 8000 bytes can cause several off page I/O’s.
If you are running queries and searching on column, that column should always be used as a fixed length column instead of VARCHAR(MAX) data type. If your row size can be managed to be less than 8000 bytes, that will store the data IN_ROW_Data_page which can avoid any page splits and avoids locking or latching during the updates. Limiting the length of the strings is really important for designing any database. Using VARCHAR(n) is recommended over VARCHAR(MAX) when ever possible.
I am really looking forward to what other SQL family members have to say about their least and the most favorite data types!