Data Page Internals: Structure of the Data Row

When I was preparing for my presentation “Method Behind the Magic of Internals in SQL Server”, I had a plan to write a blog post about the Data page internals and I am finally able to write it. Here are some little nuggets about Data page internals in SQL Server.

Black and Red Typewriter on White Table

SQL Server stores the data in the form of 8kb pages. Each page is interlinked between to the previous page and the next page number. There are two types of pages in SQL Server where data gets stored. Pages that contain the actual data and pages that contain the data about all other pages, meaning pages about the metadata information. We will have a separate blog post of types of pages we have in SQL Server. No matter how many blog posts we read on this topic, it never gets old 🙂

Lets quickly see the structure of a page.

This image has an empty alt attribute; its file name is image-27.png

Page Header is always 96bytes in size. This area contains the metadata about the page itself like the page number, previous page, next page and much more metadata information about the page (A separate blog post about this in near future).

Pay load area is where the actual data gets stored in the slots. When you check the bottom of the page, that’s the Row offset array which have the 2-byte pointers pointing to the exact space where the corresponding row begins on the page.

Data inside the payload area might not always be in the physical order of the data rows but always be logically ordered in the row offset array in the index sort order. SQL Server always read the row offset array from the right to left. The first slot array is 0 which have a pointer pointing to the lowest key value on the page and the second slot array has the pointer pointing to the next lowest key value and so on.

There are main datatypes in SQL Server, Fixed length and the variable length data types. Fixed length data types always store the same space regardless of having null or not null values. With the variable data types, SQL Server will use the required space and additionally two extra bytes.

Each byte you see in the picture has a purpose. The first two blocks containing a byte, Status Bit A and Status Bit B contains the bitmap information about the row, like if the row is logically been deleted/ghosted, row type information, versioning tag, if the row contains any NULL values, Variable length columns. The next 2 bytes is used for storing the length of the fixed length data. The next n bytes are for storing the fixed length data itself. There is a null bitmap after that which will have both the 2-byte column count in the row and null bitmap array. Regardless of if the column in null or not, each and every column will have one bit per every column.

Next is for the variable length data portion where the first 2 bytes here is for the number of variable length column in the row. The next n bytes is for the offset array of the variable column. Even when the value of the column is Null, SQL Server will still stores the offset value of 2-bytes per each variable length column. Next is the actual data of the variable length data. Finally, the 14 byte size versioning tag especially for the row versioning. This is used for the optimistic isolation levels and for the index rebuilds.

Let’s go ahead and create a brand new table and insert two rows to see how they actually look like. By using the DBCC command, let us first view all the pages for the table.

use dbanuggets
go
create table dbo.dbanuggets
(
    ID int not null,
    nugget1 varchar(255) null,
    nugget2 varchar(255) null,
    nugget3 varchar(255) null
);

insert into dbo.dbanuggets(ID, nugget1, nugget3)  values (1,replicate('d',10),replicate('b',10));
insert into dbo.dbanuggets(ID, nugget2) values (2,replicate('a',10));

/* dbcc ind ('databasename','dbo.tablename',-1 /* Displaying info about all pages from the table */);*/

dbcc IND ('dbanuggets', 'dbo.dbanuggets', -1)

To view the page header and the column information, Turn the Traceon(3604):

dbcc traceon(3604);
dbcc page
(
    'databasename' --databasename
    ,1 -- File ID
    ,11712 -- Page ID
    ,3 -- Output mode: 3 - display page header and row details
);
dbcc traceon(3604);
dbcc page ('dbanuggets',1,360,3)

When you check the first record here, it has a column Pagetype as 10 which is a metadata page IAM (Index allocation map). This page will track what all pages belongs to a particular table. The next record which have the PageType as 1 is the actual data page which contains the data rows.

There are two pages that belong to the table. The first one with PageType=10 is the special type of the page called IAM allocation map. This page tracks the pages that belong to particular object. Let’s not focus on it now – we will cover allocation map pages in one of the following blog posts.

Check the row structure here:

When you see, the 3rd one in the row is a two byte value of 0800. This is a byte swapped to the value 0008 which is the offset tells the SQL Server where the fixed-length part of the row ends and offset for the columns in the row. Next 4 bytes is used to store the ID column here which is a fixed length data. The next 4 bytes are used to store the fixed length data, the ID column. Next, the 2 byte value shows how many columns we have, here we have 4 columns. Next, we have one byte Null bitmap. As we have 4 columns, we have one byte bitmap. The value shown here as 04 which is 00000100 in the binary format. It represents that the third column in the row contains NULL value. The next 2 bytes shows the number of the variable length columns which is 0300 in the swapped order which means 3 columns. Next contains the offset value where the column data ends. Next comes the actual data for the variable length columns.

second row look likes this:

Row structure is a bit different here:

When you observe here, the nuggets1 and nuggets3 columns are having null values. The NULL bitmap in the second row shows the binary value 00001010 which tells that the nuggets1 and nuggets3 columns are NULL. In the variable length columns indicates there are only two columns. SQL Server will not have the information about the NULL variable length column in the data row. We can manage the size of the data row by making sure we add the null value columns at the last when declaring the table definition and this is when we use the variable length columns that will have many null values.

There is 8,060 bytes available on the single data page. Null bitmap is always there for the heap tables and clustered index leaf level rows even when the table does not have the nullable columns. If there are no nullable columns in the index, then the non clustered indexes do not have bitmap values.

Summary:

In this post, we have learned how the data page look like and the different parts of the data page. We have also learned about how the data row structure look like with the examples. How having null values doesn’t take space in the row data but then managed by the SQL Server as the Null bitmap value. I will be posting more about the Page internals in the future blog posts.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s