SQL Server database engine architecture

by , under SQL Server

The SQLOS was introduced in SQL 2000 and CLR intregration occurred in SQL Server 2005. The database engine has been stable for many releases with few fundamental changes. SQL Server 2016 includes the integration of an R language processor as an external process callable from the SQL Server database engine in much the same way that the CLR integration works. Microsoft indicated during the PASS Summit 2015 that the integration of Revolution R was done to facilitate the addition of other language processors in the future.
Figure 1 shows a detailed block diagram of the SQL Server database engine architecture. We do not discuss every component in the diagram but rather focus on the areas that will help you understand the relationship between SQL Server and the storage components that it accesses during operations.

The four major components of the SQL Server architecture are: protocol layer, SQLOS, query processor (relational engine), and storage engine.

Protocol layer

The protocol layer manages the communication between clients and the database engine. .NET libraries on both the client and server computers use a Microsoft defined messaging format called Tabular Data Stream (TDS) to exchange requests and results. The protocol layer encapsulates TDS in a standard communication protocal such as TCP/IP or Named Pipes. When the TDS messages that orginate from the client are unpacked from the communication protocol, they are handed off to a command processor that is part of the relational engine. Results sets sent back to the client as TDS are unpacked and processed by the client application.


SQLOS is an abstraction layer that lies between the SQL Server database engine, Windows and any external components such as the CLR and the R language processor. It is responsible for functions including thread scheduling and memory management typically performed by Windows for other applications. SQLOS does not have any abstration for I/O requests. Threads used by SQL Server issue mostly asynchronous read and write requests directly to Windows and then wait for the results.
The SQLOS queries Windows at startup to detemine the number and type of CPUs (NUMA, non-NUMA). It uses this information to create theads and schedulers to handle multiple simultaneous requests while it hides the details from Windows to prevent unnecessary context switching for long running processes.
The database engine constantly communicates with Windows through the SQLOS to request and respond to memory allocation adjustment requests. Windows is responsible for negotiating memory allocations for all processes running on the server and makes adjustments based on requrests from applications as well as its own needs and the total amount of memory available to Windows. Windows grants memory requests from SQL Server as long as there are sufficient available memory resources on the server. If Windows receives more requests for memory than it can grant, the OS may try to negotiate with SQL Server to reduce the amount of memory allocated.
The SQLOS manages its memory resources dynamically in negotiation with Windows using an internal object called the buffer pool. All memory in the buffer pool that is not used by an internal process such as the procedure cache or client connections, and so on, is allocated to a data cache used by the storage engine for buffering data and index pages. The data cache is typically the largest consumer of memory from the buffer pool. The SQLOS uses a memory broker to efficiently adjust memory allocations from the buffer pool to the many internal services that need dynamic memory allocations.

Query processor

The Query Processor, shown in Figure 1, is also referred to as the relational engine.
The main responsibilites of the relational engine are:

  • Validating T-SQL statements.
  • Parsing SQL statements by breaking them down into keywords, parameters, operators, and identifiers, and creating a series of smaller logical operations.
  • Optimizing the execution plan, which consists of finding an acceptable plan from the list of candidate plans that it determines can perform the tasks required. The relational engine estimates the cost of the processing steps based on internal metrics including estimated memory usage, CPU utilization, and the number of required I/Os based on statistics for a set of competing plans until further optimziation is determined to be more expensive than execution. The optimizer does not guarantee that the selected plan is the best but is good enough to indicate that further optimization is not warranted. Plans that are used from cache and plans that are considered trival require optimization.
  • Processing Data Definition Language (DDL) and other statements, such as SET statements, to set connection options and the CREATE statements to create objects in a database.
  • Formatting results returned to the client. The results are formatted as either a traditional, tabular result set or as an XML document. The results are then encapsulated in one or more TDS packets and returned to the application.

Storage engine

The SQL Server storage engine interacts with the relational engine to provide services to end users. From the perspective of the user and the DBA, the functioning of the storage and relational engines are indistinguishable. However, for IT professionals who design and manage applications, a basic understanding of these internals can be instrumental in understanding SQL Server behavior and problem troubleshooting.

  • The main functions of the storage engine are:
  • Managing the data cache buffers and I/O to the physical files
  • Controlling concurrency, managing transactions, locking, and logging
  • Managing the files and physical pages used to store data
  • Recovering from system faults

The relational engine decides which data satisfies a request and the storage engine makes the data available. The storage engine is also responsible for maintaing data integrity to prevent simulaneous requests from interfering with each other.

This high level time line shows how the relational engine and the storage engine work together to satisfiy a request:

  1. Data access activity begins with a query, whether it originates from a user interface or from an automated task. The data request is passed from the protocal stack into the relational engine.
  2. The relational engine compiles and optimizes the request into an execution plan. The plan consists of a series of steps that is required to retrieve and process the data into a result that the client can consume.
  3. The relational engine runs the execution plan. The execution steps that involve accessing tables and indexes are sent to the storage engine.
  4. The storage engine returns data to the relational engine where it is combined and processed into the final result set and returned to the protcol stack.
  5. The result set is sent back to the user via the protocol stack.

SQL Server logical components

The SQL Server database engine was originally designed to effectively support normalized database design. A number of enhancements, especially in the last few releases, have greatly improved performance for data warehouse workloads which are typically de-normalized.

Normalization is the process of removing redundancies from the data. Transact-SQL queries then recombine the table data using relational join operations. By avoiding the need to update the same data in multiple places, normalization improves the efficiency of an application and reduces the opportunities to introduce errors due to inconsistent data. The SQL Server logical architecture defines how the data is logically grouped and presented to the users. The core components in this architecture are:

  • Tables: Tables consist of one or more data pages. The table columns define the number and type of data that may be stored in the table. Each instance of the columns is stored as a row. The rows of a table may be ordered on the data pages and stored on disk according to the value of a clustered index key. Tables that do not have a clustered index key are stored on pages and disk in an unordered structure, also known as a heap. The storage of tables with a clustered index uses a binary tree or b-tree structure.
  • Indexes: A non-clustered index defines a key value made up of one or more columns of a table and stored as a b-tree. Additional data from the table that is not part of the index key can also be included in the b-tree. An index can speed up access to data when the data can be searched for by the value of index key. Additional performance gains can be attained if all the data required by the query is contained in the leaf of an index as either part of the key or non-key included columns. This prevents the relational engine from performing an additional lookup from the parent table.
    Indexes require additional storage and maintenance that can be non-trivial and can adversely impact Insert, Update, and Delete performance. Indexes that are rarely or never used by the optimizer have costs with no corresponding benefit.
  • Views: A view is a virtual table or a stored query. Views primarily assist application developers by reducing the number and complexity of queries to retrieve commonly accessed data that requires multiple tables to be joined, sorted and or filtered. A view may also be indexed and potentially used by the optimizer to speed up data access.
  • Stored procedures: A stored procedure is a group of Transact-SQL statements compiled and stored in a single execution plan. Coding business logic into stored procedures creates a single point of control to ensure that business rules are correctly enforced. The use of stored procedures and proper parameterization is considered a best practice because of the execution plan efficiency, and prevention of SQL Injection
    Stored procedures can also improve performance through the reuse of cached execution plans. SQL Server has an efficient algorithm to find any existing execution plans for any specific SQL statement. Cached execution plans can become stale or not be optimized for all values of user supplied input parameters. There have been enhancements to how parameters are treated for cached plans such as optimize for unknown optimizer hint. SQL Server provides
    SQL Server logical components numerous performance monitor counters and dynamic management views that you can access to determine if your stored procedures affect performance positively or negatively.
  • Constraints: Constraints are commonly used to enforce data integrity in a database including referential, data, and unique keys or indexes.
  • User-defined functions: Like functions used in other programming languages, SQL Server supports user-defined functions as named routines that accept parameters, perform an action, such as a complex calculation, and return the result of that action as a value. The return value can either be a single scalar value or a result set.
  • Triggers: A trigger is a stored procedure that automatically executes when an event occurs in the database server. DML triggers execute when a user tries to modify data through a data manipulation language (DML) event. DDL triggers execute in response to a variety of data definition language (DDL) events. These events primarily correspond to Transact-SQL CREATE, ALTER, and DROP statements, and certain system stored procedures that perform DDL-like operations.

Not all database designs follow strict normalization rules. A database that is used primarily for decision support (as opposed to update-intensive transaction processing) may not process many redundant updates. The database structure may be more understandable and efficient for decision support queries if the design is not fully normalized. De-normalization can also help the query optimizer be more efficient for typical data warehouse queries.

The impact of database design on performance cannot be overstated. Databases that are not normalized are a more common design problem for OLTP workloads than having data structures that are over-normalized. Starting with a normalized design and then selectively de-normalizing tables for specific reasons may be the best strategy.

SQL Server physical components

The SQL Server physical components determine how the data is stored in the file system of the operating system. The selection of the number and types of table columns and index design has a major impact on the requirements for physical storage.

Database file types

SQL Server uses three types of files:

  • Primary data files: Every database has one primary data file that stores data as well as information about other files used by the database.
  • Secondary data files: A database can have zero or more secondary data files. Secondary data files are not required, and a database can have many secondary files or none. By convention, a secondary data file has an .NDF extension.
  • Log files: Each database has at least one or more log files independent of the number of data files. Log files store the write ahead transaction log information that is needed to recover transactions for the database. By convention, a transaction log file has an .LDF extension.

Data files

Data files store the 8K pages used by SQL Server. A SQL Server page is the basic unit of logical data storage. A page begins with a 96-byte header that contains system information about the page. The disk space allocated to the primary or secondary data files (MDF or NDF) is logically divided into pages.

The most common types of pages that can be allocated to a data file are:

  • Data pages
  • LOB pages
  • Index pages
  • Page free space (PFS) pages
  • Global allocation map and shared global allocation map (GAM and SGAM) pages
  • Index allocation map (IAM) pages
  • Bulk change map (BCM) pages
  • Differential change map (DCM) pages

Figure 2 shows the relationship of these major page types in a data file:



Extents are the basic units for allocation of space. Each extent has eight physically adjacent pages (64 KB). A new table or index is usually allocated pages from mixed extents. Uniform extents are used for subsequent allocations after the object grows beyond eight pages.

Some SQL Server features use extents. For example, database snapshots allocate new space in their associated NTFS sparse files using extents, even if only one page in the extent has changed. This allows SQL Server to put pages for subsequent changes from that extent in an adjacent location on disk to improve performance when reading from the sparse file. Also, when using differential backups, SQL Server uses the differential change map pages in the data file to identify any extents that have been modified since the last full back. The backup engine then copies those changed extents to the differential backup file. On restore, the full backup is used to create a new copy of the database and then changed extents that are stored in the differential backup are used to overwrite the extents from the full backup that have changed.

File groups

Every database has a PRIMARY file group. Most databases have only the PRIMARY file group and one data file. User-defined file groups can be created to group data files together for administrative, data allocation, and placement purposes. A file group can contain one or more data files.

At any time, one file group is designated as the default file group. When objects are created in the database without being assigned to a file group, they are assigned to the default file group. The files in the default file group must be large enough to hold any new objects not allocated to other file groups. The PRIMARY file group is the default file group unless it is changed by using the ALTER DATABASE statement.

Historically, when SQL Server was deployed on servers that had a limited number of direct-attached disks, DBAs used multiple file groups and files to spread I/O across physical disks. It was common to create large tables and their corresponding indexes on different file groups so that the files could be allocated to different physical disks.

With the widespread adoption of RAID controllers and intelligent storage arrays, the added complexity of such detail object placement does not lead to better performance. There are, however, several reasons to use multiple files and file groups for a database including:

  • Use of In-Memory tables requires the creation of a Memory Optimized file group.
  • Filestream requires a filestream file group.
  • Create multiple equal sized files in one file group because of a high page allocation rate. SQL Server will spread new allocations for objects created this way using a proportional fill algorithm. This technique is widely used for TEMPDB data files for instances that have high object creation and deletion rates. This can alleviate wait times associated with page latch waits on the data file allocation pages. It can also be necessary for user databases that have similar characteristics.
  • Separate table and index partitions into read/write and read-only file groups for data warehouse applications. Read-only file groups only need to be backed up once. Subsequent backups can ignore the read-only file groups because that data does not change.
  • SQL Server Enterprise Edition includes a feature that allows for piecemeal recovery of the database using multiple file groups. On restore, SQL Server Enterprise Edition can make the database available after the primary file group has been brought online. The other file groups can be brought online in any order.
  • Use file group for partitioned table management. Partitions can be assigned to different file groups using different classes of storage. Partitions can be switched in and out of the table for better data loading and archiving.

Transaction logs

The transaction log records changes made to a database and stores enough information to allow SQL Server to recover the database. Recovery reconciles the data in the data files with changes recorded in the transaction log. The recovery process happens every time the server instance is restarted and optionally when a database or log restore occurs.

Physically, the transaction log consists of one or more files configured as a “circular log”. If multiple log files exist you can think of them as being a single concatenated space. There are no parallel log operations when multiple files are used and typically no performance advantages. The use of multiple log files usually occurs when the existing log file needs to be extended but there is no allocated space on the Windows device. By adding another log file, the additional space is available to the log writer.

The buffer manager in the storage engine guarantees that the log file will be written to before the change is made to the database (called write ahead logging). Writes to the log are asynchronous, however, the storage engine must receive a successful response from the log write operation at the end of a transaction before acknowledgement is made to the client that the request was successful. The log write buffer can hold up to 60K of log data. The buffer is flushed at the completion of each transaction or when the buffer is full, whichever occurs first.

Information in the log is stored in variable length records that are uniquely identified by a Log Sequence Number. The log file contains information regarding:

  • The start and end of each transaction
  • Data modifications
  • Extent and page allocations and de-allocations
  • Creation and elimination of a table or index

Each log file consists of a number of virtual log files based on the initial size of all transaction log files and the growth increment set for auto expansion. Virtual log files are the smallest unit of truncation for the transaction log. When a virtual log file no longer contains log records for active transactions, it can be truncated and the space becomes available to log new transactions. Figure 3 shows how circular logging uses multiple virtual logs in a single physical file.



Source : Microsoft SQL Server Best Practices and Design Guidelines for EMC Storage Solution Guide – Part number H14621 (December 2015, EMC Corporation)

Leave a Reply