Minimally logged operations

Today, we’re going to look at which database operations can be performed with what’s known as “minimal logging”, which can result in a considerable performance boost. Included is a quick overview of transaction logging and what it’s good for.

Practically all operations in SQL Server are logged. This is good for a number of reasons, the most important one being that a failed or cancelled operation can be rolled back to its original state, which is critical to maintaining a database’s integrity. In databases with a FULL recovery model, this also enables you to restore the database to any given point in time, if you’d like to, because you’ve got a complete chronological record of all the changes made. All this logging happens in the transaction log of your database. If you have a FULL or BULK_LOGGED recovery model set and perform regular transaction log backups, the transaction log is cleared (“truncated”) after each transaction log backup. If your recovery model is set to SIMPLE, changes are still logged, but the log the log is truncated at each checkpoint, so you don’t have any “trail” of your operations, and you can’t go back and undo them. Instead, this will potentially save you a lot of disk space and you won’t have to make backups of the transaction log.

Minimally logged operations

Not all operations need to be logged with full detail, though. Suppose you want to bulk insert a lot of data into an empty table, on which you’ve placed an exclusive table lock. If your insert crashes at any point (datatype error, duplicate key, whatever), you don’t need to know the specifics of all the data you inserted, just to roll back your operation (i.e. to delete everything from the table). All you need to know is that the table was empty when you started, and that nobody else has meddled with it while your insert was running. So you can pretty much empty the table and you’re back where you started. This is a simplified example of a minimally logged operation – the data itself isn’t logged row-by-row, only the fact that you started an insert and that the table was empty when you did. This can spell the difference between many gigabytes of logging and just a few 8 kB pages, which makes a potentially huge difference in query performance.A few things to keep in mind:

For an operation to be minimally logged, the recovery model of your database must always be BULK_LOGGED or SIMPLE.
Transactional replication cannot be used, because it uses the transaction log to identify changes.
Backups use LSN (log sequence numbers) to identify which transactions to include in the backup. While a backup is running, you won’t be able to use minimal logging.

Types of minimally logged operations

The following operations can be minimally logged. Mind you, this is not a complete list – there are other operations, but I wouldn’t want to confuse you with all the details.

Bulk insert, bulk copy and INSERT INTO … SELECT

These three methods are minimally logged if a number of strict criteria are met. The official MSDN article on the different conditions that need to be true is a bit complicated, so I’ve made an attempt at translating it to T-SQL. The following query will show you which tables in a database can use a minimally logged bulk insert:

WITH sysTables (objectName, isPublished,
                isClustered, hasNonClustered, hasData)
AS (
    SELECT sch.[name]+'.'+tbl.[name] AS objectName,
    --- If the table is published for replication:
    tbl.is_published AS isPublished,
    --- If the table has a clustered index:
    MAX((CASE WHEN ix.type_desc='CLUSTERED' THEN 1 ELSE 0 END)) AS isClustered,
    --- If the table has non-clustered indexes:
    MAX((CASE WHEN ix.type_desc='NONCLUSTERED' THEN 1 ELSE 0 END)) AS hasNonClustered,
    --- If the table contains any rows:
    (CASE WHEN parts.[object_id] IS NOT NULL THEN 1 ELSE 0 END) AS hasData
    FROM sys.schemas AS sch
    INNER JOIN sys.tables AS tbl ON sch.[schema_id]=tbl.[schema_id]
    INNER JOIN sys.indexes AS ix ON tbl.[object_id]=ix.[object_id]
    LEFT JOIN (
        SELECT DISTINCT [object_id]
        FROM sys.partitions
        WHERE [rows]!=0) AS parts ON tbl.[object_id]=parts.[object_id]
        GROUP BY sch.[name], tbl.[name], tbl.is_published, parts.[object_id])

SELECT objectName,
    --- If data pages are minimally logged:
    (CASE WHEN isPublished=1 THEN 'NO - is replicated'
          WHEN isClustered=0 THEN 'YES'
          WHEN isClustered=1 AND hasData=1 THEN 'NO - has clustered ix & data.'
          WHEN isClustered=1 AND hasData=0 THEN 'YES'
     END) AS dataPages,
    --- If index pages (non-clustered) are minimally logged:
    (CASE WHEN hasNonClustered=0 THEN '-'
          WHEN isPublished=1 THEN 'NO - is replicated'
          WHEN hasData=0 THEN 'YES'
          WHEN hasData=1 THEN 'NO - has data.'
     END) AS indexPages
FROM sysTables
ORDER BY objectName;

The two columns dataPages and indexPages represent whether actual data page updates or index page updates respectively are minimally logged. Apart from the above conditions, you’ll also need to establish a table lock, using the TABLOCK hint.

Note that INSERT INTO … SELECT is only minimally logged as of SQL Server 2008 and newer.

Trace flag 610

Actually, minimally logged inserts can be performed on tables that already have data and a clustered index, using a trace flag.

DBCC TRACEON(610, -1);

Trace flag 610 allows SQL Server to use minimal logging when creating new pages. It also eliminates the need for an explicit TABLOCK hint. However, not all INSERTs are going to allocate new pages, and those that use existing pages are still logged. In this sense, you could consider trace flag 610 a hybrid solution that allows for partially minimally logged inserts.

As with all trace flags, use it to solve a specific problem, not as a general rule-of-thumb!

SELECT … INTO

A regular

SELECT {column list} INTO {target table} ...

is also minimally logged. However, there really isn’t any point in using this syntax, since SQL Server versions starting with 2008 allow minimally logged inserts using INSERT INTO … SELECT (see above).

The main drawback of SELECT … INTO is that it creates the table in the same transaction that inserts the data. Creating the table within a transaction locks the database schema, which means you cannot access the system DMVs sys.objects, sys.indexes, etc. In some cases, other users won’t be able to create or alter tables or indexes while your operation is running.

Some DDL statements, including CREATE and ALTER INDEX

Creating or rebuilding an index is similar to building and populating a table from scratch, so these operations can also be minimally logged.

What about foreign key constraints?

Logging occurs on changes in the database and shouldn’t be confused with locking. When the target table of an INSERT has foreign key constraints, SQL Server needs to assert (validate) that the inserted values do not conflict with the foreign keys, but this doesn’t cause logging. It only locks the foreign key data, so that it cannot change while the INSERT is running.

Minimally logged operations