I’m an outspoken advocate of always using a clustered index on each and every table you create as a matter of best practice. But even I will agree that there’s a case for using the odd heap now and then.
The great thing with (clustered) indexes
There are numerous advantages to clustering your table over just using a heap,
- If you put a little thought into the clustered index, your most common and/or heavy queries won’t have to scan the table. Instead, they’ll use the clustered index (or a suitable non-clustered index) to seek directly to the correct row or range.
- Similarly, the data is already ordered in the table.
- Indexes often help with joins as well.
- Indexes come with statistics, which helps SQL Server build better query plans.
- Inserting records into a heap (row by row – think SSIS transformation) may leave you with half-full pages because SQL Server stores just a rough figure of how much of the page is actually used, not the exact number of bytes.
- Updating rows may cause page splits in a clustered index, but at least you won’t have forwarded records like you get in heaps.
- Updates on heaps might also change the RID on the row (if it has to move to another page), which requires updating every non-clustered index on the table.
- Deleting rows from a clustered table will clean up the “ghost” records, whereas it will leave big gaping holes in a heap.
What about heaps?
So, you might ask, when should I use heaps? The slightly simplified answer: when most of the conditions below are met:
- if you want to insert most of the data in a single statement or using a table lock,
- if you don’t want to have to order the data when it goes into the table,
- if you don’t need the data ordered or aggregated (except scalar aggregates) in any particular way when it goes out,
- if you’re going to read most of the data at once,
- if you’re not going to join the table to another table of significant size (i.e. more than a few rows).
- if you don’t have to share the table or even database with a significant number of other user sessions (to avoid PFS or IAM page contention).
You think the above sounds a bit like an edge case? It is. And with all this in mind, you might still get worse performance with heaps than with a clustered index. Your mileage will vary.
For every other situation, do yourself and the world a favor and add a clustered index to every table. It doesn’t even have to be unique if that’s a problem.
How do I look for heaps?
Here’s a quick way to identify all the heaps in a given database:
SELECT s.[name]+N'.'+t.[name] AS [Table], i.[type_desc] AS [Index/heap], (CASE WHEN MAX(p.partition_number) OVER ( PARTITION BY p.[object_id])>1 THEN p.partition_number END) AS [Partition], p.data_compression_desc AS [Compression], p.[rows] AS [Row count] FROM sys.schemas AS s INNER JOIN sys.tables AS t ON s.[schema_id]=t.[schema_id] INNER JOIN sys.indexes AS i ON t.[object_id]=i.[object_id] AND i.index_id IN (0, 1) INNER JOIN sys.partitions AS p ON i.[object_id]=p.[object_id] AND i.index_id=p.index_id ORDER BY s.[name], t.[name];
Now, go forth and cluster.