Analyzing partition usage and skewing

I sometimes want to know how my data is spread across different partitions in a table or index – after all, this can affect performance and storage a great deal, and if the data is really badly skewed, most or all of it could be stuck in a single partition, rendering the partitioning scheme pretty much useless in the first place.

You can use dynamic management views to find out how your data is spread across different partitions, and how those partitions are delimited, in “plain english”. Here’s how!

Here’s the basic information we’re looking for:

SELECT * FROM sys.partition_range_values;

This DMV contains one row for each partition, for each partitioning function. The “value” column of sys.partition_range_values is a sql_variant value, so if you want to use or display it, you’ll have to explicitly convert it to another datatype first.Here, we’ve cleaned up the output data a bit for readability:

SELECT val.function_id, val.boundary_id, pf.boundary_value_on_right,
    (CASE SQL_VARIANT_PROPERTY(val.[value], 'BaseType')
        --- Convert to "native" date format:
        WHEN 'date' THEN
            '{d '''+LEFT(CONVERT(varchar(max), val.[value], 120), 10)+'''}'
        --- Convert to datetime (timestamp) format:
        WHEN 'datetime' THEN
            '{ts '''+CONVERT(varchar(max), val.[value], 120)+'''}'
        --- Everything else is just plain CAST() to varchar:
        ELSE CAST(val.[value] AS varchar(max))
    END) AS boundary_value
FROM sys.partition_range_values AS val
INNER JOIN sys.partition_functions AS pf ON val.function_id = pf.function_id
ORDER BY val.function_id, val.boundary_id;

Partitioning functions come in two distinctly different flavours, defined by whether the “boundary value” of each partition is on the right or left (lower or upper) end of the partition. This is indicated by a bit column in sys.partition_functions, boundary_value_on_right. If the boundary value for a given partition is 100 and the boundary is on the right (upper) side, all values in this partition will be less than 100. However, if the boundary is on the left side, the partition will contain values up to and including 100. Conversely, the next partition will start at 100 if the boundary is on the right, and at 101 if the boundary is on the left.

Now that we have all the partitioning boundaries along with the knowledge if the partition function is bounded “right” or “left”, we can put this query in a common table expression and join the lower and upper boundary for each partition together.

We’ll use a FULL JOIN here, because the first partition will not have a lower boundary, and the last one doesn’t have an upper boundary.

Here’s the query:

WITH spc (data_space_id, function_id, boundary_id,
          boundary_value_on_right, boundary_value)
AS (--- Common table expression containing all partition boundaries
    SELECT ps.data_space_id,
        val.function_id,
        val.boundary_id,
        pf.boundary_value_on_right,
        (CASE SQL_VARIANT_PROPERTY(val.[value], 'BaseType')
            WHEN 'date' THEN
                '{d '''+LEFT(CONVERT(varchar(max), val.[value], 120), 10)+'''}'
            WHEN 'datetime' THEN
                '{ts '''+CONVERT(varchar(max), val.[value], 120)+'''}'
            ELSE CAST(val.[value] AS varchar(max))
        END) AS boundary_value
    FROM sys.partition_range_values AS val
    INNER JOIN sys.partition_functions AS pf ON
        val.function_id = pf.function_id
    INNER JOIN sys.partition_schemes AS ps ON
        pf.function_id=ps.function_id)

--- ... joined together in a FULL JOIN:
SELECT ISNULL(a.function_id, b.function_id) AS function_id,
    ISNULL(a.data_space_id, b.data_space_id) AS data_space_id,
    ISNULL(a.boundary_id+1, b.boundary_id) AS boundary_id,
    a.boundary_value+(CASE
        WHEN a.boundary_value_on_right=1 THEN '<='
        WHEN a.boundary_value_on_right=0 THEN '<'
     END) AS lower_rng,
    (CASE
        WHEN b.boundary_value_on_right=1 THEN '<'+b.boundary_value
        WHEN b.boundary_value_on_right=0 THEN '<='+b.boundary_value
     END) AS upper_rng
FROM spc AS a
FULL JOIN spc AS b ON
    a.function_id=b.function_id AND
    a.boundary_id+1=b.boundary_id
ORDER BY 1, 3;

Now, all that remains to do is to connect this query to actual tables and indexes. Note that partitioning functions can be applied to indexes and tables independently, so you’ll have to make sure not to mix those up!

The way to do this is by using the data_space_id column in sys.partition_schemes, which maps to the data_space_id column in sys.indexes. From the index, in turn, you’ll find the table.

Finally, in this example, we’ll also LEFT JOIN sys.partitions, which contains one row for each partition in the database, to get information about the (estimated) number of rows in each partition as well as if we’re using any data compression on it.

--- Partition boundaries...
WITH spc (data_space_id, function_id, boundary_id,
          boundary_value_on_right, boundary_value)
AS (    SELECT ps.data_space_id,
            val.function_id,
            val.boundary_id,
            pf.boundary_value_on_right,
            (CASE SQL_VARIANT_PROPERTY(val.[value], 'BaseType')
                WHEN 'date' THEN
                    '{d '''+LEFT(CONVERT(varchar(max), val.[value], 120), 10)+'''}'
                WHEN 'datetime' THEN
                    '{ts '''+CONVERT(varchar(max), val.[value], 120)+'''}'
                ELSE CAST(val.[value] AS varchar(max))
             END) AS boundary_value
    FROM sys.partition_range_values AS val
    INNER JOIN sys.partition_functions AS pf ON
        val.function_id = pf.function_id
    INNER JOIN sys.partition_schemes AS ps ON
        pf.function_id=ps.function_id),

--- Ranges, i.e. lower and upper bounds of a partition:
     rng (function_id, data_space_id, boundary_id, lower_rng, upper_rng)
AS (    SELECT ISNULL(a.function_id, b.function_id),
            ISNULL(a.data_space_id, b.data_space_id),
            ISNULL(a.boundary_id+1, b.boundary_id),
            a.boundary_value+(CASE
                WHEN a.boundary_value_on_right=1 THEN '<='
                WHEN a.boundary_value_on_right=0 THEN '<'
             END) AS lower_rng,
            (CASE
                WHEN b.boundary_value_on_right=1 THEN '<'+b.boundary_value
                WHEN b.boundary_value_on_right=0 THEN '<='+b.boundary_value
             END) AS upper_rng
    FROM spc AS a
    FULL JOIN spc AS b ON
        a.function_id=b.function_id AND
        a.boundary_id+1=b.boundary_id)

--- ... joined with sys.indexes and sys.tables:
SELECT p.[object_id],
    SCHEMA_NAME(t.[schema_id])+'.'+t.[name] AS [object_name],
    ix.index_id,
    ix.[name] AS index_name,
    pf.[name] AS pf_name,
    rng.boundary_id,
    ISNULL(rng.lower_rng, '')+'key'+ISNULL(rng.upper_rng, '') AS boundary,
    p.[rows],
    p.data_compression_desc
FROM rng
INNER JOIN sys.partition_functions AS pf ON
    rng.function_id = pf.function_id
INNER JOIN sys.indexes AS ix ON
    ix.data_space_id=rng.data_space_id
LEFT JOIN sys.tables AS t ON
    ix.[object_id]=t.[object_id]
LEFT JOIN sys.partitions AS p ON
    ix.[object_id]=p.[object_id] AND
    ix.index_id=p.index_id AND
    rng.boundary_id=p.partition_number
ORDER BY t.[schema_id], t.[name], p.index_id, rng.boundary_id;

The one thing I couldn’t figure out from the DMVs are what file group each respective partition uses, but I suppose that this isn’t as important anyway.

Let me know in the comments if you find this article useful, and if there’s anything I’ve left out.

sqlsunday.com

T-SQL tips and tricks, best practices and query plans from the field.

Analyzing partition usage and skewing

Let me hear your thoughts! Cancel reply

Rate this:

Share:

Let me hear your thoughts! Cancel reply