Introduction to partitioning

Partitioning is not only a great way to improve performance in large tables, but you can also use it to manage the physical storage of different chunks of data without keeping it in different tables.

First off, to use partitioning on SQL Server, you need the Enterprise Edition, which comes with a hefty price tag and as such is really only suited for more demanding setups.

How partitioning works

Partitioning data means that you split the physical storage of it into different filegroups (which in turn typically means different physical storage locations). This creates a number of performance benefits, because although the table (for instance) is still visible as one table with data, in actual fact, it is stored as a number of tables physically. And smaller tables means better performance.

You can partition a table, a heap or an index. When you partition a table (with a clustered index) it’s actually the clustered index, which makes up the physical storage of the table, that is being partitioned.

Some of the benefits of partitioning your data are obvious, other not:

Using partition elimination, SQL Server can query a single partition to increase performance.
You can apply different data compression to different partitions, depending on your needs.
Data that is frequently updated can reside on storage with good write performance, while older data that typically doesn’t change can be stored on disks with better read performance.
Self-joins (actually, “equi-joins”) can be performed more efficiently because two partitions can be treated as separate tables.
Partitioned data allows for partition switching – an advanced and very, very efficient way to move entire partitions of data between two identical tables/partitions.

Aligned indexes

When you have a partitioned table, you can partition the table’s non-clustered indexes as well. This does not happen automatically, and you can choose different partition schemes for the table and its indexes independently of each other.

However, if you choose a partition scheme for a non-clustered indexes that is essentially equivalent to that of the table, that index is referred to as being “aligned” to its table, which brings a few performance advantages which I won’t go into here, mostly related to switching or moving data in and out of partitions.

Creating a partition function

The first thing you need is a partition function. This is basically the definition of which rows end up in which partition.

--- Create the partition function:
CREATE PARTITION FUNCTION fn_part_left(int)
AS RANGE LEFT FOR VALUES (100, 110, 120, 130);

The “boundary values” for this partition function are 100, 110, 120 and 130. This means that these values are the lower and upper boundaries of each partition.

A partition function can either be RANGE LEFT or RANGE RIGHT. Whether a partition function is “left” or “right” defines on which partition to store values that are exactly equal to a boundary value:

value        RANGE LEFT partition   RANGE RIGHT partition
98             1                      1
99             1                      1
100            1                      2
101            2                      2
102            2                      2
...
128            4                      4
129            4                      4
130            4                      5
131            5                      5
132            5                      5

The boundary values of the example partitions have been highlighted in red above.

For a RANGE LEFT partition function, the first partition will include values up to and including the first boundary value and the last partition will include values above the last boundary value.

For a RANGE RIGHT partition function, on the other hand, the first partition will include values below the first boundary value and the last partition will include values starting at the last boundary value.

To test which partition a given value would end up in, you can use the special $PARTITION function, like this:

SELECT $PARTITION.fn_part_left(123);

.. where fn_part_left, in this case, is the partition function, and 123 is the value we want to test. You can also prefix $PARTITION with a specific database name.

SELECT databaseName.$PARTITION.partFunction(123);

Creating a partition scheme

Second, you need a partition scheme which defines which filegroups are used for each partition in the partition function.

--- Create the partition scheme:
CREATE PARTITION SCHEME ps_part_left AS
PARTITION fn_part_left TO
    ([GROUP_A], [GROUP_B], [GROUP_C], [GROUP_A], [GROUP_B]);

Remember, you need one more partition than the number of partition boundaries. So if you have four boundaries, that means you need five partitions. If you want to, you can define all the partitions on the same filegroup using the ALL keyword.

--- Create the partition scheme,
--- all partitions on the same filegroup:
CREATE PARTITION SCHEME ps_part_left AS
PARTITION fn_part_left ALL TO ([PRIMARY]);

Creating a partitioned table or index

When creating tables and indexes, you declare the partition scheme and partitioning column where you would otherwise declare a filegroup, using the ON keyword:

--- Create a table on the partition scheme:
CREATE TABLE dbo.tbl_part_left (
     i      int NOT NULL,
     PRIMARY KEY CLUSTERED (i)
) ON ps_part_left(i);

Partitioning an existing table or index

An existing index or an existing table with a clustered index can be moved to a partition scheme using CREATE INDEX with DROP_EXISTING.

--- Create a standard, unpartitioned table:
CREATE TABLE dbo.tbl_unpart (
    i          int NOT NULL,
    CONSTRAINT PK_tbl_unpart PRIMARY KEY CLUSTERED (i)
) ON [PRIMARY];

--- Populate the table with some example data:
INSERT INTO dbo.tbl_unpart (i)
VALUES (98), (99), (100), (101), (102),
       (103), (104), (105), (106), (107),
       (108), (109), (110), (111), (112),
       (130), (131), (132), (133);

--- Move the existing table to a partition scheme using
--- CREATE INDEX with DROP_EXISTING:
CREATE UNIQUE CLUSTERED INDEX PK_tbl_unpart ON dbo.tbl_unpart (i)
        WITH (DROP_EXISTING=ON)
    ON ps_part_left(i);

This was a very short introduction to partitioning, but as you understand, it’s a fairly deep topic, and there’s lots of room for discussions like partition switching, merging, compression, etc. Stay tuned for more further on!

4 comments

Pingback: Reloading fact tables with zero downtime « Sunday morning T-SQL
Pingback: Partitioned views over table partitioning « Sunday morning T-SQL
Pingback: Viewing the size of your database objects « Sunday morning T-SQL
Pingback: Manual transaction management « Sunday morning T-SQL