Efficient data, part 5: Datatypes

This week’s installment in the series on efficient data is all about data types. Be smart about choosing them; The right type for the right job will not only store your data in less physical space and enforce data integrity better, but smaller indexes and tables will also improve overall query performance.

The choice of data type affects the storage size of your tables, which has a fairly large impact on really large volumes of data – if you have 100 million rows of data, with four measures as a numeric data type, changing the numeric to a smaller size can save you up to 12 bytes per value, which translates to 48 bytes per row; 48 MB of storage or some 12 000+ data pages.

As a bonus, a well-chosen data type will prevent data inconsistencies and invalid values in your tables. It may even generate better statistics, making your queries run even faster.

Datatype it

Here are some ideas to look for:

  • Use date instead of datetime (on SQL Server 2008 and newer) where possible – a datetime value takes 8 bytes to store, while a date value only needs 3.
  • Instead of int or bigint columns, consider using smallint, tinyint as key columns where applicable. A smallint column uses 2 bytes, a tinyint just 1, compared to the int, which uses 4 and bigint, which uses 8.
  • Consider using datatypes that have a constant allocation size. If the allocated storage size of the column remains the same, no matter what value it holds, SQL Server may be able to perform an update-in-place when you UPDATE the value. An update-in-place happens when SQL Server actually overwrites the existing value instead of removing and re-inserting the row, but for this to happen, the new value must be equal in allocation size. There are, however a bunch of other factors at work here, but having defined a good datatype is a really good start.
  • The size of a float will vary depending on its value, whereas real is always 4 bytes. numeric also has a constant size, but it varies from 5 to a whopping 17 bytes, depending on the declared precision of the column.
  • The size of varchar columns is 2+(the number of characters). For the unicode nvarchar datatype, this number is 2+2x(number of characters). String types such as char, varchar, etc, are not really a good idea in very large tables. Do consider putting them in other tables, using a normalized schema, or keep them small.

Check back next week, for the next installment in the series!

Let me hear your thoughts!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s