A while ago, I needed to create a URL encoding function in T-SQL. URL encoding is what happens when you translate special characters (basically anything that isn’t an alphanumerical) so they’ll fit in a URL.
How does URL encoding work?
To URL encode a string, you translate special characters to their ascii value, turn that into a hexadecimal value, then prefix that value with a percent sign, so the following string
Here's a (test) string
should be translated as
For example, the apostrophe is equivalent to CHAR(39). 39 decimal is 27 hex (2×16 + 7), so then the URL encoded apostrophe is %27.
So in order to make this work, we’ll negotiate a number or challenges in T-SQL, including
- Wildcard matching certain characters
- Replacing values in a string
- Calculating hex values from decimal values
It breaks down into two functions, the URL encoding function, and a decimal-to-hex function.
A function to convert a character to its hex value
This is the most compact char-to-hex function I could construct.
CREATE FUNCTION dbo.fn_char2hex(@char char(1)) RETURNS char(2) AS BEGIN DECLARE @hex char(2), @dec int; SET @dec=ASCII(@char); SET @hex= --- First hex digit: SUBSTRING('0123456789ABCDEF', 1+(@[email protected]%16)/16, 1)+ --- Second hex digit: SUBSTRING('0123456789ABCDEF', 1+( @dec%16) , 1); RETURN(@hex); END GO
The function returns a char(2) scalar value, which means that we’re only translating characters from CHAR(0) to CHAR(255) to a two-digit hex value (00-FF). The first hex digit is the decimal value minus its modulo 16, all divided by 16, the second digit is the remainder, the modulo 16 of the original decimal value:
So if, for instance, the decimal value is 33, the first hex digit would be (33-33%16)/16 = (33-1)/16 = 32/16 = 2. The second hex digit is the remainder, in this case 33-32 = 1.
The SUBSTRING() construct is used to translate digits from 11 to 15 to their proper hex digits. For instance, if the decimal value is 12
SUSBTRING('0123456789ABCDEF', 1+12, 1) = 'C'
Ther reason for the “1+” is because SUBSTRING(xyz, 0, 1) will return a blank value, but we need it to be 0 in this case.
Edit: In a newer post, I demonstrate a more efficient method using the binary datatype.
Now, here’s the main URL encoding function, starting off with the usual declarations.
CREATE FUNCTION dbo.fn_urlencode(@string varchar(max)) RETURNS varchar(max) AS BEGIN; DECLARE @offset int, @char char(1);
First of all, we’ll have to replace all the percent characters in the string. If we don’t, we’ll find ourselves in an infinite loop that will replace % with %25, with %2525, with %252525, etc.
--- Replace % with %25, that way, we can skip replacing the --- % character once we're looping through the string. SET @string=REPLACE(@string, '%', '%'+dbo.fn_char2hex('%'));
And here’s the main loop of the function: We’re using PATINDEX() to find any character that isn’t a regular A-Z, 0-9, a period, a dash or a percent character. For each occurrence, we’re using the char2hex() function to calculate the character’s hex value, and then we replace that character in @string with the calculated hex equivalent. And rinse, repeat.
--- Loop through the @string variable, using PATINDEX() to look --- for non-standard characters using a wildcard. When no more --- occurrences are found, PATINDEX() will return 0, and the WHILE --- loop will end. SET @offset=PATINDEX('%[^A-Z0-9.\-\%]%', @string); WHILE (@offset!=0) BEGIN; SET @char=SUBSTRING(@string, @offset, 1); --- Replace the non-standard characters with URL encoded --- equivalents: SET @string=REPLACE(@string, @char, '%'+dbo.fn_char2hex(@char)); --- Find the next occurrence, if any: SET @offset=PATINDEX('%[^A-Z0-9.\-\%]%', @string); END --- Done. RETURN @string; END; GO
That’s it. To test out the function, try:
SELECT dbo.fn_urlencode('Here''s a (test) string');
There’s a lot that’s left out from this tutorial. For instance, the hex conversion function will only work with non-unicode character values (8-bit characters), but you could probably modify it relatively easily.