A while ago, I needed to create a URL encoding function in T-SQL. URL encoding is what happens when you translate special characters (basically anything that isn’t an alphanumerical) so they’ll fit in a URL.
How does URL encoding work?
To URL encode a string, you translate special characters to their ascii value, turn that into a hexadecimal value, then prefix that value with a percent sign, so the following string
Here's a (test) string
should be translated as
Here%27s%20a%20%28test%29%20string
For example, the apostrophe is equivalent to CHAR(39). 39 decimal is 27 hex (2×16 + 7), so then the URL encoded apostrophe is %27.
So in order to make this work, we’ll negotiate a number or challenges in T-SQL, including
- Wildcard matching certain characters
- Replacing values in a string
- Calculating hex values from decimal values
It breaks down into two functions, the URL encoding function, and a decimal-to-hex function.
A function to convert a character to its hex value
This is the most compact char-to-hex function I could construct.
CREATE FUNCTION dbo.fn_char2hex(@char char(1)) RETURNS char(2) AS BEGIN DECLARE @hex char(2), @dec int; SET @dec=ASCII(@char); SET @hex= --- First hex digit: SUBSTRING('0123456789ABCDEF', 1+(@dec-@dec%16)/16, 1)+ --- Second hex digit: SUBSTRING('0123456789ABCDEF', 1+( @dec%16) , 1); RETURN(@hex); END GO
The function returns a char(2) scalar value, which means that we’re only translating characters from CHAR(0) to CHAR(255) to a two-digit hex value (00-FF). The first hex digit is the decimal value minus its modulo 16, all divided by 16, the second digit is the remainder, the modulo 16 of the original decimal value:
So if, for instance, the decimal value is 33, the first hex digit would be (33-33%16)/16 = (33-1)/16 = 32/16 = 2. The second hex digit is the remainder, in this case 33-32 = 1.
The SUBSTRING() construct is used to translate digits from 11 to 15 to their proper hex digits. For instance, if the decimal value is 12
SUSBTRING('0123456789ABCDEF', 1+12, 1) = 'C'
Ther reason for the “1+” is because SUBSTRING(xyz, 0, 1) will return a blank value, but we need it to be 0 in this case.
Edit: In a newer post, I demonstrate a more efficient method using the binary datatype.
fn_urlencode()
Now, here’s the main URL encoding function, starting off with the usual declarations.
CREATE FUNCTION dbo.fn_urlencode(@string varchar(max)) RETURNS varchar(max) AS BEGIN; DECLARE @offset int, @char char(1);
First of all, we’ll have to replace all the percent characters in the string. If we don’t, we’ll find ourselves in an infinite loop that will replace % with %25, with %2525, with %252525, etc.
--- Replace % with %25, that way, we can skip replacing the --- % character once we're looping through the string. SET @string=REPLACE(@string, '%', '%'+dbo.fn_char2hex('%'));
And here’s the main loop of the function: We’re using PATINDEX() to find any character that isn’t a regular A-Z, 0-9, a period, a dash or a percent character. For each occurrence, we’re using the char2hex() function to calculate the character’s hex value, and then we replace that character in @string with the calculated hex equivalent. And rinse, repeat.
--- Loop through the @string variable, using PATINDEX() to look --- for non-standard characters using a wildcard. When no more --- occurrences are found, PATINDEX() will return 0, and the WHILE --- loop will end. SET @offset=PATINDEX('%[^A-Z0-9.\-\%]%', @string); WHILE (@offset!=0) BEGIN; SET @char=SUBSTRING(@string, @offset, 1); --- Replace the non-standard characters with URL encoded --- equivalents: SET @string=REPLACE(@string, @char, '%'+dbo.fn_char2hex(@char)); --- Find the next occurrence, if any: SET @offset=PATINDEX('%[^A-Z0-9.\-\%]%', @string); END --- Done. RETURN @string; END; GO
That’s it. To test out the function, try:
SELECT dbo.fn_urlencode('Here''s a (test) string');
There’s a lot that’s left out from this tutorial. For instance, the hex conversion function will only work with non-unicode character values (8-bit characters), but you could probably modify it relatively easily.
— I modified your code slightly to replace spaces with plus signs, per the W3 docs.
CREATE FUNCTION [dbo].[ufn_Urlencode](@string varchar(max))
— Credit: Daniel Hutmacher his original version of this code.
— See: http://sqlsunday.com/2013/04/07/url-encoding-function/
RETURNS varchar(max)
AS
BEGIN
DECLARE @hex char(2);
DECLARE @dec int;
DECLARE @offset int;
DECLARE @char char(1);
— Replace % with %25, so we can skip the % while looping over the string.
SET @string = REPLACE(@string, ‘%’, ‘%25’);
— Loop through the @string variable, using PATINDEX() to look
— for non-standard characters using a wildcard. When no more
— are found, PATINDEX() will return 0, and the WHILE loop will end.
SET @offset = PATINDEX(‘%[^ A-Z0-9.-\%]%’, @string);
WHILE (@offset != 0)
BEGIN
SET @char = SUBSTRING(@string, @offset, 1);
SET @dec = ASCII(@char);
SET @hex = SUBSTRING(‘0123456789ABCDEF’, 1 + (@dec – @dec % 16)/16, 1)
+ SUBSTRING(‘0123456789ABCDEF’, 1 + (@dec % 16), 1);
— Replace the non-standard char with URL encoded equivalent:
SET @string = REPLACE(@string, @char, ‘%’ + @hex);
— Find the next occurrence, if any:
SET @offset = PATINDEX(‘%[^ A-Z0-9.-\%]%’, @string);
END
— now that it’s nearly done, code spaces, making them plus signs
— This is done per the http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.1 standard
SET @string = REPLACE(@string, ‘ ‘, ‘+’);
— Done.
RETURN @string;
END
Neat! Thanks, Paul!
Here’s a faster Hex encoder… you can either call it one character at a time, or else encode the whole string and parse it in groups of two digits:
create function fnStringToHex
(
@Value varchar(max)
)
returns varchar(max)
as
begin
declare @r varchar(max)
select @r=upper(stuff(sys.fn_varbintohexstr(convert(varbinary(max),@Value)),1,2,”))
return @r
end
go
Slight bug: your PATINDEX should be ‘%[^A-Z0-9.\-\%]%’ — notice the \- in there…
Thanks! I’ve updated the post to reflect your correction.
Also, here’s a faster String to Hex – you can pass a single character or an entire string:
create function fnStringToHex(@Value varchar(max))
returns varchar(max)
as
begin
declare @r varchar(max)
set @r=upper(stuff(sys.fn_varbintohexstr(convert(varbinary(max),@Value)),1,2,”))
return @r
end
go
A quick note on your solution. Although elegant, fn_varbintohexstr() is an undocumented function in SQL Server, which means you shouldn’t rely on it for future compatibility. Thanks for your input!
Ah – interesting: didn’t know that! Thanks!
Thanks so so much 🙂 .. .have been pulled my hairs for many hours on the subject.
Regards, –Iris 🙂