Just for laughs, we’re going to take a look at a slightly-too-complex yet elegant solution to a seemingly simple task: creating a list of 100 fictional company names for a demo database.
What I need
I’m working on a demo database where I need some fictional customer records (company names). I could obviously just call them “Customer 1”, “Customer 2”, and so on, and then go to lunch like ordinary people. But where’s the sport? And besides, when you’re showing off a cool demo to future clients, you’d rather the data looked “authentic”.
What I tried
You could build a dynamic, randomized string generator function that jumbles just the right proportions of vowels and consonants together to form nonsensical, name-sounding strings, but this turns out to be actually really hard. Building T-SQL code to create a string that actually looks like something you could pronounce is a lot of work, the details of which I won’t go into right here.
Let’s just say I know this from experience.
What if we had a random mass of text?
So, ideally I would like a large repository of text to work with and generate my names from. I could probably build a script to extract random strings from some database that I have lying around, but there’s always the obvious risk that even a scrap of text could contain/reveal sensitive customer information. But wait! There’s the sys.messages DMV, where SQL Server keeps all its error messages.
How I invented 100 cool tech startup names
First off, let’s collect some random strings of text and dump it into a table variable:
DECLARE @strings TABLE ( message_id int NOT NULL, nvarchar(2048) NOT NULL, PRIMARY KEY CLUSTERED (message_id) ) INSERT INTO @strings (message_id, ) SELECT TOP (500) message_id, FROM sys.messages WHERE language_id=1033 ORDER BY NEWID();
Three things happen here:
- The sys.messages DMV contains over 12 000 rows on my SQL Server 2016 SP1, and those are just the us_english ones, which means that for performance reasons, I’m going to just pick 500 messages at random.
- Also, the language_id column identifies the language of the message (us_english is 1033, german is 1031, 1036 is french, 3082 is spanish, and so on), so you could easily have the script generate names in your preferred language if you want to.
- Finally, to randomize the selection, I’m sorting by NEWID() (which creates a “random” uniqueidentifier column).
Now, let’s parse all those strings into words. And by “words”, I specifically mean contiguous sequences of the characters a-z, to weed out all the whitespaces and just keep the “real words”. We’ll do this using a recursive common table expression:
DECLARE @words TABLE ( word nvarchar(100) NOT NULL, PRIMARY KEY CLUSTERED (word) ); WITH cte AS ( --- The anchor: SELECT CAST(NULL AS nvarchar(100)) AS word, AS remain FROM @strings UNION ALL --- The recursion: SELECT CAST( LEFT(remain, PATINDEX(N'%[^a-z]%', remain+N' ')-1 ) AS nvarchar(100)), CAST(SUBSTRING(remain, PATINDEX(N'%[^a-z]%', remain+N' ')+1, LEN(remain)) AS nvarchar(2048)) FROM cte WHERE remain!=N'' ) INSERT INTO @words SELECT DISTINCT LOWER(word) FROM cte WHERE LEN(word)>=3 OPTION (MAXRECURSION 0);
The anchor part of the CTE sets up one line for each string. For every recursion, we’ll check the offset to the first non-alphabetic character (including spaces and numbers). That’s what the PATINDEX() function is for. Then we’ll put what’s to the left of that offset in the “word” column and what’s to the right of the offset in the “remain” column for the next iteration.
As we iterate through this, we’re gradually truncating the “remain” column until it’s a just a blank string, which is our stop condition for the recursion. The output of the CTE will be one row for each word of each string.
For good measure, we’ll eliminate all words that are shorter than three characters and then save a DISTINCT set of the remaining words in the @words table variable.
Now, the home stretch: putting the words together.
SELECT TOP (100) UPPER(LEFT(a.word, 1))+SUBSTRING(a.word, 2, LEN(a.word))+ ISNULL(N' '+b.word, '')+ ISNULL(N' '+c.word, '') AS CoolStartupName --- First word: FROM @words AS a --- Second word: OUTER APPLY ( SELECT TOP (1) UPPER(LEFT(x.word, 1))+ SUBSTRING(x.word, 2, LEN(x.word)) AS word FROM @words AS x WHERE LEFT(a.word, 3)!=LEFT(x.word, 3) AND RIGHT(a.word, 3)!=RIGHT(x.word, 3) ORDER BY NEWID() ) AS b --- Third word (if there's space): OUTER APPLY ( SELECT TOP (1) UPPER(LEFT(x.word, 1))+ SUBSTRING(x.word, 2, LEN(x.word)) AS word FROM @words AS x WHERE LEFT(a.word, 3)!=LEFT(x.word, 3) AND RIGHT(a.word, 3)!=RIGHT(x.word, 3) AND LEFT(b.word, 3)!=LEFT(x.word, 3) AND RIGHT(b.word, 3)!=RIGHT(x.word, 3) AND LEN(a.word)+LEN(b.word)<=12 ORDER BY NEWID() ) AS c ORDER BY NEWID();
This query is actually a lot simpler than it first appears. Here’s how it breaks down:
- Pick 100 words at random (table “a”)
- For each word in “a”, if possible, pick a single random word (“b”) that doesn’t start or end with the same three letters as the “a” word.
- For each word in “b”, if possible, pick a single random word (“c”) that doesn’t start or end with the same three letters as the “a” nor the “b” word.
- The UPPER(), LEFT() and SUBSTRING() stuff is just to turn the names into title case.
- As before, the ORDER BY NEWID() randomizes the order in which the TOP (1) row is returned.
The APPLY operator runs a subquery once for each “calling” parent row, which is what we want here, as we want to randomize the sort order each time. In the execution plan, APPLY shows up as a Nested Loops operator:
Obviously, a real performance killer here is the randomized sort order, which puts no less than three Sort operations in our plan.
Bottom line
With a reasonably simple text parsing script and the built-in messages in sys.messages, we can create a list of 100 cool tech startup company names, including brilliant names such as “Intended Incoming”, “Impacted Algorithm”, “Candidate Terminator” or “Granted Cleanup”.
Would you name your tech startup “Impacted Algorithm”? Let me know in the comments.
Seconds away from buying the domain for ‘Impacted Algorithm’. A pretty funny solution to a problem I came up against just recently.
Haha! 🙂
“Applications Deleted”, a subdivision of “Floating Slow Startup”.
I like this script. It’ll bring some fun into the demo’s.
And I have a new user name too. 😉
Hahaha! Nice. 🙂
This is my new favourite game for a Friday afternoon!
Favourites to date:
Replicated Illegal
Using Construction
Dispatch Return
The Descriptor
and the slightly creepy: Population Processing
I think they’re all creepy. That’s what’s so cool. 🙂
Haha! OK, my list of faves:
Problems Valid
Error Targeted
Delete Operator
End Transactions
Result Signal You
Nvarchar Violates
Tolerant Permissible
Certificate Forceseek
Total Sensitivity
Awesome! 🙂
Altering Per Browser
Explicitly Unicode
Commit Impersonation
Atomic Identify
and my favorite: Because Memory
Explicitly Unicode sounds like a hipster web api or something. 😉
I think these also make good Band names
Source Tabular
Attempted Versions
Decode Some Specifies
Explicit Otherwise
Connect Performance
Corruption Subset
Object Fashion
Residual Beyond
Redefinition Won
…
Haha! Yeah, I see where you’re going with that. Bit of an industrial synth vibe going on here, right?
I love “Object Fashion”. Sounds glam!
And some possible lines of business (I assume coming in v2?):
Body Alter Form — Personal Training
Semantics Possible — Consultant philosophers
Argument Resumed — Dating agency
Deprecated Tables — Reclaimed furnishings
Loving it! Seems like there are still some untapped ideas out there..
Negative Bytes
Query Coherency
Empty Particle
Showmemo Password
Dumped Indexed Namespace
and my all-time favorite:
Deadlock Pool
I occasionally do Query Coherency (i.e. cleaning up or trying to understand other people’s code).
And some more :
” Temporarily Foreign” – US Immigration Agency
” Snapshots Works” – A trendy Photo studio
” Deleted Siblings” – Population Control Agency for Brothers & Sisters
” Damaged Environment ” – The local Branch of EPA
” Commit Used Reader” – Local Library’s Law Enforcement branch
“Skipped Password” – EQUIFAX’s Security Policy
” Roll Initialize” – Pillsbury’s Instruction’s Manual
” Capture Full Sender ” – Post Office’s Law Enforcement Agency
” Making Succeeded” – In-Vitro Fertilization Lab ?
“Weekly Parent Received” – Amazon’s New Service for Mom & Dad Shipments
“Uses Block Soap” – Public Bath’s Moto
Hahahahaha! 🙂
I “like” the following:
Parameter Protection
Troubleshooting Document
Previously Distributed
Four Settings Compliance
Regenerating Protocol
Bulk Permission
Cascading Security
Literal Interval
And the deeply personal:
Inconsistent Stream
Inconsistent Stream sounds a lot like me after three cappuccinos.
Or why I have to see the urologist.
Ouch. 🙂
Here are my favorites 🙂 I think some of them can even be great book or movie titles.
Replaced Problem
Last Declared Session
Clone Durability
Agent Delivered
Exposing Avoid
Clone Durability sounds like a follow-up to Blade Runner 2049!