One of the most basic concepts to master in T-SQL (or any dialect of SQL, for that matter) is understanding how the JOIN operator works. This post goes through the fundamentals of joining, and I highly recommend it if you’re just starting out with T-SQL.
What is a join?
In relational databases, such as SQL Server, data is stored in multiple relational tables. In a SELECT query, if you want to access data from two or more tables in the same query, you have to “collate” the two tables, in a process called joining.
The relations between two tables that are joined are defined by logical operators. The simplest of those operators is the equal-to operator (=). In theory, however, the join condition can be any logical expression that evaluates to true or false. When you start working with more advanced applications, you will see this quite often.
The INNER JOIN is the bread-and-butter of joins. It fetches rows from both tables where the join condition (that the A column is equal to the B column, in this example) is met. So the following example would only return the rows that can be matched in both tables:
SELECT table_a.a, table_b.b FROM table_a INNER JOIN table_b ON table_a.a=table_b.b;
So, because row 1 in table A cannot be matched with any row in table B (because there is no such row in table B), there is no record with a 1 in the output result. The INNER JOIN operator, in this example, only returns rows that are matched in both tables.
LEFT JOIN and RIGHT JOIN
LEFT JOIN, also known as a LEFT OUTER JOIN, is when you want to include all the rows from the left table, regardless if there is a matching row on the right side or not. You’ll recall that with INNER JOIN, only rows were returned that matched on both sides.
The left table, in terms of joins is the one that is mentioned before the JOIN keyword. The right table is the one mentioned after.
In this example, a LEFT JOIN means that we want to include every row from 1 to 8 from table A, even though rows 1 and 2 do not match anything in table B.
SELECT table_a.a, table_b.b FROM table_a LEFT JOIN table_b ON table_a.a=table_b.b;
In a LEFT JOIN, columns from the right-side table that are unmatched are given NULL values. A NULL value is the database equivalent of an undefined value – not to be confused with blanks or zeros!
A RIGHT JOIN is exactly the same thing as a LEFT JOIN, but as the name implies, it returns all rows from the right table, and only matching ones from the left side. In a RIGHT JOIN, left-side column values for unmatched rows are given NULLs.
Most of the time, a LEFT or RIGHT JOIN will get you where you want. However, sometimes you need both in a single statement. The FULL JOIN (or FULL OUTER JOIN) can be a bit trickier to understand and master, but it provides this functionality.
As with a LEFT or RIGHT JOIN, it returns unmatched rows, but the difference is that FULL JOIN returns unmatched rows from both sides. And as with the other join operators, columns from unmatched tables are given NULL values.
SELECT table_a.a, table_b.b FROM table_a FULL JOIN table_b ON table_a.a=table_b.b;
Finally, worth mentioning is the CROSS JOIN. This join operator does not use any join condition, which means that it returns a cartesian result set. So, for every row in the left-side table, every row in the right-side table is matched. In the example above, that means that a CROSS JOIN between tables A and B would return 64 rows, because A and B both have 8 rows.
Again, because a CROSS JOIN does not use a join condition, the ON keyword is not given:
SELECT table_a.a, table_b.b FROM table_a CROSS JOIN table_b;
Normally, CROSS JOIN is used to split data in a table, using data (like a distribution key) in a second table.
Old-style ANSI 89 JOIN syntax
You may possibly encounter code that uses the old ANSI 89-style of writing joins. I’ve included them here for the purpose of completeness, although this syntax has been deprecated in SQL Server 2005.
Here’s an example of an INNER JOIN:
--- Old-school INNER JOIN SELECT table_a.a, table_b.b FROM table_a, table_b WHERE table_a.a=table_b.b;
And this is how a LEFT JOIN would look:
--- Old-school LEFT JOIN SELECT table_a.a, table_b.b FROM table_a, table_b WHERE table_a.a*=table_b.b;
Note the * in the LEFT JOIN syntax, which denotes on what side the outer join occurs, i.e. left or right side. Note that the old cross join syntax is ANSI standard and not deprecated.
--- Old-school/ANSI CROSS JOIN. Still ok, but please don't. SELECT table_a.a, table_b.b FROM table_a, table_b;
Using WHERE with outer joins
You can use WHERE to filter rows in queries with inner joins. Logically speaking, this filtering operation is applied after the join is processed, as dictated by the SQL Server logical processing order.
For instance, if you want to find out what rows in table A don’t match anything in table B, you would do this in two steps:
- First, by LEFT JOINing table B,
- and second, by limiting the output to where table B’s columns are NULL
Remember that with outer joins, columns of unmatched rows are all NULL. We can look for these NULL values in the WHERE clause, because WHERE is logically evaluated after the JOIN is finished. This is how it could look in practice:
SELECT table_a.a, table_b.b FROM table_a LEFT JOIN table_b ON table_a.a=table_b.b WHERE table_b.b IS NULL;
With this example, the query above would return two rows, 1 and 2, because table_b.b is NULL for those two values in a left join.
In technical jargon, this is what’s called an “anti-semi-join”. The finer points and potential pitfalls of using WHERE and outer joins are the topic of a future post, so check back regularly.
Now, go and join stuff!