Join fundamentals 🧩
Success
The bits that everyone knows 😋
The "fundamental" joins
If you're reading this, you probably know what these are. The fundamental SQL joins are:
CROSS JOIN
INNER JOIN
LEFT JOIN
RIGHT JOIN
FULL JOIN
All SQL databases support the first four, and most support the FULL JOIN
as well.
CROSS JOIN
It's not common to see CROSS JOIN
, but it can be useful in some scenarios.
The CROSS JOIN
has the simplest syntax:
SELECT *
FROM left_table
CROSS JOIN right_table
There are no join conditions for a CROSS JOIN
as it just joins every row from the left table with every row from the right table.
These joins are equivalent to the following non-ANSI SQL syntax (more on this in the SQL-92 rant):
SELECT *
FROM left_table, right_table
INNER JOIN
The INNER JOIN
is one of the most common join types.
Unlike the CROSS JOIN
, the INNER JOIN
requires a join condition to match rows from the left table with rows from the right table:
SELECT *
FROM left_table
INNER JOIN right_table
ON left_table.id_column = right_table.id_column
The "feature" of the INNER JOIN
is that it only returns rows where there is a match between the left and right tables.
Note that the INNER
keyword is optional in most databases. If you don't specify it, the database will assume you mean an INNER JOIN
:
SELECT *
FROM left_table
JOIN right_table /* Still an inner join */
ON left_table.id_column = right_table.id_column
LEFT JOIN
(and RIGHT JOIN
)
The LEFT JOIN
is the other most common join type.
Like the INNER JOIN
, the LEFT JOIN
also requires a join condition:
SELECT *
FROM left_table
LEFT JOIN right_table
ON left_table.id_column = right_table.id_column
The "feature" of the LEFT JOIN
is that it keeps all rows from the left table, even if there is no match in the right table.
Although it's extremely uncommon, the RIGHT JOIN
is the same as the LEFT JOIN
except it keeps all rows from the right table:
SELECT *
FROM left_table
RIGHT JOIN right_table
ON left_table.id_column = right_table.id_column
These two joins can optionally specify the OUTER
keyword, but it's not necessary:
SELECT *
FROM left_table
[LEFT | RIGHT] OUTER JOIN right_table
ON left_table.id_column = right_table.id_column
Left or right?
Any RIGHT JOIN
can be rewritten as a LEFT JOIN
by swapping the tables around which is why RIGHT JOIN
is very uncommon.
In older versions of some databases, the join order had performance implications which meant that using RIGHT JOIN
could offer performance benefits, but this is rarely the case now.
If in doubt, use LEFT JOIN
!
FULL JOIN
The FULL JOIN
isn't supported by all databases, but it's a combination of the LEFT JOIN
and RIGHT JOIN
.
Like INNER JOIN
and LEFT JOIN
, the FULL JOIN
requires a join condition:
SELECT *
FROM left_table
FULL JOIN right_table
ON left_table.id_column = right_table.id_column
The "feature" of the FULL JOIN
is that it keeps all rows from both tables, even if there is no match in the other table.
Availability
At the time of writing (2024-04-07), the FULL JOIN
has the following availability:
- DuckDB: ✅
- SQLite: ✅ (>=3.39.0)
- PostgreSQL: ✅
- SQL Server: ✅
- Snowflake: ✅
Join syntax
Multiple joins
This is another thing you probably know, but you can join more than two tables in a single query:
SELECT *
FROM table_1
INNER JOIN table_2
ON table_1.id_column = table_2.id_column
LEFT JOIN table_3
ON table_2.id_column = table_3.id_column
You don't have to use the same join type for all the joins, and you can mix and match as you like.
You choose the order of the joins which sometimes has an impact on performance, and you have to put one join after another if the second join depends on the first join.
Joins with subqueries
You can also join tables with subqueries:
SELECT *
FROM table_1
INNER JOIN (
SELECT *
FROM table_2
WHERE column_2 > 100
) AS subquery
ON table_1.id_column = subquery.id_column
Unless the subquery is very basic (or you're a software developer using SQL in a transactional environment), it's typically a good idea to use a CTE instead of a subquery:
WITH subquery AS (
SELECT *
FROM table_2
WHERE column_2 > 100
)
SELECT *
FROM table_1
INNER JOIN subquery
ON table_1.id_column = subquery.id_column
Joins to the same table(s)
There's no limit to the number of times you can join a table in a query. You just need to alias the table each time you join it so that you can refer to each instance explicitly:
SELECT *
FROM base_table AS base1
INNER JOIN base_table AS base2
ON base1.id_column = base2.id_column
LEFT JOIN other_table AS other1
ON base1.id_column = other1.id_column_1
LEFT JOIN other_table AS other2
ON base2.id_column = other2.id_column_2
Join condition syntax
Although the examples above only use a single join condition, you can have multiple join conditions in a join:
SELECT *
FROM left_table
LEFT JOIN right_table
ON left_table.column_1 = right_table.column_1
AND left_table.column_2 = right_table.column_2
AND left_table.column_3 = right_table.column_3
The conditions don't have to be equality conditions either. You can use any condition you like:
SELECT *
FROM left_table
LEFT JOIN right_table
ON 0=1
OR (1=1
AND left_table.column_1 = right_table.column_1
AND left_table.column_2 < right_table.column_2
)
OR left_table.column_3 BETWEEN right_table.column_3
AND right_table.column_4
Note that these are all conditions for the join: they're not filters on the result set.
These conditions just determine which rows from the left table are matched with which rows from the right table. This is why equality is usually how you join tables, but it's not a requirement.
Anything that is valid inside a WHERE
clause is valid inside a join's ON
condition!