Joins 🔗
Success
Joins are how we combine data from multiple tables. They're like the LOOKUP
functions in Excel (but with some important differences)!
Note
The JOIN
clause is optional. If you use it, it must come after the FROM
clause and before the WHERE
clause.
The JOIN
clause combines data from tables
To get data from another table in Excel, you'd likely use one of the LOOKUP
functions:
These functions keep your data in the same shape as the original table, but they pull in data from another table based on a common value.
In SQL, we use the JOIN
clause to do a similar thing. The JOIN
clause is used to combine rows between tables based on a related column between them (usually).
In Microsoft SQL Server, there are a few different types of JOIN
clauses:
We'll only cover the INNER JOIN
and LEFT JOIN
in this course, but you can find more information on the others in the official Microsoft documentation.
The JOIN
syntax
To add a join to a query, you need to add the JOIN
clause after the FROM
clause and before the WHERE
clause. We also need to tell SQL how these tables are related using the ON
clause, for example:
SELECT *
FROM HumanResources.Employee
INNER JOIN HumanResources.vEmployeeDepartment
ON Employee.BusinessEntityID = vEmployeeDepartment.BusinessEntityID
;
This is a fairly intuitive example:
- The "base" table (in the
FROM
clause) isHumanResources.Employee
which has the employee information. - The "joined" table (in the
INNER JOIN
clause) isHumanResources.vEmployeeDepartment
which has the employee's current department. - The relationship between the two tables is that the
BusinessEntityID
in theEmployee
table matches theBusinessEntityID
in thevEmployeeDepartment
table. This would be similar to theVLOOKUP
function in Excel where theEmployee.BusinessEntityID
is the lookup value and thevEmployeeDepartment.BusinessEntityID
is the lookup column.
Notice how we've had to prefix the BusinessEntityID
with the table name in the ON
clause. This is because both tables have this column, so SQL needs us to prefix them to know which one we're referring to.
When we join tables, we don't usually want to SELECT *
like we have above. Instead, we explicitly list the columns we want to see from each table:
SELECT
Employee.BusinessEntityID,
Employee.NationalIDNumber,
vEmployeeDepartment.FirstName,
vEmployeeDepartment.LastName,
vEmployeeDepartment.Department,
vEmployeeDepartment.JobTitle
FROM HumanResources.Employee
LEFT JOIN HumanResources.vEmployeeDepartment
ON Employee.BusinessEntityID = vEmployeeDepartment.BusinessEntityID
;
BusinessEntityID | NationalIDNumber | FirstName | LastName | Department | JobTitle |
---|---|---|---|---|---|
1 | 295847284 | Ken | Sánchez | Executive | Chief Executive Officer |
2 | 245797967 | Terri | Duffy | Engineering | Vice President of Engineering |
3 | 509647174 | Roberto | Tamburello | Engineering | Engineering Manager |
4 | 112457891 | Rob | Walters | Tool Design | Senior Tool Designer |
5 | 695256908 | Gail | Erickson | Engineering | Design Engineer |
Tip
Although you only need to prefix the columns that are ambiguous (exist in both tables), it's a good idea to prefix all columns in a query when there's a join. This makes it easier to read and understand the query, and it can help avoid errors if the tables change in the future.
We can alias/rename tables like we can with columns
When the table names are long, it can be helpful to give them a shorter name for the rest of the query. The way that we do this is identical to how we alias columns by using the AS
keyword:
SELECT
Emp.BusinessEntityID,
Emp.NationalIDNumber,
Dep.FirstName,
Dep.LastName,
Dep.Department,
Dep.JobTitle
FROM HumanResources.Employee AS Emp
LEFT JOIN HumanResources.vEmployeeDepartment AS Dep
ON Emp.BusinessEntityID = Dep.BusinessEntityID
;
Tip
Although you can alias the tables to whatever you want, please try to make the alias meaningful. This makes the query easier to read and understand!
Step-by-step examples
We've just seen the LEFT JOIN
clause in action, but let's break it down a bit more with some examples.
For these examples, we'll use the following fake tables:
Employee
EmployeeID | EmployeeName | DepartmentID |
---|---|---|
1 | Alice | 1 |
2 | Bob | 1 |
3 | Charlie | 2 |
4 | Dave | 2 |
5 | Eve | 3 |
Department
DepartmentID | DepartmentName |
---|---|
1 | Sales |
2 | Marketing |
Address
EmployeeID | Address | FromDate | ToDate |
---|---|---|---|
1 | 1 Main Street | 2001-07-21 | 2002-10-30 |
2 | 2 Rocky Road | 2012-07-04 | 2018-02-11 |
5 | 5 Log Lane | 2009-11-19 | 2020-03-15 |
5 | 6 Claw Close | 2023-03-16 | 2024-12-31 |
Employee LEFT JOIN Department
Suppose we write a query that LEFT JOIN
s the Department
table onto the Employee
table using the DepartmentID
:
SELECT
Employee.EmployeeID,
Employee.EmployeeName,
Employee.DepartmentID,
Department.DepartmentName
FROM Employee
LEFT JOIN Department
ON Employee.DepartmentID = Department.DepartmentID
;
Since the Employee
table is the "base" table (it's in the FROM
clause), let's break down what's happening with each row in this table.
Employee 1
The first row is employee 1:
EmployeeID | EmployeeName | DepartmentID |
---|---|---|
1 | Alice | 1 |
We're joining the Department
table to this on the DepartmentID
column, specifically:
ON Employee.DepartmentID = Department.DepartmentID
This means that the information we get from the Department
table will be where the DepartmentID
in the Department
table matches the DepartmentID
in the Employee
table which, in this case, is 1:
DepartmentID | DepartmentName | |
---|---|---|
→ | 1 | Sales |
2 | Marketing |
That means that we take the Sales
value from this table for the first row of the Employee
table:
EmployeeID | EmployeeName | DepartmentID | DepartmentName |
---|---|---|---|
1 | Alice | 1 | Sales |
This is just like a lookup in Excel!
Employee 2
The second row is employee 2:
EmployeeID | EmployeeName | DepartmentID |
---|---|---|
2 | Bob | 1 |
Bob has the same DepartmentID
as Alice, so the steps for Bob are the same as they were for Alice -- we get the Sales
value from the Department
table:
EmployeeID | EmployeeName | DepartmentID | DepartmentName |
---|---|---|---|
2 | Bob | 1 | Sales |
Employee 3
The third row is employee 3:
EmployeeID | EmployeeName | DepartmentID |
---|---|---|
3 | Charlie | 2 |
This time, the DepartmentID
is 2:
DepartmentID | DepartmentName | |
---|---|---|
1 | Sales | |
→ | 2 | Marketing |
Therefore, we get the Marketing
value from the Department
table for Charlie:
EmployeeID | EmployeeName | DepartmentID | DepartmentName |
---|---|---|---|
3 | Charlie | 2 | Marketing |
Employee 4
The fourth row is employee 4:
EmployeeID | EmployeeName | DepartmentID |
---|---|---|
4 | Dave | 2 |
Dave also has a DepartmentID
of 2, so we take the Marketing
value from the Department
table for Dave:
EmployeeID | EmployeeName | DepartmentID | DepartmentName |
---|---|---|---|
4 | Dave | 2 | Marketing |
Employee 5
The fifth row is employee 5:
EmployeeID | EmployeeName | DepartmentID |
---|---|---|
5 | Eve | 3 |
This is an interesting case. Eve has a DepartmentID
of 3, but there's no DepartmentID
of 3 in the Department
table:
DepartmentID | DepartmentName | |
---|---|---|
1 | Sales | |
2 | Marketing |
This means that we can't find a DepartmentName
for Eve, so we instead get a NULL
value for the DepartmentName
:
EmployeeID | EmployeeName | DepartmentID | DepartmentName |
---|---|---|---|
5 | Eve | 3 | NULL |
Putting it all together
When we put all of these rows together, we get the following result:
EmployeeID | EmployeeName | DepartmentID | DepartmentName |
---|---|---|---|
1 | Alice | 1 | Sales |
2 | Bob | 1 | Sales |
3 | Charlie | 2 | Marketing |
4 | Dave | 2 | Marketing |
5 | Eve | 3 | null |
This is how the LEFT JOIN
clause works -- it grabs whatever data it can from the "joined" table and fills in NULL
values where it can't find a match.
Employee INNER JOIN Department
Now let's look at the INNER JOIN
clause. We'll use the exact same query as the last example, but we'll change the LEFT JOIN
to an INNER JOIN
:
SELECT
Employee.EmployeeID,
Employee.EmployeeName,
Employee.DepartmentID,
Department.DepartmentName
FROM Employee
INNER JOIN Department
ON Employee.DepartmentID = Department.DepartmentID
;
We could find a match for the first four rows, so they are the same as they were in the LEFT JOIN
example:
EmployeeID | EmployeeName | DepartmentID | DepartmentName |
---|---|---|---|
1 | Alice | 1 | Sales |
2 | Bob | 1 | Sales |
3 | Charlie | 2 | Marketing |
4 | Dave | 2 | Marketing |
However, what happens with employee 5 is different: an INNER JOIN
only keeps rows where there's a match!
Since there's no match for employee 5 in the Department
table, we don't get a row for Eve at all! This means that the result of the INNER JOIN
is:
EmployeeID | EmployeeName | DepartmentID | DepartmentName |
---|---|---|---|
1 | Alice | 1 | Sales |
2 | Bob | 1 | Sales |
3 | Charlie | 2 | Marketing |
4 | Dave | 2 | Marketing |
This is the main difference between the LEFT JOIN
and the INNER JOIN
clauses and is what catches a lot of people out when they're new to SQL.
Tip
If you're not sure which join type to use, it's usually best to start with a LEFT JOIN
and then change it to an INNER JOIN
when you're comfortable that you're not missing any data.
Employee LEFT JOIN Address
This time, suppose we write a query that LEFT JOIN
s the Address
table onto the Employee
table using the EmployeeID
:
SELECT
Employee.EmployeeID,
Employee.EmployeeName,
Address.Address,
Address.FromDate,
Address.ToDate
FROM Employee
LEFT JOIN Address
ON Employee.EmployeeID = Address.EmployeeID
;
As with the last example, we'll break down what's happening with each row in the Employee
table. We'll ignore the DepartmentID
column in the Employee
since we're not using it in the query.
Employee 1
The first row is employee 1:
EmployeeID | EmployeeName |
---|---|
1 | Alice |
This time, we're joining the Address
table to this on the EmployeeID
column, specifically:
ON Employee.EmployeeID = Address.EmployeeID
This means that the information we get from the Address
table will be where the EmployeeID
in the Address
table matches the EmployeeID
in the Employee
table which, in this case, is 1:
EmployeeID | Address | FromDate | ToDate | |
---|---|---|---|---|
→ | 1 | 1 Main Street | 2001-07-21 | 2002-10-30 |
2 | 2 Rocky Road | 2012-07-04 | 2018-02-11 | |
5 | 5 Log Lane | 2009-11-19 | 2020-03-15 | |
5 | 6 Claw Close | 2023-03-16 | 2024-12-31 |
That means that we take the 1 Main Street
address (plus the dates) from this table for the first row of the Employee
table:
EmployeeID | EmployeeName | Address | FromDate | ToDate |
---|---|---|---|---|
1 | Alice | 1 Main Street | 2001-07-21 | 2002-10-30 |
Employee 2
The second row is employee 2:
EmployeeID | EmployeeName |
---|---|
2 | Bob |
Bob's EmployeeID
is 2, so we look for the address in the Address
table where the EmployeeID
is 2:
EmployeeID | Address | FromDate | ToDate | |
---|---|---|---|---|
1 | 1 Main Street | 2001-07-21 | 2002-10-30 | |
→ | 2 | 2 Rocky Road | 2012-07-04 | 2018-02-11 |
5 | 5 Log Lane | 2009-11-19 | 2020-03-15 | |
5 | 6 Claw Close | 2023-03-16 | 2024-12-31 |
This means that we get the 2 Rocky Road
address (plus the dates) from the Address
table for Bob:
EmployeeID | EmployeeName | Address | FromDate | ToDate |
---|---|---|---|---|
2 | Bob | 2 Rocky Road | 2012-07-04 | 2018-02-11 |
Employee 3
The third row is employee 3:
EmployeeID | EmployeeName |
---|---|
3 | Charlie |
Charlie's EmployeeID
is 3, so we look for the address in the Address
table where the EmployeeID
is 3:
EmployeeID | Address | FromDate | ToDate | |
---|---|---|---|---|
1 | 1 Main Street | 2001-07-21 | 2002-10-30 | |
2 | 2 Rocky Road | 2012-07-04 | 2018-02-11 | |
5 | 5 Log Lane | 2009-11-19 | 2020-03-15 | |
5 | 6 Claw Close | 2023-03-16 | 2024-12-31 |
Since we can't find a match for Charlie and we're using a LEFT JOIN
, we'll get NULL
values for the Address
columns:
EmployeeID | EmployeeName | Address | FromDate | ToDate |
---|---|---|---|---|
3 | Charlie | null | null | null |
Employee 4
The fourth row is employee 4:
EmployeeID | EmployeeName |
---|---|
4 | Dave |
Dave's EmployeeID
is 4 which also doesn't have a match in the Address
table, so we get NULL
values for the Address
columns:
EmployeeID | EmployeeName | Address | FromDate | ToDate |
---|---|---|---|---|
4 | Dave | null | null | null |
Employee 5
The fifth row is employee 5:
EmployeeID | EmployeeName |
---|---|
5 | Eve |
Eve's EmployeeID
is 5, and they have two addresses in the Address
table:
EmployeeID | Address | FromDate | ToDate | |
---|---|---|---|---|
1 | 1 Main Street | 2001-07-21 | 2002-10-30 | |
2 | 2 Rocky Road | 2012-07-04 | 2018-02-11 | |
→ | 5 | 5 Log Lane | 2009-11-19 | 2020-03-15 |
→ | 5 | 6 Claw Close | 2023-03-16 | 2024-12-31 |
This is another place where SQL is different to Excel. Since we have two matches for Eve in the Address
table, we get two rows in the result -- we keep both!
EmployeeID | EmployeeName | Address | FromDate | ToDate |
---|---|---|---|---|
5 | Eve | 5 Log Lane | 2009-11-19 | 2020-03-15 |
5 | Eve | 6 Claw Close | 2023-03-16 | 2024-12-31 |
Putting it all together
When we put all of these rows together, we get the following result:
EmployeeID | EmployeeName | Address | FromDate | ToDate |
---|---|---|---|---|
1 | Alice | 1 Main Street | 2001-07-21 | 2002-10-30 |
2 | Bob | 2 Rocky Road | 2012-07-04 | 2018-02-11 |
3 | Charlie | null | null | null |
4 | Dave | null | null | null |
5 | Eve | 5 Log Lane | 2009-11-19 | 2020-03-15 |
5 | Eve | 6 Claw Close | 2023-03-16 | 2024-12-31 |
Warning
In Excel, you'd only get one row for Eve (Excel would just keep the first match), but in SQL you get all the matches. This is a common source of confusion for people new to SQL.
However, this behaviour is super useful when used correctly! Just keep an eye out for it.
Employee INNER JOIN Address
To finish off, let's consider the same query but with an INNER JOIN
instead of a LEFT JOIN
:
SELECT
Employee.EmployeeID,
Employee.EmployeeName,
Address.Address,
Address.FromDate,
Address.ToDate
FROM Employee
INNER JOIN Address
ON Employee.EmployeeID = Address.EmployeeID
;
Can you guess what the result will be?
It's the same as the LEFT JOIN
example, but without rows for Charlie and Dave since they don't have a match in the Address
table using this condition:
EmployeeID | EmployeeName | Address | FromDate | ToDate |
---|---|---|---|---|
1 | Alice | 1 Main Street | 2001-07-21 | 2002-10-30 |
2 | Bob | 2 Rocky Road | 2012-07-04 | 2018-02-11 |
5 | Eve | 5 Log Lane | 2009-11-19 | 2020-03-15 |
5 | Eve | 6 Claw Close | 2023-03-16 | 2024-12-31 |
LEFT JOIN
is probably the most common join type
Since the INNER JOIN
will drop rows that don't have a match in the "joined" table, it's usually "safer" to use LEFT JOIN
to make sure that you don't accidentally lose any data during the join.
This is common practice (favouring LEFT
over INNER
), so you'll often see people use LEFT JOIN
unless they have a specific reason to use a different type.
INNER JOIN
is the default join type
If you don't specify LEFT
or INNER
(or any of the others), then SQL will default to an INNER JOIN
:
SELECT
Employee.EmployeeID,
Employee.EmployeeName,
Employee.DepartmentID,
Department.DepartmentName
FROM Employee
JOIN Department /* This will be an `INNER JOIN` */
ON Employee.DepartmentID = Department.DepartmentID
;
Although this is the default, it's always a good idea to be explicit about the join type you're using. This makes the query easier to read and understand, and it can help avoid errors if the tables change in the future.
There can be several joins in a single query
The examples above have just been joining two tables, but you can join as many tables as you like in a single query.
For example, we could combine the Employee
, Department
, and Address
tables in a single query:
SELECT
Employee.EmployeeID,
Employee.EmployeeName,
Employee.DepartmentID,
Department.DepartmentName,
Address.Address,
Address.FromDate,
Address.ToDate
FROM Employee
LEFT JOIN Department
ON Employee.DepartmentID = Department.DepartmentID
LEFT JOIN Address
ON Employee.EmployeeID = Address.EmployeeID
;
EmployeeID | EmployeeName | DepartmentID | DepartmentName | Address | FromDate | ToDate |
---|---|---|---|---|---|---|
1 | Alice | 1 | Sales | 1 Main Street | 2001-07-21 | 2002-10-30 |
2 | Bob | 1 | Sales | 2 Rocky Road | 2012-07-04 | 2018-02-11 |
3 | Charlie | 2 | Marketing | null | null | null |
4 | Dave | 2 | Marketing | null | null | null |
5 | Eve | 3 | null | 5 Log Lane | 2009-11-19 | 2020-03-15 |
5 | Eve | 3 | null | 6 Claw Close | 2020-03-16 | 2024-12-31 |
Notice how, although the Employee
table only has five rows, we've ended up with six because of the (LEFT
) join with the Address
table.
Similarly, can you guess what the output would be if we used INNER JOIN
s instead of LEFT JOIN
s?
SELECT
Employee.EmployeeID,
Employee.EmployeeName,
Employee.DepartmentID,
Department.DepartmentName,
Address.Address,
Address.FromDate,
Address.ToDate
FROM Employee
INNER JOIN Department
ON Employee.DepartmentID = Department.DepartmentID
INNER JOIN Address
ON Employee.EmployeeID = Address.EmployeeID
;
EmployeeID | EmployeeName | DepartmentID | DepartmentName | Address | FromDate | ToDate |
---|---|---|---|---|---|---|
1 | Alice | 1 | Sales | 1 Main Street | 2001-07-21 | 2002-10-30 |
2 | Bob | 1 | Sales | 2 Rocky Road | 2012-07-04 | 2018-02-11 |
We'd only get two rows! We'd lose the rows for Charlie and Dave in the join with the Address
table, and we'd lose the row(s) for Eve in the join with the Department
table.
Info
You can use whichever join types you want for each join, there's no need to use all the same (e.g. all LEFT
join).
Other tips and tricks
The examples above are "typical" examples of joins, but you'll find that you can use joins in a lot of different ways!
You can use any condition in a join
We've just been using the =
operator in the ON
clause, but you can use any condition you like (and as many as you want!).
For example, we could also filter on the dates in the Address
table when we join it to the Employee
table:
SELECT
Employee.EmployeeID,
Employee.EmployeeName,
Address.Address,
Address.FromDate,
Address.ToDate
FROM Employee
INNER JOIN Address
ON Employee.EmployeeID = Address.EmployeeID
AND Address.FromDate >= '2010-01-01'
;
EmployeeID | EmployeeName | Address | FromDate | ToDate |
---|---|---|---|---|
2 | Bob | 2 Rocky Road | 2012-07-04 | 2018-02-11 |
5 | Eve | 6 Claw Close | 2020-03-16 | 2024-12-31 |
Tables can be joined to themselves
There's nothing stopping you from joining a table to itself! This can be useful when you want to compare rows within the same table, but this is pretty rare, so we won't cover it in this course.
The SQL for running these examples
Danger
The data for these examples isn't in the AdventureWorks database that we're using, so it has been created for this section. If you want to run these examples yourself, you can adjust the SQL below. Note that this is using some features that we haven't covered yet!
For the examples above, the rows are created on the fly. You're not expected to understand this yet, but it's provided so that you can run the SQL yourself if you want to.
WITH
Employee AS (
SELECT *
FROM (
VALUES
(1, 'Alice', 1),
(2, 'Bob', 1),
(3, 'Charlie', 2),
(4, 'Dave', 2),
(5, 'Eve', 3)
) AS V(EmployeeID, EmployeeName, DepartmentID)
),
Department AS (
SELECT *
FROM (
VALUES
(1, 'Sales'),
(2, 'Marketing')
) AS V(DepartmentID, DepartmentName)
),
Address AS (
SELECT *
FROM (
VALUES
(1, '1 Main Street', '2001-07-21', '2002-10-30'),
(2, '2 Rocky Road', '2012-07-04', '2018-02-11'),
(5, '5 Log Lane', '2009-11-19', '2020-03-15'),
(5, '6 Claw Close', '2020-03-16', '2024-12-31')
) AS V(EmployeeID, Address, FromDate, ToDate)
)
/* Edit this part */
SELECT
Employee.EmployeeID,
Employee.EmployeeName,
Employee.DepartmentID,
Department.DepartmentName,
Address.Address,
Address.FromDate,
Address.ToDate
FROM Employee
INNER JOIN Department
ON Employee.DepartmentID = Department.DepartmentID
INNER JOIN Address
ON Employee.EmployeeID = Address.EmployeeID
;
Further reading
Check out the official Microsoft documentation for more information on the JOIN
clause at:
The video version of this content is also available at:
Additional join modifiers in other SQL flavours
Microsoft SQL Server has a fairly limited number of join features, but other SQL flavours add loads of additional modifiers to the JOIN
clause.
If you see something that you don't recognise, make sure that you search for it in the documentation for the specific SQL flavour that you see the thing in!
Visual representation of joins
If you're a visual learner, you might find it helpful to read the following article from Atlassian (the company behind Jira and Confluence) which has some great visual representations of the different join types:
For example, their "cheat sheet" is: