Skip to content

Subqueries ๐Ÿงฉ

Success

Subqueries are undoubtedly one of the most powerful features in SQL -- but with great power comes great responsibility.

Warning

Excel doesn't really have an equivalent to subqueries (without using advanced Excel features). Be warned that this therefore may be a bit tricky to get used to!

Subqueries are "nested" queries

Subqueries are one of the awesome things about SQL. They are queries "nested" within other queries.

This allows you to do some pretty cool things, but is also easy to abuse! Make sure you don't go overboard with subqueries.

The most common place to use a subquery is in the FROM clause to use the output of another query as if it were a table:

SELECT *
FROM (
    SELECT
        BusinessEntityID AS ID,
        FirstName AS Forename,
        LastName AS Surname
    FROM Person.Person
    WHERE BusinessEntityID <= 5
) AS People
WHERE Forename IN ('Ken', 'Rob')
;
ID Forename Surname
1 Ken Sรกnchez
4 Rob Walters

In the example above, the subquery is the query inside the parentheses. This subquery is used as if it were a table in the main/outer query.

To help understand the example, you first need to know what the output of the subquery is:

SELECT
    BusinessEntityID AS ID,
    FirstName AS Forename,
    LastName AS Surname
FROM Person.Person
WHERE BusinessEntityID <= 5
;
ID Forename Surname
1 Ken Sรกnchez
2 Terri Duffy
3 Roberto Tamburello
4 Rob Walters
5 Gail Erickson

When we use this in the FROM clause, we're pretending that it's like a table that already exists -- just this table only has five rows, and the columns are called ID, Forename, and Surname.

Since the subquery is used as if it were a table, we need to use the column names as they are in the subquery. This is why we use Forename in the outer WHERE clause (WHERE Forename IN ('Ken', 'Rob')) instead of FirstName.

Note

When you use a subquery in the FROM clause in Microsoft SQL Server, you need to give it an alias. This is why we have AS People at the end of the subquery.

This is not the case in all SQL flavours, but it's a good habit to get into.

Subqueries are good for using calculated columns

One of the reasons we like to use subqueries is to use calculated columns in places that we can't use them directly.

We've seen that if we want to use a calculated column in places like the WHERE and GROUP BY clauses, we need to use the same calculation in the SELECT clause:

SELECT
    FORMAT(OrderDate, 'yyyy-MM') AS OrderMonth,
    SUM(TotalDue) AS TotalSales
FROM Sales.SalesOrderHeader
WHERE FORMAT(OrderDate, 'yyyy-MM') IN ('2013-01', '2013-02', '2013-03')
GROUP BY FORMAT(OrderDate, 'yyyy-MM')
ORDER BY OrderMonth
;

If we instead do the calculation in a subquery, we can use the calculated column in the WHERE and GROUP BY clauses without having to repeat the calculation:

SELECT
    OrderMonth,
    SUM(TotalDue) AS TotalSales
FROM (
    SELECT
        FORMAT(OrderDate, 'yyyy-MM') AS OrderMonth,
        TotalDue
    FROM Sales.SalesOrderHeader
) AS Orders
WHERE OrderMonth IN ('2013-01', '2013-02', '2013-03')
GROUP BY OrderMonth
ORDER BY OrderMonth
;

This isn't super helpful in the example above, but it would be in cases where the calculation is more complex.

Subqueries can be used in other places too

Although the most common place to use subqueries is in the FROM clause, they can also be used in other places like the WHERE and SELECT clauses (plus others).

However, there are different rules for using subqueries in these places: when we use the subquery in the FROM clause, we return an entire table; when we use the subquery in the WHERE or SELECT clauses, we return a single value.

We'll see advanced examples of these in the correlated subqueries section.

Subqueries in the WHERE clause

A good use of a subquery in the WHERE clause is to check if a value meets some condition relative to the rest of the values.

For example, the following query returns the sales orders whose total due is greater than the average total due for the table:

SELECT TOP 5
    SalesOrderID,
    OrderDate,
    TotalDue
FROM Sales.SalesOrderHeader
WHERE TotalDue > (
    /* This produces a single value that we can compare against */
    SELECT AVG(TotalDue)
    FROM Sales.SalesOrderHeader
)
;
SalesOrderID OrderDate TotalDue
43659 2011-05-31 00:00:00.000 23153.2339
43661 2011-05-31 00:00:00.000 36865.8012
43662 2011-05-31 00:00:00.000 32474.9324
43664 2011-05-31 00:00:00.000 27510.4109
43665 2011-05-31 00:00:00.000 16158.6961

Subqueries in the SELECT clause

Similarly, we can use subqueries in the SELECT clause to keep track of some aggregate value for each row.

For example, the following query returns the sales orders and the average total due for the table:

SELECT TOP 5
    SalesOrderID,
    OrderDate,
    TotalDue,
    (
        /* This produces a single value that we can use in a column */
        SELECT AVG(TotalDue)
        FROM Sales.SalesOrderHeader
    ) AS AverageTotalDue
FROM Sales.SalesOrderHeader
;
SalesOrderID OrderDate TotalDue AverageTotalDue
43659 2011-05-31 00:00:00.000 23153.2339 3915.9951
43660 2011-05-31 00:00:00.000 1457.3288 3915.9951
43661 2011-05-31 00:00:00.000 36865.8012 3915.9951
43662 2011-05-31 00:00:00.000 32474.9324 3915.9951
43663 2011-05-31 00:00:00.000 472.3108 3915.9951

Info

We'll see another way to do this example in the window functions section, but it's good to know that subqueries can be used in this way too.

Common tables expressions (CTEs) are another flavour of subquery

Rather than using subqueries directly inside the SELECT statements, SQL allows us to save the subquery as a "common table expression" (CTE) and then use the CTE in the SELECT statements.

To define a CTE, we use the WITH keyword followed by the name of the CTE and the subquery in parentheses. You can specify multiple CTEs after the WITH text by separating them with commas.

For example, we saw the following query earlier:

SELECT
    OrderMonth,
    SUM(TotalDue) AS TotalSales
FROM (
    SELECT
        FORMAT(OrderDate, 'yyyy-MM') AS OrderMonth,
        TotalDue
    FROM Sales.SalesOrderHeader
) AS Orders
WHERE OrderMonth IN ('2013-01', '2013-02', '2013-03')
GROUP BY OrderMonth
ORDER BY OrderMonth
;

We could instead save the Orders subquery as a CTE and then use the CTE in the FROM clause:

WITH Orders AS (
    SELECT
        FORMAT(OrderDate, 'yyyy-MM') AS OrderMonth,
        TotalDue
    FROM Sales.SalesOrderHeader
)

SELECT
    OrderMonth,
    SUM(TotalDue) AS TotalSales
FROM Orders
WHERE OrderMonth IN ('2013-01', '2013-02', '2013-03')
GROUP BY OrderMonth
ORDER BY OrderMonth
;

Tip

In general, using CTEs (versus using subqueries directly) is a good habit to get into as it makes your queries easier to read and understand.

Warning

Subqueries defined as CTEs are always treated as if they were tables, so although it was an easy "lift-and-shift" for the subquery in the FROM clause, it wouldn't be as simple for subqueries in the WHERE and SELECT clauses.

Further reading

Check out the official Microsoft documentation for more information on subqueries at:

The video version of this content is also available at: