Skip to content

The "explosive" joins (LATERAL) 💥

Success

There are a few implementations of the LATERAL join in SQL databases (and not all are called "lateral"!), but they all have the same basic idea: generalising correlated subqueries to explode rows into more rows.

Syntax

The syntax for lateral "joins" is different across databases, and some databases don't even call it "lateral".

In all cases, the lateral join is a generalisation of a correlated subquery, so make sure you have a grasp of those before trying to wrap your head around these!

Tip

Check out the following page for information on correlated subqueries:

In particular, the lateral join "runs" the subquery for each row in the left table. The advantage of lateral joins is that we can get multiple columns and multiple rows from the subquery -- hence the "explosive" moniker!

We'll note the availability of the LATERAL keyword (and equivalents), then jump into the database-specific syntax, and finally provide some examples.

Availability

At the time of writing (2024-04-07), the LATERAL join has the following availability:

  • DuckDB: ✅ (>=0.9)
  • SQLite: ❌ (but does support normal correlated subqueries)
  • PostgreSQL: ✅ (>=9.3)
  • SQL Server: ✅ (but as CROSS APPLY)
  • Snowflake: ✅

Info

For more information on lateral joins, particularly in other databases, see the following slides:

DuckDB, PostgreSQL, Snowflake

Each of DuckDB, PostgreSQL, and Snowflake use the same syntax for the lateral join.

Although we call it a "join", the LATERAL keyword is not used instead of LEFT, INNER, CROSS, etc. Instead, we specify it in front of a correlated subquery.

Since it's technically not a join, it's common to see people write lateral joins using non-ANSI join syntax:

SELECT *
FROM left_table, LATERAL (<expr>)

However, the ANSI syntax is also available:

SELECT *
FROM left_table
    [CROSS | LEFT] JOIN LATERAL (<expr>)

Note that, contrary to normal joins, using a lateral join with the non-ANSI syntax or in a CROSS JOIN will drop rows that don't have a result from the subquery! This is why you'll often see LEFT JOIN LATERAL in practice.

In particular, since the "join condition" is written inside the correlated subquery (it wouldn't be correlated otherwise!), the left lateral joins usually have a dummy true condition in the ON clause. Common choices are 1=1 or true:

SELECT *
FROM left_table
    LEFT JOIN LATERAL (<expr>) ON TRUE

Warning

At the time of writing (2024-04-13), Snowflake does not support LEFT JOIN LATERAL and will throw an error if you try to use it.

SQL Server

Instead of LATERAL, SQL Server provides APPLY as an alternative to JOIN to implement the same concept:

The syntax is similar to the (ANSI) syntax for normal joins:

SELECT *
FROM left_table
    [CROSS | OUTER] APPLY (<expr>)

The CROSS APPLY corresponds to the "normal" lateral join which drops rows without a result from the subquery, while the OUTER APPLY corresponds to the LEFT JOIN LATERAL.

Tip

In SQL Server, APPLY is typically used to "apply" a table-valued function to each row of the left table (hence the name of the keyword APPLY).

Examples

It's fairly rare to need lateral joins, but they can be a lifesaver when you do!

For the examples below, we'll use a customers table and an events table to demonstrate the lateral join.

Info

This example is inspired by the following:

Sample data

Suppose we have two tables, customers and events, with the following data:

customers

customer_id name last_review_datetime
1 alice 2023-12-15 03:20:00
2 bob 2024-01-24 03:30:00
3 charlie 2024-02-01 04:10:00
Expand for the object DDL
create table customers (
    customer_id integer primary key,
    name varchar,
    last_review_datetime datetime
);
insert into customers (customer_id, name, last_review_datetime)
    values
        (1, 'alice', '2023-12-15 03:20:00'),
        (2, 'bob', '2024-01-24 03:30:00'),
        (3, 'charlie', '2024-02-01 04:10:00')
;

events

event_id customer_id event_datetime event_type
1 1 2024-01-01 11:00:00 login
2 1 2024-01-01 12:00:00 logout
3 1 2024-01-03 03:00:00 login failed
4 1 2024-01-03 03:05:00 login
5 2 2024-01-03 10:00:00 login
6 2 2024-01-03 15:00:00 logout
7 1 2024-01-03 23:00:00 logout
8 2 2024-01-04 22:00:00 login failed
9 2 2024-01-04 22:05:00 login
10 3 2024-01-05 20:00:00 login
11 3 2024-01-06 04:00:00 logout
12 2 2024-01-09 15:00:00 logout
13 3 2024-01-11 21:00:00 login
14 1 2024-01-12 12:00:00 login
15 1 2024-01-12 13:00:00 logout
16 1 2024-01-13 03:00:00 login
17 2 2024-01-13 10:00:00 login failed
18 2 2024-01-13 10:05:00 login
19 2 2024-01-13 15:00:00 logout
20 1 2024-01-13 23:00:00 logout
21 2 2024-01-14 22:00:00 login
22 3 2024-01-15 20:00:00 login
23 3 2024-01-16 04:00:00 logout
24 2 2024-01-19 15:00:00 logout
25 3 2024-01-21 21:00:00 login
26 1 2024-01-22 12:00:00 login failed
27 1 2024-01-22 12:05:00 password reset
28 1 2024-01-22 12:10:00 login
29 1 2024-01-22 13:00:00 logout
30 1 2024-01-23 03:00:00 login
31 2 2024-01-23 10:00:00 login
32 2 2024-01-23 15:00:00 logout
33 1 2024-01-23 23:00:00 logout
34 2 2024-01-24 22:00:00 login
35 3 2024-01-25 20:00:00 login
36 3 2024-01-26 04:00:00 logout
37 2 2024-01-29 15:00:00 logout
38 3 2024-01-31 21:00:00 login
Expand for the object DDL
create table events (
    event_id integer primary key,
    customer_id integer,
    event_datetime datetime,
    event_type varchar
);
insert into events (event_id, customer_id, event_datetime, event_type)
    values
        (1, 1, '2024-01-01 11:00:00', 'login'),
        (2, 1, '2024-01-01 12:00:00', 'logout'),
        (3, 1, '2024-01-03 03:00:00', 'login failed'),
        (4, 1, '2024-01-03 03:05:00', 'login'),
        (5, 2, '2024-01-03 10:00:00', 'login'),
        (6, 2, '2024-01-03 15:00:00', 'logout'),
        (7, 1, '2024-01-03 23:00:00', 'logout'),
        (8, 2, '2024-01-04 22:00:00', 'login failed'),
        (9, 2, '2024-01-04 22:05:00', 'login'),
        (10, 3, '2024-01-05 20:00:00', 'login'),
        (11, 3, '2024-01-06 04:00:00', 'logout'),
        (12, 2, '2024-01-09 15:00:00', 'logout'),
        (13, 3, '2024-01-11 21:00:00', 'login'),
        (14, 1, '2024-01-12 12:00:00', 'login'),
        (15, 1, '2024-01-12 13:00:00', 'logout'),
        (16, 1, '2024-01-13 03:00:00', 'login'),
        (17, 2, '2024-01-13 10:00:00', 'login failed'),
        (18, 2, '2024-01-13 10:05:00', 'login'),
        (19, 2, '2024-01-13 15:00:00', 'logout'),
        (20, 1, '2024-01-13 23:00:00', 'logout'),
        (21, 2, '2024-01-14 22:00:00', 'login'),
        (22, 3, '2024-01-15 20:00:00', 'login'),
        (23, 3, '2024-01-16 04:00:00', 'logout'),
        (24, 2, '2024-01-19 15:00:00', 'logout'),
        (25, 3, '2024-01-21 21:00:00', 'login'),
        (26, 1, '2024-01-22 12:00:00', 'login failed'),
        (27, 1, '2024-01-22 12:05:00', 'password reset'),
        (28, 1, '2024-01-22 12:10:00', 'login'),
        (29, 1, '2024-01-22 13:00:00', 'logout'),
        (30, 1, '2024-01-23 03:00:00', 'login'),
        (31, 2, '2024-01-23 10:00:00', 'login'),
        (32, 2, '2024-01-23 15:00:00', 'logout'),
        (33, 1, '2024-01-23 23:00:00', 'logout'),
        (34, 2, '2024-01-24 22:00:00', 'login'),
        (35, 3, '2024-01-25 20:00:00', 'login'),
        (36, 3, '2024-01-26 04:00:00', 'logout'),
        (37, 2, '2024-01-29 15:00:00', 'logout'),
        (38, 3, '2024-01-31 21:00:00', 'login')
;

The following three events after each customer's last review

To illustrate the lateral join, we'll find the next three events after each customer's last review. If the customer has no events after their last review, we'll have NULL values for the event columns.

Using lateral joins makes sense for this problem because we need to:

  • Compare information between the customers and events tables
  • Limit the number of events kept per customer
  • "Explode" the customer table for each matching row in the events table

SQLite

Since SQLite doesn't support lateral joins, we'll show how to solve this problem without using a lateral join:

with events_since_review as (
    select
        events.event_id,
        events.event_datetime,
        events.event_type,
        events.customer_id,
        row_number() over (
            partition by customer_id
            order by event_datetime
        ) as event_number
    from events
        inner join customers
            using (customer_id)
    where events.event_datetime > customers.last_review_datetime
)

select
    customers.customer_id,
    customers.name as customer_name,
    events_since_review.event_id,
    events_since_review.event_datetime,
    events_since_review.event_type
from customers
    left join events_since_review
        on  customers.customer_id = events_since_review.customer_id
        and events_since_review.event_number <= 3
order by
    customers.customer_id,
    events_since_review.event_datetime
customer_id customer_name last_review_datetime event_id event_datetime event_type
1 alice 2023-12-15 03:20:00 1 2024-01-01 11:00:00 login
1 alice 2023-12-15 03:20:00 2 2024-01-01 12:00:00 logout
1 alice 2023-12-15 03:20:00 3 2024-01-03 03:00:00 login failed
2 bob 2024-01-24 03:30:00 34 2024-01-24 22:00:00 login
2 bob 2024-01-24 03:30:00 37 2024-01-29 15:00:00 logout
3 charlie 2024-02-01 04:10:00 null null null

To write a query that meets the requirements, we need to:

  1. Join the customers table to the events table so that we can compare the last review date with the event date
  2. Use the ROW_NUMBER() window function to number the events for each customer so that we can limit the number of events to three
  3. Join the results of the previous two steps back onto the customers table to account for customers who have no events after their last review

We'll see that the lateral join makes this query a bit simpler.

DuckDB and PostgreSQL

For DuckDB and PostgreSQL, we can use the lateral join to achieve the same result:

select
    customers.customer_id,
    customers.name as customer_name,
    customers.last_review_datetime,
    events_since_review.event_id,
    events_since_review.event_datetime,
    events_since_review.event_type
from customers
    left join lateral (
        select *
        from events
        where 1=1
            and customers.customer_id = events.customer_id
            and customers.last_review_datetime < events.event_datetime
        order by events.event_datetime
        limit 3
    ) as events_since_review on 1=1
order by
    customers.customer_id,
    events_since_review.event_datetime

We do the relevant filtering in the subquery, which is run for each row in the customers table. The left join ensures that we don't drop any customers, and the limit 3 is applied per customer to keep at most three events per customer.

Question

Given the Snowflake doesn't (currently) support LEFT JOIN LATERAL, how would you solve this for Snowflake?

SQL Server

The SQL Server version would be very similar to the DuckDB and PostgreSQL versions, but with OUTER APPLY instead of LEFT JOIN LATERAL (and other minor tweaks for SQL Server syntax):

select
    customers.customer_id,
    customers.name as customer_name,
    customers.last_review_datetime,
    events_since_review.event_id,
    events_since_review.event_datetime,
    events_since_review.event_type
from customers
    outer apply (
        select top 3 *
        from events
        where 1=1
            and customers.customer_id = events.customer_id
            and customers.last_review_datetime < events.event_datetime
        order by events.event_datetime
    ) as events_since_review
order by
    customers.customer_id,
    events_since_review.event_datetime

Wrap up

Given that subqueries are one of SQL's most powerful features, lateral joins are a natural extension of that power. They're not often needed, but when you do need them, they can make your life much easier!