The "timestamp" join (ASOF
) ⏱
Success
The ASOF
join is fantastic for joining two tables with mismatched timestamps, where you want to join on the closest timestamp in the right table to the timestamp in the left table.
Syntax
At the time of writing (2024-04-07), the syntax for the ASOF
join changes depending on the database engine you're using. However, the general idea is the same across all engines: you're joining two tables based on the closest timestamp in the right table to the timestamp in the left table.
DuckDB
Note
The explanation below is adjusted from the DuckDB documentation.
In DuckDB, the ASOF
keyword is a join modifier just like NATURAL
, which means that you can use it with any join type (left
, inner
, etc.).
The ASOF
join requires at least one inequality condition on the ordering field. The inequality can be any inequality condition (>=
, >
, <=
, <
) on any data type, but the most common form is >=
on a temporal type. Any other conditions must be equalities (or NOT DISTINCT
). This means that the left/right order of the tables is significant.
SELECT *
FROM left_table
ASOF [LEFT | INNER | ...] JOIN right_table
ON left_table.id_column = right_table.id_column
AND left_table.timestamp_column >= right_table.timestamp_column
ASOF
joins can also specify join conditions on matching column names with the USING
syntax, but the last attribute in the list must be the inequality, which will be greater than or equal to (>=
):
SELECT *
FROM left_table
ASOF [LEFT | INNER | ...] JOIN right_table
USING (id_column, timestamp_column)
Snowflake
Snowflake has a more unique syntax for the ASOF
join. Rather than be a join modifier, Snowflake treats it as a new join type and introduces the MATCH_CONDITION
clause:
SELECT *
FROM left_table
ASOF JOIN right_table
MATCH_CONDITION (left_table.timestamp_column >= right_table.timestamp_column)
ON left_table.id_column = right_table.id_column
It has an added restriction that the MATCH_CONDITION
clause must reference the left column first, then the right column. This isn't a typical for join conditions, hence calling it out here.
Availability
At the time of writing (2024-04-07), the ASOF
join has the following availability:
- DuckDB: ✅ (>=0.6)
- SQLite: ❌
- PostgreSQL: ❌
- SQL Server: ❌
- Snowflake: ✅ (February 28, 2024)
Are you aware of any other databases that support the ASOF
join?
Examples
Sample data
Suppose we have two tables, transactions
and exchange_rates
, with the following data:
transactions
date | account | amount | currency |
---|---|---|---|
2023-12-31 | A | 100.00 | GBP |
2024-01-01 | B | 200.00 | GBP |
2024-01-02 | C | 300.00 | GBP |
2024-01-04 | A | 400.00 | GBP |
2024-01-07 | B | 500.00 | GBP |
2024-01-07 | C | 600.00 | GBP |
2024-01-08 | A | 700.00 | GBP |
2024-01-10 | B | 800.00 | GBP |
2024-01-13 | C | 900.00 | GBP |
2024-01-13 | A | 1000.00 | GBP |
2024-01-17 | B | 1100.00 | GBP |
2024-01-17 | C | 1200.00 | GBP |
2024-01-18 | A | 1300.00 | GBP |
2024-01-20 | B | 1400.00 | GBP |
2024-01-23 | C | 1500.00 | GBP |
2024-01-23 | A | 1600.00 | GBP |
2024-01-27 | B | 1700.00 | GBP |
2024-01-27 | C | 1800.00 | GBP |
2024-01-28 | A | 1900.00 | GBP |
2024-01-30 | B | 2000.00 | GBP |
2024-02-02 | C | 2100.00 | GBP |
2024-02-02 | A | 2200.00 | GBP |
2024-02-06 | B | 2300.00 | GBP |
2024-02-06 | C | 2400.00 | GBP |
Expand for the object DDL
create or replace table transactions(
"date" date,
account varchar(8),
amount decimal(10, 2),
currency varchar(3),
constraint pk__transactions primary key ("date", account),
);
insert into transactions
values
('2023-12-31', 'A', 100, 'GBP'),
('2024-01-01', 'B', 200, 'GBP'),
('2024-01-02', 'C', 300, 'GBP'),
('2024-01-04', 'A', 400, 'GBP'),
('2024-01-07', 'B', 500, 'GBP'),
('2024-01-07', 'C', 600, 'GBP'),
('2024-01-08', 'A', 700, 'GBP'),
('2024-01-10', 'B', 800, 'GBP'),
('2024-01-13', 'C', 900, 'GBP'),
('2024-01-13', 'A', 1000, 'GBP'),
('2024-01-17', 'B', 1100, 'GBP'),
('2024-01-17', 'C', 1200, 'GBP'),
('2024-01-18', 'A', 1300, 'GBP'),
('2024-01-20', 'B', 1400, 'GBP'),
('2024-01-23', 'C', 1500, 'GBP'),
('2024-01-23', 'A', 1600, 'GBP'),
('2024-01-27', 'B', 1700, 'GBP'),
('2024-01-27', 'C', 1800, 'GBP'),
('2024-01-28', 'A', 1900, 'GBP'),
('2024-01-30', 'B', 2000, 'GBP'),
('2024-02-02', 'C', 2100, 'GBP'),
('2024-02-02', 'A', 2200, 'GBP'),
('2024-02-06', 'B', 2300, 'GBP'),
('2024-02-06', 'C', 2400, 'GBP'),
;
exchange_rates
date | from_currency | to_currency | rate |
---|---|---|---|
2024-01-01 | GBP | INR | 110.0000 |
2024-01-01 | GBP | JPY | 160.0000 |
2024-01-01 | GBP | USD | 1.3000 |
2024-02-01 | GBP | INR | 120.0000 |
2024-02-01 | GBP | JPY | 170.0000 |
2024-02-01 | GBP | USD | 1.4000 |
2024-03-01 | GBP | INR | 100.0000 |
2024-03-01 | GBP | JPY | 150.0000 |
2024-03-01 | GBP | USD | 1.2000 |
Expand for the object DDL
create or replace table exchange_rates(
"date" date,
from_currency varchar(3),
to_currency varchar(3),
rate decimal(10, 4),
constraint pk__exchange_rates primary key ("date", from_currency, to_currency),
);
insert into exchange_rates
values
('2024-01-01', 'GBP', 'INR', 110.0),
('2024-01-01', 'GBP', 'JPY', 160.0),
('2024-01-01', 'GBP', 'USD', 1.3),
('2024-02-01', 'GBP', 'INR', 120.0),
('2024-02-01', 'GBP', 'JPY', 170.0),
('2024-02-01', 'GBP', 'USD', 1.4),
('2024-03-01', 'GBP', 'INR', 100.0),
('2024-03-01', 'GBP', 'JPY', 150.0),
('2024-03-01', 'GBP', 'USD', 1.2),
;
Finding the USD amount for each transaction (DuckDB)
Solution
To find the USD amount for each transaction, we can use the ASOF
join to find the closest exchange rate to the transaction date:
select
transactions.date,
transactions.account,
transactions.amount,
exchange_rates.rate,
transactions.amount * exchange_rates.rate as amount_usd,
from transactions
asof left join exchange_rates
on transactions.currency = exchange_rates.from_currency
and exchange_rates.to_currency = 'USD'
and transactions.date >= exchange_rates.date
order by
transactions.date,
transactions.amount
date | account | amount | rate | amount_usd |
---|---|---|---|---|
2023-12-31 | A | 100.00 | null | null |
2024-01-01 | B | 200.00 | 1.3000 | 260.000000 |
2024-01-02 | C | 300.00 | 1.3000 | 390.000000 |
2024-01-04 | A | 400.00 | 1.3000 | 520.000000 |
2024-01-07 | B | 500.00 | 1.3000 | 650.000000 |
2024-01-07 | C | 600.00 | 1.3000 | 780.000000 |
2024-01-08 | A | 700.00 | 1.3000 | 910.000000 |
2024-01-10 | B | 800.00 | 1.3000 | 1040.000000 |
2024-01-13 | C | 900.00 | 1.3000 | 1170.000000 |
2024-01-13 | A | 1000.00 | 1.3000 | 1300.000000 |
2024-01-17 | B | 1100.00 | 1.3000 | 1430.000000 |
2024-01-17 | C | 1200.00 | 1.3000 | 1560.000000 |
2024-01-18 | A | 1300.00 | 1.3000 | 1690.000000 |
2024-01-20 | B | 1400.00 | 1.3000 | 1820.000000 |
2024-01-23 | C | 1500.00 | 1.3000 | 1950.000000 |
2024-01-23 | A | 1600.00 | 1.3000 | 2080.000000 |
2024-01-27 | B | 1700.00 | 1.3000 | 2210.000000 |
2024-01-27 | C | 1800.00 | 1.3000 | 2340.000000 |
2024-01-28 | A | 1900.00 | 1.3000 | 2470.000000 |
2024-01-30 | B | 2000.00 | 1.3000 | 2600.000000 |
2024-02-02 | C | 2100.00 | 1.4000 | 2940.000000 |
2024-02-02 | A | 2200.00 | 1.4000 | 3080.000000 |
2024-02-06 | B | 2300.00 | 1.4000 | 3220.000000 |
2024-02-06 | C | 2400.00 | 1.4000 | 3360.000000 |
Note the NULL
values for the first transaction -- this is because there is no exchange rate for the date of that transaction in our exchange_rates
table, and we specified ASOF LEFT JOIN
so that values without matches are still kept in the result set (like a normal LEFT JOIN
!).
Exercise
Can you re-write this query using the Snowflake syntax?
Expand for the Snowflake equivalent
select
transactions.date,
transactions.account,
transactions.amount,
exchange_rates.rate,
transactions.amount * exchange_rates.rate as amount_usd,
from transactions
asof join exchange_rates
match_condition (transactions.date >= exchange_rates.date)
on transactions.currency = exchange_rates.from_currency
/* Currently not allowed in `ASOF` join conditions */
where coalesce(exchange_rates.to_currency, 'USD') = 'USD'
order by
transactions.date,
transactions.amount
"Traditional" solutions
For comparison, here are a few solutions that illustrate how you might solve this problem without the ASOF
join.
Using a lateral join
If your database supports lateral "joins" (like DuckDB), you can use a lateral join to find the closest exchange rate to the transaction date:
select
transactions.date,
transactions.account,
transactions.amount,
rates.rate,
transactions.amount * rates.rate as amount_usd,
from transactions,
lateral (
select rate
from exchange_rates
where 1=1
and transactions.currency = exchange_rates.from_currency
and exchange_rates.to_currency = 'USD'
and exchange_rates.date <= transactions.date
order by exchange_rates.date desc
limit 1
) as rates
order by
transactions.date,
transactions.amount
Note that this approach would drop the first transaction from the result set because there is no exchange rate for the date of that transaction in our exchange_rates
table. We'll "fix" this in the next example.
Using a left lateral join
To keep the first transaction in the result set when using lateral, we can move the lateral subquery to a left join:
select
transactions.date,
transactions.account,
transactions.amount,
rates.rate,
transactions.amount * rates.rate as amount_usd,
from transactions
left join lateral (
select rate
from exchange_rates
where 1=1
and transactions.currency = exchange_rates.from_currency
and exchange_rates.to_currency = 'USD'
and exchange_rates.date <= transactions.date
order by exchange_rates.date desc
limit 1
) as rates on 1=1
order by
transactions.date,
transactions.amount
Using left join and qualify
If your database doesn't have ASOF
or lateral joins, you can use a left join and the QUALIFY
clause to find the closest exchange rate to the transaction date:
select
transactions.date,
transactions.account,
transactions.amount,
exchange_rates.rate,
transactions.amount * exchange_rates.rate as amount_usd,
from transactions
left join exchange_rates
on transactions.currency = exchange_rates.from_currency
and exchange_rates.to_currency = 'USD'
and exchange_rates.date <= transactions.date
qualify 1 = row_number() over (
partition by transactions.date, transactions.account
order by exchange_rates.date desc
)
order by
transactions.date,
transactions.amount
If you're using a database that doesn't support the QUALIFY
clause, you can wrap the query in a subquery with the ROW_NUMBER()
calculation saved to a column and filter on the column in the outer query.
Using a left join and correlated subquery
If your database doesn't have ASOF
joins, lateral joins, or the QUALIFY
clause (and you don't want to use ROW_NUMBER()
in a subquery), you can use a correlated subquery:
select
transactions.date,
transactions.account,
transactions.amount,
exchange_rates.rate,
transactions.amount * exchange_rates.rate as amount_usd
from transactions
left join exchange_rates
on transactions.currency = exchange_rates.from_currency
and exchange_rates.to_currency = 'USD'
and exchange_rates.date <= transactions.date
where 0=1
or exchange_rates.date is null
or exchange_rates.date = (
select max("date")
from exchange_rates as rates_inner
where 1=1
and exchange_rates.from_currency = rates_inner.from_currency
and exchange_rates.to_currency = rates_inner.to_currency
and rates_inner.date <= transactions.date
)
order by
transactions.date,
transactions.amount
Note that, since this specific example, is only using the rate
column from the exchange_rates
table, we could have evaluated the subquery in the SELECT
clause instead of the FROM
clause. However, this would not work if we needed to use more than one column from the exchange_rates
table, hence sticking to approaches that cater for multiple columns.
Wrap up
Like with most SQL problems, there are multiple ways to get the output that we want. The ASOF
join is just a great way to solve this particular problem, and it significantly reduces the complexity of our SQL code!
Note that, also like with most SQL problems, the performance of these queries will depend on the size of the tables and the indexes available. If you're working with large tables, you will need to consider the performance implications of the ASOF
approach compared to more "traditional" approaches.