Temperature anomaly detection 🔍

Scenario

Some scientists are studying temperature data from various sites.

They are interested in identifying temperature readings that are significantly higher than the surrounding readings.

Question

Given the temperature data below, select the temperature readings that are at least 10% higher than the average of the previous 2 and following 2 readings for the same site.

Do not include the current reading in the average calculation, and use the calculated average temperature as the denominator for the 10% calculation.

If there are fewer than 2 readings before or 2 after the current reading, do not include the reading in the output.

The output should only have the temperature readings above the threshold, with the columns:

site_id
reading_datetime
temperature
average_temperature as the average of the 4 readings around the current reading (2 each side), rounded to 4 decimal places
percentage_increase as the percentage increase of the current reading over the average_temperature, rounded to 4 decimal places

Order the output by site_id and reading_datetime.

Expand for the DDL

create table readings (
    site_id          integer,
    reading_datetime timestamp,
    temperature      decimal(5, 2) not null,
    primary key (site_id, reading_datetime)
);
insert into readings
values
    (1, '2021-01-01 03:26:23', 20.02),
    (1, '2021-01-01 19:52:46', 20.17),
    (1, '2021-01-02 02:01:17', 22.43),
    (1, '2021-01-02 21:02:34', 19.91),
    (1, '2021-01-03 04:12:56', 20.11),
    (1, '2021-01-03 20:23:12', 20.22),
    (1, '2021-01-04 05:34:23', 20.04),
    (1, '2021-01-04 21:45:34', 22.69),
    (1, '2021-01-05 06:56:45', 20.50),
    (1, '2021-01-05 22:07:56', 20.32),
    (1, '2021-01-06 07:18:07', 20.17),
    (1, '2021-01-06 23:29:18', 23.58),
    (2, '2021-01-01 04:30:10', 21.52),
    (2, '2021-01-01 17:12:25', 21.48),
    (2, '2021-01-02 01:59:43', 23.10),
    (2, '2021-01-02 20:05:53', 18.19),
    (2, '2021-01-03 03:17:12', 21.23),
    (2, '2021-01-03 19:25:20', 21.27),
    (2, '2021-01-04 04:33:34', 21.51),
    (2, '2021-01-04 20:41:45', 21.49),
    (2, '2021-01-05 05:49:56', 21.52),
    (2, '2021-01-05 21:58:07', 21.48),
    (2, '2021-01-06 07:06:18', 21.50),
    (2, '2021-01-06 23:14:29', 21.52)
;

The solution can be found at:

temperature-anomaly-detection.md

Sample input

site_id	reading_datetime	temperature
1	2021-06-01 02:12:31	26.17
1	2021-06-01 21:17:12	26.32
1	2021-06-02 01:19:56	29.58
1	2021-06-02 19:35:32	27.06
1	2021-06-03 03:14:53	26.26
1	2021-06-03 20:47:42	28.37

with readings(site_id, reading_datetime, temperature) as (
    values
        (1, '2021-06-01 02:12:31'::timestamp, 26.17),
        (1, '2021-06-01 21:17:12'::timestamp, 26.32),
        (1, '2021-06-02 01:19:56'::timestamp, 29.58),
        (1, '2021-06-02 19:35:32'::timestamp, 27.06),
        (1, '2021-06-03 03:14:53'::timestamp, 26.26),
        (1, '2021-06-03 20:47:42'::timestamp, 28.37)
)

Sample output

site_id	reading_datetime	temperature	average_temperature	percentage_increase
1	2021-06-02 01:19:56	29.58	26.4525	11.8231

solution(site_id, reading_datetime, temperature, average_temperature, percentage_increase) as (
    values
        (1, '2021-06-02 01:19:56'::timestamp, 29.58, 26.4525, 11.8231)
)

Hint 1

Use a window function (or two!) to calculate the average temperature of the surrounding readings.

Hint 2

Use another window function to identify rows with at least 4 surrounding readings (2 before and 2 after).