Skip to content

Suspicious login activity 🤔

Scenario

A company is investigating suspicious login activity on their platform.

Consecutive failed login attempts are considered suspicious once they reach a certain threshold, and the company wants to identify users who have reached this threshold.

Their platform logs login, logout, and login failed events for each user.

Question

For the events below, identify the users who have login failed events at least five times in a row.

Keep only the user ID and their greatest number of consecutive failed login attempts.

The output should have a row for each user who meets this criterion, with the columns:

  • user_id
  • consecutive_failures as the greatest number of consecutive failed login attempts for the user

Order the output by user_id.

Expand for the DDL
create table events (
    event_id       integer primary key,
    user_id        integer not null,
    event_datetime timestamp not null,
    event_type     varchar not null
);
insert into events
values
    (1,  1, '2024-01-01 11:00:00', 'login'),
    (2,  1, '2024-01-01 12:00:00', 'logout'),
    (3,  1, '2024-01-03 03:00:00', 'login failed'),
    (4,  1, '2024-01-03 03:01:00', 'login failed'),
    (5,  1, '2024-01-03 03:02:00', 'login failed'),
    (6,  1, '2024-01-03 03:05:00', 'login'),
    (7,  2, '2024-01-03 10:00:00', 'login'),
    (8,  2, '2024-01-03 15:00:00', 'logout'),
    (9,  1, '2024-01-03 23:00:00', 'logout'),
    (10, 2, '2024-01-04 22:00:00', 'login failed'),
    (11, 2, '2024-01-04 22:05:00', 'login'),
    (12, 3, '2024-01-05 20:00:00', 'login'),
    (13, 3, '2024-01-06 04:00:00', 'logout'),
    (14, 2, '2024-01-09 15:00:00', 'logout'),
    (15, 3, '2024-01-11 21:00:00', 'login'),
    (16, 1, '2024-01-12 12:00:00', 'login failed'),
    (17, 1, '2024-01-12 13:00:00', 'login failed'),
    (18, 1, '2024-01-12 23:00:00', 'login failed'),
    (19, 2, '2024-01-13 10:00:00', 'login failed'),
    (20, 2, '2024-01-13 10:05:00', 'login'),
    (21, 2, '2024-01-13 15:00:00', 'logout'),
    (22, 1, '2024-01-13 23:00:00', 'login failed'),
    (23, 1, '2024-01-13 23:01:00', 'login failed'),
    (24, 1, '2024-01-13 23:02:00', 'login failed'),
    (25, 2, '2024-01-14 22:00:00', 'login'),
    (26, 3, '2024-01-15 20:00:00', 'login'),
    (27, 3, '2024-01-16 04:00:00', 'logout'),
    (28, 2, '2024-01-19 15:00:00', 'logout'),
    (29, 3, '2024-01-21 21:00:00', 'login'),
    (30, 1, '2024-01-22 12:00:00', 'login failed'),
    (31, 1, '2024-01-22 12:05:00', 'password reset'),
    (32, 1, '2024-01-22 12:10:00', 'login'),
    (33, 1, '2024-01-22 13:00:00', 'logout'),
    (34, 1, '2024-01-23 03:00:00', 'login'),
    (35, 2, '2024-01-23 10:00:00', 'login'),
    (36, 2, '2024-01-23 15:00:00', 'logout'),
    (37, 1, '2024-01-23 23:00:00', 'logout'),
    (38, 2, '2024-01-24 22:00:00', 'login'),
    (39, 3, '2024-01-25 20:00:00', 'login'),
    (40, 3, '2024-01-26 04:00:00', 'logout'),
    (41, 2, '2024-01-29 15:00:00', 'logout'),
    (42, 3, '2024-01-30 21:00:00', 'login failed'),
    (43, 3, '2024-01-30 21:01:00', 'login failed'),
    (44, 3, '2024-01-30 21:02:00', 'login failed'),
    (45, 3, '2024-01-30 21:03:00', 'login failed'),
    (46, 3, '2024-01-30 21:04:00', 'login failed'),
    (47, 3, '2024-01-30 21:05:00', 'password reset'),
    (48, 3, '2024-01-30 21:06:00', 'password reset'),
    (49, 3, '2024-01-30 21:07:00', 'password reset'),
    (50, 3, '2024-01-30 21:08:00', 'password reset'),
    (51, 3, '2024-01-30 21:09:00', 'password reset'),
    (52, 3, '2024-01-30 21:10:00', 'password reset'),
    (53, 3, '2024-01-31 23:55:00', 'login failed'),
    (54, 3, '2024-01-31 23:56:00', 'login failed'),
    (55, 3, '2024-01-31 23:57:00', 'login failed'),
    (56, 3, '2024-01-31 23:58:00', 'login failed'),
    (57, 3, '2024-01-31 23:59:00', 'login failed'),
    (58, 3, '2024-02-01 00:00:00', 'login failed'),
    (59, 3, '2024-02-01 00:01:00', 'login failed'),
    (60, 3, '2024-02-01 00:02:00', 'login failed')
;

The solution can be found at:


Sample input
event_id user_id event_datetime event_type
1 1 2024-01-01 03:00:00 login failed
2 1 2024-01-01 03:01:00 login failed
3 1 2024-01-01 03:02:00 login failed
4 1 2024-01-01 03:03:00 login failed
5 1 2024-01-01 03:04:00 login failed
6 1 2024-01-01 03:05:00 login
7 2 2024-01-01 10:00:00 login
8 2 2024-01-01 15:00:00 logout
9 2 2024-01-01 23:00:00 login failed
with events(event_id, user_id, event_datetime, event_type) as (
    values
        (1, 1, '2024-01-01 03:00:00'::timestamp, 'login failed'),
        (2, 1, '2024-01-01 03:01:00'::timestamp, 'login failed'),
        (3, 1, '2024-01-01 03:02:00'::timestamp, 'login failed'),
        (4, 1, '2024-01-01 03:03:00'::timestamp, 'login failed'),
        (5, 1, '2024-01-01 03:04:00'::timestamp, 'login failed'),
        (6, 1, '2024-01-01 03:05:00'::timestamp, 'login'),
        (7, 2, '2024-01-01 10:00:00'::timestamp, 'login'),
        (8, 2, '2024-01-01 15:00:00'::timestamp, 'logout'),
        (9, 2, '2024-01-01 23:00:00'::timestamp, 'login failed')
)
Sample output
user_id consecutive_failures
1 5
solution(user_id, consecutive_failures) as (
    values
        (1, 5)
)
Hint 1

This is a typical "gaps and islands" problem.

Hint 2

Use the difference between two ROW_NUMBER() functions to create a group for each user and event type. Partition on user_id for one and partition on both user_id and event_type for the other, ordering both by event_id.