Skip to content

Encoding datelist ints 🔐

Scenario

Inspired by the modelling approach used by the social media platform from the customer churn/decoding datelist ints problems, the company from the suspicious login activity/bannable login activity problems want to implement the "datelist integer" for their users' login history.

This company has also given us more information about how their system works: logins expire after 24 hours, and only manual user logouts are recorded in the events table.

This means that:

  • A consecutive login within 24 hours (inclusive) of a previous login keeps the user logged in.
  • A logout event when the user is already logged out does nothing; it can be ignored.

Question

Encode the login history for each user into a datelist integer, where a day is flagged as 1 if the user was logged in at any point on that day, 0 otherwise.

The encoding should be relative to the day of the latest event in the table.

The output should have a single row per user with the columns:

  • user_id
  • last_update as the date of the latest event (the date that the encoding is relative to)
  • activity_history as the datelist integer

Order the output by user_id.

Expand for the DDL
create table events (
    event_id       integer primary key,
    user_id        integer not null,
    event_datetime timestamp not null,
    event_type     varchar not null
);
insert into events
values
    (1,  1, '2024-01-01 11:00:00', 'login'),
    (2,  1, '2024-01-01 12:00:00', 'logout'),
    (3,  1, '2024-01-03 03:00:00', 'login failed'),
    (4,  1, '2024-01-03 03:01:00', 'login failed'),
    (5,  1, '2024-01-03 03:02:00', 'login failed'),
    (6,  1, '2024-01-03 03:05:00', 'login'),
    (7,  2, '2024-01-03 10:00:00', 'login'),
    (8,  2, '2024-01-03 15:00:00', 'logout'),
    (9,  1, '2024-01-03 23:00:00', 'logout'),
    (10, 2, '2024-01-04 22:00:00', 'login failed'),
    (11, 2, '2024-01-04 22:05:00', 'login'),
    (12, 3, '2024-01-05 20:00:00', 'login'),
    (13, 3, '2024-01-06 04:00:00', 'logout'),
    (14, 2, '2024-01-09 15:00:00', 'logout'),
    (15, 3, '2024-01-11 21:00:00', 'login'),
    (16, 1, '2024-01-12 12:00:00', 'login failed'),
    (17, 1, '2024-01-12 13:00:00', 'login failed'),
    (18, 1, '2024-01-12 23:00:00', 'login failed'),
    (19, 2, '2024-01-13 10:00:00', 'login failed'),
    (20, 2, '2024-01-13 10:05:00', 'login'),
    (21, 2, '2024-01-13 15:00:00', 'logout'),
    (22, 1, '2024-01-13 23:00:00', 'login failed'),
    (23, 1, '2024-01-13 23:01:00', 'login failed'),
    (24, 1, '2024-01-13 23:02:00', 'login failed'),
    (25, 2, '2024-01-14 22:00:00', 'login'),
    (26, 3, '2024-01-15 20:00:00', 'login'),
    (27, 3, '2024-01-16 04:00:00', 'logout'),
    (28, 2, '2024-01-19 15:00:00', 'logout'),
    (29, 3, '2024-01-21 21:00:00', 'login'),
    (30, 1, '2024-01-22 12:00:00', 'login failed'),
    (31, 1, '2024-01-22 12:05:00', 'password reset'),
    (32, 1, '2024-01-22 12:10:00', 'login'),
    (33, 1, '2024-01-22 13:00:00', 'logout'),
    (34, 1, '2024-01-23 03:00:00', 'login'),
    (35, 2, '2024-01-23 10:00:00', 'login'),
    (36, 2, '2024-01-23 15:00:00', 'logout'),
    (37, 1, '2024-01-23 23:00:00', 'logout'),
    (38, 2, '2024-01-24 22:00:00', 'login'),
    (39, 3, '2024-01-25 20:00:00', 'login'),
    (40, 3, '2024-01-26 04:00:00', 'logout'),
    (41, 2, '2024-01-29 15:00:00', 'logout'),
    (42, 3, '2024-01-30 21:00:00', 'login failed'),
    (43, 3, '2024-01-30 21:01:00', 'login failed'),
    (44, 3, '2024-01-30 21:02:00', 'login failed'),
    (45, 3, '2024-01-30 21:03:00', 'login failed'),
    (46, 3, '2024-01-30 21:04:00', 'login failed'),
    (47, 3, '2024-01-30 21:05:00', 'password reset'),
    (48, 3, '2024-01-30 21:06:00', 'password reset'),
    (49, 3, '2024-01-30 21:07:00', 'password reset'),
    (50, 3, '2024-01-30 21:08:00', 'password reset'),
    (51, 3, '2024-01-30 21:09:00', 'password reset'),
    (52, 3, '2024-01-30 21:10:00', 'password reset'),
    (53, 3, '2024-01-31 23:55:00', 'login failed'),
    (54, 3, '2024-01-31 23:56:00', 'login failed'),
    (55, 3, '2024-01-31 23:57:00', 'login failed'),
    (56, 3, '2024-01-31 23:58:00', 'login failed'),
    (57, 3, '2024-01-31 23:59:00', 'login failed'),
    (58, 3, '2024-02-01 00:00:00', 'login failed'),
    (59, 3, '2024-02-01 00:01:00', 'login failed'),
    (60, 3, '2024-02-01 00:02:00', 'login failed')
;

The solution can be found at:

A worked example is provided below to help illustrate the encoding.


Sample input
event_id user_id event_datetime event_type
1 1 2024-01-01 01:03:00 login
2 1 2024-01-04 01:02:00 login
3 1 2024-01-05 01:01:00 login
4 1 2024-01-06 01:00:00 logout
5 1 2024-01-07 01:05:00 logout
6 1 2024-01-07 01:06:00 logout
7 2 2024-01-08 01:07:00 login
8 2 2024-01-09 01:08:00 login
9 2 2024-01-10 01:09:00 login
10 2 2024-01-10 01:10:00 logout
11 2 2024-01-11 01:11:00 logout
12 2 2024-01-12 01:12:00 logout
with events(event_id, user_id, event_datetime, event_type) as (
    values
        (1,  1, '2024-01-01 01:03:00'::timestamp, 'login'),
        (2,  1, '2024-01-04 01:02:00'::timestamp, 'login'),
        (3,  1, '2024-01-05 01:01:00'::timestamp, 'login'),
        (4,  1, '2024-01-06 01:00:00'::timestamp, 'logout'),
        (5,  1, '2024-01-07 01:05:00'::timestamp, 'logout'),
        (6,  1, '2024-01-07 01:06:00'::timestamp, 'logout'),
        (7,  2, '2024-01-08 01:07:00'::timestamp, 'login'),
        (8,  2, '2024-01-09 01:08:00'::timestamp, 'login'),
        (9,  2, '2024-01-10 01:09:00'::timestamp, 'login'),
        (10, 2, '2024-01-10 01:10:00'::timestamp, 'logout'),
        (11, 2, '2024-01-11 01:11:00'::timestamp, 'logout'),
        (12, 2, '2024-01-12 01:12:00'::timestamp, 'logout')
)
Sample output
user_id last_update activity_history
1 2024-01-12 3520
2 2024-01-12 28
solution(user_id, last_update, activity_history) as (
    values
        (1, '2024-01-12'::date, 3520),
        (2, '2024-01-12'::date,   28)
)
Hint 1

First construct a table with one row per day per user, and a column which flags if the user was logged in at any point on that day.

Hint 2

Multiply the flag column by a power of 2 to get the value component for that day, then sum them up to get the datelist integer.

Note that the power of 2 should be the number of days between the current row's day and the latest event's day.


Worked example

To help illustrate the encoding, consider the following events in the Sample input.

We'll walk through each of the events and how they contribute to different sessions.

  1. User 1 logs in at 2024-01-01 01:03:00, starting the first session. Their next event is a login at 2024-01-04 01:02:00 which is over 24 hours; so the first session expires at 2024-01-02 01:03:00.
  2. User 1 logs in at 2024-01-04 01:02:00, starting the second session. Their next event is a login at 2024-01-05 01:01:00 which is under 24 hours; so the second session continues.
  3. User 1 logs in at 2024-01-05 01:01:00, continuing the second session. Their next event is a logout at 2024-01-06 01:00:00 which is under 24 hours; so the second session ends at 2024-01-06 01:00:00.
  4. User 1 logs out at 2024-01-06 01:00:00, which we've already accounted for.
  5. User 1 logs out at 2024-01-07 01:05:00, which does nothing.
  6. User 1 logs out at 2024-01-07 01:06:00, which does nothing.
  7. User 2 logs in at 2024-01-08 01:07:00, starting the third session. Their next event is a login at 2024-01-09 01:08:00 which is under 24 hours; so the third session continues.
  8. User 2 logs in at 2024-01-09 01:08:00, continuing the third session. Their next event is a login at 2024-01-10 01:09:00 which is under 24 hours; so the third session continues.
  9. User 2 logs in at 2024-01-10 01:09:00, continuing the third session. Their next event is a logout at 2024-01-10 01:10:00 which is under 24 hours; so the third session ends at 2024-01-10 01:10:00.
  10. User 2 logs out at 2024-01-10 01:10:00, which we've already accounted for.
  11. User 2 logs out at 2024-01-11 01:11:00, which does nothing.
  12. User 2 logs out at 2024-01-12 01:12:00, which does nothing.

This gives us the following sessions:

user_id login_date logout_date
1 2024-01-01 01:03:00 2024-01-02 01:03:00
1 2024-01-04 01:02:00 2024-01-06 01:00:00
2 2024-01-08 01:07:00 2024-01-09 01:07:00
2 2024-01-09 01:08:00 2024-01-10 01:08:00
2 2024-01-10 01:09:00 2024-01-10 01:10:00

The earliest and latest events in the events table are on 2024-01-01 and 2024-01-12 respectively, so the encoding is relative to the 2024-01-12. We can plot the sessions on a timeline:

active_date user_1_is_active user_2_is_active
2024-01-01 1 0
2024-01-02 1 0
2024-01-03 0 0
2024-01-04 1 0
2024-01-05 1 0
2024-01-06 1 0
2024-01-07 0 0
2024-01-08 0 1
2024-01-09 0 1
2024-01-10 0 1
2024-01-11 0 0
2024-01-12 0 0

Since the latest event is on 2024-01-12, the encoding for 2024-01-12 uses 20, the encoding for 2024-01-11 uses 21, and so on. This gives us the following datelist integers:

  • User 1: 2048 + 1024 + 256 + 128 + 64 = 3520
  • User 2: 16 + 8 + 4 = 28