Encoding datelist ints 🔐
Scenario
Inspired by the modelling approach used by the social media platform from the customer churn/decoding datelist ints problems, the company from the suspicious login activity/bannable login activity problems want to implement the "datelist integer" for their users' login history.
This company has also given us more information about how their system works: logins expire after 24 hours, and only manual user logouts are recorded in the events table.
This means that:
- A consecutive login within 24 hours (inclusive) of a previous login keeps the user logged in.
- A logout event when the user is already logged out does nothing; it can be ignored.
Question
Encode the login history for each user into a datelist integer, where a day is flagged as 1
if the user was logged in at any point on that day, 0
otherwise.
The encoding should be relative to the day of the latest event in the table.
The output should have a single row per user with the columns:
user_id
last_update
as the date of the latest event (the date that the encoding is relative to)activity_history
as the datelist integer
Order the output by user_id
.
Expand for the DDL
create table events (
event_id integer primary key,
user_id integer not null,
event_datetime timestamp not null,
event_type varchar not null
);
insert into events
values
(1, 1, '2024-01-01 11:00:00', 'login'),
(2, 1, '2024-01-01 12:00:00', 'logout'),
(3, 1, '2024-01-03 03:00:00', 'login failed'),
(4, 1, '2024-01-03 03:01:00', 'login failed'),
(5, 1, '2024-01-03 03:02:00', 'login failed'),
(6, 1, '2024-01-03 03:05:00', 'login'),
(7, 2, '2024-01-03 10:00:00', 'login'),
(8, 2, '2024-01-03 15:00:00', 'logout'),
(9, 1, '2024-01-03 23:00:00', 'logout'),
(10, 2, '2024-01-04 22:00:00', 'login failed'),
(11, 2, '2024-01-04 22:05:00', 'login'),
(12, 3, '2024-01-05 20:00:00', 'login'),
(13, 3, '2024-01-06 04:00:00', 'logout'),
(14, 2, '2024-01-09 15:00:00', 'logout'),
(15, 3, '2024-01-11 21:00:00', 'login'),
(16, 1, '2024-01-12 12:00:00', 'login failed'),
(17, 1, '2024-01-12 13:00:00', 'login failed'),
(18, 1, '2024-01-12 23:00:00', 'login failed'),
(19, 2, '2024-01-13 10:00:00', 'login failed'),
(20, 2, '2024-01-13 10:05:00', 'login'),
(21, 2, '2024-01-13 15:00:00', 'logout'),
(22, 1, '2024-01-13 23:00:00', 'login failed'),
(23, 1, '2024-01-13 23:01:00', 'login failed'),
(24, 1, '2024-01-13 23:02:00', 'login failed'),
(25, 2, '2024-01-14 22:00:00', 'login'),
(26, 3, '2024-01-15 20:00:00', 'login'),
(27, 3, '2024-01-16 04:00:00', 'logout'),
(28, 2, '2024-01-19 15:00:00', 'logout'),
(29, 3, '2024-01-21 21:00:00', 'login'),
(30, 1, '2024-01-22 12:00:00', 'login failed'),
(31, 1, '2024-01-22 12:05:00', 'password reset'),
(32, 1, '2024-01-22 12:10:00', 'login'),
(33, 1, '2024-01-22 13:00:00', 'logout'),
(34, 1, '2024-01-23 03:00:00', 'login'),
(35, 2, '2024-01-23 10:00:00', 'login'),
(36, 2, '2024-01-23 15:00:00', 'logout'),
(37, 1, '2024-01-23 23:00:00', 'logout'),
(38, 2, '2024-01-24 22:00:00', 'login'),
(39, 3, '2024-01-25 20:00:00', 'login'),
(40, 3, '2024-01-26 04:00:00', 'logout'),
(41, 2, '2024-01-29 15:00:00', 'logout'),
(42, 3, '2024-01-30 21:00:00', 'login failed'),
(43, 3, '2024-01-30 21:01:00', 'login failed'),
(44, 3, '2024-01-30 21:02:00', 'login failed'),
(45, 3, '2024-01-30 21:03:00', 'login failed'),
(46, 3, '2024-01-30 21:04:00', 'login failed'),
(47, 3, '2024-01-30 21:05:00', 'password reset'),
(48, 3, '2024-01-30 21:06:00', 'password reset'),
(49, 3, '2024-01-30 21:07:00', 'password reset'),
(50, 3, '2024-01-30 21:08:00', 'password reset'),
(51, 3, '2024-01-30 21:09:00', 'password reset'),
(52, 3, '2024-01-30 21:10:00', 'password reset'),
(53, 3, '2024-01-31 23:55:00', 'login failed'),
(54, 3, '2024-01-31 23:56:00', 'login failed'),
(55, 3, '2024-01-31 23:57:00', 'login failed'),
(56, 3, '2024-01-31 23:58:00', 'login failed'),
(57, 3, '2024-01-31 23:59:00', 'login failed'),
(58, 3, '2024-02-01 00:00:00', 'login failed'),
(59, 3, '2024-02-01 00:01:00', 'login failed'),
(60, 3, '2024-02-01 00:02:00', 'login failed')
;
The solution can be found at:
A worked example is provided below to help illustrate the encoding.
Sample input
event_id | user_id | event_datetime | event_type |
---|---|---|---|
1 | 1 | 2024-01-01 01:03:00 | login |
2 | 1 | 2024-01-04 01:02:00 | login |
3 | 1 | 2024-01-05 01:01:00 | login |
4 | 1 | 2024-01-06 01:00:00 | logout |
5 | 1 | 2024-01-07 01:05:00 | logout |
6 | 1 | 2024-01-07 01:06:00 | logout |
7 | 2 | 2024-01-08 01:07:00 | login |
8 | 2 | 2024-01-09 01:08:00 | login |
9 | 2 | 2024-01-10 01:09:00 | login |
10 | 2 | 2024-01-10 01:10:00 | logout |
11 | 2 | 2024-01-11 01:11:00 | logout |
12 | 2 | 2024-01-12 01:12:00 | logout |
with events(event_id, user_id, event_datetime, event_type) as (
values
(1, 1, '2024-01-01 01:03:00'::timestamp, 'login'),
(2, 1, '2024-01-04 01:02:00'::timestamp, 'login'),
(3, 1, '2024-01-05 01:01:00'::timestamp, 'login'),
(4, 1, '2024-01-06 01:00:00'::timestamp, 'logout'),
(5, 1, '2024-01-07 01:05:00'::timestamp, 'logout'),
(6, 1, '2024-01-07 01:06:00'::timestamp, 'logout'),
(7, 2, '2024-01-08 01:07:00'::timestamp, 'login'),
(8, 2, '2024-01-09 01:08:00'::timestamp, 'login'),
(9, 2, '2024-01-10 01:09:00'::timestamp, 'login'),
(10, 2, '2024-01-10 01:10:00'::timestamp, 'logout'),
(11, 2, '2024-01-11 01:11:00'::timestamp, 'logout'),
(12, 2, '2024-01-12 01:12:00'::timestamp, 'logout')
)
Sample output
user_id | last_update | activity_history |
---|---|---|
1 | 2024-01-12 | 3520 |
2 | 2024-01-12 | 28 |
solution(user_id, last_update, activity_history) as (
values
(1, '2024-01-12'::date, 3520),
(2, '2024-01-12'::date, 28)
)
Hint 1
First construct a table with one row per day per user, and a column which flags if the user was logged in at any point on that day.
Hint 2
Multiply the flag column by a power of 2 to get the value component for that day, then sum them up to get the datelist integer.
Note that the power of 2 should be the number of days between the current row's day and the latest event's day.
Worked example
To help illustrate the encoding, consider the following events in the Sample input.
We'll walk through each of the events and how they contribute to different sessions.
- User 1 logs in at
2024-01-01 01:03:00
, starting the first session. Their next event is a login at2024-01-04 01:02:00
which is over 24 hours; so the first session expires at2024-01-02 01:03:00
. - User 1 logs in at
2024-01-04 01:02:00
, starting the second session. Their next event is a login at2024-01-05 01:01:00
which is under 24 hours; so the second session continues. - User 1 logs in at
2024-01-05 01:01:00
, continuing the second session. Their next event is a logout at2024-01-06 01:00:00
which is under 24 hours; so the second session ends at2024-01-06 01:00:00
. - User 1 logs out at
2024-01-06 01:00:00
, which we've already accounted for. - User 1 logs out at
2024-01-07 01:05:00
, which does nothing. - User 1 logs out at
2024-01-07 01:06:00
, which does nothing. - User 2 logs in at
2024-01-08 01:07:00
, starting the third session. Their next event is a login at2024-01-09 01:08:00
which is under 24 hours; so the third session continues. - User 2 logs in at
2024-01-09 01:08:00
, continuing the third session. Their next event is a login at2024-01-10 01:09:00
which is under 24 hours; so the third session continues. - User 2 logs in at
2024-01-10 01:09:00
, continuing the third session. Their next event is a logout at2024-01-10 01:10:00
which is under 24 hours; so the third session ends at2024-01-10 01:10:00
. - User 2 logs out at
2024-01-10 01:10:00
, which we've already accounted for. - User 2 logs out at
2024-01-11 01:11:00
, which does nothing. - User 2 logs out at
2024-01-12 01:12:00
, which does nothing.
This gives us the following sessions:
user_id | login_date | logout_date |
---|---|---|
1 | 2024-01-01 01:03:00 | 2024-01-02 01:03:00 |
1 | 2024-01-04 01:02:00 | 2024-01-06 01:00:00 |
2 | 2024-01-08 01:07:00 | 2024-01-09 01:07:00 |
2 | 2024-01-09 01:08:00 | 2024-01-10 01:08:00 |
2 | 2024-01-10 01:09:00 | 2024-01-10 01:10:00 |
The earliest and latest events in the events
table are on 2024-01-01
and 2024-01-12
respectively, so the encoding is relative to the 2024-01-12
. We can plot the sessions on a timeline:
active_date | user_1_is_active | user_2_is_active |
---|---|---|
2024-01-01 | 1 | 0 |
2024-01-02 | 1 | 0 |
2024-01-03 | 0 | 0 |
2024-01-04 | 1 | 0 |
2024-01-05 | 1 | 0 |
2024-01-06 | 1 | 0 |
2024-01-07 | 0 | 0 |
2024-01-08 | 0 | 1 |
2024-01-09 | 0 | 1 |
2024-01-10 | 0 | 1 |
2024-01-11 | 0 | 0 |
2024-01-12 | 0 | 0 |
Since the latest event is on 2024-01-12
, the encoding for 2024-01-12
uses 20, the encoding for 2024-01-11
uses 21, and so on. This gives us the following datelist integers:
- User 1: 2048 + 1024 + 256 + 128 + 64 = 3520
- User 2: 16 + 8 + 4 = 28