Skip to content

Combining dimensions 🔗

Scenario

A human resources department has several dimension tables with information about their employees.

For reporting purposes, they need to combine these dimensions into a single dimension table which shows all the information about each employee.

Question

Write a query which combines the three dimension tables below into a single dimension table.

The output should have the columns from the three tables combined:

  • employee_id
  • valid_from
  • valid_until
  • date_of_birth
  • gender
  • ethnicity
  • job_title
  • salary
  • email
  • phone

The number of rows in the output depends on the number of changes that need to be accounted for, and the valid_from and valid_until columns should be recalculated as necessary to account for these changes.

Order the output by employee_id and valid_from.

Expand for the DDL
create table dim_employee_demographics (
    employee_id   int,
    valid_from    date,
    valid_until   date,
    date_of_birth date,
    gender        varchar,
    ethnicity     varchar,
    primary key (employee_id, valid_from)
);
insert into dim_employee_demographics
values
    (1, '2021-06-13', '2022-06-08', '2004-02-18', 'Female',      'Malaysian'),
    (2, '2021-10-19', '9999-12-31', '1963-12-12', 'Female',      'Navajo'),
    (3, '2022-01-29', '9999-12-31', '2000-10-17', 'Genderqueer', 'White'),
    (4, '2022-04-28', '2024-03-12', '1987-12-13', 'Male',        'Black'),
    (1, '2022-06-09', '9999-12-31', '2004-02-18', 'Non-binary',  'Malaysian'),
    (5, '2022-08-31', '9999-12-31', '1999-09-10', 'Female',      'Asian'),
    (4, '2024-03-13', '9999-12-31', '1987-12-13', 'Female',      'Black')
;


create table dim_employee_career (
    employee_id int,
    valid_from  date,
    valid_until date,
    job_title   varchar,
    salary      decimal(10, 2),
    primary key (employee_id, valid_from)
);
insert into dim_employee_career
values
    (1, '2021-06-13', '2023-05-27', 'Teacher',             5000.00),
    (1, '2023-05-28', '9999-12-31', 'Teacher',             6000.00),
    (2, '2021-10-19', '2023-11-26', 'Data Analyst',        4000.00),
    (2, '2023-11-27', '2024-03-01', 'Data Analyst',        6500.00),
    (2, '2024-03-02', '9999-12-31', 'Engineering Manager', 7000.00),
    (3, '2022-01-29', '2023-04-02', 'Software Engineer',   6000.00),
    (3, '2023-04-03', '9999-12-31', 'Software Engineer',   8000.00),
    (4, '2022-06-12', '9999-12-31', 'Founder',                null)
;


create table dim_employee_contact (
    employee_id int,
    valid_from  date,
    valid_until date,
    email       varchar,
    phone       varchar,
    primary key (employee_id, valid_from)
);
insert into dim_employee_contact
values
    (1, '2021-06-13', '2024-01-29', 'c.perot0@gmail.com',        null),
    (1, '2024-01-30', '9999-12-31', 'c.perot0@gmail.com',        '1986474151'),
    (2, '2021-10-19', '2024-04-04', null,                        null),
    (2, '2024-04-05', '9999-12-31', 'hpicard1@bing.com',         null),
    (4, '2022-06-12', '2022-12-01', 'tbayford3@hotmail.co.uk',   '01246 209863'),
    (4, '2022-12-02', '2023-11-11', 'tbayford3@hotmail.co.uk',   '01752 492269'),
    (4, '2023-11-12', '9999-12-31', 'tmacalinden@hotmail.co.uk', '01270 530950'),
    (5, '2023-02-17', '9999-12-31', null,                        null)
;

The "dimensions" in this context are Slowly Changing Dimensions of type 2 from the Star Schema modelling framework.

The solution can be found at:


Sample input

Employee Demographics Dimension

employee_id valid_from valid_until date_of_birth gender ethnicity
1 2021-07-12 9999-12-31 1995-02-24 Female Black
2 2023-12-07 9999-12-31 1999-12-12 Male Asian

Employee Career Dimension

employee_id valid_from valid_until job_title salary
1 2021-07-12 2023-02-18 Student 0.00
1 2023-02-19 9999-12-31 Pianist 2000.00
2 2023-12-08 9999-12-31 Paramedic 4000.00

Employee Contact Dimension

employee_id valid_from valid_until email phone
1 2021-07-12 9999-12-31 abcde@gmail.com 123-456-789
2 2023-12-08 2023-12-31 null 01234 567890
2 2024-01-01 9999-12-31 something@mail.net 0300 123 456
with

dim_employee_demographics(employee_id, valid_from, valid_until, date_of_birth, gender, ethnicity) as (
    values
        (1, '2021-07-12'::date, '9999-12-31'::date, '1995-02-24'::date, 'Female', 'White'),
        (2, '2023-12-07'::date, '9999-12-31'::date, '1999-12-12'::date, 'Male',   'Asian')
),

dim_employee_career(employee_id, valid_from, valid_until, job_title, salary) as (
    values
        (1, '2021-07-12'::date, '2023-02-18'::date, 'Student',      0.00),
        (1, '2023-02-19'::date, '9999-12-31'::date, 'Pianist',   2000.00),
        (2, '2023-12-08'::date, '9999-12-31'::date, 'Paramedic', 4000.00)
),

dim_employee_contact(employee_id, valid_from, valid_until, email, phone) as (
    values
        (1, '2021-07-12'::date, '9999-12-31'::date, 'abcde@gmail.com',    '123-456-789'),
        (2, '2023-12-08'::date, '2023-12-31'::date, null,                 '01234 567890'),
        (2, '2024-01-01'::date, '9999-12-31'::date, 'something@mail.net', '0300 123 456')
)
Sample output
employee_id valid_from valid_until date_of_birth gender ethnicity job_title salary email phone
1 2021-07-12 2023-02-18 1995-02-24 Female Black Student 0.00 abcde@gmail.com 123-456-789
1 2023-02-19 9999-12-31 1995-02-24 Female Black Pianist 2000.00 abcde@gmail.com 123-456-789
2 2023-12-07 2023-12-07 1999-12-12 Male Asian null null null null
2 2023-12-08 2023-12-31 1999-12-12 Male Asian Paramedic 4000.00 null 01234 567890
2 2024-01-01 9999-12-31 1999-12-12 Male Asian Paramedic 4000.00 something@mail.net 0300 123 456
solution(employee_id, valid_from, valid_until, date_of_birth, gender, ethnicity, job_title, salary, email, phone) as (
    values
        (1, '2021-07-12'::date, '2023-02-18'::date, '1995-02-24'::date, 'Female', 'Black', 'Student',      0.00, 'abcde@gmail.com',    '123-456-789'),
        (1, '2023-02-19'::date, '9999-12-31'::date, '1995-02-24'::date, 'Female', 'Black', 'Pianist',   2000.00, 'abcde@gmail.com',    '123-456-789'),
        (2, '2023-12-07'::date, '2023-12-07'::date, '1999-12-12'::date, 'Male',   'Asian', null,           null, null,                 null),
        (2, '2023-12-08'::date, '2023-12-31'::date, '1999-12-12'::date, 'Male',   'Asian', 'Paramedic', 4000.00, null,                 '01234 567890'),
        (2, '2024-01-01'::date, '9999-12-31'::date, '1999-12-12'::date, 'Male',   'Asian', 'Paramedic', 4000.00, 'something@mail.net', '0300 123 456')
)
Hint 1

The valid_from values in the output will be each of the distinct valid_from values from the three tables for the given employee, so start by constructing an "axis" which is the distinct combination of all employee_id and valid_from values across the three tables.

Hint 2

For databases that support it, use an ASOF join to add the latest information from each dimension to the "axis" for each employee_id and valid_from combination.

For databases that don't support ASOF joins, calculate the valid_until values for the valid_from values in the "axis" and then join the dimension tables to the axis by where the axis valid_from-valid_until range in contained within the dimension valid_from-valid_until range.