Skip to content

Metric correlation 🔀

Scenario

A company has banded their customers into segments and calculated several metrics for each customer.

The company wants to know which pairs of metrics are most correlated within each segment.

Question

For each customer segment, find the highest correlated pair of metrics.

The correlation should be rounded to four decimal places, and the output should keep ties in the (rounded) correlation. Don't compare a metric to itself 😄

The output should have a row per segment and metric pair, with the columns:

  • segment
  • metric_pair as the metric pair in the format metric_1, metric_2. Put the (lexicographically) lower metric name on the left using your database's default collation
  • correlation as the correlation between the two metrics, rounded to four decimal places

Order the output by segment and metric_pair.

Expand for the DDL
create table metrics (
    customer_id integer primary key,
    segment     integer not null,
    metric_1    integer not null,
    metric_2    integer not null,
    metric_3    integer not null,
    metric_4    integer not null,
    metric_5    integer not null
);
insert into metrics
values
    (  1,  7,  17, 93, 23, 110,  41),
    (  2,  4,  22, 67, 38,  89,  37),
    (  3,  6,  62, 30, 90,  92, 178),
    (  4,  1,  20, 82, 87, 102,  37),
    (  5,  1,   7, 34, 50,  41,  77),
    (  6,  8,  49,  6, 24,  55,  19),
    (  7,  2,  19, 64, 63,  83,   9),
    (  8,  4,  46, 59, 21, 105,  64),
    (  9,  1,  28, 74, 10, 103,  59),
    ( 10,  6,  92, 77, 28, 170,  52),
    ( 11,  7,  27, 78, 33, 106,  77),
    ( 12,  4,  10, 17, 97,  28,  88),
    ( 13,  6,  47, 74, 37, 122,  76),
    ( 14,  1,  93,  7, 69, 100,   8),
    ( 15,  4,  93, 38, 71, 131,  71),
    ( 16,  7,  21, 58, 43,  80,  74),
    ( 17,  1,  37,  7, 54,  44,  15),
    ( 18,  2,  58, 94, 62, 152,  99),
    ( 19,  7,  89, 52, 88, 142,  71),
    ( 20,  5, 100,  7, 48, 107,  99),
    ( 21,  7,  50, 89, 31, 139,  69),
    ( 22,  5,   2, 28, 40,  29,  89),
    ( 23,  2,  78, 74, 52, 153,  39),
    ( 24,  3,  61, 65, 27, 126,  51),
    ( 25, 10,  48, 98, 93, 145,  93),
    ( 26,  9,  60, 21, 54,  81,  85),
    ( 27,  4,  19, 48, 19,  67,  25),
    ( 28,  3,  56, 26, 53,  83,  98),
    ( 29,  4,  32, 43, 86,  75,  67),
    ( 30,  6,  41,  1, 66,  43, 132),
    ( 31,  7,  97, 32, 91, 129,   4),
    ( 32,  0,  91, 85, 52, 176,  65),
    ( 33, 10,  59, 66, 18, 125,  88),
    ( 34,  5,  83, 35, 77, 118, 153),
    ( 35,  2,  38, 52, 32,  89,  24),
    ( 36,  0,  10,  0, 75,  10,  62),
    ( 37,  6,  19, 92, 31, 111,  53),
    ( 38,  7,  99,  3, 24, 102,  71),
    ( 39,  2,  54, 41, 61,  96,  17),
    ( 40,  6,  75, 61, 50, 136, 108),
    ( 41,  4,  88, 46, 68, 133,  38),
    ( 42,  0,  87, 60, 32, 146,  27),
    ( 43,  9,  41, 25, 91,  65,  34),
    ( 44,  7,  76, 28, 39, 103,  56),
    ( 45,  2,  18, 14, 13,  31,  29),
    ( 46,  8,  26, 38,  1,  64,  53),
    ( 47, 10,  90, 53, 30, 143,  56),
    ( 48,  4,   3, 20, 57,  23,  32),
    ( 49,  7,  28,  5, 71,  34,  98),
    ( 50,  0,  32, 37, 30,  69,  82),
    ( 51,  3,  69, 35, 63, 105, 125),
    ( 52,  4,  67, 70, 41, 136,  70),
    ( 53,  8,  53, 68, 67, 122,  58),
    ( 54,  8,  14, 71, 95,  85,  13),
    ( 55,  5,  55, 88, 67, 143, 133),
    ( 56,  7,  42, 80,  8, 121,   2),
    ( 57, 10,  69, 79, 30, 148,  99),
    ( 58,  3,  87, 57, 69, 144, 132),
    ( 59,  7,  59, 68, 80, 128,  58),
    ( 60,  5,  34, 36, 59,  70, 118),
    ( 61,  6,  54, 94, 25, 148,  44),
    ( 62, 10,  62, 45, 68, 107,   7),
    ( 63,  6,  97, 98, 99, 195, 201),
    ( 64,  0,  64, 25, 43,  89,  25),
    ( 65,  7,  15, 38, 51,  53,  21),
    ( 66,  3,  37, 15,  1,  51,   2),
    ( 67,  8,  34, 64, 12,  98,  68),
    ( 68,  3,  85, 58, 30, 143,  58),
    ( 69,  7,  21, 90, 79, 111,  55),
    ( 70,  5,  43, 64, 18, 107,  41),
    ( 71,  5,  53, 85, 22, 138,  45),
    ( 72, 10,  67, 49, 70, 117,  11),
    ( 73, 10,  97,  5,  6, 102,  11),
    ( 74,  7,  26, 59, 42,  85,  55),
    ( 75,  6,   5, 25, 92,  30, 176),
    ( 76,  4,  76, 26,  3, 102,  95),
    ( 77,  1,  18, 53, 95,  71,   8),
    ( 78,  0,  79, 51, 82, 130,  74),
    ( 79,  1,  72, 63,  3, 136,  48),
    ( 80,  9,  45, 24,  5,  70,  47),
    ( 81,  0,  46, 40, 19,  86,  73),
    ( 82,  8,  34, 72, 17, 107,  54),
    ( 83,  7,  36, 54, 47,  90,   3),
    ( 84,  3,  50, 57,  8, 107,  15),
    ( 85,  2,  66, 11,  7,  77,  53),
    ( 86,  0,   0, 82, 95,  83,   5),
    ( 87,  4,  13, 38, 16,  51,  14),
    ( 88,  6,  61, 10, 31,  72,  56),
    ( 89,  7,  21, 17, 34,  38,  34),
    ( 90,  3,  77, 31, 96, 108, 197),
    ( 91,  3,  90, 27, 44, 117,  79),
    ( 92,  0,  87, 24, 64, 110,  38),
    ( 93,  6,  90, 32, 98, 122, 193),
    ( 94, 10,  82, 65, 19, 147,  48),
    ( 95,  1,  58, 93, 68, 152,   9),
    ( 96,  7,  42, 43, 16,  84,  27),
    ( 97,  5,  29, 31,  1,  60,   9),
    ( 98,  2,  49, 70, 58, 119,   6),
    ( 99,  6,  51, 51, 86, 102, 162),
    (100,  4,  18, 35, 85,  53,  30)
;

The solution can be found at:


Sample input
customer_id segment metric_1 metric_2 metric_3 metric_4 metric_5
1 1 21 58 66 79 29
2 0 70 55 79 125 2
3 1 68 55 10 123 70
4 1 20 62 59 82 25
5 0 42 9 80 51 13
6 1 26 89 17 115 66
7 1 45 51 90 96 17
8 0 4 52 47 56 61
9 0 57 48 82 105 40
10 1 17 93 45 109 76
with metrics(customer_id, segment, metric_1, metric_2, metric_3, metric_4, metric_5) as (
    values
        ( 1, 1, 21, 58, 66,  79, 29),
        ( 2, 0, 70, 55, 79, 125,  2),
        ( 3, 1, 68, 55, 10, 123, 70),
        ( 4, 1, 20, 62, 59,  82, 25),
        ( 5, 0, 42,  9, 80,  51, 13),
        ( 6, 1, 26, 89, 17, 115, 66),
        ( 7, 1, 45, 51, 90,  96, 17),
        ( 8, 0,  4, 52, 47,  56, 61),
        ( 9, 0, 57, 48, 82, 105, 40),
        (10, 1, 17, 93, 45, 109, 76)
)
Sample output
segment metric_pair correlation
0 metric_1, metric_3 0.9051
1 metric_4, metric_5 0.8357
solution(segment, metric_pair, correlation) as (
    values
        (0, 'metric_1, metric_3', 0.9051),
        (1, 'metric_4, metric_5', 0.8357)
)
Hint 1

Use a correlation function, usually called CORR, to calculate the correlation between two metrics.

If you're using a database that doesn't have a built-in correlation function, you can try to calculate it manually -- but I'd instead recommend skipping this question.

Hint 2

To get every pair of metrics "side by side" for the CORR function, unpivot the table so that each metric is in its own row and then join the resulting table to itself on the segment and customer_id columns.