Metric correlation 🔀
Scenario
A company has banded their customers into segments and calculated several metrics for each customer.
The company wants to know which pairs of metrics are most correlated within each segment.
Question
For each customer segment, find the highest correlated pair of metrics.
The correlation should be rounded to four decimal places, and the output should keep ties in the (rounded) correlation. Don't compare a metric to itself 😄
The output should have a row per segment and metric pair, with the columns:
segment
metric_pair
as the metric pair in the formatmetric_1, metric_2
. Put the (lexicographically) lower metric name on the left using your database's default collationcorrelation
as the correlation between the two metrics, rounded to four decimal places
Order the output by segment
and metric_pair
.
Expand for the DDL
create table metrics (
customer_id integer primary key,
segment integer not null,
metric_1 integer not null,
metric_2 integer not null,
metric_3 integer not null,
metric_4 integer not null,
metric_5 integer not null
);
insert into metrics
values
( 1, 7, 17, 93, 23, 110, 41),
( 2, 4, 22, 67, 38, 89, 37),
( 3, 6, 62, 30, 90, 92, 178),
( 4, 1, 20, 82, 87, 102, 37),
( 5, 1, 7, 34, 50, 41, 77),
( 6, 8, 49, 6, 24, 55, 19),
( 7, 2, 19, 64, 63, 83, 9),
( 8, 4, 46, 59, 21, 105, 64),
( 9, 1, 28, 74, 10, 103, 59),
( 10, 6, 92, 77, 28, 170, 52),
( 11, 7, 27, 78, 33, 106, 77),
( 12, 4, 10, 17, 97, 28, 88),
( 13, 6, 47, 74, 37, 122, 76),
( 14, 1, 93, 7, 69, 100, 8),
( 15, 4, 93, 38, 71, 131, 71),
( 16, 7, 21, 58, 43, 80, 74),
( 17, 1, 37, 7, 54, 44, 15),
( 18, 2, 58, 94, 62, 152, 99),
( 19, 7, 89, 52, 88, 142, 71),
( 20, 5, 100, 7, 48, 107, 99),
( 21, 7, 50, 89, 31, 139, 69),
( 22, 5, 2, 28, 40, 29, 89),
( 23, 2, 78, 74, 52, 153, 39),
( 24, 3, 61, 65, 27, 126, 51),
( 25, 10, 48, 98, 93, 145, 93),
( 26, 9, 60, 21, 54, 81, 85),
( 27, 4, 19, 48, 19, 67, 25),
( 28, 3, 56, 26, 53, 83, 98),
( 29, 4, 32, 43, 86, 75, 67),
( 30, 6, 41, 1, 66, 43, 132),
( 31, 7, 97, 32, 91, 129, 4),
( 32, 0, 91, 85, 52, 176, 65),
( 33, 10, 59, 66, 18, 125, 88),
( 34, 5, 83, 35, 77, 118, 153),
( 35, 2, 38, 52, 32, 89, 24),
( 36, 0, 10, 0, 75, 10, 62),
( 37, 6, 19, 92, 31, 111, 53),
( 38, 7, 99, 3, 24, 102, 71),
( 39, 2, 54, 41, 61, 96, 17),
( 40, 6, 75, 61, 50, 136, 108),
( 41, 4, 88, 46, 68, 133, 38),
( 42, 0, 87, 60, 32, 146, 27),
( 43, 9, 41, 25, 91, 65, 34),
( 44, 7, 76, 28, 39, 103, 56),
( 45, 2, 18, 14, 13, 31, 29),
( 46, 8, 26, 38, 1, 64, 53),
( 47, 10, 90, 53, 30, 143, 56),
( 48, 4, 3, 20, 57, 23, 32),
( 49, 7, 28, 5, 71, 34, 98),
( 50, 0, 32, 37, 30, 69, 82),
( 51, 3, 69, 35, 63, 105, 125),
( 52, 4, 67, 70, 41, 136, 70),
( 53, 8, 53, 68, 67, 122, 58),
( 54, 8, 14, 71, 95, 85, 13),
( 55, 5, 55, 88, 67, 143, 133),
( 56, 7, 42, 80, 8, 121, 2),
( 57, 10, 69, 79, 30, 148, 99),
( 58, 3, 87, 57, 69, 144, 132),
( 59, 7, 59, 68, 80, 128, 58),
( 60, 5, 34, 36, 59, 70, 118),
( 61, 6, 54, 94, 25, 148, 44),
( 62, 10, 62, 45, 68, 107, 7),
( 63, 6, 97, 98, 99, 195, 201),
( 64, 0, 64, 25, 43, 89, 25),
( 65, 7, 15, 38, 51, 53, 21),
( 66, 3, 37, 15, 1, 51, 2),
( 67, 8, 34, 64, 12, 98, 68),
( 68, 3, 85, 58, 30, 143, 58),
( 69, 7, 21, 90, 79, 111, 55),
( 70, 5, 43, 64, 18, 107, 41),
( 71, 5, 53, 85, 22, 138, 45),
( 72, 10, 67, 49, 70, 117, 11),
( 73, 10, 97, 5, 6, 102, 11),
( 74, 7, 26, 59, 42, 85, 55),
( 75, 6, 5, 25, 92, 30, 176),
( 76, 4, 76, 26, 3, 102, 95),
( 77, 1, 18, 53, 95, 71, 8),
( 78, 0, 79, 51, 82, 130, 74),
( 79, 1, 72, 63, 3, 136, 48),
( 80, 9, 45, 24, 5, 70, 47),
( 81, 0, 46, 40, 19, 86, 73),
( 82, 8, 34, 72, 17, 107, 54),
( 83, 7, 36, 54, 47, 90, 3),
( 84, 3, 50, 57, 8, 107, 15),
( 85, 2, 66, 11, 7, 77, 53),
( 86, 0, 0, 82, 95, 83, 5),
( 87, 4, 13, 38, 16, 51, 14),
( 88, 6, 61, 10, 31, 72, 56),
( 89, 7, 21, 17, 34, 38, 34),
( 90, 3, 77, 31, 96, 108, 197),
( 91, 3, 90, 27, 44, 117, 79),
( 92, 0, 87, 24, 64, 110, 38),
( 93, 6, 90, 32, 98, 122, 193),
( 94, 10, 82, 65, 19, 147, 48),
( 95, 1, 58, 93, 68, 152, 9),
( 96, 7, 42, 43, 16, 84, 27),
( 97, 5, 29, 31, 1, 60, 9),
( 98, 2, 49, 70, 58, 119, 6),
( 99, 6, 51, 51, 86, 102, 162),
(100, 4, 18, 35, 85, 53, 30)
;
The solution can be found at:
Sample input
customer_id | segment | metric_1 | metric_2 | metric_3 | metric_4 | metric_5 |
---|---|---|---|---|---|---|
1 | 1 | 21 | 58 | 66 | 79 | 29 |
2 | 0 | 70 | 55 | 79 | 125 | 2 |
3 | 1 | 68 | 55 | 10 | 123 | 70 |
4 | 1 | 20 | 62 | 59 | 82 | 25 |
5 | 0 | 42 | 9 | 80 | 51 | 13 |
6 | 1 | 26 | 89 | 17 | 115 | 66 |
7 | 1 | 45 | 51 | 90 | 96 | 17 |
8 | 0 | 4 | 52 | 47 | 56 | 61 |
9 | 0 | 57 | 48 | 82 | 105 | 40 |
10 | 1 | 17 | 93 | 45 | 109 | 76 |
with metrics(customer_id, segment, metric_1, metric_2, metric_3, metric_4, metric_5) as (
values
( 1, 1, 21, 58, 66, 79, 29),
( 2, 0, 70, 55, 79, 125, 2),
( 3, 1, 68, 55, 10, 123, 70),
( 4, 1, 20, 62, 59, 82, 25),
( 5, 0, 42, 9, 80, 51, 13),
( 6, 1, 26, 89, 17, 115, 66),
( 7, 1, 45, 51, 90, 96, 17),
( 8, 0, 4, 52, 47, 56, 61),
( 9, 0, 57, 48, 82, 105, 40),
(10, 1, 17, 93, 45, 109, 76)
)
Sample output
segment | metric_pair | correlation |
---|---|---|
0 | metric_1, metric_3 | 0.9051 |
1 | metric_4, metric_5 | 0.8357 |
solution(segment, metric_pair, correlation) as (
values
(0, 'metric_1, metric_3', 0.9051),
(1, 'metric_4, metric_5', 0.8357)
)
Hint 1
Use a correlation function, usually called CORR
, to calculate the correlation between two metrics.
If you're using a database that doesn't have a built-in correlation function, you can try to calculate it manually -- but I'd instead recommend skipping this question.
Hint 2
To get every pair of metrics "side by side" for the CORR
function, unpivot the table so that each metric is in its own row and then join the resulting table to itself on the segment
and customer_id
columns.