Advanced aggregations 🥇
Success
We saw in the pivot table section that we can extend the GROUP BY
clause with the ROLLUP
modifier.
There are two additional modifiers that we can use to extend the GROUP BY
clause with even more flexibility: GROUPING SETS
and CUBE
.
Warning
The GROUPING SETS
and CUBE
modifiers are advanced, and there aren't Excel equivalents for these (without using advanced Excel features).
Make sure that you are comfortable with the main concepts before diving into these advanced concepts.
Let's start with a pivot table recap
We saw the ROLLUP
modifier in the pivot table section which allows us to create subtotals and grand totals in our GROUP BY
clause.
SELECT
FORMAT(OrderDate, 'yyyy-MM') AS OrderMonth,
OnlineOrderFlag,
SUM(TotalDue) AS TotalSales
FROM Sales.SalesOrderHeader
WHERE '2013-01-01' <= OrderDate AND OrderDate < '2013-07-01'
GROUP BY ROLLUP (
FORMAT(OrderDate, 'yyyy-MM'),
OnlineOrderFlag
)
ORDER BY
OrderMonth,
OnlineOrderFlag
;
The output of this query would include subtotals for each combination of OrderMonth
and OnlineOrderFlag
, as well as a grand total for each OrderMonth
and a grand total for the entire dataset:
OrderMonth | OnlineOrderFlag | TotalSales | |
---|---|---|---|
null | null | 20996947.7407 | ← This is the grand total |
2013-01 | null | 2340061.5521 | ← This is the 2013-01 subtotal |
2013-01 | false | 1761132.8322 | |
2013-01 | true | 578928.7199 | |
2013-02 | null | 2600218.8667 | ← This is the 2013-02 subtotal |
2013-02 | false | 2101152.5476 | |
2013-02 | true | 499066.3191 | |
2013-03 | null | 3831605.9389 | ← This is the 2013-03 subtotal |
2013-03 | false | 3244501.4287 | |
2013-03 | true | 587104.5102 | |
2013-04 | null | 2840711.1734 | ← This is the 2013-04 subtotal |
2013-04 | false | 2239156.6675 | |
2013-04 | true | 601554.5059 | |
2013-05 | null | 3658084.9461 | ← This is the 2013-05 subtotal |
2013-05 | false | 3019173.6253 | |
2013-05 | true | 638911.3208 | |
2013-06 | null | 5726265.2635 | ← This is the 2013-06 subtotal |
2013-06 | false | 4775809.3027 | |
2013-06 | true | 950455.9608 |
One thing you might notice is that we only have the subtotals for OrderMonth
. Where are the subtotals for OnlineOrderFlag
?
As mentioned in the pivot tables section, the ROLLUP
modifier follows the same rules as Excel's pivot tables and don't create subtotals for every column.
This is where the GROUPING SETS
modifier comes in.
GROUPING SETS
allow us to specify the subtotals we want
We use the GROUPING SETS
modifier in a very similar way to the ROLLUP
modifier. We write GROUPING SETS
after the GROUP BY
, but rather than specify a list of columns, we specify a list of lists of columns.
I'll say that again: we specify a list of lists of columns.
This is a bit funky 😝
Each list of columns in the GROUPING SETS
clause will create a subtotal for the combination of columns in that list. To specify a grand total, we use an empty list.
To see this, let's rewrite the previous query using GROUPING SETS
instead of ROLLUP
:
SELECT
FORMAT(OrderDate, 'yyyy-MM') AS OrderMonth,
OnlineOrderFlag,
SUM(TotalDue) AS TotalSales
FROM Sales.SalesOrderHeader
WHERE '2013-01-01' <= OrderDate AND OrderDate < '2013-07-01'
GROUP BY GROUPING SETS (
(),
(FORMAT(OrderDate, 'yyyy-MM')),
(FORMAT(OrderDate, 'yyyy-MM'), OnlineOrderFlag)
)
ORDER BY
OrderMonth,
OnlineOrderFlag
;
Specifically:
()
corresponds to the grand total.(FORMAT(OrderDate, 'yyyy-MM'))
corresponds to the subtotals forOrderMonth
.(FORMAT(OrderDate, 'yyyy-MM'), OnlineOrderFlag)
corresponds to the totals for each combination ofOrderMonth
andOnlineOrderFlag
.
Specifying the subtotals explicitly like this gives us more control over the output of the query. For example, it's super easy to add a subtotal for OnlineOrderFlag
by adding another list to the GROUPING SETS
clause:
SELECT
FORMAT(OrderDate, 'yyyy-MM') AS OrderMonth,
OnlineOrderFlag,
SUM(TotalDue) AS TotalSales
FROM Sales.SalesOrderHeader
WHERE '2013-01-01' <= OrderDate AND OrderDate < '2013-07-01'
GROUP BY GROUPING SETS (
(),
(FORMAT(OrderDate, 'yyyy-MM')),
(OnlineOrderFlag),
(FORMAT(OrderDate, 'yyyy-MM'), OnlineOrderFlag)
)
ORDER BY
OrderMonth,
OnlineOrderFlag
;
OrderMonth | OnlineOrderFlag | TotalSales | |
---|---|---|---|
null | null | 20996947.7407 | |
null | false | 17140926.4040 | ← This is new |
null | true | 3856021.3367 | ← This is new |
2013-01 | null | 2340061.5521 | |
2013-01 | false | 1761132.8322 | |
2013-01 | true | 578928.7199 | |
2013-02 | null | 2600218.8667 | |
2013-02 | false | 2101152.5476 | |
2013-02 | true | 499066.3191 | |
2013-03 | null | 3831605.9389 | |
2013-03 | false | 3244501.4287 | |
2013-03 | true | 587104.5102 | |
2013-04 | null | 2840711.1734 | |
2013-04 | false | 2239156.6675 | |
2013-04 | true | 601554.5059 | |
2013-05 | null | 3658084.9461 | |
2013-05 | false | 3019173.6253 | |
2013-05 | true | 638911.3208 | |
2013-06 | null | 5726265.2635 | |
2013-06 | false | 4775809.3027 | |
2013-06 | true | 950455.9608 |
Since we control exactly which subtotals we want, we can create a much more customised output than we could with ROLLUP
!
CUBE
is like ROLLUP
but for every combination of columns
In the last example above, we added a subtotal for OnlineOrderFlag
by adding another list to the GROUPING SETS
clause. This meant that we were creating subtotals for every combination of OrderMonth
and OnlineOrderFlag
:
()
for neither (the grand total).(FORMAT(OrderDate, 'yyyy-MM'))
for justOrderMonth
.(OnlineOrderFlag)
for justOnlineOrderFlag
.(FORMAT(OrderDate, 'yyyy-MM'), OnlineOrderFlag)
for both.
With two columns, we write four lists in the GROUPING SETS
clause. With three columns, we'd write eight lists. With four columns, we'd write sixteen lists... So this can easily get out of hand!
The CUBE
modifier is like ROLLUP
but for every combination of columns. It's a shortcut for writing out all the combinations of columns in the GROUPING SETS
clause. It's also like ROLLUP
because we write a list of columns, not a list of lists of columns.
To see this, let's rewrite the previous query using CUBE
instead of GROUPING SETS
:
SELECT
FORMAT(OrderDate, 'yyyy-MM') AS OrderMonth,
OnlineOrderFlag,
SUM(TotalDue) AS TotalSales
FROM Sales.SalesOrderHeader
WHERE '2013-01-01' <= OrderDate AND OrderDate < '2013-07-01'
GROUP BY CUBE (
FORMAT(OrderDate, 'yyyy-MM'),
OnlineOrderFlag
)
ORDER BY
OrderMonth,
OnlineOrderFlag
;
This will give us the same output as the previous query, but we didn't have to write out all the combinations of columns in the GROUPING SETS
clause.
We can still use GROUPING
/GROUPING_ID
to identify subtotals
Like with ROLLUP
, we can use the GROUPING
and GROUPING_ID
functions to identify which columns are subtotals. For example, here are the same queries as above but with the GROUPING_ID
function added to the SELECT
clause (they have the same output, so only one is shown):
SELECT
GROUPING_ID(FORMAT(OrderDate, 'yyyy-MM'), OnlineOrderFlag) AS GroupingId,
FORMAT(OrderDate, 'yyyy-MM') AS OrderMonth,
OnlineOrderFlag,
SUM(TotalDue) AS TotalSales
FROM Sales.SalesOrderHeader
WHERE '2013-01-01' <= OrderDate AND OrderDate < '2013-07-01'
GROUP BY GROUPING SETS (
(),
(FORMAT(OrderDate, 'yyyy-MM')),
(OnlineOrderFlag),
(FORMAT(OrderDate, 'yyyy-MM'), OnlineOrderFlag)
)
ORDER BY
OrderMonth,
OnlineOrderFlag
;
SELECT
GROUPING_ID(FORMAT(OrderDate, 'yyyy-MM'), OnlineOrderFlag) AS GroupingId,
FORMAT(OrderDate, 'yyyy-MM') AS OrderMonth,
OnlineOrderFlag,
SUM(TotalDue) AS TotalSales
FROM Sales.SalesOrderHeader
WHERE '2013-01-01' <= OrderDate AND OrderDate < '2013-07-01'
GROUP BY CUBE (
FORMAT(OrderDate, 'yyyy-MM'),
OnlineOrderFlag
)
ORDER BY
OrderMonth,
OnlineOrderFlag
;
GroupingId | OrderMonth | OnlineOrderFlag | TotalSales |
---|---|---|---|
3 | null | null | 20996947.7407 |
2 | null | false | 17140926.4040 |
2 | null | true | 3856021.3367 |
1 | 2013-01 | null | 2340061.5521 |
0 | 2013-01 | false | 1761132.8322 |
0 | 2013-01 | true | 578928.7199 |
1 | 2013-02 | null | 2600218.8667 |
0 | 2013-02 | false | 2101152.5476 |
0 | 2013-02 | true | 499066.3191 |
1 | 2013-03 | null | 3831605.9389 |
0 | 2013-03 | false | 3244501.4287 |
0 | 2013-03 | true | 587104.5102 |
1 | 2013-04 | null | 2840711.1734 |
0 | 2013-04 | false | 2239156.6675 |
0 | 2013-04 | true | 601554.5059 |
1 | 2013-05 | null | 3658084.9461 |
0 | 2013-05 | false | 3019173.6253 |
0 | 2013-05 | true | 638911.3208 |
1 | 2013-06 | null | 5726265.2635 |
0 | 2013-06 | false | 4775809.3027 |
0 | 2013-06 | true | 950455.9608 |
"Cubes" are actually a well-known concept
Warning
This is no longer an SQL concept; this is a general concept in mathematics and computer science.
The CUBE
modifier creates what's known as an Online Analytical Processing (OLAP) cube.
These are used in data warehousing and business intelligence to enable more performant reporting and analytics. They're a bit more advanced than what we're covering here, but it's good to know that the SQL CUBE
modifier is based on a well-known concept.
Tip
If you've used Excel's Power Pivot, you've already used OLAP cubes!
Power Pivot creates OLAP cubes behind the scenes so that using the power functions (like CUBEVALUE
) can just "lookup" from these OLAP cubes.
Further reading
Check out the official Microsoft documentation for more information on GROUPING SETS
and CUBE
at: