Region precipitation โ
Scenario
Analysts are using a weather dataset to understand the precipitation levels in different regions.
They have a dataset with the average precipitation levels per month for various "grids" in the UK.
A "grid" consists of a region and a location, joined together with a hyphen. For example, AC-27
is a grid; the region is AC
and the location is 27
.
For grids that they are studying, they need the average of the monthly precipitation levels. Note that "average" here is just the mean of the values, not a weighted average or anything more complex.
They are also aware that the dataset has some gaps in it, so to fill some values for the grids not in the dataset, they use the following logic:
- If the grid exists in the dataset, use the average of the monthly precipitation levels for the grid.
- If the grid doesn't exist but other grids in the same region do, use the average of the monthly precipitation levels for the grids in the same region.
- Otherwise, use the average of the monthly precipitation levels for the whole dataset.
In each case, round the average to six decimal places.
Question
Find the average precipitation levels for the grids below:
AC-27
AQ-54
AQ-55
AQ-56
BK-45
BK-77
BR-18
X-17
The output should have a row for each of the grids above, with the columns:
grid_id
average_precipitation
as the average precipitation level for the grid, using the logic above
Order the output by grid_id
.
Expand for the DDL
create table precipitation (
grid_id varchar primary key,
pr_january numeric(14, 9),
pr_february numeric(14, 9),
pr_march numeric(14, 9),
pr_april numeric(14, 9),
pr_may numeric(14, 9),
pr_june numeric(14, 9),
pr_july numeric(14, 9),
pr_august numeric(14, 9),
pr_september numeric(14, 9),
pr_october numeric(14, 9),
pr_november numeric(14, 9),
pr_december numeric(14, 9)
);
insert into precipitation
values
('AC-60', 77.313659668, 62.118778229, 62.963756561, 56.346126556, 56.507827759, 69.902297974, 76.416755676, 89.419967651, 70.767288208, 91.376182556, 94.308906555, 92.126197815),
('AC-27', 185.549865723, 158.017700195, 135.172912598, 93.645156860, 78.538978577, 80.773406982, 94.989349365, 108.333747864, 135.921539307, 162.479400635, 162.757064819, 177.592956543),
('AC-62', 81.204818726, 69.791305542, 64.281105042, 58.210914612, 60.880943298, 69.017036438, 80.740959167, 85.535209656, 68.737182617, 84.757614136, 90.352188110, 84.427131653),
('AQ-54', 220.434295654, 175.195220947, 154.287719727, 116.504539490, 110.188606262, 119.045143127, 136.076614380, 157.373947144, 142.956726074, 210.527175903, 220.573928833, 225.163681030),
('AQ-56', 143.682556152, 115.634277344, 99.110107422, 76.086784363, 79.841613770, 83.851272583, 95.768104553, 108.682968140, 99.521377563, 145.173202515, 142.895782471, 146.305435181),
('AQ-89', 191.861526489, 142.057312012, 126.669769287, 102.783355713, 103.405494690, 107.407104492, 132.361694336, 149.649581909, 137.866043091, 196.463287354, 197.636199951, 211.932723999),
('BK-93', 95.801376343, 66.791961670, 52.871055603, 54.724102020, 54.952892303, 52.194873810, 58.420711517, 64.284889221, 63.347404480, 100.553070068, 99.549606323, 99.715141296),
('BK-77', 50.038841248, 38.004615784, 36.342658997, 41.668888092, 48.747943878, 55.387825012, 60.916931152, 63.336502075, 49.749183655, 63.458656311, 56.233974457, 53.735862732),
('BK-89', 59.065109253, 44.379257202, 36.788879395, 42.614151001, 46.291542053, 52.152229309, 47.220512390, 57.204319000, 51.107784271, 67.685745239, 65.681800842, 59.279094696),
('X-57', 120.805274963, 99.440124512, 90.794105530, 75.126007080, 77.335937500, 83.300735474, 95.619758606, 100.679786682, 92.218643188, 112.163681030, 121.698951721, 130.179748535),
('X-64', 110.184608459, 70.485427856, 83.828720093, 67.215316772, 83.502983093, 84.731437683, 76.147949219, 85.476707458, 78.607589722, 108.458976746, 117.972114563, 111.607467651),
('X-59', 150.341644287, 124.240226746, 110.852073669, 88.458465576, 89.581214905, 98.376708984, 105.874870300, 113.212600708, 106.017417908, 135.823379517, 147.751571655, 155.859909058)
;
The solution can be found at:
A worked example is provided below to help illustrate the average calculations.
Sample input
Find the average precipitation levels for the grids:
AB-12
AB-99
Z-17
...given the precipitation levels below:
grid_id | pr_january | pr_february | pr_march | pr_april | pr_may | pr_june | pr_july | pr_august | pr_september | pr_october | pr_november | pr_december |
---|---|---|---|---|---|---|---|---|---|---|---|---|
AB-12 | 98.654982000 | 95.465774000 | 93.622460000 | 94.100401000 | 87.123098000 | 67.165477000 | 54.468731000 | 55.012740000 | 57.335890000 | 67.232145000 | 85.332001000 | 92.165432000 |
AB-34 | 154.119868000 | 125.977546000 | 101.024456000 | 134.523452000 | 99.456788000 | 95.025468000 | 92.135497000 | 93.653200000 | 98.126477000 | 103.332032000 | 111.360141000 | 125.216407000 |
C-56 | 56.963354000 | 76.455462000 | 61.879871000 | 87.666547000 | 85.931607000 | 83.636598000 | 51.258741000 | 65.165441000 | 71.636687000 | 94.654210000 | 92.632147000 | 101.300156000 |
with precipitation(grid_id, pr_january, pr_february, pr_march, pr_april, pr_may, pr_june, pr_july, pr_august, pr_september, pr_october, pr_november, pr_december) as (
values
('AB-12', 98.654982000, 95.465774000, 93.622460000, 94.100401000, 87.123098000, 67.165477000, 54.468731000, 55.012740000, 57.335890000, 67.232145000, 85.332001000, 92.165432000),
('AB-34', 154.119868000, 125.977546000, 101.024456000, 134.523452000, 99.456788000, 95.025468000, 92.135497000, 93.653200000, 98.126477000, 103.332032000, 111.360141000, 125.216407000),
('C-56', 56.963354000, 76.455462000, 61.879871000, 87.666547000, 85.931607000, 83.636598000, 51.258741000, 65.165441000, 71.636687000, 94.654210000, 92.632147000, 101.300156000)
)
Sample output
grid_id | average_precipitation |
---|---|
AB-12 | 78.973261 |
AB-99 | 95.067936 |
Z-17 | 89.189202 |
solution(grid_id, average_precipitation) as (
values
('AB-12', 78.973261),
('AB-99', 95.067936),
('Z-17', 89.189202)
)
Hint 1
Create a lookup table with the average precipitation levels for each grid in the dataset, as well as rolled up averages (by using ROLLUP
) for each region and the entire dataset.
Hint 2
Use the GROUPING
(DuckDB) or GROUPING_ID
(SQL Server) function to determine if a row is a total, subtotal, or grand total, and then join the lookup table to the dataset three timesโone for each total typeโusing this group ID (and any other required columns).
Success
If you're interested in using some real (and complete!) Met Office data, you can find plenty at:
Worked examples
To help illustrate the average calculations, consider the following grids:
AC-27
AC-28
Z-14
Let's briefly walk through the average calculation for each of these grids.
AC-27
The dataset has precipitation levels for AC-27
, so we use the mean of the twelve monthly precipitation levels for AC-27
: this is 131.147673
.
AC-28
The dataset doesn't have precipitation levels for AC-28
, but it does have precipitation levels for other AC
locations (namely, AC-27
, AC-60
, and AC-62
). We use the mean of the monthly precipitation levels for each grid in the AC
region: this is 93.646562
.
Z-14
The dataset doesn't have precipitation levels for Z-14
, and it doesn't have precipitation levels for any other Z
locations. We use the mean of the monthly precipitation levels for all grids in the dataset: this is 99.378632
.