Region precipitation ☔

Scenario

Analysts are using a weather dataset to understand the precipitation levels in different regions.

They have a dataset with the average precipitation levels per month for various "grids" in the UK.

A "grid" consists of a region and a location, joined together with a hyphen. For example, AC-27 is a grid; the region is AC and the location is 27.

For grids that they are studying, they need the average of the monthly precipitation levels. Note that "average" here is just the mean of the values, not a weighted average or anything more complex.

They are also aware that the dataset has some gaps in it, so to fill some values for the grids not in the dataset, they use the following logic:

If the grid exists in the dataset, use the average of the monthly precipitation levels for the grid.
If the grid doesn't exist but other grids in the same region do, use the average of the monthly precipitation levels for the grids in the same region.
Otherwise, use the average of the monthly precipitation levels for the whole dataset.

In each case, round the average to six decimal places.

Question

Find the average precipitation levels for the grids below:

AC-27
AQ-54
AQ-55
AQ-56
BK-45
BK-77
BR-18
X-17

The output should have a row for each of the grids above, with the columns:

grid_id
average_precipitation as the average precipitation level for the grid, using the logic above

Order the output by grid_id.

Expand for the DDL

create table precipitation (
    grid_id      varchar primary key,
    pr_january   numeric(14, 9),
    pr_february  numeric(14, 9),
    pr_march     numeric(14, 9),
    pr_april     numeric(14, 9),
    pr_may       numeric(14, 9),
    pr_june      numeric(14, 9),
    pr_july      numeric(14, 9),
    pr_august    numeric(14, 9),
    pr_september numeric(14, 9),
    pr_october   numeric(14, 9),
    pr_november  numeric(14, 9),
    pr_december  numeric(14, 9)
);
insert into precipitation
values
    ('AC-60',  77.313659668,  62.118778229,  62.963756561,  56.346126556,  56.507827759,  69.902297974,  76.416755676,  89.419967651,  70.767288208,  91.376182556,  94.308906555,  92.126197815),
    ('AC-27', 185.549865723, 158.017700195, 135.172912598,  93.645156860,  78.538978577,  80.773406982,  94.989349365, 108.333747864, 135.921539307, 162.479400635, 162.757064819, 177.592956543),
    ('AC-62',  81.204818726,  69.791305542,  64.281105042,  58.210914612,  60.880943298,  69.017036438,  80.740959167,  85.535209656,  68.737182617,  84.757614136,  90.352188110,  84.427131653),
    ('AQ-54', 220.434295654, 175.195220947, 154.287719727, 116.504539490, 110.188606262, 119.045143127, 136.076614380, 157.373947144, 142.956726074, 210.527175903, 220.573928833, 225.163681030),
    ('AQ-56', 143.682556152, 115.634277344,  99.110107422,  76.086784363,  79.841613770,  83.851272583,  95.768104553, 108.682968140,  99.521377563, 145.173202515, 142.895782471, 146.305435181),
    ('AQ-89', 191.861526489, 142.057312012, 126.669769287, 102.783355713, 103.405494690, 107.407104492, 132.361694336, 149.649581909, 137.866043091, 196.463287354, 197.636199951, 211.932723999),
    ('BK-93',  95.801376343,  66.791961670,  52.871055603,  54.724102020,  54.952892303,  52.194873810,  58.420711517,  64.284889221,  63.347404480, 100.553070068,  99.549606323,  99.715141296),
    ('BK-77',  50.038841248,  38.004615784,  36.342658997,  41.668888092,  48.747943878,  55.387825012,  60.916931152,  63.336502075,  49.749183655,  63.458656311,  56.233974457,  53.735862732),
    ('BK-89',  59.065109253,  44.379257202,  36.788879395,  42.614151001,  46.291542053,  52.152229309,  47.220512390,  57.204319000,  51.107784271,  67.685745239,  65.681800842,  59.279094696),
    ('X-57',  120.805274963,  99.440124512,  90.794105530,  75.126007080,  77.335937500,  83.300735474,  95.619758606, 100.679786682,  92.218643188, 112.163681030, 121.698951721, 130.179748535),
    ('X-64',  110.184608459,  70.485427856,  83.828720093,  67.215316772,  83.502983093,  84.731437683,  76.147949219,  85.476707458,  78.607589722, 108.458976746, 117.972114563, 111.607467651),
    ('X-59',  150.341644287, 124.240226746, 110.852073669,  88.458465576,  89.581214905,  98.376708984, 105.874870300, 113.212600708, 106.017417908, 135.823379517, 147.751571655, 155.859909058)
;

The solution can be found at:

region-precipitation.md

A worked example is provided below to help illustrate the average calculations.

Sample input

Find the average precipitation levels for the grids:

AB-12
AB-99
Z-17

...given the precipitation levels below:

grid_id	pr_january	pr_february	pr_march	pr_april	pr_may	pr_june	pr_july	pr_august	pr_september	pr_october	pr_november	pr_december
AB-12	98.654982000	95.465774000	93.622460000	94.100401000	87.123098000	67.165477000	54.468731000	55.012740000	57.335890000	67.232145000	85.332001000	92.165432000
AB-34	154.119868000	125.977546000	101.024456000	134.523452000	99.456788000	95.025468000	92.135497000	93.653200000	98.126477000	103.332032000	111.360141000	125.216407000
C-56	56.963354000	76.455462000	61.879871000	87.666547000	85.931607000	83.636598000	51.258741000	65.165441000	71.636687000	94.654210000	92.632147000	101.300156000

with precipitation(grid_id, pr_january, pr_february, pr_march, pr_april, pr_may, pr_june, pr_july, pr_august, pr_september, pr_october, pr_november, pr_december) as (
    values
        ('AB-12',  98.654982000,  95.465774000,  93.622460000,  94.100401000, 87.123098000, 67.165477000, 54.468731000, 55.012740000, 57.335890000,  67.232145000,  85.332001000,  92.165432000),
        ('AB-34', 154.119868000, 125.977546000, 101.024456000, 134.523452000, 99.456788000, 95.025468000, 92.135497000, 93.653200000, 98.126477000, 103.332032000, 111.360141000, 125.216407000),
        ('C-56',   56.963354000,  76.455462000,  61.879871000,  87.666547000, 85.931607000, 83.636598000, 51.258741000, 65.165441000, 71.636687000,  94.654210000,  92.632147000, 101.300156000)
)

Sample output

grid_id	average_precipitation
AB-12	78.973261
AB-99	95.067936
Z-17	89.189202

solution(grid_id, average_precipitation) as (
    values
        ('AB-12', 78.973261),
        ('AB-99', 95.067936),
        ('Z-17',  89.189202)
)

Hint 1

Create a lookup table with the average precipitation levels for each grid in the dataset, as well as rolled up averages (by using ROLLUP) for each region and the entire dataset.

Hint 2

Use the GROUPING (DuckDB) or GROUPING_ID (SQL Server) function to determine if a row is a total, subtotal, or grand total, and then join the lookup table to the dataset three times—one for each total type—using this group ID (and any other required columns).

Success

If you're interested in using some real (and complete!) Met Office data, you can find plenty at:

https://climatedataportal.metoffice.gov.uk/

Worked examples

To help illustrate the average calculations, consider the following grids:

AC-27
AC-28
Z-14

Let's briefly walk through the average calculation for each of these grids.

`AC-27`

The dataset has precipitation levels for AC-27, so we use the mean of the twelve monthly precipitation levels for AC-27: this is 131.147673.

`AC-28`

The dataset doesn't have precipitation levels for AC-28, but it does have precipitation levels for other AC locations (namely, AC-27, AC-60, and AC-62). We use the mean of the monthly precipitation levels for each grid in the AC region: this is 93.646562.

`Z-14`

The dataset doesn't have precipitation levels for Z-14, and it doesn't have precipitation levels for any other Z locations. We use the mean of the monthly precipitation levels for all grids in the dataset: this is 99.378632.