Skip to content

Region precipitation โ˜”

Scenario

Analysts are using a weather dataset to understand the precipitation levels in different regions.

They have a dataset with the average precipitation levels per month for various "grids" in the UK.

A "grid" consists of a region and a location, joined together with a hyphen. For example, AC-27 is a grid; the region is AC and the location is 27.

For grids that they are studying, they need the average of the monthly precipitation levels. Note that "average" here is just the mean of the values, not a weighted average or anything more complex.

They are also aware that the dataset has some gaps in it, so to fill some values for the grids not in the dataset, they use the following logic:

  • If the grid exists in the dataset, use the average of the monthly precipitation levels for the grid.
  • If the grid doesn't exist but other grids in the same region do, use the average of the monthly precipitation levels for the grids in the same region.
  • Otherwise, use the average of the monthly precipitation levels for the whole dataset.

In each case, round the average to six decimal places.

Question

Find the average precipitation levels for the grids below:

  • AC-27
  • AQ-54
  • AQ-55
  • AQ-56
  • BK-45
  • BK-77
  • BR-18
  • X-17

The output should have a row for each of the grids above, with the columns:

  • grid_id
  • average_precipitation as the average precipitation level for the grid, using the logic above

Order the output by grid_id.

Expand for the DDL
create table precipitation (
    grid_id      varchar primary key,
    pr_january   numeric(14, 9),
    pr_february  numeric(14, 9),
    pr_march     numeric(14, 9),
    pr_april     numeric(14, 9),
    pr_may       numeric(14, 9),
    pr_june      numeric(14, 9),
    pr_july      numeric(14, 9),
    pr_august    numeric(14, 9),
    pr_september numeric(14, 9),
    pr_october   numeric(14, 9),
    pr_november  numeric(14, 9),
    pr_december  numeric(14, 9)
);
insert into precipitation
values
    ('AC-60',  77.313659668,  62.118778229,  62.963756561,  56.346126556,  56.507827759,  69.902297974,  76.416755676,  89.419967651,  70.767288208,  91.376182556,  94.308906555,  92.126197815),
    ('AC-27', 185.549865723, 158.017700195, 135.172912598,  93.645156860,  78.538978577,  80.773406982,  94.989349365, 108.333747864, 135.921539307, 162.479400635, 162.757064819, 177.592956543),
    ('AC-62',  81.204818726,  69.791305542,  64.281105042,  58.210914612,  60.880943298,  69.017036438,  80.740959167,  85.535209656,  68.737182617,  84.757614136,  90.352188110,  84.427131653),
    ('AQ-54', 220.434295654, 175.195220947, 154.287719727, 116.504539490, 110.188606262, 119.045143127, 136.076614380, 157.373947144, 142.956726074, 210.527175903, 220.573928833, 225.163681030),
    ('AQ-56', 143.682556152, 115.634277344,  99.110107422,  76.086784363,  79.841613770,  83.851272583,  95.768104553, 108.682968140,  99.521377563, 145.173202515, 142.895782471, 146.305435181),
    ('AQ-89', 191.861526489, 142.057312012, 126.669769287, 102.783355713, 103.405494690, 107.407104492, 132.361694336, 149.649581909, 137.866043091, 196.463287354, 197.636199951, 211.932723999),
    ('BK-93',  95.801376343,  66.791961670,  52.871055603,  54.724102020,  54.952892303,  52.194873810,  58.420711517,  64.284889221,  63.347404480, 100.553070068,  99.549606323,  99.715141296),
    ('BK-77',  50.038841248,  38.004615784,  36.342658997,  41.668888092,  48.747943878,  55.387825012,  60.916931152,  63.336502075,  49.749183655,  63.458656311,  56.233974457,  53.735862732),
    ('BK-89',  59.065109253,  44.379257202,  36.788879395,  42.614151001,  46.291542053,  52.152229309,  47.220512390,  57.204319000,  51.107784271,  67.685745239,  65.681800842,  59.279094696),
    ('X-57',  120.805274963,  99.440124512,  90.794105530,  75.126007080,  77.335937500,  83.300735474,  95.619758606, 100.679786682,  92.218643188, 112.163681030, 121.698951721, 130.179748535),
    ('X-64',  110.184608459,  70.485427856,  83.828720093,  67.215316772,  83.502983093,  84.731437683,  76.147949219,  85.476707458,  78.607589722, 108.458976746, 117.972114563, 111.607467651),
    ('X-59',  150.341644287, 124.240226746, 110.852073669,  88.458465576,  89.581214905,  98.376708984, 105.874870300, 113.212600708, 106.017417908, 135.823379517, 147.751571655, 155.859909058)
;

The solution can be found at:

A worked example is provided below to help illustrate the average calculations.


Sample input

Find the average precipitation levels for the grids:

  • AB-12
  • AB-99
  • Z-17

...given the precipitation levels below:

grid_id pr_january pr_february pr_march pr_april pr_may pr_june pr_july pr_august pr_september pr_october pr_november pr_december
AB-12 98.654982000 95.465774000 93.622460000 94.100401000 87.123098000 67.165477000 54.468731000 55.012740000 57.335890000 67.232145000 85.332001000 92.165432000
AB-34 154.119868000 125.977546000 101.024456000 134.523452000 99.456788000 95.025468000 92.135497000 93.653200000 98.126477000 103.332032000 111.360141000 125.216407000
C-56 56.963354000 76.455462000 61.879871000 87.666547000 85.931607000 83.636598000 51.258741000 65.165441000 71.636687000 94.654210000 92.632147000 101.300156000
with precipitation(grid_id, pr_january, pr_february, pr_march, pr_april, pr_may, pr_june, pr_july, pr_august, pr_september, pr_october, pr_november, pr_december) as (
    values
        ('AB-12',  98.654982000,  95.465774000,  93.622460000,  94.100401000, 87.123098000, 67.165477000, 54.468731000, 55.012740000, 57.335890000,  67.232145000,  85.332001000,  92.165432000),
        ('AB-34', 154.119868000, 125.977546000, 101.024456000, 134.523452000, 99.456788000, 95.025468000, 92.135497000, 93.653200000, 98.126477000, 103.332032000, 111.360141000, 125.216407000),
        ('C-56',   56.963354000,  76.455462000,  61.879871000,  87.666547000, 85.931607000, 83.636598000, 51.258741000, 65.165441000, 71.636687000,  94.654210000,  92.632147000, 101.300156000)
)
Sample output
grid_id average_precipitation
AB-12 78.973261
AB-99 95.067936
Z-17 89.189202
solution(grid_id, average_precipitation) as (
    values
        ('AB-12', 78.973261),
        ('AB-99', 95.067936),
        ('Z-17',  89.189202)
)
Hint 1

Create a lookup table with the average precipitation levels for each grid in the dataset, as well as rolled up averages (by using ROLLUP) for each region and the entire dataset.

Hint 2

Use the GROUPING (DuckDB) or GROUPING_ID (SQL Server) function to determine if a row is a total, subtotal, or grand total, and then join the lookup table to the dataset three timesโ€”one for each total typeโ€”using this group ID (and any other required columns).

Success

If you're interested in using some real (and complete!) Met Office data, you can find plenty at:


Worked examples

To help illustrate the average calculations, consider the following grids:

  • AC-27
  • AC-28
  • Z-14

Let's briefly walk through the average calculation for each of these grids.

AC-27

The dataset has precipitation levels for AC-27, so we use the mean of the twelve monthly precipitation levels for AC-27: this is 131.147673.

AC-28

The dataset doesn't have precipitation levels for AC-28, but it does have precipitation levels for other AC locations (namely, AC-27, AC-60, and AC-62). We use the mean of the monthly precipitation levels for each grid in the AC region: this is 93.646562.

Z-14

The dataset doesn't have precipitation levels for Z-14, and it doesn't have precipitation levels for any other Z locations. We use the mean of the monthly precipitation levels for all grids in the dataset: this is 99.378632.