Zarr multi-scales with pyramid_resample

Zarr multi-scales with `pyramid_resample`#

This notebook demonstrates how to use pyramid_resample to generate pyramids using pyresample’s block based interpolators.

import dask  # noqa
import xarray as xr

from ndpyramid import pyramid_resample

Setup output paths#

You can set the output path to object storage or a memory store, after changing the S3 URI to a location that you have write access to.

S3 = False
if S3:
    output = "s3://carbonplan-scratch/pyramid_comparison/resampled.zarr"
else:
    import zarr

    output = zarr.storage.MemoryStore()

Open input dataset#

In this example, we’ll use some of CarbonPlan’s downscaled CMIP6 data for the input dataset. It’s helpful to use chunked Zarr data as the input because we won’t need to manually rechunk the data, which can lead to a complicated Dask task graph if the rechunked data aren’t cached before pyramid generation.

ds = xr.open_zarr(
    "https://rice1.osn.mghpcc.org/carbonplan/cp-cmip/version1/rechunked_data/GARD-SV/ScenarioMIP.CCCma.CanESM5.ssp245.r1i1p1f1.month.GARD-SV.tasmax.zarr"
)
ds = ds.rio.write_crs("EPSG:4326")
ds

<xarray.Dataset> Size: 4GB
Dimensions:      (lat: 721, lon: 1440, time: 1020)
Coordinates:
  * lat          (lat) float32 3kB -90.0 -89.75 -89.5 -89.25 ... 89.5 89.75 90.0
  * lon          (lon) float32 6kB -180.0 -179.8 -179.5 ... 179.2 179.5 179.8
  * time         (time) datetime64[ns] 8kB 2015-01-01 2015-02-01 ... 2099-12-01
    spatial_ref  int64 8B 0
Data variables:
    tasmax       (time, lat, lon) float32 4GB dask.array<chunksize=(780, 72, 144), meta=np.ndarray>
Attributes: (12/17)
    Conventions:                     CF-1.8
    activity_id:                     ScenarioMIP
    cmip6_downscaling_contact:       hello@carbonplan.org
    cmip6_downscaling_explainer:     https://carbonplan.org/research/cmip6-do...
    cmip6_downscaling_institution:   CarbonPlan
    cmip6_downscaling_license:       CC-BY-4.0
    ...                              ...
    institution_id:                  CCCma
    member_id:                       r1i1p1f1
    references:                      Eyring, V., Bony, S., Meehl, G. A., Seni...
    source_id:                       CanESM5
    timescale:                       day
    variable_id:                     tasmax

Generate pyramids#

We’ll use the pyramid_resample function to generate a pyramid with two levels. This function currently requires specifying which dimensions correspond to x and y. The data also must have longitude values between -180 and 180 and a dimension order of (time, y, x) for 3D datasets or (y, x) for 2D datasets. We’ll use the nearest and bilinear resampling algorithms for comparison with the other pyramid generation methods.

First, use the nearest neighbor method.

%%time
levels = 2
resampled_pyramid = pyramid_resample(ds, x="lon", y="lat", levels=levels, resampling="nearest")
resampled_pyramid.to_zarr(output, zarr_format=2, consolidated=True, mode="w")

CPU times: user 51.5 s, sys: 23.3 s, total: 1min 14s
Wall time: 2min 13s

Next, use the bilinear method.

%%time
levels = 2
resampled_pyramid = pyramid_resample(ds, x="lon", y="lat", levels=levels, resampling="bilinear")
resampled_pyramid.to_zarr(output, zarr_format=2, consolidated=True, mode="w")

CPU times: user 37.2 s, sys: 13.1 s, total: 50.3 s
Wall time: 1min 7s

Open and plot the result#

dt = xr.open_datatree(output, engine="zarr")
dt["1"].ds.isel(time=0).tasmax.plot()

<matplotlib.collections.QuadMesh at 0x17e32e180>

../_images/938237d6eab6c4d69d07632895b6692b19a961fa87fed85fcf6e97907f5a7ca7.png