Zarr multi-scales with pyramid_resample
#
This notebook demonstrates how to use pyramid_resample
to generate pyramids using pyresample
’s block based interpolators.
[1]:
import dask # noqa
import datatree
import xarray as xr
from ndpyramid import pyramid_resample
Setup output paths#
You can set the output path to object storage or a memory store, after changing the S3 URI to a location that you have write access to.
[2]:
S3 = False
if S3:
output = "s3://carbonplan-scratch/pyramid_comparison/resampled.zarr"
else:
import zarr
output = zarr.storage.MemoryStore()
Open input dataset#
In this example, we’ll use some of CarbonPlan’s downscaled CMIP6 data for the input dataset. It’s helpful to use chunked Zarr data as the input because we won’t need to manually rechunk the data, which can lead to a complicated Dask task graph if the rechunked data aren’t cached before pyramid generation.
[3]:
ds = xr.open_zarr(
'https://rice1.osn.mghpcc.org/carbonplan/cp-cmip/version1/rechunked_data/GARD-SV/ScenarioMIP.CCCma.CanESM5.ssp245.r1i1p1f1.month.GARD-SV.tasmax.zarr'
)
ds = ds.rio.write_crs("EPSG:4326")
ds
[3]:
<xarray.Dataset> Size: 4GB Dimensions: (lat: 721, lon: 1440, time: 1020) Coordinates: * lat (lat) float32 3kB -90.0 -89.75 -89.5 -89.25 ... 89.5 89.75 90.0 * lon (lon) float32 6kB -180.0 -179.8 -179.5 ... 179.2 179.5 179.8 * time (time) datetime64[ns] 8kB 2015-01-01 2015-02-01 ... 2099-12-01 spatial_ref int64 8B 0 Data variables: tasmax (time, lat, lon) float32 4GB dask.array<chunksize=(780, 72, 144), meta=np.ndarray> Attributes: (12/17) Conventions: CF-1.8 activity_id: ScenarioMIP cmip6_downscaling_contact: hello@carbonplan.org cmip6_downscaling_explainer: https://carbonplan.org/research/cmip6-do... cmip6_downscaling_institution: CarbonPlan cmip6_downscaling_license: CC-BY-4.0 ... ... institution_id: CCCma member_id: r1i1p1f1 references: Eyring, V., Bony, S., Meehl, G. A., Seni... source_id: CanESM5 timescale: day variable_id: tasmax
Generate pyramids#
We’ll use the pyramid_resample
function to generate a pyramid with two levels. This function currently requires specifying which dimensions correspond to x
and y
. The data also must have longitude values between -180 and 180 and a dimension order of (time
, y
, x
) for 3D datasets or (y
, x
) for 2D datasets. We’ll use the nearest
and bilinear
resampling algorithms for comparison with the other pyramid generation methods.
First, use the nearest neighbor method.
[4]:
%%time
levels = 2
resampled_pyramid = pyramid_resample(ds, x="lon", y="lat", levels=levels, resampling="nearest")
resampled_pyramid.to_zarr(output, consolidated=True, mode='w')
CPU times: user 21.5 s, sys: 6.88 s, total: 28.4 s
Wall time: 24.2 s
Next, use the bilinear method.
[5]:
%%time
levels = 2
resampled_pyramid = pyramid_resample(ds, x="lon", y="lat", levels=levels, resampling="bilinear")
resampled_pyramid.to_zarr(output, consolidated=True, mode='w')
CPU times: user 23.7 s, sys: 6.41 s, total: 30.1 s
Wall time: 21.5 s
Open and plot the result#
[6]:
dt = datatree.open_datatree(output, engine="zarr")
dt["1"].ds.isel(time=0).tasmax.plot()
[6]:
<matplotlib.collections.QuadMesh at 0x7fe28fce15d0>
[ ]: