Zarr-multiscales from GeoTiffs#

by Joe Hamman & Jeremy Freeman (CarbonPlan), September 27, 2021, Updated by Max Jones (CarbonPlan), 2024

This notebook demonstrates the production of Zarr data pyramids for use with @carbonplan/maps, an api for interactive multi-dimensional data-driven web maps.

Some of the libraries used here are in pre-release condition. Specifically ndpyramid and datatree are currently udergoing rapid development. Use the pattern below but expect changes to the specific apis.

All of the libraries used in this demonstration are included in this conda environment file.

[1]:
import matplotlib.pyplot as plt
import pandas as pd
import xarray as xr

from ndpyramid import pyramid_reproject
[2]:
VERSION = 2
LEVELS = 6
PIXELS_PER_TILE = 128
S3 = False
input_path = f"s3://carbonplan-maps/v{VERSION}/demo/raw"
if S3:
    base = f"s3://carbonplan-maps/v{VERSION}/demo/"
    store_2d = base + "2d/tavg"
    store_3d = base + "3d/tavg-prec"
    store_3d_1var = base + "3d/tavg-month"
    store_4d = base + "4d/tavg-prec-month"
else:
    import zarr

    store_2d = zarr.storage.MemoryStore()
    store_3d = zarr.storage.MemoryStore()
    store_3d_1var = zarr.storage.MemoryStore()
    store_4d = zarr.storage.MemoryStore()

2d (tavg)#

In this example we open a single 2d image (GeoTIFF) and create a Zarr pyramid.

[3]:
%%time
# input dataset
path = f"{input_path}/wc2.1_2.5m_tavg_10.tif"
# open and extract the input dataset
ds = (
    xr.open_dataarray(path, engine="rasterio")
    .to_dataset(name="tavg")
    .squeeze()
    .reset_coords(["band"], drop=True)
)

# create the pyramid
dt = pyramid_reproject(ds, levels=LEVELS, clear_attrs=True)

# write the pyramid to zarr
dt.to_zarr(store_2d, consolidated=True)
CPU times: user 4.07 s, sys: 863 ms, total: 4.94 s
Wall time: 12.1 s

3d, two variables (tavg and prec)#

In this example, we open two 2d images (temperature and precipitation), combine them into a single array (along the band dimension), and create a Zarr pyramid.

[4]:
%%time
# input datasets
path1 = f"{input_path}/wc2.1_2.5m_tavg_10.tif"
path2 = f"{input_path}/wc2.1_2.5m_prec_10.tif"

# open and extract the input datasets
ds1 = (
    xr.open_dataarray(path1, engine="rasterio")
    .to_dataset(name="climate")
    .squeeze()
    .reset_coords(["band"], drop=True)
)
ds2 = (
    xr.open_dataarray(path2, engine="rasterio")
    .to_dataset(name="climate")
    .squeeze()
    .reset_coords(["band"], drop=True)
)
ds2["climate"].values[ds2["climate"].values == ds2["climate"].values[0, 0]] = ds1["climate"].values[
    0, 0
]
ds = xr.concat([ds1, ds2], pd.Index(["tavg", "prec"], name="band"))

# create the pyramid
dt = pyramid_reproject(ds, levels=LEVELS, other_chunks={'band': 2}, clear_attrs=True)

# write the pyramid to zarr
dt.to_zarr(store_3d, consolidated=True)
CPU times: user 6.25 s, sys: 1.29 s, total: 7.54 s
Wall time: 13.4 s

3d, one variable, multiple time points#

In this example, we open 12 2d images (one map of temperature for each month), combine them into a single array (along the month dimension), and create a Zarr pyramid.

[5]:
%%time
# open and extract the input datasets
ds_all = []
months = list(map(lambda d: d + 1, range(12)))
for i in months:
    path = f"{input_path}/wc2.1_2.5m_tavg_{i:02g}.tif"
    ds = (
        xr.open_dataarray(path, engine="rasterio")
        .to_dataset(name="tavg")
        .squeeze()
        .reset_coords(["band"], drop=True)
    )
    ds_all.append(ds)
ds = xr.concat(ds_all, pd.Index(months, name="month"))

# create the pyramid
dt = pyramid_reproject(ds, levels=LEVELS, other_chunks={'month': 12}, clear_attrs=True)

# write the pyramid to zarr
dt.to_zarr(store_3d_1var, consolidated=True)
CPU times: user 41.9 s, sys: 8.3 s, total: 50.2 s
Wall time: 2min 12s

4d, multiple variables, multiple time points#

In this example, we open 12 2d images for 2 variables (one map of temperature and precipitation for each month), combine them into a single array (along the month and band dimensions), and create a Zarr pyramid.

[6]:
%%time
# open and extract the input datasets
ds1_all = []
ds2_all = []
months = list(map(lambda d: d + 1, range(12)))
for i in months:
    path = f"{input_path}/wc2.1_2.5m_tavg_{i:02g}.tif"
    ds = (
        xr.open_dataarray(path, engine="rasterio")
        .to_dataset(name="climate")
        .squeeze()
        .reset_coords(["band"], drop=True)
    )
    ds1_all.append(ds)
ds1 = xr.concat(ds1_all, pd.Index(months, name="month"))
for i in months:
    path = f"{input_path}/wc2.1_2.5m_prec_{i:02g}.tif"
    ds = (
        xr.open_dataarray(path, engine="rasterio")
        .to_dataset(name="climate")
        .squeeze()
        .reset_coords(["band"], drop=True)
    )
    ds2_all.append(ds)
ds2 = xr.concat(ds2_all, pd.Index(months, name="month"))
ds2["climate"].values[ds2["climate"].values == ds2["climate"].values[0, 0, 0]] = ds1[
    "climate"
].values[0, 0, 0]
ds = xr.concat([ds1, ds2], pd.Index(["tavg", "prec"], name="band"))

# create the pyramid
dt = pyramid_reproject(
    ds, levels=LEVELS, extra_dim="band", other_chunks={'band': 2, 'month': 12}, clear_attrs=True
)
dt.ds.attrs

# write the pyramid to zarr
dt.to_zarr(store_4d, consolidated=True, mode="w")
CPU times: user 1min 15s, sys: 23 s, total: 1min 38s
Wall time: 3min 19s

Plot the first time step at each level#

[7]:
for level in range(LEVELS):
    fig, ax = plt.subplots()
    dt[str(level)].ds.isel(band=0, month=0).climate.plot()
    ax.set_title(f"Level {level}")
../_images/examples_geotiff_12_0.png
../_images/examples_geotiff_12_1.png
../_images/examples_geotiff_12_2.png
../_images/examples_geotiff_12_3.png
../_images/examples_geotiff_12_4.png
../_images/examples_geotiff_12_5.png
[ ]: