Computation of global mean SST

In this example, we illustrate possible bad choices of chunk when computing horizontal mean SST time-series and time-average SST.

Extraction of data

First, the SST data is extracted, as well as the cell surfaces.

from dask.diagnostics import Profiler, ResourceProfiler, CacheProfiler, visualize
import xarray as xr
import matplotlib.pyplot as plt
data = xr.open_dataset('data/surface_thetao.nc')
data = data.isel(olevel=0)
mesh = xr.open_dataset('data/mesh_mask_eORCA1_v2.2.nc')
mesh = mesh.isel(z=0, t=0)
surface = mesh['e2t'] * mesh['e1t']
surface

Bad choice of chunks

First, let’s make a try by divinding our dataset into tiles:

chunk = {'x': 150, 'y':100}
thetao = data['thetao'].chunk(chunk)
thetao

The surface array is also decomposed into the same chunks

surfbis = surface.chunk(chunk)
surfbis

Now, the mean time-series is computing by weighting using the cell surface:

tmean = (thetao * surfbis).sum(dim=['x', 'y']) / surfbis.sum(dim=['x', 'y'])
%%time
with Profiler() as prof, ResourceProfiler(dt=0.25) as rprof, CacheProfiler() as cprof:
    tmean.compute()
visualize([prof, rprof, cprof], show=False)
tmean.data.visualize()
l = tmean.plot()

Now, the time-average map is computed:

time_mean = thetao.mean(dim='time_counter')
%%time
with Profiler() as prof, ResourceProfiler(dt=0.25) as rprof, CacheProfiler() as cprof:
    time_mean.compute()
visualize([prof, rprof, cprof], show=False)
time_mean.data.visualize()
l = time_mean.plot(robust=True, cmap=plt.cm.jet)

Better choice of chunks

The performance in the above are disappointing. This this due to a bad chunking choice. If the SST is now chunked along the time only.

chunk = {'time_counter': 70}
thetao = data['thetao'].chunk(chunk)
thetao
tmean = (thetao * surface).sum(dim=['x', 'y']) / surface.sum(dim=['x', 'y'])
%%time
with Profiler() as prof, ResourceProfiler(dt=0.25) as rprof, CacheProfiler() as cprof:
    tmean.compute()
visualize([prof, rprof, cprof], show=False)
tmean.data.visualize()
l = tmean.plot()
time_mean = thetao.mean(dim='time_counter')
%%time
with Profiler() as prof, ResourceProfiler(dt=0.25) as rprof, CacheProfiler() as cprof:
    time_mean.compute()
visualize([prof, rprof, cprof], show=False)
time_mean.data.visualize()
l = time_mean.plot(robust=True, cmap=plt.cm.jet)