Computation of global mean SST¶

In this example, we illustrate possible bad choices of chunk when computing horizontal mean SST time-series and time-average SST.

Extraction of data¶

First, the SST data is extracted, as well as the cell surfaces.

from dask.diagnostics import Profiler, ResourceProfiler, CacheProfiler, visualize
import xarray as xr
import matplotlib.pyplot as plt

data = xr.open_dataset('data/surface_thetao.nc')
data = data.isel(olevel=0)

mesh = xr.open_dataset('data/mesh_mask_eORCA1_v2.2.nc')
mesh = mesh.isel(z=0, t=0)

surface = mesh['e2t'] * mesh['e1t']
surface


First, let’s make a try by divinding our dataset into tiles:

chunk = {'x': 150, 'y':100}
thetao = data['thetao'].chunk(chunk)
thetao


The surface array is also decomposed into the same chunks

surfbis = surface.chunk(chunk)
surfbis


Now, the mean time-series is computing by weighting using the cell surface:

tmean = (thetao * surfbis).sum(dim=['x', 'y']) / surfbis.sum(dim=['x', 'y'])

%%time
with Profiler() as prof, ResourceProfiler(dt=0.25) as rprof, CacheProfiler() as cprof:
tmean.compute()

visualize([prof, rprof, cprof], show=False)

tmean.data.visualize()

l = tmean.plot()


Now, the time-average map is computed:

time_mean = thetao.mean(dim='time_counter')

%%time
with Profiler() as prof, ResourceProfiler(dt=0.25) as rprof, CacheProfiler() as cprof:
time_mean.compute()

visualize([prof, rprof, cprof], show=False)

time_mean.data.visualize()

l = time_mean.plot(robust=True, cmap=plt.cm.jet)


Better choice of chunks¶

The performance in the above are disappointing. This this due to a bad chunking choice. If the SST is now chunked along the time only.

chunk = {'time_counter': 70}
thetao = data['thetao'].chunk(chunk)
thetao

tmean = (thetao * surface).sum(dim=['x', 'y']) / surface.sum(dim=['x', 'y'])

%%time
with Profiler() as prof, ResourceProfiler(dt=0.25) as rprof, CacheProfiler() as cprof:
tmean.compute()

visualize([prof, rprof, cprof], show=False)

tmean.data.visualize()

l = tmean.plot()

time_mean = thetao.mean(dim='time_counter')

%%time
with Profiler() as prof, ResourceProfiler(dt=0.25) as rprof, CacheProfiler() as cprof:
time_mean.compute()

visualize([prof, rprof, cprof], show=False)

time_mean.data.visualize()

l = time_mean.plot(robust=True, cmap=plt.cm.jet)