Computation of global mean SST¶
In this example, we illustrate possible bad choices of chunk when computing horizontal mean SST time-series and time-average SST.
Extraction of data¶
First, the SST data is extracted, as well as the cell surfaces.
from dask.diagnostics import Profiler, ResourceProfiler, CacheProfiler, visualize
import xarray as xr
import matplotlib.pyplot as plt
data = xr.open_dataset('data/surface_thetao.nc')
data = data.isel(olevel=0)
mesh = xr.open_dataset('data/mesh_mask_eORCA1_v2.2.nc')
mesh = mesh.isel(z=0, t=0)
surface = mesh['e2t'] * mesh['e1t']
surface
Bad choice of chunks¶
First, let’s make a try by divinding our dataset into tiles:
chunk = {'x': 150, 'y':100}
thetao = data['thetao'].chunk(chunk)
thetao
The surface array is also decomposed into the same chunks
surfbis = surface.chunk(chunk)
surfbis
Now, the mean time-series is computing by weighting using the cell surface:
tmean = (thetao * surfbis).sum(dim=['x', 'y']) / surfbis.sum(dim=['x', 'y'])
%%time
with Profiler() as prof, ResourceProfiler(dt=0.25) as rprof, CacheProfiler() as cprof:
tmean.compute()
visualize([prof, rprof, cprof], show=False)
tmean.data.visualize()
l = tmean.plot()
Now, the time-average map is computed:
time_mean = thetao.mean(dim='time_counter')
%%time
with Profiler() as prof, ResourceProfiler(dt=0.25) as rprof, CacheProfiler() as cprof:
time_mean.compute()
visualize([prof, rprof, cprof], show=False)
time_mean.data.visualize()
l = time_mean.plot(robust=True, cmap=plt.cm.jet)
Better choice of chunks¶
The performance in the above are disappointing. This this due to a bad chunking choice. If the SST is now chunked along the time only.
chunk = {'time_counter': 70}
thetao = data['thetao'].chunk(chunk)
thetao
tmean = (thetao * surface).sum(dim=['x', 'y']) / surface.sum(dim=['x', 'y'])
%%time
with Profiler() as prof, ResourceProfiler(dt=0.25) as rprof, CacheProfiler() as cprof:
tmean.compute()
visualize([prof, rprof, cprof], show=False)
tmean.data.visualize()
l = tmean.plot()
time_mean = thetao.mean(dim='time_counter')
%%time
with Profiler() as prof, ResourceProfiler(dt=0.25) as rprof, CacheProfiler() as cprof:
time_mean.compute()
visualize([prof, rprof, cprof], show=False)
time_mean.data.visualize()
l = time_mean.plot(robust=True, cmap=plt.cm.jet)