Arrays¶

Usage¶

The Python list and dict objects are not suitable to manage multi-dimensional arrays.

Instead, the Numpy (Numerical Python) library should be used. It allows to:

Create multi-dimensional arrays
Access array attributes (shape, number of dimensions, data types)
Manipulate arrays (changing shape, tiling)
Manipulate missing values
Perform optimized numerical operations (pointwise or matrix) via broadcasting (no loops)

Array manipulation¶

Array initialisation¶

There are several methods to initialize 1D arrays. arange takes an argument a start, end end stride argument.

import numpy as np 

x = np.arange(0, 10, 1) # equivalent to Matlab 0:1:9
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

linspace takes a start, end and a number of elements:

x = np.linspace(10, 20, 41) # 10 to 20 with 41 values
x

array([10.  , 10.25, 10.5 , 10.75, 11.  , 11.25, 11.5 , 11.75, 12.  ,
       12.25, 12.5 , 12.75, 13.  , 13.25, 13.5 , 13.75, 14.  , 14.25,
       14.5 , 14.75, 15.  , 15.25, 15.5 , 15.75, 16.  , 16.25, 16.5 ,
       16.75, 17.  , 17.25, 17.5 , 17.75, 18.  , 18.25, 18.5 , 18.75,
       19.  , 19.25, 19.5 , 19.75, 20.  ])

array allows to init from list for instance:

x = np.array([1, 2, 3, 4, 5]) # initialisation from list 
x

array([1, 2, 3, 4, 5])

Initialization of ND arrays can be achieved using different methods, which take as arguments the shape of the array and eventually the type.

x = np.ones((2, 3, 8), dtype=int) # init with 1 integers
x

array([[[1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1]],

       [[1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1]]])

x = np.zeros((2, 3, 8), dtype=float) # init with 0 floats 
x

array([[[0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0.]]])

x = np.full((2, 3, 8), -99, dtype=float)  # fill array with 10.1
print(x)

[[[-99. -99. -99. -99. -99. -99. -99. -99.]
  [-99. -99. -99. -99. -99. -99. -99. -99.]
  [-99. -99. -99. -99. -99. -99. -99. -99.]]

 [[-99. -99. -99. -99. -99. -99. -99. -99.]
  [-99. -99. -99. -99. -99. -99. -99. -99.]
  [-99. -99. -99. -99. -99. -99. -99. -99.]]]

Changing data type¶

Changing data type is done by using the astype function:

x = x.astype(int)  # conversion from floats to int
x

array([[[-99, -99, -99, -99, -99, -99, -99, -99],
        [-99, -99, -99, -99, -99, -99, -99, -99],
        [-99, -99, -99, -99, -99, -99, -99, -99]],

       [[-99, -99, -99, -99, -99, -99, -99, -99],
        [-99, -99, -99, -99, -99, -99, -99, -99],
        [-99, -99, -99, -99, -99, -99, -99, -99]]])

Getting attributes¶

Numpy arrays have several attributes. To get the number of dimensions:

x.ndim

to get the shape:

x.shape

(2, 3, 8)

to get the data type:

x.dtype

dtype('int64')

Indexing¶

Indexing is done as for lists, except that it can be done along several dimensions:

x = np.ones((10, 20, 15, 30), dtype=float)

x[:, :, 0, :].shape

(10, 20, 30)

x[:, -1, ::2, ::2].shape

(10, 8, 15)

x[:, :, 0:1, ::3].shape

(10, 20, 1, 10)

Using ..., you can also access an array without fully knowing it’s shape:

x[..., 0].shape 

(10, 20, 15)

x[..., 2:5, 0].shape 

(10, 20, 3)

x[-1, ...].shape

(20, 15, 30)

x[::2, ::3,  ...].shape

(5, 7, 15, 30)

numpy.array can be used to index numpy.array, but prefer using slice instead. Indeed, using the former may be error prone.

For instance, let’s create a 2D array.

z = np.arange(20 * 10)
z

array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,
        13,  14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,
        26,  27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,
        39,  40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,
        52,  53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,
        65,  66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,
        78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,
        91,  92,  93,  94,  95,  96,  97,  98,  99, 100, 101, 102, 103,
       104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116,
       117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129,
       130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142,
       143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155,
       156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168,
       169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181,
       182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,
       195, 196, 197, 198, 199])

x = np.reshape(z, (20, 10))
x.shape

(20, 10)

For extracting a subspan of an array, you might want to use:

i = [0, 1, 2, 3]
j = [5, 6, 7, 8]
x[i, j]

array([ 5, 16, 27, 38])

In this case, the output is 1D. It extracts the value for each (i, j) pair and is equivalent to:

x[i[0], j[0]], x[i[1], j[1]], x[i[2], j[2]], x[i[3], j[3]]

(5, 16, 27, 38)

Beside, this will fail if i and j have different sizes. The proper syntaxt would be:

i = slice(0, 4)
j = slice(5, 9)
x[i,j]

array([[ 5,  6,  7,  8],
       [15, 16, 17, 18],
       [25, 26, 27, 28],
       [35, 36, 37, 38]])

The values obtained in the first try are the elements in the diagonal of the subspan.

Copy¶

As for list, assigment of numpy arrays should not be used to make copies, since the reference is copied, not the values. In the following, the modification of x modifies y, and conversely:

x = np.arange(10)
y = x
y[0] = -1
x[-1] = 1000
x, y

(array([  -1,    1,    2,    3,    4,    5,    6,    7,    8, 1000]),
 array([  -1,    1,    2,    3,    4,    5,    6,    7,    8, 1000]))

To make copies, use the copy method:

x = np.arange(10)
y = x.copy()  # deep copy of x into y 
y[0] = -1
x[-1] = 1000
x, y

(array([   0,    1,    2,    3,    4,    5,    6,    7,    8, 1000]),
 array([-1,  1,  2,  3,  4,  5,  6,  7,  8,  9]))

Reshaping¶

If we create a 2D array:

x = np.zeros((6, 4))
for i in range(6):
    x[i, :] = np.arange(4) + i*4
x.shape

(6, 4)

The conversion into 1D can be achieved using ravel:

x1dc = np.ravel(x)   # converts ND to 1D following C order
x1dc

array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.,
       13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23.])

By default, the order of the reshaping follows the C order. You can specify an Fortran-like order as follows:

x1df = np.ravel(x, order="F")
x1df

array([ 0.,  4.,  8., 12., 16., 20.,  1.,  5.,  9., 13., 17., 21.,  2.,
        6., 10., 14., 18., 22.,  3.,  7., 11., 15., 19., 23.])

To convert an array from a given set of dimensions to another one, use the reshape method. The number of elements in the source and destination shapes must be the same.

xresc = np.reshape(x, (2, 3, 4)) 
xresc

array([[[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.]],

       [[12., 13., 14., 15.],
        [16., 17., 18., 19.],
        [20., 21., 22., 23.]]])

xresf = np.reshape(x, (2, 3, 4), order="F")
xresf

array([[[ 0.,  1.,  2.,  3.],
        [ 8.,  9., 10., 11.],
        [16., 17., 18., 19.]],

       [[ 4.,  5.,  6.,  7.],
        [12., 13., 14., 15.],
        [20., 21., 22., 23.]]])

If we recreate a 3D array as follows:

x = np.reshape(np.arange(2 * 3 * 4), (2, 3, 4))
x.shape

(2, 3, 4)

We can reshape it from ND to 2D, by keeping one dimension unchanged, as follows:

xtest = np.reshape(x, (2, -1))
xtest.shape

(2, 12)

Here, we keep the first dimension but transform (3, 4) in 12.

xtest2 = np.reshape(x, (-1, 4))
xtest2.shape

(6, 4)

Here, we keep the last dimension but transform (2, 3) in 6.

Repeating along a given dimensions¶

To repeat an array along a dimension, use the tile function. If we create a 1D array:

x = np.zeros((10, 20, 30))
x.shape

(10, 20, 30)

We can repeat it 40 times as follows:

xtile = np.tile(x, (40, 1, 1, 1))
xtile.shape

(40, 10, 20, 30)

Note that the dimension argument of tile must always finish by 1. I.e. the array can only be extended along the first dimensions. If we try to extend the above array along the last dimension as follows:

xtile2 = np.tile(x, (1, 1, 1, 40))
xtile2.shape

(1, 10, 20, 1200)

Here the result has the wrong dimension. To make it work, the first syntax must be used with transpose:

xtile2 = np.tile(x, (40, 1, 1, 1))
xtile2.shape

(40, 10, 20, 30)

xtile2 = np.transpose(xtile2, (1, 2, 3, 0))
xtile2.shape

(10, 20, 30, 40)

Changing dimension order¶

x = np.ones((5, 10, 15))
x.shape

(5, 10, 15)

Changing dimension orders is achieved using the transpose method. By default, it reverses the dimensions.

xt = x.T
xt.shape

(15, 10, 5)

xt = np.transpose(x) # identical to xt = xt.T
xt.shape

(15, 10, 5)

But you can specify the new order as an argument:

xt = np.transpose(x, (2, 0, 1))
xt.shape

(15, 5, 10)

Testing conditions¶

xresc = np.reshape(np.arange(24), (2, 3, 4))

Testing conditions can be done by using operators on the array. It returns an array of boolean:

condition = (xresc <= 12) & (xresc >= 3)  # array of boolean (True if condition is met)
condition

array([[[False, False, False,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True]],

       [[ True, False, False, False],
        [False, False, False, False],
        [False, False, False, False]]])

To display the values when the condition is met:

xresc[condition]

array([ 3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

To display the values when condition is not met:

xresc[~condition]

array([ 0,  1,  2, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23])

To get the indexes where the condition is met, use the nonzero method. It returns a tuple with the indexes where the condition is met for each axes.

ind = np.nonzero(condition)
ind

(array([0, 0, 0, 0, 0, 0, 0, 0, 0, 1]),
 array([0, 1, 1, 1, 1, 2, 2, 2, 2, 0]),
 array([3, 0, 1, 2, 3, 0, 1, 2, 3, 0]))

Therefore, the condition is met for xresc[0, 0, 3], xresc[0, 1, 0], ..., xresc[0, 2, 3], xresc[1, 0, 0]

You can also get the values where condition is met using the indexes:

xresc[ind]

array([ 3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

To extract the unique occurrences in an array, use the unique value:

x = np.array([0, 0, 0, 1, 2, 2, 3, 3, 3, 3])
np.unique(x)

array([0, 1, 2, 3])

Concatenation¶

Concatenation of arrays is achieved by usinging the concatenate function, which takes as arguments the arrays to concatenate, and the dimension along which to concatenate:

x = np.zeros((2, 3, 5))
y = np.ones((2, 7, 5))
x.shape, y.shape

((2, 3, 5), (2, 7, 5))

To concatenate the arrays along their 2nd dimension:

z = np.concatenate((x, y), axis=1)  
z.shape

(2, 10, 5)

Operations using arrays¶

Arhtimetical operations¶

In Python, standard operations are performed pointwise (contrary to Matlab for instance). To do matrix operations, you need to call specific methods (numpy.matmul for instance).

x = np.array([1, 2, 3, 4, 5]).astype(float)
x

array([1., 2., 3., 4., 5.])

y = np.array([6, 7, 8, 9, 10]).astype(float)
y

array([ 6.,  7.,  8.,  9., 10.])

x * y

array([ 6., 14., 24., 36., 50.])

y / x

array([6.        , 3.5       , 2.66666667, 2.25      , 2.        ])

y % x

array([0., 1., 2., 1., 0.])

y // x

array([6., 3., 2., 2., 2.])

x**y

array([1.000000e+00, 1.280000e+02, 6.561000e+03, 2.621440e+05,
       9.765625e+06])

In the above, the / operator returns the right value. However, if x has 0 values:

x[2] = 0.
x[3] = -1.
x

array([ 1.,  2.,  0., -1.,  5.])

z = y / x
z

/home/barrier/Softwares/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: RuntimeWarning: divide by zero encountered in true_divide
  """Entry point for launching an IPython kernel.

array([ 6. ,  3.5,  inf, -9. ,  2. ])

Here, we have a warning message, and the output array contains an inf value. To avoid this, we can use the divide method with the where and out argument.

The first one specifies the condition when the operation should be applied.

The second one specifies an output array which will be used to replace missing values. In this case, the default output will be an array of NaNs of the same size as x.

z = np.divide(y, x, where=(x!=0), out=np.full(x.shape, np.nan, dtype=x.dtype))  # no more warning message
z

array([ 6. ,  3.5,  nan, -9. ,  2. ])

Here, we have no more warnings and inf has been replaced by nan. The same thing can be applied to other functions such as log10:

w = np.log10(x)
w

/home/barrier/Softwares/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: RuntimeWarning: divide by zero encountered in log10
  """Entry point for launching an IPython kernel.
/home/barrier/Softwares/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: RuntimeWarning: invalid value encountered in log10
  """Entry point for launching an IPython kernel.

array([0.     , 0.30103,    -inf,     nan, 0.69897])

w = np.log10(x, where=(x > 0), out=np.full(x.shape, np.nan, dtype=x.dtype))
w

array([0.     , 0.30103,     nan,     nan, 0.69897])

Mathematical operations (mean, etc.)¶

Here we will work on the given array:

x = np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
             [7, 1, 3, 5, 6, 7, 8, 10, 4, 6],
             [5, 1, 0, 3, 5, 8, 12, 5, 8, 1]])
x.shape

(3, 10)

To compute the mean over the entire array:

out = np.mean(x)
out

5.333333333333333

To compute the standard deviation over second dimension:

out = np.std(x, axis=1) 
out

array([2.87228132, 2.45153013, 3.57211422])

To compute the sum along the first dimension:

out = np.sum(x, axis=0)
out

array([13,  4,  6, 12, 16, 21, 27, 23, 21, 17])

Same thing for cumulated sum:

out = np.cumsum(x, axis=0) # compute sum over first dimension
out

array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10],
       [ 8,  3,  6,  9, 11, 13, 15, 18, 13, 16],
       [13,  4,  6, 12, 16, 21, 27, 23, 21, 17]])

Same thing for product:

out = np.prod(x, axis=0) 
out

array([ 35,   2,   0,  60, 150, 336, 672, 400, 288,  60])

This also works for multi-dimensional arrays:

x = np.reshape(np.arange(24), (2, 3, 4))  # lat, lon, time
x.shape

(2, 3, 4)

To compute the mean along the first and 2nd dimensions:

xmean = np.mean(x, axis=(0, 1))
xmean.shape

(4,)

To compute the mean along the 2nd and 3rd dimensions:

xmean = np.mean(x, axis=(1, 2))
xmean.shape

(2,)

Broadcasting¶

To make computation, it is highly advised to avoid loops. Numpy provides different broadcasting rules which allow to prevent from using loops (cf. docs.scipy)

Imagine that we have a ND array, with the dimensions being (time, lat, lon):

x = np.reshape(np.arange(24), (2, 3, 4))  # time, lat, lon
x.shape

(2, 3, 4)

If we want to compute the anomalies, we first compute the temporal mean:

xmean = np.mean(x, axis=0) 
xmean.shape

(3, 4)

Now we can compute the anomalies

anom = x - xmean
anom.shape

(2, 3, 4)

In this case, the code above works even though the x and xmean have different shapes. This is because the last two dimensions (i.e. trailing dimensions) are the same. Now, imagine that the array is in fact ordered as (lat, lon, time).

We first compute the mean over the time-dimensions:

xmean = np.mean(x, axis=-1)    # temporal mean
xmean.shape

(2, 3)

However, the computation of the anomalies will fail:

#anom = x - xmean

It fails because the first (i.e leading) dimensions are the same, which does not allow broadcasting. This can be fixed by using the keepdims argument. It will keep a virtual dimension on the mean output.

xmean = np.mean(x, axis=-1, keepdims=True)
xmean.shape

(2, 3, 1)

anom = x - xmean

If you are lazy to remember the broadcasting rules, you can use the numpy.newaxis method to add virtual dimensions. It allows to add degenerated dimensions on an array. If we look at the shape of our x array:

x.shape

(2, 3, 4)

If we create a 6D array of dimensions:

y = np.zeros((5, 2, 7, 3, 10, 4))
y.shape

(5, 2, 7, 3, 10, 4)

To multiply x and y without loops, we can add degenerated dimenions to x in order to align them with the dimensions of y:

xnd = x[np.newaxis, :, np.newaxis, :, np.newaxis, :]
xnd.shape

(1, 2, 1, 3, 1, 4)

We can see that now the shapes of the xnd array and of y are aligned:

y.shape

(5, 2, 7, 3, 10, 4)

Now we can multiply both arrays:

z = y * xnd
z.shape

(5, 2, 7, 3, 10, 4)

Managing filled values¶

Filled values are generally defined as numpy.nan values.

x = np.arange(1, 10).astype(float)
x[0:3] = np.nan
x

array([nan, nan, nan,  4.,  5.,  6.,  7.,  8.,  9.])

Checking whether values are NaN is achieved by using the np.isnan method

cond = np.isnan(x)
cond

array([ True,  True,  True, False, False, False, False, False, False])

To find the index of the NaN values:

np.nonzero(np.isnan(x))

(array([0, 1, 2]),)

An equivalent to the np.isnan method is to use the mathematical definition of NaNs.

np.nonzero(x != x) # Mathematical definition of NaN

(array([0, 1, 2]),)

Warning: NaNs can only be used with Float arrays (not integer arrays)

Operations on filled values¶

To perform operations on data containing NaNs requires the use of special functions:

np.nanmean(x)

6.5

np.nansum(x)

39.0

np.nanstd(x)

1.707825127659933

Since this can be a bit annoying, numpy.array objects can be converted into masked_array objects, whose operations automatically discards NaN.

Masked array¶

x = np.arange(0, 10).astype(int)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

To convert an array into a masked array, the following methods can be used:

x = np.ma.masked_where(x==0, x)
x

masked_array(data=[--, 1, 2, 3, 4, 5, 6, 7, 8, 9],
             mask=[ True, False, False, False, False, False, False, False,
                   False, False],
       fill_value=999999)

x = np.ma.masked_equal(x, 5)
x

masked_array(data=[--, 1, 2, 3, 4, --, 6, 7, 8, 9],
             mask=[ True, False, False, False, False,  True, False, False,
                   False, False],
       fill_value=5)

x = np.ma.masked_greater(x, 7)
x

masked_array(data=[--, 1, 2, 3, 4, --, 6, 7, --, --],
             mask=[ True, False, False, False, False,  True, False, False,
                    True,  True],
       fill_value=5)

x = np.ma.masked_less(x, 3)
x

masked_array(data=[--, --, --, 3, 4, --, 6, 7, --, --],
             mask=[ True,  True,  True, False, False,  True, False, False,
                    True,  True],
       fill_value=5)

x = np.ma.masked_where(np.isnan(x), x)
x

masked_array(data=[--, --, --, 3, 4, --, 6, 7, --, --],
             mask=[ True,  True,  True, False, False,  True, False, False,
                    True,  True],
       fill_value=5)

x[2] = np.ma.masked
x

masked_array(data=[--, --, --, 3, 4, --, 6, 7, --, --],
             mask=[ True,  True,  True, False, False,  True, False, False,
                    True,  True],
       fill_value=5)

This new object has additional attributes and method. Especially for assessing where the filled values are located.

x.mask

array([ True,  True,  True, False, False,  True, False, False,  True,
        True])

np.ma.getmaskarray(x)

array([ True,  True,  True, False, False,  True, False, False,  True,
        True])

np.mean(x)

5.0

np.sum(x)

np.std(x)

1.5811388300841898

It is strongly advised to use the np.ma.getmaskarray method rather than using the mask attribute. Indeed, the latter will return a bool instead of an array of bool if no data is missing.

# warning for masked_arrays
x = np.arange(10, 15).astype(float)
x = np.ma.masked_where(np.isnan(x), x)
x

masked_array(data=[10., 11., 12., 13., 14.],
             mask=False,
       fill_value=1e+20)

Therefore, extracting all non missing values from an array is not safe using mask:

iok = np.nonzero(x.mask == False) 
x[iok]

masked_array(data=[10.],
             mask=False,
       fill_value=1e+20)

In this case, only the first value is returned, instead of all values.

However, it works perfectly using the getmaskarray method:

iok = np.nonzero(np.ma.getmaskarray(x) == False)
x[iok]

masked_array(data=[10., 11., 12., 13., 14.],
             mask=False,
       fill_value=1e+20)

Description	Module
Special functions (mathematical physics)	scipy.special
Integration	scipy.integrate
Optimization	scipy.optimize
Interpolation	scipy.interpolate
Fourier Transforms	scipy.fft
Signal Processing	scipy.signal
Linear Algebra	scipy.linalg
Spatial data structures and algorithms	scipy.spatial
Statistics	scipy.stats
Multidimensional image processing	scipy.ndimage
File IO: Matlab, NetCDF, IDL	scipy.io

Introduction to Python