Arrays¶
Usage¶
The Python list
and dict
objects are not suitable to manage multi-dimensional arrays.
Instead, the Numpy (Numerical Python) library should be used. It allows to:
Create multi-dimensional arrays
Access array attributes (shape, number of dimensions, data types)
Manipulate arrays (changing shape, tiling)
Manipulate missing values
Perform optimized numerical operations (pointwise or matrix) via broadcasting (no loops)
Array manipulation¶
Array initialisation¶
There are several methods to initialize 1D arrays. arange
takes an argument a start, end end stride argument.
import numpy as np
x = np.arange(0, 10, 1) # equivalent to Matlab 0:1:9
x
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
linspace
takes a start, end and a number of elements:
x = np.linspace(10, 20, 41) # 10 to 20 with 41 values
x
array([10. , 10.25, 10.5 , 10.75, 11. , 11.25, 11.5 , 11.75, 12. ,
12.25, 12.5 , 12.75, 13. , 13.25, 13.5 , 13.75, 14. , 14.25,
14.5 , 14.75, 15. , 15.25, 15.5 , 15.75, 16. , 16.25, 16.5 ,
16.75, 17. , 17.25, 17.5 , 17.75, 18. , 18.25, 18.5 , 18.75,
19. , 19.25, 19.5 , 19.75, 20. ])
array
allows to init from list
for instance:
x = np.array([1, 2, 3, 4, 5]) # initialisation from list
x
array([1, 2, 3, 4, 5])
Initialization of ND arrays can be achieved using different methods, which take as arguments the shape of the array and eventually the type.
x = np.ones((2, 3, 8), dtype=int) # init with 1 integers
x
array([[[1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1]],
[[1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1]]])
x = np.zeros((2, 3, 8), dtype=float) # init with 0 floats
x
array([[[0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0.]]])
x = np.full((2, 3, 8), -99, dtype=float) # fill array with 10.1
print(x)
[[[-99. -99. -99. -99. -99. -99. -99. -99.]
[-99. -99. -99. -99. -99. -99. -99. -99.]
[-99. -99. -99. -99. -99. -99. -99. -99.]]
[[-99. -99. -99. -99. -99. -99. -99. -99.]
[-99. -99. -99. -99. -99. -99. -99. -99.]
[-99. -99. -99. -99. -99. -99. -99. -99.]]]
Changing data type¶
Changing data type is done by using the astype
function:
x = x.astype(int) # conversion from floats to int
x
array([[[-99, -99, -99, -99, -99, -99, -99, -99],
[-99, -99, -99, -99, -99, -99, -99, -99],
[-99, -99, -99, -99, -99, -99, -99, -99]],
[[-99, -99, -99, -99, -99, -99, -99, -99],
[-99, -99, -99, -99, -99, -99, -99, -99],
[-99, -99, -99, -99, -99, -99, -99, -99]]])
Getting attributes¶
Numpy arrays have several attributes. To get the number of dimensions:
x.ndim
3
to get the shape:
x.shape
(2, 3, 8)
to get the data type:
x.dtype
dtype('int64')
Indexing¶
Indexing is done as for lists, except that it can be done along several dimensions:
x = np.ones((10, 20, 15, 30), dtype=float)
x[:, :, 0, :].shape
(10, 20, 30)
x[:, -1, ::2, ::2].shape
(10, 8, 15)
x[:, :, 0:1, ::3].shape
(10, 20, 1, 10)
Using ...
, you can also access an array without fully knowing it’s shape:
x[..., 0].shape
(10, 20, 15)
x[..., 2:5, 0].shape
(10, 20, 3)
x[-1, ...].shape
(20, 15, 30)
x[::2, ::3, ...].shape
(5, 7, 15, 30)
numpy.array
can be used to index numpy.array
, but prefer using slice
instead. Indeed, using the former may be error prone.
For instance, let’s create a 2D array.
z = np.arange(20 * 10)
z
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,
52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64,
65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77,
78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,
91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103,
104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116,
117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129,
130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142,
143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155,
156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168,
169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181,
182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,
195, 196, 197, 198, 199])
x = np.reshape(z, (20, 10))
x.shape
(20, 10)
For extracting a subspan of an array, you might want to use:
i = [0, 1, 2, 3]
j = [5, 6, 7, 8]
x[i, j]
array([ 5, 16, 27, 38])
In this case, the output is 1D. It extracts the value for each (i, j)
pair and is equivalent to:
x[i[0], j[0]], x[i[1], j[1]], x[i[2], j[2]], x[i[3], j[3]]
(5, 16, 27, 38)
Beside, this will fail if i
and j
have different sizes. The proper syntaxt would be:
i = slice(0, 4)
j = slice(5, 9)
x[i,j]
array([[ 5, 6, 7, 8],
[15, 16, 17, 18],
[25, 26, 27, 28],
[35, 36, 37, 38]])
The values obtained in the first try are the elements in the diagonal of the subspan.
Copy¶
As for list, assigment of numpy arrays should not be used to make copies, since the reference is copied, not the values. In the following, the modification of x
modifies y
, and conversely:
x = np.arange(10)
y = x
y[0] = -1
x[-1] = 1000
x, y
(array([ -1, 1, 2, 3, 4, 5, 6, 7, 8, 1000]),
array([ -1, 1, 2, 3, 4, 5, 6, 7, 8, 1000]))
To make copies, use the copy
method:
x = np.arange(10)
y = x.copy() # deep copy of x into y
y[0] = -1
x[-1] = 1000
x, y
(array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 1000]),
array([-1, 1, 2, 3, 4, 5, 6, 7, 8, 9]))
Reshaping¶
If we create a 2D array:
x = np.zeros((6, 4))
for i in range(6):
x[i, :] = np.arange(4) + i*4
x.shape
(6, 4)
The conversion into 1D can be achieved using ravel
:
x1dc = np.ravel(x) # converts ND to 1D following C order
x1dc
array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12.,
13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23.])
By default, the order of the reshaping
follows the C
order. You can specify an Fortran-like order as follows:
x1df = np.ravel(x, order="F")
x1df
array([ 0., 4., 8., 12., 16., 20., 1., 5., 9., 13., 17., 21., 2.,
6., 10., 14., 18., 22., 3., 7., 11., 15., 19., 23.])
To convert an array from a given set of dimensions to another one, use the reshape
method. The number of elements in the source and destination shapes must be the same.
xresc = np.reshape(x, (2, 3, 4))
xresc
array([[[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.]],
[[12., 13., 14., 15.],
[16., 17., 18., 19.],
[20., 21., 22., 23.]]])
xresf = np.reshape(x, (2, 3, 4), order="F")
xresf
array([[[ 0., 1., 2., 3.],
[ 8., 9., 10., 11.],
[16., 17., 18., 19.]],
[[ 4., 5., 6., 7.],
[12., 13., 14., 15.],
[20., 21., 22., 23.]]])
If we recreate a 3D array as follows:
x = np.reshape(np.arange(2 * 3 * 4), (2, 3, 4))
x.shape
(2, 3, 4)
We can reshape it from ND to 2D, by keeping one dimension unchanged, as follows:
xtest = np.reshape(x, (2, -1))
xtest.shape
(2, 12)
Here, we keep the first dimension but transform (3, 4) in 12.
xtest2 = np.reshape(x, (-1, 4))
xtest2.shape
(6, 4)
Here, we keep the last dimension but transform (2, 3) in 6.
Repeating along a given dimensions¶
To repeat an array along a dimension, use the tile
function. If we create a 1D array:
x = np.zeros((10, 20, 30))
x.shape
(10, 20, 30)
We can repeat it 40 times as follows:
xtile = np.tile(x, (40, 1, 1, 1))
xtile.shape
(40, 10, 20, 30)
Note that the dimension argument of tile
must always finish by 1. I.e. the array can only be extended along the first dimensions. If we try to extend the above array along the last dimension as follows:
xtile2 = np.tile(x, (1, 1, 1, 40))
xtile2.shape
(1, 10, 20, 1200)
Here the result has the wrong dimension. To make it work, the first syntax must be used with transpose
:
xtile2 = np.tile(x, (40, 1, 1, 1))
xtile2.shape
(40, 10, 20, 30)
xtile2 = np.transpose(xtile2, (1, 2, 3, 0))
xtile2.shape
(10, 20, 30, 40)
Changing dimension order¶
x = np.ones((5, 10, 15))
x.shape
(5, 10, 15)
Changing dimension orders is achieved using the transpose
method. By default, it reverses the dimensions.
xt = x.T
xt.shape
(15, 10, 5)
xt = np.transpose(x) # identical to xt = xt.T
xt.shape
(15, 10, 5)
But you can specify the new order as an argument:
xt = np.transpose(x, (2, 0, 1))
xt.shape
(15, 5, 10)
Testing conditions¶
xresc = np.reshape(np.arange(24), (2, 3, 4))
Testing conditions can be done by using operators on the array. It returns an array of boolean:
condition = (xresc <= 12) & (xresc >= 3) # array of boolean (True if condition is met)
condition
array([[[False, False, False, True],
[ True, True, True, True],
[ True, True, True, True]],
[[ True, False, False, False],
[False, False, False, False],
[False, False, False, False]]])
To display the values when the condition is met:
xresc[condition]
array([ 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
To display the values when condition is not met:
xresc[~condition]
array([ 0, 1, 2, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23])
To get the indexes where the condition is met, use the nonzero
method. It returns a tuple with the indexes where the condition is met for each axes.
ind = np.nonzero(condition)
ind
(array([0, 0, 0, 0, 0, 0, 0, 0, 0, 1]),
array([0, 1, 1, 1, 1, 2, 2, 2, 2, 0]),
array([3, 0, 1, 2, 3, 0, 1, 2, 3, 0]))
Therefore, the condition is met for xresc[0, 0, 3]
, xresc[0, 1, 0]
, ...
, xresc[0, 2, 3]
, xresc[1, 0, 0]
You can also get the values where condition is met using the indexes:
xresc[ind]
array([ 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
To extract the unique occurrences in an array, use the unique
value:
x = np.array([0, 0, 0, 1, 2, 2, 3, 3, 3, 3])
np.unique(x)
array([0, 1, 2, 3])
Concatenation¶
Concatenation of arrays is achieved by usinging the concatenate
function, which takes as arguments the arrays to concatenate, and the dimension along which to concatenate:
x = np.zeros((2, 3, 5))
y = np.ones((2, 7, 5))
x.shape, y.shape
((2, 3, 5), (2, 7, 5))
To concatenate the arrays along their 2nd dimension:
z = np.concatenate((x, y), axis=1)
z.shape
(2, 10, 5)
Operations using arrays¶
Arhtimetical operations¶
In Python, standard operations are performed pointwise (contrary to Matlab for instance). To do matrix operations, you need to call specific methods (numpy.matmul
for instance).
x = np.array([1, 2, 3, 4, 5]).astype(float)
x
array([1., 2., 3., 4., 5.])
y = np.array([6, 7, 8, 9, 10]).astype(float)
y
array([ 6., 7., 8., 9., 10.])
x * y
array([ 6., 14., 24., 36., 50.])
y / x
array([6. , 3.5 , 2.66666667, 2.25 , 2. ])
y % x
array([0., 1., 2., 1., 0.])
y // x
array([6., 3., 2., 2., 2.])
x**y
array([1.000000e+00, 1.280000e+02, 6.561000e+03, 2.621440e+05,
9.765625e+06])
In the above, the /
operator returns the right value. However, if x
has 0 values:
x[2] = 0.
x[3] = -1.
x
array([ 1., 2., 0., -1., 5.])
z = y / x
z
/home/barrier/Softwares/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: RuntimeWarning: divide by zero encountered in true_divide
"""Entry point for launching an IPython kernel.
array([ 6. , 3.5, inf, -9. , 2. ])
Here, we have a warning message, and the output array contains an inf
value. To avoid this, we can use the divide
method with the where
and out
argument.
The first one specifies the condition when the operation should be applied.
The second one specifies an output array which will be used to replace missing values. In this case, the default output will be an array of NaNs
of the same size as x
.
z = np.divide(y, x, where=(x!=0), out=np.full(x.shape, np.nan, dtype=x.dtype)) # no more warning message
z
array([ 6. , 3.5, nan, -9. , 2. ])
Here, we have no more warnings and inf
has been replaced by nan
. The same thing can be applied to other functions such as log10
:
w = np.log10(x)
w
/home/barrier/Softwares/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: RuntimeWarning: divide by zero encountered in log10
"""Entry point for launching an IPython kernel.
/home/barrier/Softwares/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: RuntimeWarning: invalid value encountered in log10
"""Entry point for launching an IPython kernel.
array([0. , 0.30103, -inf, nan, 0.69897])
w = np.log10(x, where=(x > 0), out=np.full(x.shape, np.nan, dtype=x.dtype))
w
array([0. , 0.30103, nan, nan, 0.69897])
Mathematical operations (mean, etc.)¶
Here we will work on the given array:
x = np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[7, 1, 3, 5, 6, 7, 8, 10, 4, 6],
[5, 1, 0, 3, 5, 8, 12, 5, 8, 1]])
x.shape
(3, 10)
To compute the mean over the entire array:
out = np.mean(x)
out
5.333333333333333
To compute the standard deviation over second dimension:
out = np.std(x, axis=1)
out
array([2.87228132, 2.45153013, 3.57211422])
To compute the sum along the first dimension:
out = np.sum(x, axis=0)
out
array([13, 4, 6, 12, 16, 21, 27, 23, 21, 17])
Same thing for cumulated sum:
out = np.cumsum(x, axis=0) # compute sum over first dimension
out
array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[ 8, 3, 6, 9, 11, 13, 15, 18, 13, 16],
[13, 4, 6, 12, 16, 21, 27, 23, 21, 17]])
Same thing for product:
out = np.prod(x, axis=0)
out
array([ 35, 2, 0, 60, 150, 336, 672, 400, 288, 60])
This also works for multi-dimensional arrays:
x = np.reshape(np.arange(24), (2, 3, 4)) # lat, lon, time
x.shape
(2, 3, 4)
To compute the mean along the first and 2nd dimensions:
xmean = np.mean(x, axis=(0, 1))
xmean.shape
(4,)
To compute the mean along the 2nd and 3rd dimensions:
xmean = np.mean(x, axis=(1, 2))
xmean.shape
(2,)
Broadcasting¶
To make computation, it is highly advised to avoid loops. Numpy provides different broadcasting rules which allow to prevent from using loops (cf. docs.scipy)
Imagine that we have a ND array, with the dimensions being (time, lat, lon)
:
x = np.reshape(np.arange(24), (2, 3, 4)) # time, lat, lon
x.shape
(2, 3, 4)
If we want to compute the anomalies, we first compute the temporal mean:
xmean = np.mean(x, axis=0)
xmean.shape
(3, 4)
Now we can compute the anomalies
anom = x - xmean
anom.shape
(2, 3, 4)
In this case, the code above works even though the x
and xmean
have different shapes. This is because the last two dimensions (i.e. trailing dimensions) are the same. Now, imagine that the array is in fact ordered as (lat, lon, time)
.
We first compute the mean over the time-dimensions:
xmean = np.mean(x, axis=-1) # temporal mean
xmean.shape
(2, 3)
However, the computation of the anomalies will fail:
#anom = x - xmean
It fails because the first (i.e leading) dimensions are the same, which does not allow broadcasting. This can be fixed by using the keepdims
argument. It will keep a virtual dimension on the mean output.
xmean = np.mean(x, axis=-1, keepdims=True)
xmean.shape
(2, 3, 1)
anom = x - xmean
If you are lazy to remember the broadcasting rules, you can use the numpy.newaxis
method to add virtual dimensions. It allows to add degenerated dimensions on an array. If we look at the shape of our x
array:
x.shape
(2, 3, 4)
If we create a 6D array of dimensions:
y = np.zeros((5, 2, 7, 3, 10, 4))
y.shape
(5, 2, 7, 3, 10, 4)
To multiply x
and y
without loops, we can add degenerated dimenions to x
in order to align them with the dimensions of y
:
xnd = x[np.newaxis, :, np.newaxis, :, np.newaxis, :]
xnd.shape
(1, 2, 1, 3, 1, 4)
We can see that now the shapes of the xnd
array and of y
are aligned:
y.shape
(5, 2, 7, 3, 10, 4)
Now we can multiply both arrays:
z = y * xnd
z.shape
(5, 2, 7, 3, 10, 4)
Managing filled values¶
Filled values are generally defined as numpy.nan
values.
x = np.arange(1, 10).astype(float)
x[0:3] = np.nan
x
array([nan, nan, nan, 4., 5., 6., 7., 8., 9.])
Checking whether values are NaN
is achieved by using the np.isnan
method
cond = np.isnan(x)
cond
array([ True, True, True, False, False, False, False, False, False])
To find the index of the NaN
values:
np.nonzero(np.isnan(x))
(array([0, 1, 2]),)
An equivalent to the np.isnan
method is to use the mathematical definition of NaNs
.
np.nonzero(x != x) # Mathematical definition of NaN
(array([0, 1, 2]),)
Warning: NaNs
can only be used with Float arrays (not integer arrays)
Operations on filled values¶
To perform operations on data containing NaNs
requires the use of special functions:
np.nanmean(x)
6.5
np.nansum(x)
39.0
np.nanstd(x)
1.707825127659933
Since this can be a bit annoying, numpy.array
objects can be converted into masked_array
objects, whose operations automatically discards NaN.
Masked array¶
x = np.arange(0, 10).astype(int)
x
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
To convert an array into a masked array, the following methods can be used:
x = np.ma.masked_where(x==0, x)
x
masked_array(data=[--, 1, 2, 3, 4, 5, 6, 7, 8, 9],
mask=[ True, False, False, False, False, False, False, False,
False, False],
fill_value=999999)
x = np.ma.masked_equal(x, 5)
x
masked_array(data=[--, 1, 2, 3, 4, --, 6, 7, 8, 9],
mask=[ True, False, False, False, False, True, False, False,
False, False],
fill_value=5)
x = np.ma.masked_greater(x, 7)
x
masked_array(data=[--, 1, 2, 3, 4, --, 6, 7, --, --],
mask=[ True, False, False, False, False, True, False, False,
True, True],
fill_value=5)
x = np.ma.masked_less(x, 3)
x
masked_array(data=[--, --, --, 3, 4, --, 6, 7, --, --],
mask=[ True, True, True, False, False, True, False, False,
True, True],
fill_value=5)
x = np.ma.masked_where(np.isnan(x), x)
x
masked_array(data=[--, --, --, 3, 4, --, 6, 7, --, --],
mask=[ True, True, True, False, False, True, False, False,
True, True],
fill_value=5)
x[2] = np.ma.masked
x
masked_array(data=[--, --, --, 3, 4, --, 6, 7, --, --],
mask=[ True, True, True, False, False, True, False, False,
True, True],
fill_value=5)
This new object has additional attributes and method. Especially for assessing where the filled values are located.
x.mask
array([ True, True, True, False, False, True, False, False, True,
True])
np.ma.getmaskarray(x)
array([ True, True, True, False, False, True, False, False, True,
True])
np.mean(x)
5.0
np.sum(x)
20
np.std(x)
1.5811388300841898
It is strongly advised to use the np.ma.getmaskarray
method rather than using the mask
attribute. Indeed, the latter will return a bool
instead of an array of bool
if no data is missing.
# warning for masked_arrays
x = np.arange(10, 15).astype(float)
x = np.ma.masked_where(np.isnan(x), x)
x
masked_array(data=[10., 11., 12., 13., 14.],
mask=False,
fill_value=1e+20)
Therefore, extracting all non missing values from an array is not safe using mask
:
iok = np.nonzero(x.mask == False)
x[iok]
masked_array(data=[10.],
mask=False,
fill_value=1e+20)
In this case, only the first value is returned, instead of all values.
However, it works perfectly using the getmaskarray
method:
iok = np.nonzero(np.ma.getmaskarray(x) == False)
x[iok]
masked_array(data=[10., 11., 12., 13., 14.],
mask=False,
fill_value=1e+20)
Scientific Python¶
Although the Numpy library allows to do some operations, it is rather limited. Other mathematical functions are provided by the Scipy library.
Submodules¶
Scipy comes with numerous submodules, which are listed below (source: scipy.org)
Description |
Module |
---|---|
Special functions (mathematical physics) |
|
Integration |
|
Optimization |
|
Interpolation |
|
Fourier Transforms |
|
Signal Processing |
|
Linear Algebra |
|
Spatial data structures and algorithms |
|
Statistics |
|
Multidimensional image processing |
|
File IO: Matlab, NetCDF, IDL |
Loops¶
As highlighted in the above, it is highly recommended to avoid the use of loops. However, if needed, the looping method in arrays is described below:
Multi-arrays in computer memory¶
Computer memory is inherently linear, i.e. multi-dimensional arrays are stored in memory as one-dimensional arrays. This can be done in two ways:
Row-major order: C/C++, Python
Column-major order: Fortran, Matlab, R, Julia
Row-major order:
Column-major order:
Loops¶
This has implications when writting loops. Indeed, imbricated loops should be consistent with the memory ordering:
Row-order (Python)
import numpy as np
x = np.empty((5, 10))
for i in range(5): # inner loop: 1st dim
for j in range(10): # outer loop: last dim
print(x[i, j])
Column-order (Julia)
x = zeros(Int8, 5, 10)
for j = 1:10 # inner loop: last dim
for i = 1:5 # outer loop: 1st dim
println(x[i, j])
end
end
Sources: Wikipedia, thegreenplace
Let’s see with a quick example, using the following array
import numpy as np
shape = (30, 40, 50, 800)
x = np.random.rand(shape[0], shape[1], shape[2], shape[3]).astype(float)
x.shape
(30, 40, 50, 800)
%%time
total = 0
N = 0
for i in range(shape[0]):
for j in range(shape[1]):
for k in range(shape[2]):
for l in range(shape[3]):
total += x[i, j, k, l]
N += 1
total /= N
total
CPU times: user 18.4 s, sys: 2 ms, total: 18.4 s
Wall time: 18.4 s
0.49996726640667866
%%time
total = 0
N = 0
for l in range(shape[3]):
for k in range(shape[2]):
for j in range(shape[1]):
for i in range(shape[0]):
total += x[i, j, k, l]
N += 1
total /= N
total
CPU times: user 24.2 s, sys: 36 ms, total: 24.3 s
Wall time: 24.3 s
0.4999672664065223
The last loop is slower than the first one because the loop order is not consistent with the C-order used in Python.
Note: The np.ndenumerate
method alows to loop in an array without risk.
cpt = 0
for k, v in np.ndenumerate(x):
if(cpt == 15):
break
print(k, x[k])
cpt += 1
(0, 0, 0, 0) 0.42457975076903165
(0, 0, 0, 1) 0.5377794661947678
(0, 0, 0, 2) 0.9744969920065903
(0, 0, 0, 3) 0.740293458634836
(0, 0, 0, 4) 0.13934592348230423
(0, 0, 0, 5) 0.09461858849650584
(0, 0, 0, 6) 0.4775997141298671
(0, 0, 0, 7) 0.8553440450168308
(0, 0, 0, 8) 0.1048005048148104
(0, 0, 0, 9) 0.885842543934454
(0, 0, 0, 10) 0.4107712279021991
(0, 0, 0, 11) 0.6931539244334584
(0, 0, 0, 12) 0.47570418127062775
(0, 0, 0, 13) 0.012965744408408142
(0, 0, 0, 14) 0.9019989978361336
The np.nditer
allows to so similar things:
cpt = 0
for v in np.nditer(x):
if(cpt == 15):
break
print(v)
cpt += 1
0.42457975076903165
0.5377794661947678
0.9744969920065903
0.740293458634836
0.13934592348230423
0.09461858849650584
0.4775997141298671
0.8553440450168308
0.1048005048148104
0.885842543934454
0.4107712279021991
0.6931539244334584
0.47570418127062775
0.012965744408408142
0.9019989978361336
Note that the use of nditer
is read-only. To modify the input array:
cpt = 0
with np.nditer(x, op_flags=['readwrite']) as it:
for v in it:
if(cpt == 15):
break
v[...] = 2 * v
cpt += 1
cpt = 0
for v in np.nditer(x):
if(cpt == 15):
break
print(v)
cpt += 1
0.8491595015380633
1.0755589323895356
1.9489939840131807
1.480586917269672
0.27869184696460847
0.18923717699301168
0.9551994282597343
1.7106880900336616
0.2096010096296208
1.771685087868908
0.8215424558043982
1.3863078488669167
0.9514083625412555
0.025931488816816284
1.8039979956722672
For more details about iteration, visit arrays.nditer.
Since the loops are not efficient in Python, it is better not to use them! Therefore, use vectorized functions.
%%time
tot = x.mean()
tot
CPU times: user 26.6 ms, sys: 15 µs, total: 26.7 ms
Wall time: 26.6 ms
0.49996742743360034