8. Processing Multidimensional Data

The open-access textbook Minimalist Data Wrangling with Python by Marek Gagolewski is, and will remain, freely available for everyone’s enjoyment (also in PDF; a printed version can be ordered from Amazon: AU CA DE ES FR IT JP NL PL SE UK US). It is a non-profit project. Although available online, it is a whole course; it should be read from the beginning to the end. Refer to the Preface for general introductory remarks. Any bug/typo reports/fixes are appreciated.

8.1. From Vectors to Matrices

Let us study how the vector operations that we discussed in, amongst others, Chapter 5 can be extended to matrices. In many cases, we will end up applying the same transform either on every matrix element separately, or on each row or column. They are all brilliant examples of the write less, do more principle in practice.

8.1.1. Vectorised Mathematical Functions

Applying vectorised functions such as numpy.round, numpy.log, and numpy.exp returns an array of the same shape, with all elements transformed accordingly.

A = np.array([
    [0.2, 0.6, 0.4, 0.4],
    [0.0, 0.2, 0.4, 0.7],
    [0.8, 0.8, 0.2, 0.1]
])  # example matrix that we will be using below

For example:

np.square(A)
## array([[0.04, 0.36, 0.16, 0.16],
##        [0.  , 0.04, 0.16, 0.49],
##        [0.64, 0.64, 0.04, 0.01]])

takes the square of every element.

More generally, we will be denoting such operations with:

\[\begin{split} f(\mathbf{X})= \left[ \begin{array}{cccc} f(x_{1,1}) & f(x_{1,2}) & \cdots & f(x_{1,m}) \\ f(x_{2,1}) & f(x_{2,2}) & \cdots & f(x_{2,m}) \\ \vdots & \vdots & \ddots & \vdots \\ f(x_{n,1}) & f(x_{n,2}) & \cdots & f(x_{n,m}) \\ \end{array} \right]. \end{split}\]

8.1.2. Componentwise Aggregation

Unidimensional aggregation functions (e.g., numpy.mean, numpy.quantile) can be applied to summarise:

  • all data into a single number (axis=None, being the default),

  • data in each column (axis=0), as well as

  • data in each row (axis=1).

Here are the examples corresponding to the above cases:

np.mean(A)
## 0.39999999999999997
np.mean(A, axis=0)
## array([0.33333333, 0.53333333, 0.33333333, 0.4       ])
np.mean(A, axis=1)
## array([0.4  , 0.325, 0.475])

Important

Let us repeat, axis=1 does not mean that we get the column means (even though columns constitute the 2nd axis, and we count starting at 0). It denotes the axis along which the matrix is sliced. Sadly, even yours truly sometimes does not get it right on the first attempt.

Exercise 8.1

Given the nhanes_adult_female_bmx_2020 dataset, compute the mean, standard deviation, minimum, and maximum of each body measurement.

We will get back to the topic of the aggregation of multidimensional data in Section 8.4.

8.1.3. Arithmetic, Logical, and Comparison Operations

Recall that for vectors, binary operators such as `+`, `*`, `==`, `<=`, and `&` as well as similar elementwise functions (e.g., numpy.minimum) can be applied if both inputs are of the same length, for example:

np.array([1, 10, 100, 1000]) * np.array([7, -6, 2, 8])  # elementwisely
## array([   7,  -60,  200, 8000])

Alternatively, one input can be a scalar:

np.array([1, 10, 100, 1000]) * -3
## array([   -3,   -30,  -300, -3000])

More generally, a set of rules referred to in the numpy manual as broadcasting describes how this package handles arrays of different shapes.

Important

Generally, for two matrices, their column/row numbers much match or be equal to 1. Also, if one operand is a one-dimensional array, it will be promoted to a row vector.

Let us explore all the possible scenarios.

8.1.3.1. Matrix vs Scalar

If one operand is a scalar, then it is going to be propagated over all matrix elements, for example:

(-1)*A
## array([[-0.2, -0.6, -0.4, -0.4],
##        [-0. , -0.2, -0.4, -0.7],
##        [-0.8, -0.8, -0.2, -0.1]])

changes the sign of every element, which is, mathematically, an instance of multiplying a matrix \(\mathbf{X}\) by a scalar \(c\):

\[\begin{split} c\mathbf{X}= \left[ \begin{array}{cccc} cx_{1,1} & cx_{1,2} & \cdots & cx_{1,m} \\ cx_{2,1} & cx_{2,2} & \cdots & cx_{2,m} \\ \vdots & \vdots & \ddots & \vdots \\ cx_{n,1} & cx_{n,2} & \cdots & cx_{n,m} \\ \end{array} \right]. \end{split}\]

Furthermore:

A**2
## array([[0.04, 0.36, 0.16, 0.16],
##        [0.  , 0.04, 0.16, 0.49],
##        [0.64, 0.64, 0.04, 0.01]])

takes the square1 of each element. Also:

A >= 0.25
## array([[False,  True,  True,  True],
##        [False, False,  True,  True],
##        [ True,  True, False, False]])

compares each element to 0.25.

8.1.3.2. Matrix vs Matrix

For two matrices of identical sizes, we act on the corresponding elements:

B = np.tri(A.shape[0], A.shape[1])  # just an example
B  # a lower triangular 0-1 matrix
## array([[1., 0., 0., 0.],
##        [1., 1., 0., 0.],
##        [1., 1., 1., 0.]])

And now:

A * B
## array([[0.2, 0. , 0. , 0. ],
##        [0. , 0.2, 0. , 0. ],
##        [0.8, 0.8, 0.2, 0. ]])

multiplies each \(a_{i,j}\) by the corresponding \(b_{i,j}\).

This extends on the idea from algebra that given \(\mathbf{A}\), \(\mathbf{B}\) with n rows and m columns each, the result of \(+\) (or \(-\)) would be for instance:

\[\begin{split} \mathbf{A} + \mathbf{B}= \left[ \begin{array}{cccc} a_{1,1}+b_{1,1} & a_{1,2}+b_{1,2} & \cdots & a_{1,m}+b_{1,m} \\ a_{2,1}+b_{2,1} & a_{2,2}+b_{2,2} & \cdots & a_{2,m}+b_{2,m} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n,1}+b_{n,1} & a_{n,2}+b_{n,2} & \cdots & a_{n,m}+b_{n,m} \\ \end{array} \right]. \end{split}\]

Thanks to the matrix-matrix and matrix-scalar operations we can perform various tests on a per-element basis, e.g.,

(A >= 0.25) & (A <= 0.75)  # logical matrix & logical matrix
## array([[False,  True,  True,  True],
##        [False, False,  True,  True],
##        [False, False, False, False]])
Example 8.2

(*) Figure 8.1 depicts a (filled) contour plot of the Himmelblau’s function, \(f(x,y)=(x^{2}+y-11)^{2}+(x+y^{2}-7)^{2}\), for \(x\in[-5, 5]\) and \(y\in[-4, 4]\). To draw it, we probed 250 points from the two said ranges and called numpy.meshgrid to generate two matrices, both of shape 250 by 250, giving the x- and y-coordinates of all the points on the corresponding two-dimensional grid. Thanks to this, we were able to use vectorised mathematical operations to compute the values of \(f\) thereon.

x = np.linspace(-5, 5, 250)
y = np.linspace(-4, 4, 250)
xg, yg = np.meshgrid(x, y)
z = (xg**2 + yg - 11)**2 + (xg + yg**2 - 7)**2
plt.contourf(x, y, z, levels=20)
CS = plt.contour(x, y, z, levels=[1, 5, 10, 20, 50, 100, 150, 200, 250])
plt.clabel(CS, colors="black")
plt.show()
../_images/ex-contourf-1.png

Figure 8.1 An example filled contour plot with additional labelled contour lines

To understand the result generated by numpy.meshgrid, here is its output for a smaller number of probe points:

x = np.linspace(-5, 5, 3)
y = np.linspace(-4, 4, 5)
xg, yg = np.meshgrid(x, y)
xg
## array([[-5.,  0.,  5.],
##        [-5.,  0.,  5.],
##        [-5.,  0.,  5.],
##        [-5.,  0.,  5.],
##        [-5.,  0.,  5.]])

Here, each column is the same.

yg
## array([[-4., -4., -4.],
##        [-2., -2., -2.],
##        [ 0.,  0.,  0.],
##        [ 2.,  2.,  2.],
##        [ 4.,  4.,  4.]])

In this case, each row is identical. Thanks to this, calling:

(xg**2 + yg - 11)**2 + (xg + yg**2 - 7)**2
## array([[116., 306., 296.],
##        [208., 178., 148.],
##        [340., 170., 200.],
##        [320.,  90., 260.],
##        [340., 130., 520.]])

gives a matrix \(\mathbf{Z}\) such that \(z_{i,j}\) is generated by considering the \(i\)-th element in y and the \(j\)-th item in x, which is exactly what we desired.

8.1.3.3. Matrix vs Any Vector

An n×m matrix can also be combined with an n×1 column vector:

A * np.array([1, 10, 100]).reshape(-1, 1)
## array([[ 0.2,  0.6,  0.4,  0.4],
##        [ 0. ,  2. ,  4. ,  7. ],
##        [80. , 80. , 20. , 10. ]])

The above propagated the column vector over all columns (left to right).

Similarly, combining with a 1×m row vector:

A + np.array([1, 2, 3, 4]).reshape(1, -1)
## array([[1.2, 2.6, 3.4, 4.4],
##        [1. , 2.2, 3.4, 4.7],
##        [1.8, 2.8, 3.2, 4.1]])

recycles the row vector over all rows (top to bottom).

If one operand is a one-dimensional array or a list of length m, it will be treated as a row vector. For example:

np.round(A - np.mean(A, axis=0), 3)  # matrix - vector
## array([[-0.133,  0.067,  0.067, -0.   ],
##        [-0.333, -0.333,  0.067,  0.3  ],
##        [ 0.467,  0.267, -0.133, -0.3  ]])

On a side note, this is an instance of centring of each column. An explicit .reshape(1, -1) was not necessary.

Mathematically, although it is not necessarily a standard notation, we will allow adding and subtracting row vectors from matrices of compatible sizes:

\[\begin{split} \mathbf{X}+\mathbf{t}= \mathbf{X}+[ t_1\ t_2\ \cdots\ t_m ]= \left[ \begin{array}{cccc} x_{1,1} + t_1 & x_{1,2} + t_2 & \dots & x_{1,m} + t_m \\ x_{2,1} + t_1 & x_{2,2} + t_2 & \dots & x_{2,m} + t_m \\ \vdots & \vdots & \ddots & \vdots \\ x_{n,1} + t_1 & x_{n,2} + t_2 & \dots & x_{n,m} + t_m \\ \end{array} \right]. \end{split}\]

This corresponds to shifting (translating) every row in the matrix.

Exercise 8.3

In the nhanes_adult_female_bmx_2020 dataset, standardise, normalise, and min-max scale every column. In every case, a single line of code is sufficient.

8.1.3.4. Row Vector vs Column Vector (*)

A row vector combined with a column vector results in an operation’s being performed on each combination of all pairs of elements in the two arrays (i.e., the cross-product; not just the corresponding pairs).

np.arange(1, 8).reshape(1, -1) * np.array([1, 10, 100]).reshape(-1, 1)
## array([[  1,   2,   3,   4,   5,   6,   7],
##        [ 10,  20,  30,  40,  50,  60,  70],
##        [100, 200, 300, 400, 500, 600, 700]])
Exercise 8.4

Check out that numpy.nonzero relies on similar shape broadcasting rules as the binary operators we discussed here, but not with respect to all three arguments.

Example 8.5

(*) Himmelblau’s function in Figure 8.1 is only defined by means of arithmetic operators, which all accept the kind of shape broadcasting that we discuss in this section. Consequently, calling numpy.meshgrid in that example to evaluate \(f\) on a grid of points was not really necessary:

x = np.linspace(-5, 5, 3)
y = np.linspace(-4, 4, 5)
xg = x.reshape(1, -1)
yg = y.reshape(-1, 1)
(xg**2 + yg - 11)**2 + (xg + yg**2 - 7)**2
## array([[116., 306., 296.],
##        [208., 178., 148.],
##        [340., 170., 200.],
##        [320.,  90., 260.],
##        [340., 130., 520.]])

See also the sparse parameter in numpy.meshgrid and Figure 12.9 where this function turns out useful after all.

8.1.4. Other Row and Column Transforms (*)

Some functions that we discussed in the previous part of this course are equipped with the axis argument, which allows to process each row or column independently, for example:

np.sort(A, axis=1)
## array([[0.2, 0.4, 0.4, 0.6],
##        [0. , 0.2, 0.4, 0.7],
##        [0.1, 0.2, 0.8, 0.8]])

sorts every row (separately). Moreover:

scipy.stats.rankdata(A, axis=0)
## array([[2. , 2. , 2.5, 2. ],
##        [1. , 1. , 2.5, 3. ],
##        [3. , 3. , 1. , 1. ]])

computes the ranks of elements in each column.

Some functions have the default argument axis=-1, which means that they are applied along the last (i.e., columns in the matrix case) axis:

np.diff(A)  # axis=1 here
## array([[ 0.4, -0.2,  0. ],
##        [ 0.2,  0.2,  0.3],
##        [ 0. , -0.6, -0.1]])

Still, the aforementioned numpy.mean is amongst the many exceptions to this rule.

Compare the above with:

np.diff(A, axis=0)
## array([[-0.2, -0.4,  0. ,  0.3],
##        [ 0.8,  0.6, -0.2, -0.6]])

which gives the iterated differences for each column separately (along the rows).

If a function (built-in or custom) in not equipped with the axis argument and – instead – it was designed to work with individual vectors, we can propagate it over all the rows or columns by calling numpy.apply_along_axis.

For instance, here is another (did you solve the suggested exercise?) way to compute the column z-scores:

def standardise(x):
    return (x-np.mean(x))/np.std(x)

np.round(np.apply_along_axis(standardise, 0, A), 2)
## array([[-0.39,  0.27,  0.71, -0.  ],
##        [-0.98, -1.34,  0.71,  1.22],
##        [ 1.37,  1.07, -1.41, -1.22]])

But of course we prefer (x - np.mean(x, axis=0))/np.std(x, axis=0).

Note

(*) Matrices are iterable (in the sense of Section 3.4), but in an interesting way. Namely, an iterator traverses through each row in a matrix. Writing:

r1, r2, r3 = A  # A has 3 rows

creates three variables, each representing a separate row in A, the second of which is:

r2
## array([0. , 0.2, 0.4, 0.7])

8.2. Indexing Matrices

Recall that for unidimensional arrays, we have four possible choices of indexers (i.e., where performing filtering like x[i]):

  • scalar (extracts a single element),

  • slice (selects a regular subsequence, e.g., every 2nd element or the first 6 items; returns a view on existing data – it does not make an independent copy of the subsetted elements),

  • integer vector (selects elements at given indexes),

  • logical vector (selects elements that correspond to True in the indexer).

Matrices are two-dimensional arrays. Subsetting thereof will require two indexes. We write A[i, j] to select rows given by i and columns given by j. Both i and j can be one of the four above types, so we have at least 10 different cases to consider (skipping the symmetric ones).

Important

Generally:

  • each scalar index reduces the dimensionality of the subsetted object by 1;

  • slice-slice and slice-scalar indexing returns a view on the existing array, so we need to be careful when modifying the resulting object;

  • usually, indexing returns a submatrix (subblock), which is a combination of elements at given rows and columns;

  • indexing with two integer or logical vectors at the same time should be avoided.

Let us look at all the possible scenarios in greater detail.

8.2.1. Slice-Based Indexing

Our favourite example matrix again:

A = np.array([
    [0.2, 0.6, 0.4, 0.4],
    [0.0, 0.2, 0.4, 0.7],
    [0.8, 0.8, 0.2, 0.1]
])

Indexing based on two slices selects a submatrix:

A[::2, 3:]  # every second row, skip the first three columns
## array([[0.4],
##        [0.1]])

An empty slice selects all elements on the corresponding axis:

A[:, ::-1]  # all rows, reversed columns
## array([[0.4, 0.4, 0.6, 0.2],
##        [0.7, 0.4, 0.2, 0. ],
##        [0.1, 0.2, 0.8, 0.8]])

Let us stress that the result is always a matrix.

8.2.2. Scalar-Based Indexing

Indexing by a scalar selects a given row or column, reducing the dimensionality of the output object.

A[:, 3]
## array([0.4, 0.7, 0.1])

selects the 4th column and gives a flat vector (we can always use the reshape method to convert the resulting object back to a matrix).

Furthermore:

A[0, -1]
## 0.4

yields the element (scalar) in the first row and the last column.

8.2.3. Mixed Boolean/Integer Vector and Scalar/Slice Indexers

A logical and integer vector-like object can also be used for element selection. If the other indexer is a slice or a scalar, the result is quite predictable, for instance:

A[ [0, -1, 0], ::-1 ]
## array([[0.4, 0.4, 0.6, 0.2],
##        [0.1, 0.2, 0.8, 0.8],
##        [0.4, 0.4, 0.6, 0.2]])

selects the first, the last, and the first row again and reverses the order of columns.

A[ A[:, 0] > 0.1, : ]
## array([[0.2, 0.6, 0.4, 0.4],
##        [0.8, 0.8, 0.2, 0.1]])

selects the rows from A where the values in the first column of A are greater than 0.1.

A[np.mean(A, axis=1) > 0.35, : ]
## array([[0.2, 0.6, 0.4, 0.4],
##        [0.8, 0.8, 0.2, 0.1]])

selects the rows whose mean is greater than 0.35.

A[np.argsort(A[:, 0]), : ]
## array([[0. , 0.2, 0.4, 0.7],
##        [0.2, 0.6, 0.4, 0.4],
##        [0.8, 0.8, 0.2, 0.1]])

orders the matrix with respect to the values in the first column (all rows permuted in the same way, together).

Exercise 8.6

In the nhanes_adult_female_bmx_2020 dataset, select all the participants whose heights are within their mean ± 2 standard deviations.

8.2.4. Two Vectors as Indexers (*)

With two vectors (logical or integer) things are a tad more horrible, as in this case not only some form of shape broadcasting comes into play but also all the headache-inducing exceptions listed in the perhaps not the most clearly written Advanced Indexing section of the numpy manual. Cheer up, though: things in pandas are much worse; see Section 10.5.

For the sake of our maintaining sanity, in practice, it is best to be extra careful when using two vector indexers and stick only to the scenarios discussed below.

For two flat integer indexers, we pick elementwisely:

A[ [0, -1, 0, 2, 0], [1, 2, 0, 2, 1] ]
## array([0.6, 0.2, 0.2, 0.2, 0.6])

yields A[0, 1], A[-1, 2], A[0, 0], A[2, 2], and A[0, 1].

To select a submatrix using integer indexes, it is best to make sure that the first indexer is a column vector, and the second one is a row vector (or some objects like these, e.g., compatible lists of lists).

A[ [[0], [-1]], [[1, 3]] ]  # column vector-like list, row vector-like list
## array([[0.6, 0.4],
##        [0.8, 0.1]])

Further, if indexing involves logical vectors, it is best to convert them to integer ones first (e.g., by calling numpy.flatnonzero).

A[ np.flatnonzero(np.mean(A, axis=1) > 0.35).reshape(-1, 1), [[0, 2, 3, 0]] ]
## array([[0.2, 0.4, 0.4, 0.2],
##        [0.8, 0.2, 0.1, 0.8]])

The necessary reshaping can be done automatically with the numpy.ix_ function:

A[ np.ix_( np.mean(A, axis=1) > 0.35, [0, 2, 3, 0] ) ]  # np.ix_(rows, cols)
## array([[0.2, 0.4, 0.4, 0.2],
##        [0.8, 0.2, 0.1, 0.8]])

Alternatively, we can always apply indexing twice instead:

A[np.mean(A, axis=1) > 0.45, :][:, [0, 2, 3, 0]]
## array([[0.8, 0.2, 0.1, 0.8]])

This is only a mild inconvenience. We will be forced to apply such double indexing anyway in pandas whenever selecting rows by position and columns by name is required; see Section 10.5.

8.2.5. Views on Existing Arrays (*)

Only the indexing involving two slices or a slice and a scalar returns a view on an existing array.

For example:

B = A[:, ::2]
B
## array([[0.2, 0.4],
##        [0. , 0.4],
##        [0.8, 0.2]])

Now B and A share memory. By modifying B in place, e.g.:

B *= -1

the changes will be visible in A as well:

A
## array([[-0.2,  0.6, -0.4,  0.4],
##        [-0. ,  0.2, -0.4,  0.7],
##        [-0.8,  0.8, -0.2,  0.1]])

This is time and memory efficient, but might lead to some unexpected results if we are being rather absent-minded. We have been warned.

8.2.6. Adding and Modifying Rows and Columns

With slice/scalar-based indexers, rows/columns/individual elements can be replaced by new content in a natural way:

A[:, 0] = A[:, 0]**2

With numpy arrays, however, brand new rows or columns cannot be added via the index operator. Instead, the whole array needs to be created from scratch using, e.g., one of the functions discussed in Section 7.1.4. For example:

A = np.column_stack((A, np.sqrt(A[:, 0])))
A
## array([[ 0.04,  0.6 , -0.4 ,  0.4 ,  0.2 ],
##        [ 0.  ,  0.2 , -0.4 ,  0.7 ,  0.  ],
##        [ 0.64,  0.8 , -0.2 ,  0.1 ,  0.8 ]])

8.3. Matrix Multiplication, Dot Products, and the Euclidean Norm

Matrix algebra is at the core of all the methods used in data analysis with the matrix multiply being the most fundamental operation therein (e.g., [18, 34]).

Given \(\mathbf{A}\in\mathbb{R}^{n\times p}\) and \(\mathbf{B}\in\mathbb{R}^{p\times m}\), their multiply is a matrix \(\mathbf{C}=\mathbf{A}\mathbf{B}\in\mathbb{R}^{n\times m}\) such that \(c_{i,j}\) is the sum of the \(i\)-th row in \(\mathbf{A}\) and the \(j\)-th column in \(\mathbf{B}\) multiplied elementwisely:

\[ c_{i,j} = a_{i, 1} b_{1, j} + a_{i, 2} b_{2, j} + \dots + a_{i, p} b_{p, j} = \sum_{k=1}^p a_{i,k} b_{k, j}, \]

for \(i=1,\dots,n\) and \(j=1,\dots,m\).

For example:

A = np.array([
    [1, 0, 1],
    [2, 2, 1],
    [3, 2, 0],
    [1, 2, 3],
    [0, 0, 1],
])
B = np.array([
    [1, 0, 0, 0],
    [0, 4, 1, 3],
    [2, 0, 3, 1],
])

And now:

C = A @ B  # or: A.dot(B)
C
## array([[ 3,  0,  3,  1],
##        [ 4,  8,  5,  7],
##        [ 3,  8,  2,  6],
##        [ 7,  8, 11,  9],
##        [ 2,  0,  3,  1]])

Mathematically, we can write the above as:

\[\begin{split} \left[ \begin{array}{ccc} 1 & 0 & 1 \\ 2 & 2 & 1 \\ 3 & 2 & 0 \\ \mathbf{1} & \mathbf{2} & \mathbf{3} \\ 0 & 0 & 1 \\ \end{array} \right] \, \left[ \begin{array}{cccc} 1 & 0 & \mathbf{0} & 0 \\ 0 & 4 & \mathbf{1} & 3 \\ 2 & 0 & \mathbf{3} & 1 \\ \end{array} \right] = \left[ \begin{array}{cccc} 3 & 0 & 3 & 1 \\ 4 & 8 & 5 & 7 \\ 3 & 8 & 2 & 6 \\ 7 & 8 & \mathbf{11} & 9 \\ 2 & 0 & 3 & 1 \\ \end{array} \right]. \end{split}\]

For example, the element in the 4th row and 3rd column, \(c_{4,3}\) takes the 4th row in the left matrix \(\mathbf{a}_{4,\cdot}=[1\ 2\ 3]\) and the 3rd column in the right matrix \(\mathbf{b}_{\cdot,3}=[0\ 1\ 3]^T\) (they are marked in bold), multiplies the corresponding elements and computes their sum, i.e., \(c_{4,3}=1\cdot 0 + 2\cdot 1 + 3\cdot 3 = 11\).

Important

Matrix multiplication can only be performed on two matrices of compatible sizes – the number of columns in the left matrix must match the number of rows in the right operand.

Another example:

A = np.array([
    [1, 2],
    [3, 4]
])
I = np.array([  # np.eye(2)
    [1, 0],
    [0, 1]
])
A @ I  # or A.dot(I)
## array([[1, 2],
##        [3, 4]])

We matrix-multiplied \(\mathbf{A}\) by the identity matrix \(\mathbf{I}\), which is the neutral element of the said operation. This is why the result is identical to \(\mathbf{A}\).

Important

In most textbooks, just like in this one, \(\mathbf{A}\mathbf{B}\) always denotes the matrix multiplication. This is a very different operation from the elementwise multiplication.

Compare the above to:

A * I  # elementwise multiplication
## array([[1, 0],
##        [0, 4]])
Exercise 8.7

(*) Show that \((\mathbf{A}\mathbf{B})^T=\mathbf{B}^T \mathbf{A}^T\). Also notice that, typically, matrix multiplication is not commutative.

Note

By definition, matrix multiplication gives a convenient means for denoting sums of products of corresponding elements in many pairs of vectors, which we refer to as dot products.

Given two vectors \(\boldsymbol{x}, \boldsymbol{y} \in \mathbb{R}^p\), their dot (or scalar) product is given by:

\[ \boldsymbol{x} \cdot \boldsymbol{y} = \sum_{i=1}^p x_i y_i. \]

In matrix multiplication terms, if \(\mathbf{x}\) is a row vector and \(\mathbf{y}^T\) is a column vector, then the above can be written as \(\mathbf{x} \mathbf{y}^T\). The result is a single number.

In particular, a dot product of a vector and itself:

\[ \boldsymbol{x} \cdot \boldsymbol{x} = \sum_{i=1}^p x_i^2, \]

is the square of the Euclidean norm of \(\boldsymbol{x}\), which – as we said in Section 5.3.2.3 – is used to measure the magnitude of a vector:

\[ \|\boldsymbol{x}\| = \sqrt{\sum_{i=1}^p x_i^2} =\sqrt{\boldsymbol{x} \cdot \boldsymbol{x}} =\sqrt{\mathbf{x} \mathbf{x}^T}. \]

It is worth pointing out that the Euclidean norm fulfils (amongst others) the condition that \(\|\boldsymbol{x}\|=0\) if and only if \(\boldsymbol{x}=\mathbf{0}=(0,0,\dots,0)\). The same of course holds for its square.

Exercise 8.8

Show that \(\mathbf{A}^T \mathbf{A}\) gives the matrix that consists of the dot products of all the pairs of columns in \(\mathbf{A}\) and \(\mathbf{A} \mathbf{A}^T\) stores the dot products of all the pairs of rows.

In Section 9.3.2, we will see that matrix multiplication can be used as a way to express certain geometrical transformations of points in a dataset, e.g., scaling and rotating.

Also, in Section 9.3.3, we briefly discuss the concept of the inverse of a matrix and in Section 9.3.4, we introduce its singular value decomposition.

8.5. Exercises

Exercise 8.10

Does numpy.mean(A, axis=0) compute rowwise or columnwise means?

Exercise 8.11

How does shape broadcasting work? List the most common pairs of shape cases when performing arithmetic operations like addition or multiplication.

Exercise 8.12

What are the possible matrix indexing schemes and how do they behave?

Exercise 8.13

Which kinds of matrix indexers return a view on an existing array?

Exercise 8.14

(*) How can we select a submatrix comprised of the first and the last row and the first and the last column?

Exercise 8.15

Why appropriate data preprocessing is required when computing the Euclidean distance between points?

Exercise 8.16

What is the relationship between the dot product, the Euclidean norm, and the Euclidean distance?

Exercise 8.17

What is a centroid? How is it defined by means of the Euclidean distance between the points in a dataset?

Exercise 8.18

What is the difference between the fixed-radius and few nearest-neighbours search?

Exercise 8.19

(*) When K-d trees or other spatial search data structures might be better than a brute-force search with scipy.spatial.distance.cdist?


1

This is not the same as matrix-multiply by itself which we cover in Section 8.3.

2

There are many possible distances, allowing to measure the similarity of points not only in \(\mathbb{R}^m\), but also character strings (e.g., the Levenshtein metric), ratings (e.g., cosine dissimilarity), etc.; there is even an encyclopedia of distances [21].

3

In our context, we should prefer referring to them as m-d trees, but let us stick with the traditional name.