8. Processing Multidimensional Data

The online version of the open-access textbook Minimalist Data Wrangling with Python by Marek Gagolewski is, and will remain, freely available for everyone’s enjoyment (also in PDF). Any bug/typos reports/fixes are appreciated. Although available online, this is a whole course; it should be read from the beginning to the end. In particular, refer to the Preface for general introductory remarks.

8.1. From Vectors to Matrices

Let us study how the vector operations that we discussed in, amongst others, Chapter 5 can be extended to matrices. In many cases, we will end up applying the same transform either on every matrix element separately, or on each row or column. They are all a brilliant example of the write less, do more principle in practice.

8.1.1. Vectorised Mathematical Functions

Applying vectorised functions such as numpy.round, numpy.log, and numpy.exp returns an array of the same shape, with all elements transformed accordingly.

A = np.array([
    [0.2, 0.6, 0.4, 0.4],
    [0.0, 0.2, 0.4, 0.7],
    [0.8, 0.8, 0.2, 0.1]
])  # example matrix that we will be using below

For example:

np.square(A)
## array([[0.04, 0.36, 0.16, 0.16],
##        [0.  , 0.04, 0.16, 0.49],
##        [0.64, 0.64, 0.04, 0.01]])

takes the square of every element.

More generally, we will be denoting such operations with:

\[\begin{split} f(\mathbf{X})= \left[ \begin{array}{cccc} f(x_{1,1}) & f(x_{1,2}) & \cdots & f(x_{1,m}) \\ f(x_{2,1}) & f(x_{2,2}) & \cdots & f(x_{2,m}) \\ \vdots & \vdots & \ddots & \vdots \\ f(x_{n,1}) & f(x_{n,2}) & \cdots & f(x_{n,m}) \\ \end{array} \right]. \end{split}\]

8.1.2. Componentwise Aggregation

Unidimensional aggregation functions (e.g., numpy.mean, numpy.quantile) can be applied to summarise:

  • data in each row (axis=1),

  • data in each column (axis=0),

as well as:

  • all data into a single number (axis=None, being the default),

Here are the examples corresponding to the above cases:

np.mean(A)
## 0.39999999999999997
np.mean(A, axis=1)
## array([0.4  , 0.325, 0.475])
np.mean(A, axis=0)
## array([0.33333333, 0.53333333, 0.33333333, 0.4       ])

Important

Let us repeat, axis=1 does not mean that we get the column means (even though columns constitute the 2nd axis, and we count starting at 0). It denotes the axis along which the matrix is sliced. Sadly, even yours truly sometimes does not get it right.

Exercise 8.1

Given the nhanes_adult_female_bmx_2020 dataset, compute the mean, standard deviation, minimum, and maximum of each body measurement.

We will get back to the topic of the aggregation of multidimensional data in Section 8.4.2 and Section 8.4.3.

8.1.3. Arithmetic, Logical, and Comparison Operations

Recall that for vectors, binary operators such as `+`, `*`, `==`, `<=`, and `&` and similar elementwise functions (e.g., numpy.minimum) can be applied if both inputs are of the same length, for example:

np.array([1, 10, 100, 1000]) * np.array([7, -6, 2, 8])
## array([   7,  -60,  200, 8000])

Alternatively, one input can be a scalar:

np.array([1, 10, 100, 1000]) * -3
## array([   -3,   -30,  -300, -3000])

More generally, a set of rules referred in the numpy manual to as broadcasting describes how this package handles arrays of different shapes.

Important

Generally, for two matrices, their column/row numbers much match or be equal to 1. Also, if one operand is a 1-dimensional array, it will be promoted to a row vector.

Let us explore all the possible cases.

8.1.3.1. Matrix vs Scalar

If one operand is a scalar, then it is going to be propagated over all matrix elements, for example:

(-1)*A
## array([[-0.2, -0.6, -0.4, -0.4],
##        [-0. , -0.2, -0.4, -0.7],
##        [-0.8, -0.8, -0.2, -0.1]])

changes the sign of every element, which is, mathematically, an instance of multiplying a matrix \(\mathbf{X}\) by a scalar \(c\)

\[\begin{split} c\mathbf{X}= \left[ \begin{array}{cccc} cx_{1,1} & cx_{1,2} & \cdots & cx_{1,m} \\ cx_{2,1} & cx_{2,2} & \cdots & cx_{2,m} \\ \vdots & \vdots & \ddots & \vdots \\ cx_{n,1} & cx_{n,2} & \cdots & cx_{n,m} \\ \end{array} \right]. \end{split}\]

Furthermore:

A**2
## array([[0.04, 0.36, 0.16, 0.16],
##        [0.  , 0.04, 0.16, 0.49],
##        [0.64, 0.64, 0.04, 0.01]])

takes the square of each element (this is not the same as matrix-multiply by itself which we cover in the matrix algebra section).

Also:

A >= 0.25
## array([[False,  True,  True,  True],
##        [False, False,  True,  True],
##        [ True,  True, False, False]])

compares each element to 0.25.

8.1.3.2. Matrix vs Matrix

For two matrices of identical sizes, we will be acting on the corresponding elements:

B = np.tri(A.shape[0], A.shape[1])  # just an example
B  # a lower triangular 0-1 matrix
## array([[1., 0., 0., 0.],
##        [1., 1., 0., 0.],
##        [1., 1., 1., 0.]])

And now:

A * B
## array([[0.2, 0. , 0. , 0. ],
##        [0. , 0.2, 0. , 0. ],
##        [0.8, 0.8, 0.2, 0. ]])

multiplies each \(a_{i,j}\) by the corresponding \(b_{i,j}\).

This extends on the idea from algebra that given \(\mathbf{A}\), \(\mathbf{B}\) with n rows and m columns each, the result of \(+\) (or \(-\)) would be for instance:

\[\begin{split} \mathbf{A} + \mathbf{B}= \left[ \begin{array}{cccc} a_{1,1}+b_{1,1} & a_{1,2}+b_{1,2} & \cdots & a_{1,m}+b_{1,m} \\ a_{2,1}+b_{2,1} & a_{2,2}+b_{1,2} & \cdots & a_{2,m}+b_{2,m} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n,1}+b_{n,1} & a_{n,2}+b_{n,2} & \cdots & a_{n,m}+b_{n,m} \\ \end{array} \right]. \end{split}\]

Thanks to the matrix-matrix and matrix-scalar operations we can perform various tests on an per-element basis, e.g.,

(A >= 0.25) & (A <= 0.75)  # logical matrix & logical matrix
## array([[False,  True,  True,  True],
##        [False, False,  True,  True],
##        [False, False, False, False]])
Example 8.2

(*) Figure 8.1 depicts a (filled) contour plot of the Himmelblau’s function, \(f(x,y)=(x^{2}+y-11)^{2}+(x+y^{2}-7)^{2}\), for \(x\in[-5, 5]\) and \(y\in[-4, 4]\). In order to draw it, we have probed 250 points from the two said ranges and called numpy.meshgrid to generate two matrices, both of shape 250 by 250, giving the x- and y-coordinates of all the points on the corresponding a two-dimensional grid. Thanks to this, we were able to use vectorised mathematical operations to compute the values of \(f\) thereon.

x = np.linspace(-5, 5, 250)
y = np.linspace(-4, 4, 250)
xg, yg = np.meshgrid(x, y)
z = (xg**2 + yg - 11)**2 + (xg + yg**2 - 7)**2
plt.contourf(x, y, z, levels=20)
CS = plt.contour(x, y, z, levels=[1, 5, 10, 20, 50, 100, 150, 200, 250])
plt.clabel(CS, colors="black")
plt.show()
../_images/ex-contourf-1.png

Figure 8.1 An example filled contour plot with additional labelled contour lines

In order to understand the result generated by numpy.meshgrid better here are the outputs generated by it for a smaller number of probe points:

x = np.linspace(-5, 5, 3)
y = np.linspace(-4, 4, 5)
xg, yg = np.meshgrid(x, y)
xg
## array([[-5.,  0.,  5.],
##        [-5.,  0.,  5.],
##        [-5.,  0.,  5.],
##        [-5.,  0.,  5.],
##        [-5.,  0.,  5.]])

Here, each column is the same.

yg
## array([[-4., -4., -4.],
##        [-2., -2., -2.],
##        [ 0.,  0.,  0.],
##        [ 2.,  2.,  2.],
##        [ 4.,  4.,  4.]])

In this case, each row is identical. Thanks to this, calling:

(xg**2 + yg - 11)**2 + (xg + yg**2 - 7)**2
## array([[116., 306., 296.],
##        [208., 178., 148.],
##        [340., 170., 200.],
##        [320.,  90., 260.],
##        [340., 130., 520.]])

gives a matrix \(\mathbf{Z}\) such that \(z_{i,j}\) is generated by considering the \(i\)-th element in y and the \(j\)-th item in x, which is exactly what we have desired.

8.1.3.3. Matrix vs Any Vector

An n×m matrix can also be combined with an n×1 column vector:

A * np.array([1, 10, 100]).reshape(-1, 1)
## array([[ 0.2,  0.6,  0.4,  0.4],
##        [ 0. ,  2. ,  4. ,  7. ],
##        [80. , 80. , 20. , 10. ]])

The above propagated the column vector over all columns (left to right).

Similarly, combining with an 1×m row vector:

A + np.array([1, 2, 3, 4]).reshape(1, -1)
## array([[1.2, 2.6, 3.4, 4.4],
##        [1. , 2.2, 3.4, 4.7],
##        [1.8, 2.8, 3.2, 4.1]])

recycles the row vector over all rows (top to bottom).

If one operand is a 1-dimensional array or a list of length m, it will be treated as a row vector.

np.round(A - np.mean(A, axis=0), 3)
## array([[-0.133,  0.067,  0.067, -0.   ],
##        [-0.333, -0.333,  0.067,  0.3  ],
##        [ 0.467,  0.267, -0.133, -0.3  ]])

This resulted in each column mean’s being 0. An explicit .reshape(1, -1) was not necessary.

Mathematically, although it is not necessarily a standard notation, we will allow adding and subtracting row vectors from matrices of compatible sizes:

\[\begin{split} \mathbf{X}+\mathbf{t}= \left[ \begin{array}{cccc} x_{1,1} + t_1 & x_{1,2} + t_2 & \dots & x_{1,m} + t_m \\ x_{2,1} + t_1 & x_{2,2} + t_2 & \dots & x_{2,m} + t_m \\ \vdots & \vdots & \ddots & \vdots \\ x_{n,1} + t_1 & x_{n,2} + t_2 & \dots & x_{n,m} + t_m \\ \end{array} \right]. \end{split}\]

This corresponds to shifting (translating) every row in the matrix.

However, subtracting the row means already requires some extra labour:

A - np.mean(A, axis=1).reshape(-1, 1)
## array([[-0.2  ,  0.2  ,  0.   ,  0.   ],
##        [-0.325, -0.125,  0.075,  0.375],
##        [ 0.325,  0.325, -0.275, -0.375]])

The reader is encouraged to verify that A - np.mean(A, axis=1) would raise an exception.

Exercise 8.3

Standardise, normalise, and min-max scale each column in the nhanes_adult_female_bmx_2020.csv dataset using a single line of code.

8.1.3.4. Row Vector vs Column Vector (*)

As a bonus, let us quickly mention that a row vector combined with a column vector results in an operation’s being performed on each combination of all pairs of elements in the two arrays (i.e., the cross-product; not just the corresponding pairs).

np.arange(1, 8).reshape(1, -1) * np.array([1, 10, 100]).reshape(-1, 1)
## array([[  1,   2,   3,   4,   5,   6,   7],
##        [ 10,  20,  30,  40,  50,  60,  70],
##        [100, 200, 300, 400, 500, 600, 700]])
Exercise 8.4

Check out that numpy.nonzero relies on similar shape broadcasting rules as the binary operators we have discussed here, however, not with regard to all the 3 arguments.

Example 8.5

Himmelblau’s function in Figure 8.1 is only defined by means of arithmetic operators, which all accept the kind of shape broadcasting that we discuss in this section. Therefore, calling numpy.meshgrid in that example was actually not necessary in order to evaluate \(f\) on a grid of points:

x = np.linspace(-5, 5, 3)
y = np.linspace(-4, 4, 5)
xg = x.reshape(1, -1)
yg = y.reshape(-1, 1)
(xg**2 + yg - 11)**2 + (xg + yg**2 - 7)**2
## array([[116., 306., 296.],
##        [208., 178., 148.],
##        [340., 170., 200.],
##        [320.,  90., 260.],
##        [340., 130., 520.]])

See also the sparse parameter in numpy.meshgrid and Figure 12.9 where this function turns out useful after all.

8.1.4. Other Row and Column Transforms (*)

Some functions that we have already discussed in the previous part of this course are equipped with the axis argument, which allows to process each row or column independently, for example:

np.sort(A, axis=1)
## array([[0.2, 0.4, 0.4, 0.6],
##        [0. , 0.2, 0.4, 0.7],
##        [0.1, 0.2, 0.8, 0.8]])

sorts every row (separately). Moreover:

scipy.stats.rankdata(A, axis=0)
## array([[2. , 2. , 2.5, 2. ],
##        [1. , 1. , 2.5, 3. ],
##        [3. , 3. , 1. , 1. ]])

computes the ranks of elements in each column.

A few functions have the default argument axis=-1, which means that they are applied along the last (i.e., columns in the matrix case) axis:

np.diff(A)  # axis=1 here
## array([[ 0.4, -0.2,  0. ],
##        [ 0.2,  0.2,  0.3],
##        [ 0. , -0.6, -0.1]])

However, the aforementioned numpy.mean is amongst the many exceptions to this rule.

Compare the above with:

np.diff(A, axis=0)
## array([[-0.2, -0.4,  0. ,  0.3],
##        [ 0.8,  0.6, -0.2, -0.6]])

which gives the iterated differences for each column separately (along the rows).

If a function (built-in or custom) in not equipped with the axis argument and – instead – it was designed to work with individual vectors, we can propagate it over all the rows or columns by calling numpy.apply_along_axis.

For instance, here is another (have we solved the suggested exercise?) way to compute the column z-scores:

def standardise(x):
    return (x-np.mean(x))/np.std(x)

np.round(np.apply_along_axis(standardise, 0, A), 2)
## array([[-0.39,  0.27,  0.71, -0.  ],
##        [-0.98, -1.34,  0.71,  1.22],
##        [ 1.37,  1.07, -1.41, -1.22]])

Note

(*) Matrices are of course iterable (in the sense of Section 3.4), but in an interesting way. Namely, an iterator traverses through each row in a matrix. Therefore, for example, writing

r1, r2, r3 = A  # A has 3 rows

creates three variables, each representing a separate row in A, second of which being

r2
## array([0. , 0.2, 0.4, 0.7])

8.2. Indexing Matrices

Recall that for 1-dimensional arrays we have four possible choices of indexers (i.e., where performing filtering like x[i]):

  • scalar (extracts a single element),

  • slice (selects a regular subsequence, e.g., every 2nd element or the first 6 items; returns a view on existing data – it does not make an independent copy of the subsetted elements),

  • integer vector (selects elements at given indices),

  • logical vector (select elements that correspond to True in the indexer).

Matrices are two-dimensional arrays, and hence subsetting thereof will require two indexes: we will thus be writing A[i, j] to select rows given by i and columns given by j.

Both i and j can be one of the four above types, so we at least 10 different cases to consider (skipping the symmetric ones).

Important

Generally:

  • each scalar indexer reduces the dimensionality of the subsetted object by 1;

  • slice-slice and slice-scalar indexing returns a view on the existing array, so we need to be careful when modifying the resulting object;

  • usually, indexing returns a submatrix (subblock), which is a combination of elements at given rows and columns, but two flat-vector indexers are vectorised elementwisely instead;

  • indexing with two integer or logical vectors at the same time should be avoided.

Let us look at all the possible scenarios in greater detail.

8.2.1. Slice-Based Indexing

Our favourite example matrix again:

A = np.array([
    [0.2, 0.6, 0.4, 0.4],
    [0.0, 0.2, 0.4, 0.7],
    [0.8, 0.8, 0.2, 0.1]
])

Indexing based on two slices selects a submatrix:

A[::2, 3:]
## array([[0.4],
##        [0.1]])

gives every second row and skips first three columns. Let us stress that the result is still a matrix.

An empty slice selects all elements on the corresponding axis, therefore:

A[:, ::-1]  # all rows, reversed columns
## array([[0.4, 0.4, 0.6, 0.2],
##        [0.7, 0.4, 0.2, 0. ],
##        [0.1, 0.2, 0.8, 0.8]])

simply reverses the order of columns.

8.2.2. Scalar-Based Indexing

Indexing by a scalar selects a given row or column, reducing the dimensionality of the output object.

A[1, :]
## array([0. , 0.2, 0.4, 0.7])

selects the 2nd row and gives a flat vector.

A[0, -1]
## 0.4

yields the element in the first row and last column.

However, we can always use the reshape method to convert the resulting object back to a matrix.

8.2.3. Mixed Boolean/Integer Vector and Scalar/Slice Indexers

A logical and integer vector-like object can also be used for element selection. If the other indexer is a slice or a scalar, the result is quite predictable, for instance:

A[ [0, -1, 0], ::-1 ]
## array([[0.4, 0.4, 0.6, 0.2],
##        [0.1, 0.2, 0.8, 0.8],
##        [0.4, 0.4, 0.6, 0.2]])

selects the first, the last, and the first row again and reverses the order of columns.

A[ A[:, 0] > 0.1, : ]
## array([[0.2, 0.6, 0.4, 0.4],
##        [0.8, 0.8, 0.2, 0.1]])

selects rows such that in the first column the values are greater than 0.1.

A[np.mean(A, axis=1) > 0.35, : ]
## array([[0.2, 0.6, 0.4, 0.4],
##        [0.8, 0.8, 0.2, 0.1]])

selects the rows whose mean is greater than 0.35.

A[np.argsort(A[:, 0]), : ]
## array([[0. , 0.2, 0.4, 0.7],
##        [0.2, 0.6, 0.4, 0.4],
##        [0.8, 0.8, 0.2, 0.1]])

orders the matrix with respect to the values in the first row (all rows permuted in the same way, together).

Exercise 8.6

In the nhanes_adult_female_bmx_2020 dataset, select all the participants whose heights is within their mean ± 2 standard deviations.

8.2.4. Two Vectors as Indexers (*)

With two vectors (logical or integer) things are a tad more horrible, as in this case not only some form shape broadcasting comes into play but also all the headache-inducing exceptions listed in the perhaps not the most clearly written Advanced Indexing section of the numpy manual. Cheer up, however, things in pandas are much worse (see Section 10.5).

For the sake of our maintaining sanity, in practice it is best to stick only to the scenarios below and be extra careful when using two vector indexers.

For two flat integer indexers, we pick elementwisely:

A[ [0, -1, 0, 2, 0], [1, 2, 0, 2, 1] ]
## array([0.6, 0.2, 0.2, 0.2, 0.6])

yields A[0, 1], A[-1, 2], A[0, 0], A[2, 2], and A[0, 1].

In order to select a submatrix using integer indexes, it is best to make sure that the first indexer is a column vector, and the second is a row vector (or some object like the two said ones, e.g., compatible lists of lists). Further, if indexing involves logical vectors, it is best to convert them to integer ones first (e.g., by calling numpy.nonzero).

The above transformations can be done automatically via the numpy.ix_ function, which is always the safest choice:

A[ np.ix_(np.mean(A, axis=1) > 0.35, [0, 2, 3, 0]) ]
## array([[0.2, 0.4, 0.4, 0.2],
##        [0.8, 0.2, 0.1, 0.8]])

Alternatively, we can always apply indexing twice instead (which we will anyway be forced to do in pandas whenever selecting rows by number and columns by names is required).

A[np.mean(A, axis=1) > 0.45, :][:, [0, 2, 3, 0]]
## array([[0.8, 0.2, 0.1, 0.8]])

8.2.5. Views on Existing Arrays (*)

Only indexing involving two slices or a slice and a scalar returns a view on an existing array.

For example:

B = A[:, ::2]
B
## array([[0.2, 0.4],
##        [0. , 0.4],
##        [0.8, 0.2]])

Now B and A share memory. Therefore, by modifying B in-place, e.g.,

B *= -1

the changes will be visible in A as well:

A
## array([[-0.2,  0.6, -0.4,  0.4],
##        [-0. ,  0.2, -0.4,  0.7],
##        [-0.8,  0.8, -0.2,  0.1]])

This is time and memory efficient, but might lead to some unexpected results if we are not focused enough. We have been warned.

8.2.6. Adding and Modifying Rows and Columns

With slice/scalar-based indexers, rows/columns/individual elements can be replaced by new content in a natural way:

A[:, 0] = A[:, 0]**2

With numpy arrays, however, brand new rows or columns cannot be added using the index operator. Instead, the whole array needs to be created from scratch using, for example, one of the functions discussed in Section 7.1.4.

Here is an example where we add a new column to the A matrix being a function of the first column:

A = np.column_stack((A, np.sqrt(A[:, 0])))
A
## array([[ 0.04,  0.6 , -0.4 ,  0.4 ,  0.2 ],
##        [ 0.  ,  0.2 , -0.4 ,  0.7 ,  0.  ],
##        [ 0.64,  0.8 , -0.2 ,  0.1 ,  0.8 ]])

8.3. Matrix Multiplication, Dot Products, and the Euclidean Norm

Matrix algebra is at the core of most methods used in data analysis and matrix multiply is one of the most fundamental operations therein (e.g., [DFO20, Gen17]).

Given \(\mathbf{A}\in\mathbb{R}^{n\times p}\) and \(\mathbf{B}\in\mathbb{R}^{p\times m}\), their multiply is a matrix \(\mathbf{C}=\mathbf{A}\mathbf{B}\in\mathbb{R}^{n\times m}\) such that \(c_{i,j}\) is the sum of the \(i\)-th row in \(\mathbf{A}\) and the \(j\)-th column in \(\mathbf{B}\) multiplied elementwisely:

\[ c_{i,j} = a_{i, 1} b_{1, j} + a_{i, 2} b_{2, j} + \dots + a_{i, p} b_{p, j} = \sum_{k=1}^p a_{i,k} b_{k, j} \]

for \(i=1,\dots,n\) and \(j=1,\dots,m\).

For example:

A = np.array([
    [1, 0, 1],
    [2, 2, 1],
    [3, 2, 0],
    [1, 2, 3],
    [0, 0, 1],
])
B = np.array([
    [1, -1, 0, 0],
    [0,  4, 1, 3],
    [2,  0, 3, 1],
])
C = A @ B  # or: A.dot(B)
C
## array([[ 3, -1,  3,  1],
##        [ 4,  6,  5,  7],
##        [ 3,  5,  2,  6],
##        [ 7,  7, 11,  9],
##        [ 2,  0,  3,  1]])

Mathematically, we can write the above as:

\[\begin{split} \left[ \begin{array}{ccc} 1 & 0 & 1 \\ 2 & 2 & 1 \\ 3 & 2 & 0 \\ \color{red}{1} & \color{red}{2} & \color{red}{3} \\ 0 & 0 & 1 \\ \end{array} \right] \, \left[ \begin{array}{cccc} 1 & -1 & \color{red}{0} & 0 \\ 0 & 4 & \color{red}{1} & 3 \\ 2 & 0 & \color{red}{3} & 1 \\ \end{array} \right] = \left[ \begin{array}{cccc} 3 & -1 & 3 & 1 \\ 4 & 6 & 5 & 7 \\ 3 & 5 & 2 & 6 \\ 7 & 7 & \color{red}{11} & 9 \\ 2 & 0 & 3 & 1 \\ \end{array} \right]. \end{split}\]

For example, the element in the 4th row and 3rd column, \(c_{4,3}\) takes the 4th row in the left matrix \(\mathbf{a}_{4,\cdot}=[1\, 2\, 3]\) and the 3rd column in the right matrix \(\mathbf{b}_{\cdot,3}=[0\, 1\, 3]^T\) (they are marked in red), multiplies the corresponding elements and computes their sum, i.e., \(c_{4,3}=1\cdot 0 + 2\cdot 1 + 3\cdot 3 = 11\).

Important

Matrix multiplication can only be performed on two matrices of compatible sizes – the number of columns in the left matrix must match the number of rows in the right operand.

Another example:

A = np.array([
    [1, 2],
    [3, 4]
])
I = np.array([
    [1, 0],
    [0, 1]
])
A @ I  # or A.dot(I)
## array([[1, 2],
##        [3, 4]])

We matrix-multiplied \(\mathbf{A}\) by the identity matrix \(\mathbf{I}\), which is a neutral element of the said operation, and hence the result is identical to \(\mathbf{A}\).

Important

In most textbooks, just like in this one, \(\mathbf{A}\mathbf{B}\) almost always denotes the matrix multiplication which is a very different operation to the elementwise multiplication.

Compare the above to:

A * I  # elementwise multiplication
## array([[1, 0],
##        [0, 4]])
Exercise 8.7

Show that \(\mathbf{A}^T \mathbf{A}\) gives the matrix that consists of the dot products of all the pairs of columns in \(\mathbf{A}\) and \(\mathbf{A} \mathbf{A}^T\) stores the dot products of all the pairs of rows.

Exercise 8.8

(*) Show that \((\mathbf{A}\mathbf{B})^T=\mathbf{B}^T \mathbf{A}^T\). Also notice that, typically, matrix multiplication is not commutative.

Note

Matrix multiplication gives a convenient means for denoting sums of products of corresponding elements in many pairs of vectors, which we refer to as dot products.

Given two vectors \(\boldsymbol{x}, \boldsymbol{y} \in \mathbb{R}^p\), their dot (or scalar) product is given by

\[ \boldsymbol{x} \cdot \boldsymbol{y} = \sum_{i=1}^p x_i y_i. \]

Thus, in matrix multiplication terms, if \(\mathbf{x}\) is a row vector and \(\mathbf{y}^T\) is a column vector, then the above can be written as \(\mathbf{x} \mathbf{y}^T\). The result is a single number.

In particular, a dot product of a vector and itself,

\[ \boldsymbol{x} \cdot \boldsymbol{x} = \sum_{i=1}^p x_i^2, \]

is the square of the Euclidean norm of \(\boldsymbol{x}\), which – as we have said in Section 5.3.2.3 – we use to measure the length of a vector,

\[ \|\boldsymbol{x}\| = \sqrt{\sum_{i=1}^p x_i^2} =\sqrt{\boldsymbol{x} \cdot \boldsymbol{x}} =\sqrt{\mathbf{x} \mathbf{x}^T}. \]

The Euclidean norm fulfils (amongst others) the condition that \(\|\boldsymbol{x}\|=0\) if and only if \(\boldsymbol{x}=\boldsymbol{0}=(0,0,\dots,0)\). The same of course holds for its square.

In Section 9.3.2 we will see that matrix multiplication can be used as a way to express certain geometrical transformations of points in a dataset, e.g., scaling and rotating.

Also, in Section 9.3.3 we briefly discuss the concept of the inverse of a matrix and in Section 9.3.4 its singular value decomposition.

8.5. Exercises

Exercise 8.10

Does numpy.mean(A, axis=0) compute rowwise or columnwise means?

Exercise 8.11

How does shape broadcasting work? List the most common pairs of shape cases when performing arithmetic operations.

Exercise 8.12

What are the possible matrix indexing schemes and how do they behave?

Exercise 8.13

Which kinds of matrix indexers return a view on an existing array?

Exercise 8.14

(*) How to select a submatrix comprised of the first and the last row and the first and the last column?

Exercise 8.15

Why appropriate data preprocessing is required when computing the Euclidean distance between points?

Exercise 8.16

What is the relationship between the dot product, the Euclidean norm, and the Euclidean distance?

Exercise 8.17

What is a centroid? How is it defined by means of the Euclidean distance between the points in a dataset?

Exercise 8.18

When K-d tress or other spatial search data structures might be better than a brute-force search with scipy.spatial.distance.cdist?