The Math Needed - Explained

Sigma Notation

In this section we'll take a look at sigma notation (often called summation notation) - something we'll see a lot of throughout this book. It's something you've seen before - whether you understand it or not. However, in this section I'll clear it all up.

The sigma operator, \(\sum\), is used to describe sums with multiple terms. It's that simple - it's a way of writing long sums in a concise way. To make this better understood, let's take a look at some Python code. The below snippet shouldn't be anything new to you:

total = 0
for n in range(1, 5):
    total += n**2

The above snippet of code loops through the nums list and sums up the square of each number in that list. We can write this in sigma notation as follows: \[ \sum_{n=1}^{4} {n^{2}} \] The n=1 below the operator says we should start at 1. In sigma notation n is called the index. The 4 above the operator says we should stop after 4 (remember the second argument in the range function indicates that is the value we should stop at and we shouldn't enter the loop body if the number equals this). The expression following the operator is the expression we should substitute our index into and evaluate - summing up the result of each evaluation on each index.

The above summation expression can be written in long form as: \[ 1^2 + 2^2 + 3^2 + 4^2 \] We can have expressions with multiple variables too. In this case they are unknowns and remain as such: \[ \sum_{n=1}^{3} {\frac{k^2}{n + 1}} \] This expression can be written in long form as: \[ \begin{align*} \sum_{n=1}^{3} {\frac{k^2}{n + 1}} &= \frac{k^2}{1 + 1} + \frac{k^2}{2 + 1} + \frac{k^2}{3 + 1}\ &= {\frac{k^2}{2}} + {\frac{k^2}{3}} + {\frac{k^2}{4}} \end{align*} \] It really is that simple and there's not much more I can speak to on it. There will be some questions at the end of this chapter to help you practice a few times, just so you can be certain you understand it.

Matrices

In this section we'll give a brief introduction to matrices and how to do some basic matrix math. We don't need to go into great detail, just the basics is enough to get you through this book with a good understanding of what's going on.

Introduction to Matrices

A matrix (matrices for plural) is a rectangular arrangement of numbers into rows and columns. For example, the below matrix has 4 rows and 3 columns. \[ A = \begin{bmatrix} 3 & 8 & -4\\ 7 & -9 & 14\\ -3 & 1 & 11\\ 7 & 4 & 6\\ \end{bmatrix} \] The dimensions of a matrix tells us about it's size: the number of rows and the number of columns, in that order.

Since matrix A has four rows and three columns, we write its dimensions as 4 x 3, pronounced "four by three".

An element of the matrix is simply an entry in the matrix. Each element in the matrix is identified by naming the row and column in which it appears. For example. In the above matrix A, the element \(a_{2, 1}\) is the element in the second row, first column - or - 7. The element at \(a_{4, 3}\) is the element in the fourth row, third column - or - 6.

These are represented in code as an array of arrays (multidimensional arrays) and they're something you've probably worked with before even if you didn't know it. We can represent the above matrix in Python as follows:

A = [[3, 8, -4],
     [7, -9, 14],
     [-3, 1, 11],
     [7, 4, 6]]

# OUTPUT: 6 - remember indexes start from 0 in code, rather than 1 
print(A[3, 2])

If we're ever translating a matrix index from the way it's done in maths such as \(a_{4, 3}\), we must subtract 1 from each index when writing in code. So, from maths - \(a_{x, y}\) - we index as \(a_{x-1, y-1}\) in code.

To generalise, we represent an \(m \times n\) matrix as: \[ A_{m,n} = \begin{bmatrix} a_{1,1} & a_{1,2} & \cdots & a_{1,n} \\ a_{2,1} & a_{2,2} & \cdots & a_{2,n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m,1} & a_{m,2} & \cdots & a_{m,n} \end{bmatrix} \]

Transpose of a Matrix

In this section we'll look at the property of matrices called the transpose. The transpose of a matrix is simply a flipped version of the original matrix. We can transpose a matrix by switching its rows with its columns.

We denote the transpose of a matrix \(A\) as \(A^T\). For example, let's take the below matrix: \[ A = \begin{bmatrix} 1 & 2 & 3\\ 4 & 5 & 6\\ \end{bmatrix} \] It's transpose would be: \[ A^T = \begin{bmatrix} 1 & 4\\ 2 & 5\\ 3 & 6\\ \end{bmatrix} \] If we take a look at the matrix from the previous section: \[ A = \begin{bmatrix} 3 & 8 & -4\\ 7 & -9 & 14\\ -3 & 1 & 11\\ 7 & 4 & 6\\ \end{bmatrix} \] Its transpose would be: \[ A = \begin{bmatrix} 3 & 7 & -3 & 7\\ 8 & -9 & 1 & 4\\ -4 & 14 & 11 & 6\\ \end{bmatrix} \] As you may have noticed, if we take a matrix of dimension 4 x 3, then the dimensions of the transpose would be 3 x 4. In a more general sense if we take a matrix of dimensions \(n \times m\), then its transpose will be of dimensions \(m \times n\).

This will become really important later on so be sure you understand how the transpose is done. You'll have a chance to practice at the end of this chapter.

Matrix Multiplication

In this section we'll look at how to multiply matrices. This is probably the most important section out of the matrix sections to understand, we're going to be doing a lot of it. For code samples here, I will be using a library called numpy.

To begin, lets take the matrix from earlier: \[ A = \begin{bmatrix} 1 & 2 & 3\\ 4 & 5 & 6\\ \end{bmatrix} \] We can multiply this matrix by a scalar. You can think of a scalar as a regular old number, \(2\) for example.

If we want to multiply the matrix \(A\) by the value \(2\), we write this as: \[ B = 2 \begin{bmatrix} 1 & 2 & 3\\ 4 & 5 & 6\\ \end{bmatrix} \] To multiply these, we simply multiply each entry in the matrix by \(2\) as shown below: \[ \begin{align*} B &= \begin{bmatrix} 1 \times 2 & 2 \times 2 & 3 \times 2\\ 4 \times 2 & 5 \times 2 & 6 \times 2\\ \end{bmatrix}\\ &= \begin{bmatrix} 2 & 4 & 6\\ 8 & 10 & 12\\ \end{bmatrix}\\ \end{align*} \] When we multiply two matrices together we call this the dot product of two matrices. There is however one limitation on this. The number of columns in the first matrix must be the same as the number of rows in the second matrix. Let's a take a matrix that has dimensions \(2 \times 3\). We can only multiply this by a matrix that has \)3\) rows. It can however, have as many columns as we like in the second matrix. We can also have as many rows as we like in the first matrix.

I can't stress this enough, so I'll say it again. If we want to multiply two matrices, the number of columns in the first matrix must be the same as the number of rows in the second.

The dimensions of the resulting matrix is the number of rows in the first matrix and the number of columns in the second matrix. For example, if we multiplied two matrices of dimensions \(2 \times 3\) and \(3 \times 2\) then the resulting matrix will have dimensions \(2 \times 2\).

In maths, we write the dot product of matrices \(A\) and \(B\) as \((A \cdot B)\).

If we take the two matrices shown below: \[ A = \begin{bmatrix} 1 & 2 & 3\\ 4 & 5 & 6\\ \end{bmatrix} , B = \begin{bmatrix} 1 & 2\\ 3 & 4\\ 5 & 6\\ \end{bmatrix} \] We multiply them as follows: \[ AB = \begin{bmatrix} 1 & 2 & 3\\ 4 & 5 & 6\\ \end{bmatrix} \begin{bmatrix} 1 & 2\\ 3 & 4\\ 5 & 6\\ \end{bmatrix} \] Firstly, we take the first column of \(B\) which contains 1, 3, and 5 and we multiply element wise by the first row in \(A\) which contains 1, 2, and 3. We add each of these up. The is shown below: \[ (1 \times 1) + (2 \times 3) + (3 \times 5) = 22 \]

\[ AB = \begin{bmatrix} 22 & \\ \\ \end{bmatrix} \]

This is the first entry of the first row in \(AB\) at index \(AB_{1, 1}\) in our new matrix.

Next, we take the second column of \(B\) and multiply it by the first row of \(A\) which is shown below: \[ (1 \times 2) + (2 \times 4) + (3 \times 6) = 28 \]

\[ AB = \begin{bmatrix} 22 & 28\\ \\ \end{bmatrix} \]

This gives us the second entry of the first row of \(AB\) at index \(AB_{1, 2}\). Now we move to the second row of \(A\) and go back to the first column of \(B\) and multiply them together element wise as follows: \[ (4 \times 1) + (5 \times 3) + (6 \times 5) = 49 \]

\[ AB = \begin{bmatrix} 22 & 28\\ 49 & \\ \end{bmatrix} \]

This gives us the first entry in our second row of \(AB\) at index \(AB_{2, 1}\). You can probably guess what we're going to do next at this point. That's right - we multiply element wise the elements in the second row of \(A\) and the elements in the second column of \(B\) and add them together: \[ (4 \times 2) + (5 \times 4) + (6 \times 6) = 64 \]

\[ AB = \begin{bmatrix} 22 & 28\\ 49 & 64\\ \end{bmatrix} \]

I think that was straight forward enough. Let's take a look at another example. What if we tried to do the following? \[ CD = \begin{bmatrix} 1 & 2 & 3\\ 4 & 5 & 6\\ \end{bmatrix} \begin{bmatrix} 7 & 8 & 9\\ 10 & 11 & 12\\ \end{bmatrix} \] We can't do this because our dimensions don't match up: we can't multiply a \(2 \times 3\) matrix by a \(2 \times 3\). However, we can use a technique we learned earlier to allow this multiplication to happen. We can transpose the matrix \(D\).

This would give us: \[ (C \cdot D^T) = \begin{bmatrix} 1 & 2 & 3\\ 4 & 5 & 6\\ \end{bmatrix} \begin{bmatrix} 7 & 10\\ 8 & 11\\ 9 & 12\\ \end{bmatrix} \] Now we have two matrices of dimensions \(2 \times 3\) and \(3 \times 2\) which we can multiply. I'm not going to walk through the process again, but we multiply the first row of \(C\) by the first column of \(D^T\), then the first row of \(C\) by the second column of \(D^T\), then the second row of \(C\) by the first column of \)D^T\) and finally the second row of \(C\) by the second column of \(D^T\). Each time we multiply element wise and add the results up to give us the entry. Try it yourself and compare it to the below result: \[ (C.D^T) = \begin{bmatrix} 50 & 68\ 122 & 167\ \end{bmatrix} \]

It can be a lengthy process but it is made simple for us in Python by using the numpy library. We can use the dot() function from numpy to do this. Let's take a look:

import numpy as np 

C = np.array([[1, 2, 3],
              [4, 5, 6]]) 
D = np.array([[7, 10],
              [8, 11],
              [9, 12]]) 
np.dot(C,D)

#OUTPUT
[[ 50,  68],
 [122, 167]]

The array() function creates an array that numpy can use to perform matrix math on. In the above example I had already transposed the matrix, however we can use numpy to transpose the matrix \(D\) from our above example:

C = np.array([[1, 2, 3],
              [4, 5, 6]])

D = np.array([[7, 8, 9],
              [10, 11, 12]])
dT = D.tranpose() # We can do D.T for shorthand i.e. dT = D.T

np.dot(C, dT)

# OUTPUT
[[ 50,  68],
 [122, 167]]

As you can see, we simply transpose D and perform a dot product.

Logarithms

I'm not going to go into a huge amount of detail in this section but in their simplest form, a logarithm answers the question - how many of one number do we multiply to get another number? This, by definition means it is the inverse function to the exponentiation. Much like division is the inverse to multiplication, or subtraction is the inverse of addition. For example \(\log_{2}64 = 6\), since \(2^6\) = 64.

More generally, \(\log_{b}(x) = y\) exactly if \(b^y = x\) and \(x > 0\) and \(b > 0\) and \(b \neq 1\). The conditions on this may be a little confusing at first but I encourage you to try some logarithms that break those conditions to gain a better understanding. I find that playing around like that when learning something new gives me a much better understanding.

Let's take a look at one final example. We'll take the following exponential: \[ 5^4 = 625 \] Now we want to ask: "how many 5s need to be multiplied together to get 625?". We solve this using the logarithm: \[ \log_{5}(625) = 4 \] You should use a calculator when doing these! That's all I have to say on them for now and we'll see them later on in a more practical context. There are also some practice questions at the end of this section.