import numpy as npNumPy Operations
Programming for Data Science
Element-wise Arithmetic
NumPy arrays can be transformed with with arithmetic operations.
These are all element-wise operations.
Let’s start with a couple of \(2\)-D arrays.
arr1 = np.array([[1., 2., 3.], [4., 5., 6.]])
arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])
arr1, arr2(array([[1., 2., 3.],
[4., 5., 6.]]),
array([[ 0., 4., 1.],
[ 7., 2., 12.]]))
If we multiply these two matrices, NumPy performs multiplication on each pair of cells with the same index or coordinate.
arr1 * arr2array([[ 0., 8., 3.],
[28., 10., 72.]])
You can think of it this way:
coordinate arr1 arr2 arr1 * arr2
0, 0 1. 0. 0.
0, 1 2. 4. 8.
0, 2 3. 1. 3.
...
Of course, this works for the other operations, too.
arr1 - arr2array([[ 1., -2., 2.],
[-3., 3., -6.]])
arr2 / arr1array([[0. , 2. , 0.33333333],
[1.75 , 0.4 , 2. ]])
1 / arr1array([[1. , 0.5 , 0.33333333],
[0.25 , 0.2 , 0.16666667]])
arr1 ** arr2array([[1.00000000e+00, 1.60000000e+01, 3.00000000e+00],
[1.63840000e+04, 2.50000000e+01, 2.17678234e+09]])
arr2 ** 0.5array([[0. , 2. , 1. ],
[2.64575131, 1.41421356, 3.46410162]])
arr2 > arr1array([[False, True, False],
[ True, False, True]])
Broadcasting
What happens when you try to perform an element-wise operation on two arrays of different shape?
NumPy will convert a low-dimensional array into a high-dimensional array to allow the operation to take place.
This is called broadcasting.
Let’s look at an example.
foo = np.ones((6,4))fooarray([[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]])
If we multiply it by \(5\), the scalar is converted into an array of the same shape as foo with the value \(5\) “broadcast” to populate the entire array.
foo * 5array([[5., 5., 5., 5.],
[5., 5., 5., 5.],
[5., 5., 5., 5.],
[5., 5., 5., 5.],
[5., 5., 5., 5.],
[5., 5., 5., 5.]])
We actually saw this already when we looked at slices.
If we want to multiply an array by a vector, the vector is broadcast to become a 2D array.
foo * np.array([5, 10, 6, 8])array([[ 5., 10., 6., 8.],
[ 5., 10., 6., 8.],
[ 5., 10., 6., 8.],
[ 5., 10., 6., 8.],
[ 5., 10., 6., 8.],
[ 5., 10., 6., 8.]])
Note that NumPy can’t always make the adjustment:
foo * np.array([5, 10])--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[14], line 1 ----> 1 foo * np.array([5, 10]) ValueError: operands could not be broadcast together with shapes (6,4) (2,)
Boolean Indexing
Another crucial topic in NumPy is boolean indexing.
In brief, you can pass a boolean array to the array indexer (i.e. the [] suffix) and it will return only those cells that are True.
This is a technique we will use frequently in Pandas and R.
Let’s assume that we have two related arrays:
nameswhich holds the names associated with the data in each row, or observations, of a table.datawhich holds the data associated with each feature of a table.
There are \(7\) observations and \(4\) features.
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
namesarray(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'], dtype='<U4')
data = np.random.randn(7, 4)
dataarray([[-1.0441337 , -0.8075191 , -1.89069483, -0.9007478 ],
[ 1.57167643, -0.50688096, 0.78120542, 0.22558685],
[-1.45769989, -0.49824512, -1.2056539 , -0.43596557],
[-0.33496411, -0.25575133, -0.61952407, 0.88984652],
[-0.59001244, -2.23411426, -0.31123889, -0.86358338],
[ 1.23662366, -0.90041871, 0.63348956, -0.58677799],
[-3.09362317, 0.92042332, 0.53013723, 0.24224835]])
A comparison operation for an array returns an array of booleans.
Let’s see which names are 'Bob':
names == 'Bob'array([ True, False, False, True, False, False, False])
Now, this boolean expression can be passed to an array indexer to the data:
data[names == 'Bob']array([[-1.0441337 , -0.8075191 , -1.89069483, -0.9007478 ],
[-0.33496411, -0.25575133, -0.61952407, 0.88984652]])
Along the second axis, we can use a slice or integer to select data.
data[names == 'Bob', 2:]array([[-1.89069483, -0.9007478 ],
[-0.61952407, 0.88984652]])
data[names == 'Bob', 3]array([-0.9007478 , 0.88984652])
If you know SQL, this is like the query:
SELECT col3, col4 FROM data WHERE name = 'Bob'Negation
Here are some examples of negated boolean operations being applied.
bix = names != 'Bob'
bixarray([False, True, True, False, True, True, True])
data[bix]array([[ 1.57167643, -0.50688096, 0.78120542, 0.22558685],
[-1.45769989, -0.49824512, -1.2056539 , -0.43596557],
[-0.59001244, -2.23411426, -0.31123889, -0.86358338],
[ 1.23662366, -0.90041871, 0.63348956, -0.58677799],
[-3.09362317, 0.92042332, 0.53013723, 0.24224835]])
data[~bix] # Back to Bobarray([[-1.0441337 , -0.8075191 , -1.89069483, -0.9007478 ],
[-0.33496411, -0.25575133, -0.61952407, 0.88984652]])
data[~(names == 'Bob')]array([[ 1.57167643, -0.50688096, 0.78120542, 0.22558685],
[-1.45769989, -0.49824512, -1.2056539 , -0.43596557],
[-0.59001244, -2.23411426, -0.31123889, -0.86358338],
[ 1.23662366, -0.90041871, 0.63348956, -0.58677799],
[-3.09362317, 0.92042332, 0.53013723, 0.24224835]])
Note that we don’t use not but instead the tilde ~ sign to negate (flip) a value.
Nor do we use and and or; instead we use & and |.
Also, expressions join by these operators must be in parentheses.
mask = (names == 'Bob') | (names == 'Will')
mask
data[mask]array([[-1.0441337 , -0.8075191 , -1.89069483, -0.9007478 ],
[-1.45769989, -0.49824512, -1.2056539 , -0.43596557],
[-0.33496411, -0.25575133, -0.61952407, 0.88984652],
[-0.59001244, -2.23411426, -0.31123889, -0.86358338]])
We can also do things like this:
data[data < 0] = 0
dataarray([[0. , 0. , 0. , 0. ],
[1.57167643, 0. , 0.78120542, 0.22558685],
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0.88984652],
[0. , 0. , 0. , 0. ],
[1.23662366, 0. , 0.63348956, 0. ],
[0. , 0.92042332, 0.53013723, 0.24224835]])
And we can alter data with boolean indexing, just as we did with slices.
data[names != 'Joe'] = 7
dataarray([[7. , 7. , 7. , 7. ],
[1.57167643, 0. , 0.78120542, 0.22558685],
[7. , 7. , 7. , 7. ],
[7. , 7. , 7. , 7. ],
[7. , 7. , 7. , 7. ],
[1.23662366, 0. , 0.63348956, 0. ],
[0. , 0.92042332, 0.53013723, 0.24224835]])