Introduction
- NumPy stands for numerical Python and it turns data into numbers that the computer can analyze to find patterns
- We use NumPy because it’s fast and it’s optimized using C in the background
Data Types and Attributes
ndarray
: n-dimensional array
.ndim
: number of dimensions.dtype
: datatype.shape
: tells you the shape (row, column, beyond).size
: how many elements in the array
import numpy as np
# 1 dimensional array
a1 = np.array([1,2,3])
# 2 dimensional array
a2 = np.array([[1,2,3],
[4,5,6]])
# 3 dimensional array
a3 = np.array([[[1,2,3],
[4,5,6],
[7,8,9]],
[[10,11,12],
[13,14,15],
[16,17,18]]])
Creating arrays
shift + tab
in jupyter notebook will show you the info from the documentationnp.ones(shape)
: array in the shape specified filled with onesnp.zeros(shape)
: array in the shape specified filled with zerosnp.arange(start,stop,step)
np.random.randint(low,high,shape)
np.random.random(shape)
Random Seed
- Note: random numbers are pseudo-random
np.random.seed(seed=num)
: allows you to randomly generate things in a reproducible way
Viewing arrays and matrices
np.unique(array)
: returns the unique values of the array- You can index and slice the arrays like you would normally in Python lists
- Tip: the last number in the shape is usually the inner-most array size
Manipulating & comparing arrays
arr1 + arr2
ornp.add(arr1, arr2)
: adds the two arrays together- Most standard arithmetic is compatible with arrays as well
- Important note: Not all shapes are compatible for arithmetic though so keep that in mind!
- The term for shape compatibility is broadcasting
- Dimensions are compatible if:
- they are equal to each other
- one of the numbers is 1
Aggregation
- Aggregation is performing the same operation on a number of objects
sum(arr1)
vsnp.sum(arr1)
- Use the Python operation on Python datatypes and the NumPy operations on NumPy operations
np.mean()
np.max()
np.min()
np.std()
: standard deviation- Standard deviation: measure of how spread out a group of numbers is from the mean (square root of the variance)
np.var()
: variance- Variance: measure of the average degree to which each number is different to the mean (ie: Higher variance = wider range of numbers)
Reshaping & transposing arrays
.reshape(shape)
: reshapes the array to the desired shape.T
: transposes the array meaning that it swaps the axises
Dot product
np.dot(arr1, arr2)
: dot product aka matrix multiplication- Requirement: the inside numbers of the shape have to be equal
- Result: produces a matrix that’s the shape of the outside numbers
Sorting arrays
np.sort(arr)
: sorts each rownp.argsort(arr)
: tells you the index the value will be when sortednp.argmin(arr,axis)
,np.argmax(arr,axis)
: finds the minimum or maximum value in the specified axis. Axises are “flipped”:axis=0
looks at the columnsaxis=1
looks at the rows
Practical example: turn an image into NumPy array
from matplotlib.image import imread
# turn panda image into an array
panda = imread("file-name.png")
Other methods & functions
np.linespace()
returns evenly spaced numbers over a specified intervalnp.random.radn(size)
creates a data set that has a normal distribution