3.1.2. A Quick Guide to Numpy

At the top of your python code, load Numpy like this:

import numpy as np

Now you can use numpy throughout your file via the np object.

Tip

  1. Jupyter Lab’s “Help” menu has a link to Numpy documentation.

  2. You can access an element of a numpy array just like a list:

    x=np.arange(1,5,1)
    x[1]
    

    If the array is a matrix, x[row,col] works.

  3. Whirlwind has a more comprehensive dive into splitting, slicing, and other numpy operations.

3.1.2.1. Common methods:

Tip

Copy this table into your cookbook notes folder.

Function

Description

np.array([user defined list, or lists of lists])

creates an array or matrix

np.ones(how many) and np.ones([rows,cols])

same but all elements are 1

np.zeros(how many) and np.zeros([rows,cols])

same but all elements are 0

np.arange(start,end,stepsize)

creates array, note that the array will not include any elements >=end

np.linspace(from,to,# of elements)

creates array covering the range specified

np.eye(#)

creates an identity matrix of size #

np.concatenate([x, y])

combines arrays x and y

np.nan

is a NaN object (e.g. like a missing element in a data table)

We will definitely use this in pandas

np.ceil(#), np.floor(#)

if #=3.4, ceil will return 4, and floor will return 3.

np.max(x), np.min(x), np.average(x), np.median(x)

many statistical operations work as you would expect

np.reshape(x,[rows,cols])

works like it looks

np.random.<dist>

can draw random numbers from many distributions

use tab autocompletion to see all the options (type np.random. and then hit TAB)

YOU MUST NEVER EVER EVER EVER EVER DRAW RANDOM NUMBERS WITHOUT SETTING A SEED!!!

3.1.2.2. A warning about “random” numbers

Warning

Let me repeat that: YOU MUST NEVER EVER EVER EVER EVER DRAW RANDOM NUMBERS WITHOUT SETTING A SEED!!!

If you don’t, your code will produce different outputs every single time you run it. And other people will get different answers too!

And the point of code is that it is reproducible.

import numpy as np
np.set_printoptions(2) # just to control # of decimal places shown

np.random.seed(0) # this is how you set a seed
print("original random draw:    ",np.random.rand(4))
print("now it's different:      ",np.random.rand(4))
print("now it's different:      ",np.random.rand(4))
np.random.seed(0)
print("now it's the same again: ",np.random.rand(4))
original random draw:     [0.55 0.72 0.6  0.54]
now it's different:       [0.42 0.65 0.44 0.89]
now it's different:       [0.96 0.38 0.79 0.53]
now it's the same again:  [0.55 0.72 0.6  0.54]

3.1.2.3. Using Numpy within Pandas

Because pandas is built on top of numpy, all of these numpy functions work on pandas objects.

Numpy 🤝 Pandas

3.1.2.4. The dark side of vectors and numpy

  1. You can’t vectorize every operation :(

  2. Numpy is a great solution for the issue of speed, but not for the issue of memory.

Numpy can be prohibitive, memory-wise: When you run an array operation, Python creates the entire array and puts it into memory, then runs it. A vector of length 1,000,000,000,000 is huge and requires substantial memory to create. By contrast, you can execute for i in range(1,000,000,000,000): pass without causing an issue, because Python never created that vector, it just iterated over numbers. This is because range(#) is a “generator” and not an explicit object.