3.1.2. A Quick Guide to Numpy¶

At the top of your python code, load Numpy like this:

import numpy as np

Now you can use numpy throughout your file via the np object.

Tip

Jupyter Lab’s “Help” menu has a link to Numpy documentation.
You can access an element of a numpy array just like a list:
```
x=np.arange(1,5,1)
x[1]
```
If the array is a matrix, x[row,col] works.
Whirlwind has a more comprehensive dive into splitting, slicing, and other numpy operations.

3.1.2.1. Common methods:¶

Tip

Copy this table into your cookbook notes folder.

Function	Description
`np.array([user defined list, or lists of lists])`	creates an array or matrix
`np.ones(how many)` and `np.ones([rows,cols])`	same but all elements are 1
`np.zeros(how many)` and `np.zeros([rows,cols])`	same but all elements are 0
`np.arange(start,end,stepsize)`	creates array, note that the array will not include any elements `>=end`
`np.linspace(from,to,# of elements)`	creates array covering the range specified
`np.eye(#)`	creates an identity matrix of size `#`
`np.concatenate([x, y])`	combines arrays `x` and `y`
`np.nan`	is a NaN object (e.g. like a missing element in a data table) We will definitely use this in pandas
`np.ceil(#)`, `np.floor(#)`	if #=3.4, ceil will return 4, and floor will return 3.
`np.max(x)`, `np.min(x)`, `np.average(x)`, `np.median(x)`	many statistical operations work as you would expect
`np.reshape(x,[rows,cols])`	works like it looks
`np.random.<dist>`	can draw random numbers from many distributions use tab autocompletion to see all the options (type `np.random.` and then hit TAB) YOU MUST NEVER EVER EVER EVER EVER DRAW RANDOM NUMBERS WITHOUT SETTING A SEED!!!

3.1.2.2. A warning about “random” numbers¶

Warning

Let me repeat that: YOU MUST NEVER EVER EVER EVER EVER DRAW RANDOM NUMBERS WITHOUT SETTING A SEED!!!

If you don’t, your code will produce different outputs every single time you run it. And other people will get different answers too!

And the point of code is that it is reproducible.

import numpy as np
np.set_printoptions(2) # just to control # of decimal places shown

np.random.seed(0) # this is how you set a seed
print("original random draw:    ",np.random.rand(4))
print("now it's different:      ",np.random.rand(4))
print("now it's different:      ",np.random.rand(4))
np.random.seed(0)
print("now it's the same again: ",np.random.rand(4))

original random draw:     [0.55 0.72 0.6  0.54]
now it's different:       [0.42 0.65 0.44 0.89]
now it's different:       [0.96 0.38 0.79 0.53]
now it's the same again:  [0.55 0.72 0.6  0.54]

3.1.2.3. Using Numpy within Pandas¶

Because pandas is built on top of numpy, all of these numpy functions work on pandas objects.

Numpy 🤝 Pandas

3.1.2.4. The dark side of vectors and `numpy`¶

You can’t vectorize every operation :(
Numpy is a great solution for the issue of speed, but not for the issue of memory.

Numpy can be prohibitive, memory-wise: When you run an array operation, Python creates the entire array and puts it into memory, then runs it. A vector of length 1,000,000,000,000 is huge and requires substantial memory to create. By contrast, you can execute for i in range(1,000,000,000,000): pass without causing an issue, because Python never created that vector, it just iterated over numbers. This is because range(#) is a “generator” and not an explicit object.

LeDataSciFi-2024