3.1.2. A Quick Guide to Numpy¶
At the top of your python code, load Numpy like this:
import numpy as np
Now you can use numpy throughout your file via the np
object.
Tip
Jupyter Lab’s “Help” menu has a link to Numpy documentation.
You can access an element of a numpy array just like a list:
x=np.arange(1,5,1) x[1]
If the array is a matrix,
x[row,col]
works.Whirlwind has a more comprehensive dive into splitting, slicing, and other numpy operations.
3.1.2.1. Common methods:¶
Tip
Copy this table into your cookbook notes folder.
Function |
Description |
---|---|
|
creates an array or matrix |
|
same but all elements are 1 |
|
same but all elements are 0 |
|
creates array, note that the array will not include any elements |
|
creates array covering the range specified |
|
creates an identity matrix of size |
|
combines arrays |
|
is a NaN object (e.g. like a missing element in a data table) |
|
if #=3.4, ceil will return 4, and floor will return 3. |
|
many statistical operations work as you would expect |
|
works like it looks |
|
can draw random numbers from many distributions |
3.1.2.2. A warning about “random” numbers¶
Warning
Let me repeat that: YOU MUST NEVER EVER EVER EVER EVER DRAW RANDOM NUMBERS WITHOUT SETTING A SEED!!!
If you don’t, your code will produce different outputs every single time you run it. And other people will get different answers too!
And the point of code is that it is reproducible.
import numpy as np
np.set_printoptions(2) # just to control # of decimal places shown
np.random.seed(0) # this is how you set a seed
print("original random draw: ",np.random.rand(4))
print("now it's different: ",np.random.rand(4))
print("now it's different: ",np.random.rand(4))
np.random.seed(0)
print("now it's the same again: ",np.random.rand(4))
original random draw: [0.55 0.72 0.6 0.54]
now it's different: [0.42 0.65 0.44 0.89]
now it's different: [0.96 0.38 0.79 0.53]
now it's the same again: [0.55 0.72 0.6 0.54]
3.1.2.3. Using Numpy within Pandas¶
Because pandas is built on top of numpy, all of these numpy functions work on pandas objects.
Numpy 🤝 Pandas
3.1.2.4. The dark side of vectors and numpy
¶
You can’t vectorize every operation :(
Numpy is a great solution for the issue of speed, but not for the issue of memory.
Numpy can be prohibitive, memory-wise: When you run an array operation, Python creates the entire array and puts it into memory, then runs it. A vector of length 1,000,000,000,000
is huge and requires substantial memory to create. By contrast, you can execute for i in range(1,000,000,000,000): pass
without causing an issue, because Python never created that vector, it just iterated over numbers. This is because range(#)
is a “generator” and not an explicit object.