3.2.4. Temporary vs. Permanent Methods


This page is especially useful in the context of developing code. While I’m figuring out a step, I rarely save the output to my original data - I use temporary methods and print the output. Once I know it’s right, I make the changes permanent and proceed. Temporary Methods

When you use a method on an object (e.g. a DataFrame) in python, <object>.<method>(<args>) performs the method on the object and returns the modified object, as you can see here:

import pandas as pd

# define a df
df = pd.DataFrame({'height':[72,60,68],'gender':['M','F','M'],'weight':[175,110,150]})

# call method on df and print - df.assign yields the modified object!
height gender weight feet
0 72 M 175 6
1 60 F 110 5
2 68 M 150 5

This is useful if you want to alter the variable temporarily (e.g. for a graph, or to just print it out, like I literally just did!).


But the object in memory wasn’t changed by the code above when I used df.<method>. See, here is the df in memory, and it wasn’t changed:

print(df)  # see, the object has no feet! this is the original obj!
   height gender  weight
0      72      M     175
1      60      F     110
2      68      M     150 Permanent changes


If you want to change the object permanently, you have two options1

# option 1: explicitly define the df as the prior df after the method was called
# here, that means to add "df = " before the df.method 
df = df.assign(feet1=df['height']//12) 

# option 2: define a new feature of the df
# here, "df['newcolumnname'] = " (some operation)

print(df) # both of these added to obj in memory
   height gender  weight  feet1  feet2
0      72      M     175      6      6
1      60      F     110      5      5
2      68      M     150      5      5


You can also do some pandas operations “in place”, without explicitly writing df = at the start of the line. However, I discourage this for reasons I won’t belabor here.