3.2.3. Common Functions/Methods¶
Note
Some pandas
methods work on a dataframe, like df.assign(feet=df['height']//12)
. These are methods that are altering a dataframe, and you use them like this: <dfname>.<method>(<arguments>)
.
Note: Delete the <
and >
after you type in the dataframe name, method, and arguments. Those are just indicating the text inside them is a placeholder.
Some pandas
methods are a called on the pandas module itself (e.g. pd.merge
). These are methods that are doing tasks outside a dataframe (like loading or merging datasets), and you use them like this: pd.<method>(<arguments>)
Remember the SHIFT+TAB trick to see function help!
Type import pandas as pd
then run that to load pandas. Then type pd.merge(
like you want to merge to dataframes, except you don’t remember the arguments to use. So type SHIFT+TAB
to see the function’s documentation!
Loading and saving data
Function |
Pandas method |
Example (see official syntax for more) |
---|---|---|
loading data |
read_csv, read_dta, etc |
|
saving data |
to_csv, to_dta, etc |
|
Manipulating data ⭐
Warning
df.assign(feet=df['height']//12)
will not add a “feet” variable to df
permanently. This is true of almost all dataframe methods (e.g. filter, rename, …). If you want to save the new variable, you need to type df = df.assign(feet=df['height']//12)
. See the next page for more.
Note
Remember: replace df
below with the name of the dataframe you’re working on!
Function |
Pandas method |
Examples (see official syntax for more) |
---|---|---|
new variables or replace existing |
assign |
|
filter or get subset of observations |
⭐ query / loc / iloc |
|
get subset of columns |
filter |
|
rename columns |
rename |
|
sort |
sort_values |
|
do an operation on groups of observations |
groupby ⭐ |
|
summary stats |
agg / pivot_table |
|
summary stats on groups |
agg / pivot_table |
|
create a variable based on its group |
agg+transform |
|
delete column |
drop |
|
use non-pd function on df |
pipe |
|
combine dataframes |
merge |
|
change time frequency of data |
resample |
|
window/rolling calculations |
window |
|
Reshaping data and changing index
Function |
Pandas method |
Example (see official syntax for more) |
---|---|---|
convert wide to long/tall (“stack!”) |
stack |
|
convert long/tall to wide (“unstack!”) |
unstack |
|
turn a variable column into the index |
set_index |
|
turn the index into a variable |
reset_index |
|
Statistical operations
These functions can be called for a variable “col1” in this form:
<dfname>['columnname'].<function>()
or for all numerical columns at once using<dfname>.<function>()
.These functions work within groups. ⭐
Function |
Description |
---|---|
count |
Number of non-null observations |
sum |
Sum of values |
mean |
Mean of values |
mad |
Mean absolute deviation |
median |
Arithmetic median of values |
min |
Minimum |
max |
Maximum |
mode |
Mode |
abs |
Absolute Value |
prod |
Product of values |
std |
Unbiased standard deviation |
var |
Unbiased variance |
sem |
Unbiased standard error of the mean |
skew |
Unbiased skewness (3rd moment) |
kurt |
Unbiased kurtosis (4th moment) |
quantile |
Sample quantile (value at %) |
cumsum |
Cumulative sum |
cumprod |
Cumulative product |
cummax |
Cumulative maximum |
cummin |
Cumulative minimum |
nunique |
How many unique values? |
value_counts |
How many of each unique value are there? |