{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Common tasks\n", "\n", "```{important}\n", "\n", "Yes, this page is kind of long. But that's because it has a lot of useful info!\n", "\n", "Use the page's table of contents to the right to jump to what you're looking for. \n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reshaping data\n", "\n", "In the [shape of data](02b_pandasVocab) page, I explained the concept of wide vs. tall data with this example: " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Tall:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FirmYearSales
0Ford200010
1Ford200112
2Ford200214
3Ford200316
4GM200011
5GM200113
6GM200213
7GM200315
\n", "
" ], "text/plain": [ " Firm Year Sales\n", "0 Ford 2000 10\n", "1 Ford 2001 12\n", "2 Ford 2002 14\n", "3 Ford 2003 16\n", "4 GM 2000 11\n", "5 GM 2001 13\n", "6 GM 2002 13\n", "7 GM 2003 15" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import pandas as pd\n", "\n", "df = (pd.Series({ ('Ford',2000):10,\n", " ('Ford',2001):12,\n", " ('Ford',2002):14,\n", " ('Ford',2003):16,\n", " ('GM',2000):11,\n", " ('GM',2001):13,\n", " ('GM',2002):13,\n", " ('GM',2003):15})\n", " .to_frame()\n", " .rename(columns={0:'Sales'})\n", " .rename_axis(['Firm','Year'])\n", " .reset_index()\n", " )\n", "print(\"Tall:\")\n", "display(df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{note}\n", "To reshape dataframes, you have to work with index and column names. \n", "```\n", "\n", "So before we use `stack` and `unstack` here, put the firm and year into the index." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "tall = df.set_index(['Firm','Year'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### To convert a tall dataframe to wide: `df.unstack()`.\n", "\n", "If your index has multiple levels, the level parameter is used to pick which to unstack. \"0\" is the innermost level of the index. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "Unstack (make it shorter+wider) on level 0/Firm:\n", "\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Sales
FirmFordGM
Year
20001011
20011213
20021413
20031615
\n", "
" ], "text/plain": [ " Sales \n", "Firm Ford GM\n", "Year \n", "2000 10 11\n", "2001 12 13\n", "2002 14 13\n", "2003 16 15" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "Unstack (make it shorter+wider) on level 1/Year:\n", "\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Sales
Year2000200120022003
Firm
Ford10121416
GM11131315
\n", "
" ], "text/plain": [ " Sales \n", "Year 2000 2001 2002 2003\n", "Firm \n", "Ford 10 12 14 16\n", "GM 11 13 13 15" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "print(\"\\n\\nUnstack (make it shorter+wider) on level 0/Firm:\\n\") \n", "display(tall.unstack(level=0))\n", "print(\"\\n\\nUnstack (make it shorter+wider) on level 1/Year:\\n\") \n", "display(tall.unstack(level=1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### To convert a wide dataframe to tall/long: `df.stack()`.\n", "\n", "```{tip}\n", "Pay attention after reshaping to the order of your index variables and how they are sorted. \n", "```" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "Stack it back (make it tall): wide_year.stack()\n", "\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Sales
YearFirm
2000Ford10
GM11
2001Ford12
GM13
2002Ford14
GM13
2003Ford16
GM15
\n", "
" ], "text/plain": [ " Sales\n", "Year Firm \n", "2000 Ford 10\n", " GM 11\n", "2001 Ford 12\n", " GM 13\n", "2002 Ford 14\n", " GM 13\n", "2003 Ford 16\n", " GM 15" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "Year-then-firm doesn't make much sense.\n", "Reorder to firm-year: wide_year.stack().swaplevel()\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Sales
FirmYear
Ford200010
GM200011
Ford200112
GM200113
Ford200214
GM200213
Ford200316
GM200315
\n", "
" ], "text/plain": [ " Sales\n", "Firm Year \n", "Ford 2000 10\n", "GM 2000 11\n", "Ford 2001 12\n", "GM 2001 13\n", "Ford 2002 14\n", "GM 2002 13\n", "Ford 2003 16\n", "GM 2003 15" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "Year-then-firm sorting make much sense.\n", "Sort to firm-year: wide_year.stack().swaplevel().sort_index()\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Sales
FirmYear
Ford200010
200112
200214
200316
GM200011
200113
200213
200315
\n", "
" ], "text/plain": [ " Sales\n", "Firm Year \n", "Ford 2000 10\n", " 2001 12\n", " 2002 14\n", " 2003 16\n", "GM 2000 11\n", " 2001 13\n", " 2002 13\n", " 2003 15" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# save the wide df above to this name for subseq examples\n", "wide_year = tall.unstack(level=0) \n", "\n", "print(\"\\n\\nStack it back (make it tall): wide_year.stack()\\n\") \n", "display(wide_year.stack())\n", "print(\"\\n\\nYear-then-firm doesn't make much sense.\\nReorder to firm-year: wide_year.stack().swaplevel()\") \n", "display(wide_year.stack().swaplevel())\n", "print(\"\\n\\nYear-then-firm sorting make much sense.\\nSort to firm-year: wide_year.stack().swaplevel().sort_index()\") \n", "display(wide_year.stack().swaplevel().sort_index())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Beautiful!**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lambda (in `assign` or after `groupby`)\n", "\n", "You will see this inside pandas chains a lot: `lambda x: someFunc(x)`, e.g.:\n", "- `.assign(lev = lambda x: (x['dltt']+x['dlc'])/x['at'] )`\n", "- `.groupby('industry').assign(avglev = lambda x: x['lev'].mean() )`\n", "\n", "Q1: What is that \"lambda\"?\n", "\n", "A1: A lambda function is an anonymous function that is usually one line and usually defined without a name. You write it like this:\n", "\n", "```py\n", "lambda : \n", "```\n", "\n", "Here, you can see how the lambda function takes inputs and creates output the same way a function does:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "15" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dumb_prog = lambda a: a + 10 # I added \"dumb_prog =\" to name the lambda function and use it\n", "dumb_prog(5)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "15" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# we could define a fnc to do the exact same thing\n", "def dumb_prog(a):\n", " return a + 10\n", "dumb_prog(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Q2: Why is that lambda there? \n", "\n", "A2: We use lambdas when we need a function for a short period of time and when the name of the function doesn't matter. \n", "\n", "\n", " \n", "In the example above, `[df].groupby('industry').assign(avglev = lambda x: x['lev'].mean() )`, \n", "1. groupby **splits** the dataframe into groups, \n", "2. then, within each group, it **applies** a function (here: the mean), \n", "3. and then returns a new dataframe with one observation for each group (the average leverage for the industry). Visually, this **split-apply-combine**[^ref] process looks like this:\n", "\n", "![](https://jakevdp.github.io/PythonDataScienceHandbook/figures/03.08-split-apply-combine.png)\n", "\n", "[^ref]: (This figure is yet another resource I'm borrowing from the awesome [PythonDataScienceHandbook](https://jakevdp.github.io/PythonDataScienceHandbook). \n", "\n", "But notice! The `.assign()` portion is working on these tiny split up pieces of the dataframe created by `df.groupby('industry')`. Those pieces are dataframe objects that don't have names! \n", "\n", "**So lambda functions let us refer to an unnamed dataframe object!** When you type `.assign(newVar = lambda x: someFunc(x))`, `x` is the object (\"some df object\") that assign is working on. Ta da!\n", "\n", "```python\n", "# common syntax within pandas\n", ".assign( = lambda : ) \n", "\n", "# often, tempname is just \"x\" for short\n", ".assign( = lambda x: ) \n", "\n", "# example:\n", ".assign(lev = lambda x: (x['dltt']+x['dlc'])/x['at'] )\n", "\n", "```\n", "\n", "```{note}\n", "It turns out that lambda functions are very useful in python programming, and not just within pandas. For example, some functions take functions as inputs, like [csnap()](#printing-inside-of-chains), `map()`, and `filter()`, and lambda functions let us give them custom functions quickly. \n", "\n", "But pandas is where we will use lambda functions most in this class.\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## `.transform()` after groupby\n", "\n", "Sometimes you get a statistic for a group, but you want that statistic in every single row of your original dataset.\n", "\n", "But `groupby` creates a new dataframe that is smaller, with only one row per row.\n", "\n", "```{admonition}\n", ":class: tip\n", "\n", "Use `.transform()` after `groupby` to \"cast\" those statistics back to the original \n", "\n", "```\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
data
key
A1
A4
B2
B5
C3
C6
\n", "
" ], "text/plain": [ " data\n", "key \n", "A 1\n", "A 4\n", "B 2\n", "B 5\n", "C 3\n", "C 6" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import pandas as pd \n", "import numpy as np\n", "df = pd.DataFrame({'key':[\"A\",'B','C',\"A\",'B','C'],\n", " 'data':np.arange(1,7)}).set_index('key').sort_index()\n", "\n", "display(df) # the input" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
data
key
A5
B7
C9
\n", "
" ], "text/plain": [ " data\n", "key \n", "A 5\n", "B 7\n", "C 9" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# groupby().sum() shrinks the dataset\n", "display(df.groupby(level='key')['data'].sum()\n", " .to_frame() ) # just added this line bc df prints prettier than series" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
data
key
A5
A5
B7
B7
C9
C9
\n", "
" ], "text/plain": [ " data\n", "key \n", "A 5\n", "A 5\n", "B 7\n", "B 7\n", "C 9\n", "C 9" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# groupby().transform(sum) does NOT shrink the dataset\n", "\n", "df.groupby(level='key').transform(sum) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One last trick: Let's add that new variable to the original dataset!" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
datagroupsum
key
A15
A45
B27
B57
C39
C69
\n", "
" ], "text/plain": [ " data groupsum\n", "key \n", "A 1 5\n", "A 4 5\n", "B 2 7\n", "B 5 7\n", "C 3 9\n", "C 6 9" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# option 1: create the var\n", "df['groupsum'] = df.groupby(level='key').transform(sum)\n", "\n", "# option 2: create the var with assign (can be used inside chains)\n", "df = df.assign(groupsum = df.groupby(level='key')['data'].transform(sum))\n", "\n", "display(df) \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using non-pandas functions inside chains \n", "\n", "One problem with writing chains on dataframes is that you can only use methods that work on the object (a dataframe) that is getting chained. \n", "\n", "So for example, you've formatted dataframe to plot. You can't directly add a seaborn function to the chain: _Seaborn functions are methods of the package seaborn, not the dataframe._ (It's `sns.lmplot`, not `df.lmplot`.) \n", "\n", "`.pipe()` allows you to hand a dataframe to functions that don't work directly on dataframes. \n", "\n", "\n", "````{admonition} The syntax of .pipe()\n", "```python\n", "df.pipe(<'outside function'>, \n", " <'if the first parameter of the outside function isnt the df, '\n", " 'the name of the parameter that is expecting the dataframe'>,\n", " <'any other parameters youd give the outside function'>\n", "```\n", "\n", "Note that the object after the pipe command is run might not be a dataframe anymore! It's whatever object the piped function produces!\n", "````\n", "\n", "### Example 1\n", "\n", "[From one of the pandas devs:](https://tomaugspurger.github.io/method-chaining)\n", "\n", "> ```python\n", "> jack_jill = pd.DataFrame()\n", "> (jack_jill.pipe(went_up, 'hill')\n", "> .pipe(fetch, 'water')\n", "> .pipe(fell_down, 'jack')\n", "> .pipe(broke, 'crown')\n", "> .pipe(tumble_after, 'jill')\n", "> )\n", "> ```\n", "> \n", "> This really is just right-to-left function execution. The first argument to pipe, a callable, is called with the DataFrame on the left as its first argument, and any additional arguments you specify.\n", "> \n", "> I hope the analogy to data analysis code is clear. Code is read more often than it is written. When you or your coworkers or research partners have to go back in two months to update your script, having the story of raw data to results be told as clearly as possible will save you time.\n", "\n", "### Example 2\n", "\n", "[From Steven Morse:](https://stmorse.github.io/journal/tidyverse-style-pandas.html)\n", "\n", "> ```python\n", "> (sns.load_dataset('diamonds')\n", "> .query('cut in [\"Ideal\", \"Good\"] & \\\n", "> clarity in [\"IF\", \"SI2\"] & \\\n", "> carat < 3')\n", "> .pipe((sns.FacetGrid, 'data'),\n", "> row='cut', col='clarity', hue='color',\n", "> hue_order=list('DEFGHIJ'),\n", "> height=6,\n", "> legend_out=True)\n", "> .map(sns.scatterplot, 'carat', 'price', alpha=0.8)\n", "> .add_legend())\n", "> ```\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Printing inside of chains\n", "\n", "```{tip}\n", "One thing about chains, is that sometimes it's hard to know what's going on within them without just commenting out all the code and running it bit-by-bit. \n", "\n", "This function, `csnap` (meaning \"C\"hain \"SNAP\"shot) will let you print messages from inside the chain, by exploiting the `.pipe()` function we just covered!\n", "```\n", "\n", "![](https://media.giphy.com/media/Buy7YdhkyHBCM/source.gif)\n", "\n", "Copy this into your code:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "def csnap(df, fn=lambda x: x.shape, msg=None):\n", " \"\"\" Custom Help function to print things in method chaining. \n", " Will also print a message, which helps if you're printing a bunch of these, so that you know which csnap print happens at which point.\n", " Returns back the df to further use in chaining.\n", " \n", " Usage examples - within a chain of methods:\n", " df.pipe(csnap)\n", " df.pipe(csnap, lambda x: )\n", " df.pipe(csnap, msg=\"Shape here\")\n", " df.pipe(csnap, lambda x: x.sample(10), msg=\"10 random obs\")\n", " \"\"\"\n", " if msg:\n", " print(msg)\n", " display(fn(df))\n", " return df\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An example of this in use:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Shape before describe\n" ] }, { "data": { "text/plain": [ "(6, 2)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Shape after describe and pick one var\n" ] }, { "data": { "text/plain": [ "(8,)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Random sample of df at point #3\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dataones
max6.01
min1.01
\n", "
" ], "text/plain": [ " data ones\n", "max 6.0 1\n", "min 1.0 1" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dataonestwosthrees
count6.000000123
mean3.500000123
std1.870829123
min1.000000123
25%2.250000123
50%3.500000123
75%4.750000123
max6.000000123
\n", "
" ], "text/plain": [ " data ones twos threes\n", "count 6.000000 1 2 3\n", "mean 3.500000 1 2 3\n", "std 1.870829 1 2 3\n", "min 1.000000 1 2 3\n", "25% 2.250000 1 2 3\n", "50% 3.500000 1 2 3\n", "75% 4.750000 1 2 3\n", "max 6.000000 1 2 3" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(df\n", " .pipe(csnap, msg=\"Shape before describe\")\n", " .describe()['data'] # get the distribution stats of a variable (I'm just doing something to show csnap off)\n", " .pipe(csnap, msg=\"Shape after describe and pick one var\") # see, it prints a message from within the chain!\n", " .to_frame()\n", " .assign(ones = 1)\n", " .pipe(csnap, lambda x: x.sample(2), msg=\"Random sample of df at point #3\") # see, it prints a message from within the chain! \n", " .assign(twos=2,threes=3)\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prettier pandas output\n", "\n", "A few random things:\n", "\n", "- Want to change the order of rows in an output table? `.reindex()`\n", "- Want to format the numbers shown by pandas?\n", " 1. Permanent: Add this line of code to the top of your file: `pd.set_option('display.float', '{:.2f}'.format)`\n", " 2. Temp: Add `style.format` to the end of your table command. E.g.: `df.describe().style.format(\"{:.2f}\")`\n", "- Want to control the number of columns / rows pandas shows? \n", " 1. `pd.set_option('display.max_columns', 50)`\n", " 2. `pd.set_option('display.max_rows', 50)`\n", "- More formatting controls: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.set_option.html " ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.13" } }, "nbformat": 4, "nbformat_minor": 4 }