3.2.8. Exercises

Try these and compare to a friend. There are many ways to solve each, so if your approaches differ, explain it to each other!

import pandas as pd
import pandas_datareader as pdr # IF NECESSARY, from terminal: pip install pandas_datareader 
import datetime
import numpy as np

start = datetime.datetime(2017, 1, 1) # you can specify start and end dates this way
end = datetime.datetime(2021, 1, 27)
macro_df = pdr.data.DataReader(['GDP','CPIAUCSL','UNRATE'], 'fred', start, end) Part 1

During class, I used this dataframe to go over Pandas vocab, and we show how to

  • access 1 variable (note: pd calls this a “series” object, which is a 1D object instead of a 2D object)

  • access multiple vars

  • access, print, and change column names

  • access, print, reset, and set the index


  • Q0: Do each of the four new golden rules for initial data exploration, from the lecture.

  • Q1: What is the second series above?

  • Q2: What is the frequency of the series?

  • Q3: What is the average ANNUAL GDP, based on the data?

# do your work here Part 2

  • Q4: Download the annual real gdp from 1960 to 2018 from FRED and compute the average annual percent change

  • Q5: Compute the average gdp percent change within each decade

# do your work here Part 3

First, I’ll do the work to load January data on unemployment, the Case-Shiller housing index, and median household income in three states (CA/MI/PA).

Then, we’ll answer some questions


Run this block yourself, line-by-line, and part-by-part to figure out what I’m doing.

For example, just run the first three lines to download the data, then run


Try other arguments inside resample to see what works (and what it does) and what doesn’t work.


start = 1990 # pandas datareader can infer these are years
end = 2018
macro_data = pdr.data.DataReader(['CAUR','MIUR','PAUR', # unemployment 
                                  'LXXRSA','DEXRSA','WDXRSA', # case shiller index in LA, Detroit, DC (no PA  available!)
                                  'MEHOINUSCAA672N','MEHOINUSMIA672N','MEHOINUSPAA672N'], #  
                                 'fred', start, end)
macro_data = macro_data.resample('Y').first() # get's the first observation for each variable in a given year


macro_data.index = macro_data.index.year
print("\n\n DATA BEFORE FORMATTING: \n\n")
print(macro_data[:20]) # see how the data looks now? ugly variable names, but its an annual dataset at least
print("\n\n DATA AFTER FORMATTING: \n\n")
print(macro_data[:20]) # this is a dataset that is "wide", and now 
                       # the column variable names have 2 levels - var name, 
                       # and unit/state that variable applies to
  • Q6: for each decade and state, report the average annual CHANGE (level, not percent) in unemployment

  • Q7: for each decade and state, report the average annual PERCENT CHANGE in house prices and household income

# do your work here