3.2.8. Exercises

Try these and compare to a friend. There are many ways to solve each, so if your approaches differ, explain it to each other!

import pandas as pd
import pandas_datareader as pdr # IF NECESSARY, from terminal: pip install pandas_datareader 
import datetime
import numpy as np

start = datetime.datetime(2017, 1, 1) # you can specify start and end dates this way
end = datetime.datetime(2021, 1, 27)
macro_df = pdr.data.DataReader(['GDP','CPIAUCSL','UNRATE'], 'fred', start, end)

3.2.8.1. Part 1

During class, I used this dataframe to go over Pandas vocab, and we show how to

  • access 1 variable (note: pd calls this a “series” object, which is a 1D object instead of a 2D object)

  • access multiple vars

  • access, print, and change column names

  • access, print, reset, and set the index

Questions:

  • Q0: Do each of the four new golden rules for initial data exploration, from the lecture.

  • Q1: What is the second series above?

  • Q2: What is the frequency of the series?

  • Q3: What is the average ANNUAL GDP, based on the data?

# do your work here

3.2.8.2. Part 2

  • Q4: Download the annual real gdp from 1960 to 2018 from FRED and compute the average annual percent change

  • Q5: Compute the average gdp percent change within each decade

# do your work here

3.2.8.3. Part 3

First, I’ll do the work to load January data on unemployment, the Case-Shiller housing index, and median household income in three states (CA/MI/PA).

Then, we’ll answer some questions

Tip

Run this block yourself, line-by-line, and part-by-part to figure out what I’m doing.

For example, just run the first three lines to download the data, then run

macro_data.resample('Y')

Try other arguments inside resample to see what works (and what it does) and what doesn’t work.

# LOAD DATA AND CONVERT TO ANNUAL

start = 1990 # pandas datareader can infer these are years
end = 2018
macro_data = pdr.data.DataReader(['CAUR','MIUR','PAUR', # unemployment 
                                  'LXXRSA','DEXRSA','WDXRSA', # case shiller index in LA, Detroit, DC (no PA  available!)
                                  'MEHOINUSCAA672N','MEHOINUSMIA672N','MEHOINUSPAA672N'], #  
                                 'fred', start, end)
macro_data = macro_data.resample('Y').first() # get's the first observation for each variable in a given year

# CLEAN UP THE FORMATING SOMEWHAT

macro_data.index = macro_data.index.year
print("\n\n DATA BEFORE FORMATTING: \n\n")
print(macro_data[:20]) # see how the data looks now? ugly variable names, but its an annual dataset at least
macro_data.columns=pd.MultiIndex.from_tuples([
    ('Unemployment','CA'),('Unemployment','MI'),('Unemployment','PA'),
    ('HouseIdx','CA'),('HouseIdx','MI'),('HouseIdx','PA'),
    ('MedIncome','CA'),('MedIncome','MI'),('MedIncome','PA')
    ])
print("\n\n DATA AFTER FORMATTING: \n\n")
print(macro_data[:20]) # this is a dataset that is "wide", and now 
                       # the column variable names have 2 levels - var name, 
                       # and unit/state that variable applies to
  • Q6: for each decade and state, report the average annual CHANGE (level, not percent) in unemployment

  • Q7: for each decade and state, report the average annual PERCENT CHANGE in house prices and household income

# do your work here