3.2.8. Exercises¶
Try these and compare to a friend. There are many ways to solve each, so if your approaches differ, explain it to each other!
The answers are available.1
import pandas as pd
import pandas_datareader as pdr # IF NECESSARY, from terminal: pip install pandas_datareader
import datetime
import numpy as np
start = datetime.datetime(2017, 1, 1) # you can specify start and end dates this way
end = datetime.datetime(2021, 1, 27)
macro_df = pdr.data.DataReader(['GDP','CPIAUCSL','UNRATE'], 'fred', start, end)
3.2.8.1. Part 1¶
During class, I used this dataframe to go over Pandas vocab, and we show how to
access 1 variable (note:
pd
calls this a “series” object, which is a 1D object instead of a 2D object)access multiple vars
access, print, and change column names
access, print, reset, and set the index
Questions:
Q0: Do each of the four new golden rules for initial data exploration, from the lecture.
Q1: What is the second series above?
Q2: What is the frequency of the series?
Q3: What is the average ANNUAL GDP, based on the data?
# do your work here
3.2.8.2. Part 2¶
Q4: Download the annual real gdp from 1960 to 2018 from FRED and compute the average annual percent change
Q5: Compute the average gdp percent change within each decade
# do your work here
3.2.8.3. Part 3¶
First, I’ll do the work to load January data on unemployment, the Case-Shiller housing index, and median household income in three states (CA/MI/PA).
Then, we’ll answer some questions
Tip
Run this block yourself, line-by-line, and part-by-part to figure out what I’m doing.
For example, just run the first three lines to download the data, then run
macro_data.resample('Y')
Try other arguments inside resample to see what works (and what it does) and what doesn’t work.
# LOAD DATA AND CONVERT TO ANNUAL
start = 1990 # pandas datareader can infer these are years
end = 2018
macro_data = pdr.data.DataReader(['CAUR','MIUR','PAUR', # unemployment
'LXXRSA','DEXRSA','WDXRSA', # case shiller index in LA, Detroit, DC (no PA available!)
'MEHOINUSCAA672N','MEHOINUSMIA672N','MEHOINUSPAA672N'], #
'fred', start, end)
macro_data = macro_data.resample('Y').first() # get's the first observation for each variable in a given year
# CLEAN UP THE FORMATING SOMEWHAT
macro_data.index = macro_data.index.year
print("\n\n DATA BEFORE FORMATTING: \n\n")
print(macro_data[:20]) # see how the data looks now? ugly variable names, but its an annual dataset at least
macro_data.columns=pd.MultiIndex.from_tuples([
('Unemployment','CA'),('Unemployment','MI'),('Unemployment','PA'),
('HouseIdx','CA'),('HouseIdx','MI'),('HouseIdx','PA'),
('MedIncome','CA'),('MedIncome','MI'),('MedIncome','PA')
])
print("\n\n DATA AFTER FORMATTING: \n\n")
print(macro_data[:20]) # this is a dataset that is "wide", and now
# the column variable names have 2 levels - var name,
# and unit/state that variable applies to
Q6: for each decade and state, report the average annual CHANGE (level, not percent) in unemployment
Q7: for each decade and state, report the average annual PERCENT CHANGE in house prices and household income
# do your work here