Resources, tutorials, data¶
Note
Anything that is bolded/underlined below is also considered essential.
If you have any favorite resources you like, or found helpful, please let me know and I’ll add them!
THE MOST ESSENTIAL RESOURCES
Help: Google, Stack Overflow, Github help, JupyterLab documentation, Python help
Cheat sheets to bookmark/print! Better yet, download these to your Class Notes repo/folder, and put them inside a folder called “Cheatsheats”!
Included in this folder: python basics, jupyter notebook, importing data, numpy, pandas, seaborn, and scikit-learn
Supercharged finance packages (for use on the class project or after the semester), including
Packages to easily build apps/dashboards
A free way to get most of the functionality of a Bloomberg terminal
A coding co-pilot
Jobs, internships, grad school
Note: The tips for econ grad school apply to other fields, including finance.
Data Sources
To make this easy to update, I’ve put the best data sources in this Google Sheet. This will be especially useful when we get to thinking about projects!
Python
Note
If you use any of these and LIKE or DISLIKE them, PLEASE let me know so I can guide future students to resources.
Comprehensive, free class: Kaggle Learn, covers basic programming concepts, python, Pandas, viz, ML modeling, and more via tutorials with short exercises.
Essential: A whirlwind tour of python
Lessons 3 - 5 of the official tutorial
datacamp.com has many self guided lessons
Codeacademy is great for beginners just trying to write python code, but the code is not run your own machine or in JupyterLab, and isn’t saved for later reference. Still, it’s an easy on-ramp and you can probably blast through the key lessons before a free trial expires (currently=7 days).
The best compilation of coding resources on the web, including:
Data Science
Visualization
Essential: Kaggle’s Data viz tutorial is excellent. It has reproducible code and data, using python.
Essential: An Economist’s Guide to Visualizing Data is excellent as well.
Essential: Data Visualization: A practical introduction, by Kieran Healy especially discusses the “whys” of visualization in a smart way. The walkthroughs are in R, not python, however.
Github, Git, and Version control
Getting started on GitHub and a twitter length description of how a project flows
The most thorough yet simple walkthrough of Git and Github use on the web. Applies to python use for the most part.
Practicing ML
The datasets below are great for learning how to use ML techniques - they tend to be small and manageable enough to work with manually.
Scikit (python package) can read in some data, which has data on Boston real estate, wine, a larger california housing dataset
Essential: Pandas can read in a LOT of useful data! Data providers include: Federal Reserve (“FRED”), Ken French, NASDAQ, OECD, Qunadl, TSP, World Bank, and more!
ML competitions with serious prizes at drivendata.org
This competition was interesting. You could start trying to analyze it here. This has a good example of the process you might follow. After you’re done, you can see the winner’s code and discussion of the winning approach
Essential: kaggle.com has ML competitions, some FAQs, tutorials, data and competitions
Real estate data, a tutorial exploring that data, and a pass at a model
Philly based data would be fun. Here is real estate, one option for data, seems ok, N=805
Predict box office for movies. VaultML claims they can do this by reading the screenplays and using textual analysis tools
UC Irvine has a data repo, some of these are available via scikit package
Predicting where the wine is from (wine/location <— easy starter challenge (where is the wine from?)
Books
Range, by David Epstein is a very interesting book generally, and it touches on prediction skill too
Superforecasters. Here is a decent free summary