Resources, Tutorials, and Data Sources¶
Anything that is bolded/underlined below is also considered essential.
If you have any favorite resources you like, or found helpful, please let me know!
THE MOST ESSENTIAL RESOURCES
Cheat sheets to bookmark/print! Better yet, download these to your Class Notes repo/folder, and put them inside a folder called “Cheatsheats”!
Included in this folder: python basics, jupyter notebook, importing data, numpy, pandas, seaborn, and scikit-learn
Jobs, internships, grad school
Note: The tips for econ grad school apply to other fields, including finance.
If you use any of these and LIKE or DISLIKE them, PLEASE let me know so I can guide future students to resources.
Essential: A whirlwind tour of python
Lessons 3 - 5 of the official tutorial
datacamp.com has many self guided lessons
Codeacademy is great for beginners just trying to write python code, but the code is not run your own machine or in JupyterLab, and isn’t saved for later reference. Still, it’s an easy on-ramp and you can probably blast through the key lessons before a free trial expires (currently=7 days).
Essential: Kaggle’s Data viz tutorial is excellent. It has reproducible code and data, using python.
Essential: An Economist’s Guide to Visualizing Data is excellent as well.
Essential: Data Visualization: A practical introduction, by Kieran Healy especially discusses the “whys” of visualization in a smart way. The walkthroughs are in R, not python, however.
Github, Git, and Version control
The most thorough yet simple walkthrough of Git and Github use on the web. Applies to python use for the most part.
Scikit (python package) can read in some data, which has data on Boston real estate, wine, a larger california housing dataset
Essential: Pandas can read in a LOT of useful data! Data providers include: Federal Reserve (“FRED”), Ken French, NASDAQ, OECD, Qunadl, TSP, World Bank, and more!
This comp was interesting. You could start trying to analyze it here. This has a good example of the process you might follow. After you’re done, you can see the winner’s code and discussion of the winning approach
Essential: kaggle.com has ML competitions, some FAQs, tutorials, data and competitions
Philly based data would be fun. Here is real estate, one option for data, seems ok, N=805
Predict box office for movies. VaultML claims they can do this by reading the screenplays and using textual analysis tools
UC Irvine has a data repo, some of these are available via scikit package