LeDataSciFi-2022
Syllabus
Objectives
Outcomes
About us + office hours
Course structure + policies
Grading
Gratitude
Schedule, tips, resources
Dashboard, key links, schedule
Help
Resources
Tips and hacks
Assignments and participation
How to start and turn in assignments
How to do your peer reviews
Midterm project AKA ASGN 5
Final projects
Textbook
1. Motivation and Getting Started
1.1. Motivation
1.2. Set up
1.3. GitHub Basics
1.4. The Github Workflow
1.5. Markdown basics
1.6. Jupyter Lab Basics
1.7. Python Basics
1.8. Debugging
1.9. Errors
1.10. Libraries/Packages
1.11. Gitignore Files
2. Good Analysis Practices
2.1. A case study of bad research
2.2. The golden rules
2.3. Organizing your projects
2.4. Filepaths
2.5. Storing data smartly
2.6. Functions
2.7. Writing good code comments
2.8. Writing good code
3. Wrangling with Data
3.1. Numpy
3.1.1. Numpy + Scientific Computing
3.1.2. A (Very) Short Introduction
3.1.3. Exercises
3.1.4. More Resources
3.2. Pandas
3.2.1. Tips
3.2.2. Vocab and Long vs Wide Data
3.2.3. Common Functions
3.2.4. Temp. vs Perm. Objects
3.2.5. Golden Rules + EDA
3.2.6. Pandas Chains
3.2.7. Common Tasks
3.2.8. Exercises
3.2.9. Summary and Resources
3.3. Data Visualization
3.3.1. The Role of Viz in Analysis
3.3.2. Making a Plot
3.3.3. Which Plot Type Should I Use?
3.3.4. Visual EDA
3.3.5. Better Plots
3.4. Other (Important) Data Wrangling Skills
3.4.1. Merging
3.4.2. Dealing with Missing Data
3.4.3. Dealing with Outliers
4. Accessing the World of Data
4.1. Getting Data off the Web
4.2. Opening + Parsing
a
Webpage
4.3. Building a spider
4.4. The Power of Textual Data
4.4.1. Working with Python Strings
4.4.2. Regex basics
4.4.3. Developing a regex
4.4.4. Finding words near each other
5. Data Science for Finance
5.1. Modeling and Teamwork
5.1.1. Machine Learning Gone Wrong
5.1.2. The modeling process
5.1.3. Coding in Teams
5.1.4. Sharing large files
5.2. Regression
5.2.1. Basics and Notation
5.2.2. Mechanics of running regressions
5.2.3. Goodness of Fit
5.2.4. Interpreting regression coefficients
5.2.5. Statistical significance
5.2.6. Summary and Resources
5.3. Intro to Machine Learning
5.3.1. The objective of machine learning
5.3.2. Model evaluation via cross validation (CV)
5.3.3. Model evaluation via Out-of-Sample (OOS)
5.3.4. Evaluating models
5.3.5. Model selection
5.3.6. The Cardinal Sin of ML: Data Leakage
5.4. Scikit-Learn for ML
5.4.1. Best Practice Pseudo Code
5.4.2. SKLearn Intro
5.4.3. Cross-Validation
5.4.4. Pipelines
5.4.5. Preprocessing
5.4.6. Optimizing a Model
5.4.7. Many Models
5.5. Finance Applications
5.5.1. Compounding returns
5.5.2. Estimating CAPM
5.5.3. Estimating better models
6. The Future
repository
open issue
Index