LeDataSciFi-2024
Syllabus
Objectives
Outcomes
About us + office hours
Course structure + policies
ChatGPT and other AI
Grading
Interested but unsure?
Pre-Class Bootcamp
The Hall of Awesomeness
Gratitude
Schedule, tips, resources
Dashboard, key links, schedule
Help
Resources
Tips and hacks
Community Codebook
Class Handouts
Assignments and participation
How to start and turn in assignments
How to do your peer reviews
Midterm project AKA ASGN 5
Personal Website AKA ASGN 9
Final projects
Textbook
1. Motivation and Getting Started
1.1. Motivation
1.2. Set up
1.3. GitHub Basics
1.4. The GitHub Workflow
1.5. Markdown basics
1.6. Jupyter Lab Basics
1.7. Python Basics
1.8. Debugging
1.9. Errors
1.10. Libraries/Packages
1.11. Gitignore Files
2. Good Analysis Practices
2.1. A case study of bad research
2.2. The golden rules
2.3. Organizing your projects
2.4. Filepaths
2.5. Storing data smartly
2.6. Functions
2.7. Writing good code comments
2.8. Writing good code
3. Wrangling with Data
3.1. Numpy
3.1.1. Numpy + Scientific Computing
3.1.2. A (Very) Short Introduction
3.1.3. Exercises
3.1.4. More Resources
3.2. Pandas
3.2.1. Tips
3.2.2. Vocab and Long vs Wide Data
3.2.3. Common Functions
3.2.4. Temp. vs Perm. Objects
3.2.5. Golden Rules + EDA
3.2.6. Pandas Chains
3.2.7. Common Tasks
3.2.8. Exercises
3.2.9. Summary and Resources
3.3. Data Visualization
3.3.1. The Role of Viz in Analysis
3.3.2. Making a Plot
3.3.3. Which Plot Type Should I Use?
3.3.4. Visual EDA Walkthrough
3.3.5. Visual EDA Tools
3.3.6. Better Plots
3.3.7. Customizing figures
3.3.8. Interactive plots:
plotly
3.3.9. Exercises
3.3.10. Chapter Summary
3.4. Other (Important) Data Wrangling Skills
3.4.1. Merging
3.4.2. Dealing with Missing Data
3.4.3. Dealing with Outliers
3.4.4. Chapter Summary
4. Building and Using Large (Textual) Data
4.1. Getting Data off the Web
4.2. Opening + Parsing
a
Webpage
4.3. Building a spider
4.4. Exploiting Textual Data
4.4.1. Working with Python Strings
4.4.2. Regex basics
4.4.3. Developing a regex
4.4.4. Intro to NLP - The Anchor Phase Technique
5. Data Science Intro
5.1. The promise of ML
5.2. Planning a project
5.3. ML Gone Wrong
5.4. Coding in Teams
5.5. Sharing large files
6. Regression
6.1. Basics and Notation
6.2. Mechanics of running regressions
6.3. Goodness of Fit
6.4. Interpreting regression coefficients
6.5. Statistical significance
6.6.
Significant
warnings about “statistical significance”
6.7. Fixed effects, categorical variables, and prettier regression tables
6.8. Summary and Resources
7. ML - Intro & Discussion
7.1. The objective of machine learning
7.2. Data Leakage - Illustration
7.3. Data Leakage - Illustration 2
7.4. Data Leakage Defined
7.5. Model evaluation via cross validation (CV)
7.6. Model evaluation via Out-of-Sample (OOS)
7.7. Evaluating models
7.8. Model selection
8. ML - Code & Implementation
8.1. Best Practice Pseudo Code
8.2. SKLearn Intro
8.3. Cross-Validation
8.4. Pipelines
8.5. Preprocessing
8.6. Optimizing a Model
8.7. Many Models
9. Finance Applications
9.1. Compounding returns
9.2. Rolling returns
9.3. Expanding returns
9.4. Estimating CAPM
9.5. Estimating better models
9.6. Supercharged resources/packages
10. The Future
10.1. Saving Your Work
10.2. Faster, better, stronger
10.3. Dashboards
Slides and Memes
Slides from previous classes
Memes
repository
open issue
Index