LeDataSciFi-2025
Syllabus
Objectives - Spring 2025 (NOT fall)
Outcomes
ChatGPT and other AI
Interested but unsure?
The Hall of Awesomeness
Gratitude
Tips and resources
Help
Resources
Tips and hacks
Memes
Textbook
1. Motivation and Getting Started
1.1. Motivation
1.2. Set up - Spring 2025 (NOT fall 2025)
1.3. GitHub Basics
1.4. The GitHub Workflow
1.5. Markdown basics
1.6. Jupyter Lab Basics
1.7. Python Basics
1.8. Debugging
1.9. Errors
1.10. Libraries/Packages
1.11. Gitignore Files
2. Good Analysis Practices
2.1. A case study of bad research
2.2. The golden rules
2.3. Organizing your projects
2.4. Filepaths
2.5. Storing data smartly
2.6. Functions
2.7. Writing good code comments
2.8. Writing good code
3. Wrangling with Data
3.1. Numpy
3.1.1. Numpy + Scientific Computing
3.1.2. A (Very) Short Introduction
3.1.3. Exercises
3.1.4. More Resources
3.2. Pandas
3.2.1. Tips
3.2.2. Vocab and Long vs Wide Data
3.2.3. Common Functions
3.2.4. Temp. vs Perm. Objects
3.2.5. Golden Rules + EDA
3.2.6. Pandas Chains
3.2.7. Common Tasks
3.2.8. Exercises
3.2.9. Summary and Resources
3.3. Data Visualization
3.3.1. The Role of Viz in Analysis
3.3.2. Making a Plot
3.3.3. Which Plot Type Should I Use?
3.3.4. Visual EDA Walkthrough
3.3.5. Visual EDA Tools
3.3.6. Bin Scatter Plots
3.3.7. Bin Scatter + Pair Plots
3.3.8. Better Plots
3.3.9. Customizing figures
3.3.10. Interactive plots:
plotly
3.3.11. Exercises
3.3.12. Chapter Summary
3.4. Other (Important) Data Wrangling Skills
3.4.1. Merging
3.4.2. Dealing with Missing Data
3.4.3. Dealing with Outliers
3.4.4. Chapter Summary
4. Building and Using Large (Textual) Data
4.1. Getting Data off the Web
4.2. Opening + Parsing
a
Webpage
4.3. Building a spider
4.4. Exploiting Textual Data
4.4.1. Working with Python Strings
4.4.2. Regex basics
4.4.3. Developing a regex
4.4.4. Intro to NLP - The Anchor Phase Technique
5. Data Science Intro
5.1. The promise of ML
5.2. Planning a project
5.3. ML Gone Wrong
5.4. Coding in Teams
5.5. Sharing large files
6. Regression
6.1. Basics and Notation
6.2. Mechanics of running regressions
6.3. Goodness of Fit
6.4. Interpreting regression coefficients
6.5. Statistical significance
6.6.
Significant
warnings about “statistical significance”
6.7. Fixed effects, categorical variables, and prettier regression tables
6.8. Summary and Resources
7. ML - Intro & Discussion
7.1. The objective of machine learning
7.2. Data Leakage - Illustration
7.3. Data Leakage - Illustration 2
7.4. Data Leakage Defined
7.5. Model evaluation via cross validation (CV)
7.6. Model evaluation via Out-of-Sample (OOS)
7.7. Evaluating models
7.8. Model selection
8. ML - Code & Implementation
8.1. Best Practice Pseudo Code
8.2. SKLearn Intro
8.3. Cross-Validation
8.4. Pipelines
8.5. Preprocessing
8.6. Optimizing a Model
8.7. Many Models
9. Finance Applications
9.1. Compounding returns
9.2. Rolling returns
9.3. Expanding returns
9.4. Estimating CAPM
9.5. Estimating better models
9.6. Does this signal make a tradeable anomaly?
9.7. Open Asset Pricing
9.8. Trading on stock return predictions
9.9. Supercharged resources/packages
10. The Future
10.1. Saving Your Work
10.2. Faster, better, stronger
10.3. Dashboards
repository
open issue
Index