LeDataSciFi-2021
About the class
Objectives
Outcomes
About us + office hours
Course structure + policies
Grading
Gratitude
Schedule, tips, resources
Dashboard, key links, schedule
Help + resources
Tips and hacks
Assignments and participation
How to start and turn in assignments
How to do your peer reviews
ASGN 5: Our first full data science assignment
Final projects
Textbook
1. Motivation and Getting Started
1.1. Motivation
1.2. Set up
1.3. Github Basics
1.4. Markdown basics
1.5. Jupyter Lab Basics
1.6. Python Basics
1.7. Debugging
1.8. Errors
1.9. Installing libraries
1.10. Gitignore Files
2. Good Analysis Practices
2.1. A case study of bad research
2.2. The golden rules
2.3. Good data
2.4. When to write functions
2.5. Reorganizing the bad research folder
2.6. Filepaths
3. Wrangling with Data
3.1. Numpy
3.1.1. Numpy + Scientific Computing
3.1.2. A (Very) Short Introduction
3.1.3. Exercises
3.1.4. More Resources
3.2. Pandas
3.2.1. Tips
3.2.2. Vocab and Long vs Wide Data
3.2.3. Common Functions
3.2.4. Temp. vs Perm. Objects
3.2.5. Golden Rules + EDA
3.2.6. Pandas Chains
3.2.7. Common Tasks
3.2.8. Exercises
3.2.9. Summary and Resources
3.3. Data Visualization
3.3.1. The Role of Viz in Analysis
3.3.2. Making a Plot
3.3.3. Which Plot Type Should I Use?
3.3.4. Visual EDA
3.3.5. Better Plots
3.4. Other (Important) Data Wrangling Skills
3.4.1. Merging
3.4.2. Dealing with Missing Data
3.4.3. Dealing with Outliers
4. Accessing the World of Data
4.1. Getting Data off the Web
4.2. Opening + Parsing
a
Webpage
4.3. Building a spider
4.4. Strings
4.4.1. Working with Python Strings
4.4.2. Regex basics
4.4.3. Developing a regex
4.4.4. Finding words near each other
5. Data Science for Finance
5.1. Modeling and Teamwork
5.1.1. Machine Learning Gone Wrong
5.1.2. The modeling process
5.1.3. Coding in Teams
5.1.4. Sharing large files
5.2. Regression
5.2.1. Basics and Notation
5.2.2. Mechanics of running regressions
5.2.3. Goodness of Fit
5.2.4. Interpreting regression coefficients
5.2.5. Statistical significance
5.2.6. Summary and Resources
5.3. Intro to Machine Learning
5.3.1. The objective of machine learning
5.3.2. Model evaluation via cross validation (CV)
5.3.3. Model evaluation via Out-of-Sample (OOS)
5.3.4. Evaluating models
5.3.5. Model selection
5.3.6. Doing ML Wrong is Easy
5.4. Scikit-Learn for ML
5.4.1. Best Practice Pseudo Code
5.4.2. SKLearn Intro
5.4.3. Cross-Validation
5.4.4. Pipelines
5.4.5. Preprocessing
5.4.6. Optimizing a Model
5.4.7. Many Models
5.5. The Future
5.5.2.1. Saving Your Work
repository
open issue
Index