1.1. Motivation,

1.1.1. Or: Should I take this class (A: YES!)

Employers are starving for talented students that can use Python, and they are willing to pay for it:

Python is #2 for jobs with a high salary:


Within “data science” itself, it has the most jobs:


Among the common alternative data science languages, the most popular on SO:


Why is that? Well, it is not just about rote coding. It’s about a set of complementary skills, used within a framework for problem solving.

1.1.2. The data scientist approach

  1. Define a problem/project in a valuable way: A clearly specified question with metrics for success and idea of impact. Always keep the big picture in mind!

    • Valuable skill even if you never code again!

  2. Work collaboratively on the problem.

    • Interesting problems are big and rarely solo: Sir Edmund Hilary needed Tsenging Norgue to climb Everest.

    • Valuable skill even if you never code again! Understanding how to manage how any team projects evolve (whether it is producing a document, or slides, or whatnot) is extremely valuable. Imagine spending a week on a report, and then your boss Jan1 or your coworker Jean Ralphio2 undoes your week of work by editing a version of the report that is a week old. ARRRRRG!

  3. Acquire data and clean it. Age old wisdom tells us that if the input is crap, the output will be… Time spent on cleaning is often more valuable than time spent on modeling.

  4. Explore the data.

  5. Analysis using appropriate modeling tools. This is <25% of the work on most projects.

  6. Deliver the project conclusions to higher ups in the form of clear business recommendations. Writing should always be geared to the audience, and managers typically want bottom lines, whereas technical leads need more technical justification.

    • Valuable skill even if you never code again!

1.1.3. From here to there…

I’ve designed this class with the hope that you’ll be prepared and able to execute each of those steps.

By the end of the semester, your resume, should you choose to, can include your (burgeoning) proficiency with Python, Github, Machine Learning (ML) tools, web scraping, and data viz, in addition to describing your exploits on Github and the final course project.

So, in terms of data scientists, your journey this semester is hopefully something like:3 Our bargain (not Faustian, I hope)

This class is ambitious! You will need to learn skills from computer science, statistics, and econometrics just so that we have the toolkit needed to begin analysis. I’m aiming to make each of those components accessible (e.g. we won’t prove any theorems, and I’m boiling down programming to essentials). Still, that menu of skills is not easy to acquire (that’s why employers pay $$$ for it!), and…

THUS: You will have to work outside of class quite a bit.

And if you’ve never programmed:

  • I swear, youngens these days have it so much easier!4

  • Seriously, getting Python up and running has never been quicker, and we will have some working code soon!

  • You will be frustrated at times. This is natural! No programmer exists who has not cursed their computer to the depths of hell.

    • This is completely true: Half the time, it’s a silly typo on line 42 of your code. Like, you literally misspelled “regression” as “regresion”.

    • Corollary: A lot of programming takes place after dark, under the influence of coffee and Red Bull. This is why you misspelled “regresion”. Try to program at times when you have a clearer mind :)

  • Overcoming those frustrating issues feels soooooo good. You’ll feel a sense of accomplishment. Fight for that!

  • Your classmates are in it too, and they can, and surely will, help.

My half of the bargain: I will work just as hard as you throughout the semester to improve this new class. It’s experimental so some things (lectures, assignments) will succeed and aid you along that journey towards being Terminator 3 Linda Hamilton, and some things I try will fail. When something doesn’t work out, I’ll try to improve it.

Related: When you have questions in class, ask! Falling behind is costly, and asking a question is cheap. If you’re confused or having computer issues, someone else surely is too. If you’re stuck outside of class (homeworks, assignments, etc.), see the resources section of the website for a set of things you can do. After trying the options there, you can always… (come to the drop-in hours!)



I guess that makes me your old assistant on the journey…