The final project

There is no one way to do all of this, but if you feel as though you can go further, you probably can, and you should.

The project will have several deliverable stages to keep us on track, according to the schedule. Changes to that master schedule supercede any dates below.

  1. 15% Project proposal. Discussed below

  2. 5% Project proposal (final revision). Discussed below.

  3. 20% Project status report. Discussed below.

  4. 45% Written component - report and presentation files. Discussed below.

  5. 15% Presentations. Discussed below.

A note on ambition

Ambition will be considered when grading the written component and results.

The goal of the project is not to simply take a pre-cleaned dataset and run a basic analysis. You should collect data, from one or many sources, and combine them into a usable, clean dataset. Consider the depth of your analysis as you gather and clean this data.

As a silly example: Are you just gathering data about the height and weight of a sample of individuals and looking at the correlation of these two variables? That level is not sufficient for the end goal of this project. You would be better off using this data to predict the gender of an individual, given height and weight entered by a user, and using the data to make these predictions.

Always ask yourself if you can take your analysis one step further. If you answered yes, go for it! Use your data to extract information, gather this information, and draw conclusions.

Initial proposals

  • General idea: The question are you interested in, the data you need to acquire, the variables you’ll use, and the plan for how you’ll analyze it (what methods you’ll try and why you think they apply to your problem), considerations about how data might impact that.

  • Treat this document as if it is public facing, and a proposal for which you would like research funding. That is, the proposal document should be polished (both in visual formatting and editing) for external audiences.

  • Graded on: question viability, creativity, finance application, plan sketch, writing quality.

  • Instructions for the proposals are here.

Final proposals

  • Graded on: The improvement from the prior version, how feedback was incorporated, and current status

Project status report

  • General idea: You’ve now acquired the key data and finished most of the data cleaning.

  • Purpose: Needs to show progress and that you’re on track!

  • Ideal deliverable: A notebook file with nice data sections describing data source(s) and how you got/cleaned the data. This section could go straight into your final report if it’s polished enough.

  • Actual deliverable A notebook file that

    • describes (short bullet points) your data sources,

    • outlines (numbered list, broad steps, not minutia) how you acquired the data (for many groups, the downloading is in a separate file), got the data into python, and if you found any issues with the data you cleaned up (again, possibly a different file)

    • includes a bullet point list of your main observations from your EDA

    • shows your exploratory data analysis (EDA) (tables and figures and whatnot, does not need to be pretty or formatted)

Report and Presentation Files

On the due date (listed in the schedule), your repo should be cleaned and polished for publication. That means it should be cleaned of excess and random files, and that folders are sensible (data, temporary, code), the readme helps me/the TA/future visitors explore your repo easily. Your folder structure is up to you and will respond to the nature of your particular project, but I should be able to easily find

  • Your final report

  • Your presentation file

  • The code used to scrape and download data (and if you click-and-download anything, a link to the source) can be separate files, and the code used to load, clean, merge, and explore the data.