The final project¶
This project wraps up all the skills in the class so far:
Ask an interesting question or highlight a problem to be solved
Get the necessary data to address it, performing EDA as needed to clean it
Analysis - question dependent
Communicating your results: notebook + presentation + website
A note on ambition¶
The goal of the project is not to simply take a pre-cleaned dataset and run a basic analysis. You should collect data, from one or many sources, and combine them into a usable, clean dataset.
Ambition is a non-trivial portion of the grade. Ambition in data acquisition, analysis methods, and website presentation will be considered and rewarded.
Components¶
As with any (interesting) project, the path from now to completion will not be a straight line.
So, to keep us on track, the project will have several deliverable stages, according to the schedule. Changes to that master schedule supersede any dates below.
Initial proposals (10%)
General idea: The question/problem are you interested in, the data you need to acquire, the variables you’ll use, and the plan for how you’ll analyze it (what methods you’ll try and why you think they apply to your problem), considerations about how data might impact that.
Treat this document as if it is public facing, and a proposal for which you would like funding. That is, the proposal document should be polished (both in visual formatting and editing) for external audiences.
Graded on: question viability, creativity, finance application, plan sketch, writing quality.
Instructions for the proposals are here.
Final proposals (10%)
I will provide feedback on your proposals to help you “right size” your goal and avoid some potholes.
Graded on: The improvement from the prior version, how feedback was incorporated, and quality after revisions
Project status report (15%)
General idea: You’ve now acquired the key data and finished most of the data cleaning.
Purpose: Needs to show progress and that you’re on track!
Ideal deliverable: A notebook file with nice data sections describing data source(s) and how you got/cleaned the data. This section could go straight into your final report if it’s polished enough.
Actual deliverable A notebook file that
describes (short bullet points) your data sources,
outlines (numbered list, broad steps, not minutia) how you acquired the data (for many groups, the downloading is in a separate file), got the data into python, and if you found any issues with the data you cleaned up (again, possibly a different file)
includes a bullet point list of the main observations from your EDA
shows your exploratory data analysis (EDA) (tables and figures and whatnot, does not need to be pretty or formatted)
Graded on: Data you have, EDA shown and discussed
Repo at submission (20%)
On the due date (listed in the schedule), your repo should be cleaned and polished for publication. That means it should be cleaned of excess and random files, and that folders are sensible (data, temporary, code), the readme helps me/the TA/future visitors explore your repo easily. Your folder structure is up to you and will respond to the nature of your particular project, but I should be able to easily find
The readme should contain a link to the website built off this analysis
The code used to scrape and download data (and if you click-and-download anything, a link to the source) can be separate files, and the code used to load, clean, merge, and explore the data.
The code used to do the analysis
Your presentation file needs to be in this repo. If you use google slides, you should include them as a PDF in this folder / put a link to the slides in the readme.
Graded on: Folder org, read me, code readability/structure
Obvious caveats for grading: Form matters, check grammar, and cite work you build on. Plagiarism is not acceptable.*
Website / Dashboard (25%)
We will talk about this more later.
Obvious caveats for grading: Form matters, check grammar, and cite work you build on. Plagiarism is not acceptable.*
Presentation (20%)
I’ll discuss scheduling later.
You have 15 minutes
Everyone should contribute
There will be Q&A (from myself)
Teach your classmates and me something! Strive for clarity and try to make something about it memorable.
Method: You can present a powerpoint, a jupyter file, or jupyter slides (nice!). I’ll leave it up to your group to present in the manner you consider most effective for your project.
Time: Each group will have up to 15 minutes to present your project, so build your presentation file accordingly. Try to avoid “speed talking” to make the time work. Less is more, usually. Sadly, 15 minutes won’t be enough to show everything you did, so focus on big picture details rather than on the syntax of line 89 of your code.
Content: A presentation’s structure is tailored even more to its material than a report is, so what your slides show is up to you. Be creative, and have fun. Try to convey to me and your peers why the question is interesting, describe plainly your approach and why your approach makes sense, what your main analytical findings are, and what you concluded from the exercise. You can even show/use your website during your presentation if you want.
Enjoyment: Don’t be afraid to “market yourselves”! If you did something impressive (tons and tons of data, or an impressive scraper, or a great model), find a way to tastefully show your classmates (and me) the cool stuff you did!
Obvious caveats for grading: Form matters, check grammar, and cite work you build on. Plagiarism is not acceptable.*