Full instructions for the proposals

In the project repo, create a file called “proposal.md”. It should cover two big things:

  1. The research question. It should be precise (NOT VAGUE), the hypothesis clear, and the metrics well defined.

  2. The necessary data. This should be realistically acquirable over our time frame. There are a lot of data resources on the website, including FRED, ourworldindata.com, and SEC’s EDGAR.

The template below is just a template. You can modify it, just remember to sell me on the idea and why it’s both interesting and feasible.


Research Proposal: < Title >

By X, Y, and Z

Research Question

This section should cover:

  1. What do we want to know or what problems are we trying to solve? As in the midterm, you should list (1) the “bigger” question/debate/problem you’re interested in, and also (2) the specific research question(s) you’ll actually try to answer.

    • The research question will be smaller in scope than the big picture question. But the answer to your specific research question should shed light on the bigger question (although it likely won’t conclusively answer it).

    • The answer to your specific research question should shed light on the bigger question (although it likely won’t conclusively answer it).

  2. If your project is about relationships, what are the hypotheses you’re testing?

  3. If your project is about prediction, what is your metrics of success? (What are you maximizing?) Can you find a baseline from prior work to give you a ball park to aim for?

Necessary Data

This section should cover:

  1. What does the final dataset need to look like (mostly dictated by the question and the availability of data):

    • What is an observation, e.g. a firm, or a firm-year, etc.

    • What is the sample period?

    • What are the sample conditions? (Years, restrictions you anticipate (e.g. exclude or require some industries)

    • What variables are absolutely necessary and what would you like to have if possible?

  2. What data do we have and what data do we need?

  3. How will we collect more data?

  4. What are the raw inputs and how will you store them (the folder structure(s) for each input type)?

  5. Speculate at a high level (not specific code!) about how you’ll transform the raw data into the final form.


Acknowledgment: We are effectively answering questions 1.1-1.3 and 2.1-2.3 from DS100 in this proposal.