5.4. Coding in Teams¶
Collaborative coding is so essential to the process of solving interesting finance problems, that it underlies the objectives at the front of this website.
This page is focused on helping your teams attack the project in the most effective way. And it includes a few things that will push your existing GitHub comfort level up and make your team more productive.
I would love your feedback (on the discussion board) on how you deal with the asynchronous work problem!
How did you decide to approach the collaboration on this project? (The questions below)
Please let me know what issues/problems your group runs into
What solutions did you try? Did they work well or only barely or not at all?
Flip side: If your group has an easy time, or finds something that works well, please let me and your classmates know!
Any comments will probably help your classmates and future years!
Q: How should you “meet”?
A: It’s up to you! Be entrepreneurial and run your group as you all see fit. (WhatsApp, groupme, google doc, zoom, skype…)
Q: How should you approach working concurrently on the project?
A: You basically have three approaches:
Sequentially divide tasks and conquer, e.g. Person A does part 1, Person B does part 2 after A is done.
I.e. in asgn-05 we had three files: download_wiki, measure_risk, and analysis. You can split up your project in a similar fashion.
Main advantages: Specialization + this “gives ownership” to one person for each part
Co-work on a task simultaneously: Persons A and B do a zoom share meeting and co-code on person A’s computer via screen share + remote control. Advantage: More brainpower, and good when the whole group is stuck.
Separately attack the same task, then combine your answers: Persons A and B separately do part 1, compare answers/approach, and put together a finalized solution to part 1. This creates duplicate and discarded work product, but will generate more ideas for getting to the solution.
Q: How do we work in the project repo “at the same time”?
The main issue is that two people might make conflicting changes. E.g., Johnny added a line to
data.py but Cindy deleted a line from
A: You have, basically, two approaches, and you might use both at different points of the project:
Free-for-all approach. Everyone works in the “master” branch of the repo all the time. This is what your default instinct might be. It can work, but you will probably have to fix merge conflicts to proceed at some point.
The “branching” approach. Basically, you create a clone of the “master” branch to work on, and when you’ve finished your changes, you create a “pull request” where you ask the main project’s owner (you and your own team, in this case) to pull your branch’s changes into the master branch. See the demo video below.
Warning! Warning! Warning!
FOLLOW THE GitHub WORKFLOW RULES EVERY SINGLE TIME YOU WORK ON CODE OR DO ANYTHING IN THE REPO
BEFORE YOU START ANY WORK FOR THE DAY: Go to GH Desktop and “Fetch/Pull” origin
WHEN YOU ARE DONE WITH A WORKING SESSION: Clear your code, rerun all, save file, then push to cloud
If you forget to fetch/pull before you start (and someone made a change on the GitHub repo since you last synced), or if someone is working at the same time (and pushes a change to the GitHub repo that conflicts with a change you made), you are likely to receive a “Merge Conflict” notification from GH Desktop.
Other Recommendations and Advice
Your most experienced coder might be given “CEO” status over the repo and “leads the way” on pull requests and gives guidance on merge conflicts.
Follow the golden rules in Chapter 2. It is all about tips to make building a project less painful, more productive, and more rewarding.
For example: Instead of putting the entire project in one ipynb file, structure the project like the midterm assignment:
One code file to download each input needed,
One code file to parse/transform each input,
One “get_all_data” code file that, if executed, would run all files above
One code to build the analysis sample, explore it, and analyze it
It’s better to over-communicate than under-communicate, especially in our virtual world
5.4.1. Branching Demo¶
Above, I mentioned that one way that multiple people can work in the same repo at the same time is by “branching”. Rather than explaining it, let’s let one of our TAs do a walkthrough on how this can work!
Here’s the side text from the video:
Open GH Desktop and create a toy repo
start new branch “my work”
add data/data.txt into folder (to simulate some work you’ve done)
see how GH Desktop sees a change?
click to try to switch branch (don’t though)
it says “leave changes on this branch or bring to master” –> only the branch you’re “in” can see/push changes you make
commit to branch
publish up to GH website
view on GH
switch branches to see the new files
compare: you’re able to merge
can explain your argument for changes (to convince others to adopt in distributed projects), submit
look at master branch - it should have data/data.txt
create “cynthia_did_some_work.txt” which says inside: “while i was sleeping”
go back to desktop like you’re going to work on the project
go to master… pulling origin would sync it but dont
go to “my work” branch
fetch / update from master: this gets the cynthia file, and I can continue
push this new file back up to my own branch on GH’s servers
make a new fake work file
merge into main one more time