1.3. Github Basics¶
1.3.1. Why Github?¶
Note: I use Windows and will be of less help for Mac users. The TA, however, uses Mac and will be more helpful when issues stem from OS differences. GitHub, in particular, might be finicky if you have outdated versions of Safari or any other browser.
We will be using GitHub a lot in this course:
All of your course-related work will go on GitHub.
Discussion / help / announcements will happen on GitHub. (Yes, announcements!)
This entire website is on GitHub!
Assignments are posted on GitHub.
But why GitHub? Because it’s tremendously effective for developing a project. It is used by Apple, Uber, Netflix, Google, Microsoft, Bitcoin, CERN, Chinese censors (wait, what?), and many more large, sophisticated, multi-billion dollar entities.
It’s useful for (1) cloud storage, (2) collaboration, and (3) version control.
Let’s get started!
1.3.2. GitHub as cloud storage¶
At the very least, GitHub allows for cloud storage, like Google Drive and Dropbox do. There’s a bit more structure than just storing files under your account:
Repositories (aka “repo”): All files must be organized into repositories. Think of these as folders with self-contained projects. These can either be public or private.
User Accounts vs. Organization Accounts (aka “Org”): All repositories belong to an account:
A user account is the account you just made, and typically holds repositories related to your own work.
An Organization account can be owned by multiple people, and typically holds repositories relevant to a group (like STAT 545).
awesome-pythonrepo is a “curated list of awesome Python frameworks, libraries, software and resources”
ledatascifi-2021repo, within its corresponding
LeDataSciFiOrg contains these lectures.
By the end of lecture one and before the next class, you should be able to and have completed:
“Clone” that to your computer. The folder and files on your computer are called the “remote” repo.
Modify the repo in the cloud and, then, “fetch” those changes to your computer. Think of “fetch” as syncing your computer to catch up with any changes in the master.
Modify the repo on your computer, and then, “push” those changes to the cloud. Think of “push” as syncing the master to catch up with any changes from your computer.
With those exercises, done you’re ready to work with GitHub. We will also explore
What a good
Understand what the
.gitignorefile is and how/why to use it
What a merge conflict is and how to resolve it
1.3.3. GitHub for collaboration¶
The “traditional” way to collaborate involves sending files over email. But emails get buried, and, also… who has the most recent version, and what is it?
Git(Hub) solves this!
Git (just “Git”) is a distributed version control system. Basically: “Imagine if Dropbox and the “Track changes” feature in MS Word had a baby. Git would be that baby.” It’s great for us because it’s optimized for code.
GitHub (not just “Git”) is built on top of the Git system. Among the many added features that make collaboration easier, two are worth highlighting:
The GitHub repository is treated as the “master version”.
You can (and probably should!) use GitHub Issues instead of email to track open tasks.
Issues are a discussion board corresponding to a particular repository.
One “thread” is called an Issue. Some features:
You can tag other GitHub users using
Get email notifications if you are tagged, or are
Watching a repository.
As an example, check out the Issues in the
ggplot2 repository. People raise issues of all kinds, and then when they are solved, “Close” the issue. You can
We will talk about collaboration later. Suffice it to say, managing group tasks is of paramount importance in virtually all jobs you might have after college.
126.96.36.199. Collaboration practice¶
“Exercise 1”: VERY IMPORTANT
classmatesteam within the LeDataSciFi org. (You have to click on the gradebook link in coursesite (classroom.github.com/…) and then I’ll invite you.
classmatesteam. THIS IS WHERE CLASS ANNOUNCEMENTS WILL BE POSTED.
You should now get an email notification whenever an Issue is posted by myself, the TA, or if your classmates ask a question.
“Exercise 2”: Use the discussion board
Introduce yourself with 2 truths and a lie, and we can go from there.
1.3.4. GitHub for version control with Git¶
Why version control? In addition to the awful “file naming conundrum” in the comic above,
Don’t fret removing stuff
Leave a breadcrumb trail for troubleshooting
“Undo” and navigate a previous state
Helps you define your work
The way you work on project with GitHub is by following what I will call the GitHub Workflow:
Fetch early, commit frequently, push often!
This habit will help you avoid disasters, so that you get the positive features of Github without the headaches.
1. Make your coffee, open Github Desktop, and FETCH the project you’ll work on.
Change the “current repository” to the assignment you want to work on (or project, or your notes repo, etc.)
Click “Fetch origin” to download any changes from the master repo on the Github servers. This is important, because if someone else changed the files while you were sleeping, you’ll get the most updated files to work on.
Start your work on your computer.
If you don’t “fetch” before you start, it’s becomes easier to change a file someone else changed differently, creating a conflict. When this happens, you have to resolve the conflicting files before moving on.
2. “COMMIT” FREQUENTLY (say every 30 minutes or so, but depends on the team/task):
Save the files you’re working on. (Just like you would while working on a Powerpoint or Word document.)
When you save the file, Github Desktop (GHD) will notice it has been changed.
Go to GDH. Notice that your file is listed as a “changed” file.
Describe those changes in the “Summary” and (optionally) “Description” boxes, and click the blue “Commit” button.
Try to do this every time you save your files! It will make rolling back changes easier.
Do this early and often
3. “PUSH” OFTEN, but probably less than you commit (say every 60-90 minutes or so, but depends on the team/task):
Push your changes to the cloud by clicking the blue “push” button in GHD.
Now, you’ve got an up-to-date backup and teammates can see the changes and work with the latest files.
GHD will warn you if someone else made a change in the meantime. If this happens, click “fetch” to download what they did. If there is a conflict between your work and your teammate’s, you’ll have to resolve it.
Being careful about these steps might seem pointless during solo projects, but I encourage you to practice these good habits now, so that when you do collaborative work, you’re protected from mistakes.
Fact: Git only pushes/tracks the changes (called a diff) associated with a commit, so that it doesn’t need to take a snapshot of all your files each time.
View a recent “diff” by clicking on the description of the commit or the button with the SHA or hash code (something like
This is also useful for collaborators to see exactly what you changed.
View the repository from a while back with the
990cf9acommit, this folder looked VERY different!
View the history of a file by clicking on the any, then clicking “History”.