1.3. Github Basics

1.3.1. Why Github?

Note: I use Windows and will be of less help for Mac users. The TA, however, uses Mac and will be more helpful when issues stem from OS differences. GitHub, in particular, might be finicky if you have outdated versions of Safari or any other browser.

We will be using GitHub a lot in this course:

  • All of your course-related work will go on GitHub.

  • Discussion / help / announcements will happen on GitHub. (Yes, announcements!)

  • This entire website is on GitHub!

  • Assignments are posted on GitHub.

But why GitHub? Because it’s tremendously effective for developing a project. It is used by Apple, Uber, Netflix, Google, Microsoft, Bitcoin, CERN, Chinese censors (wait, what?), and many more large, sophisticated, multi-billion dollar entities.

It’s useful for (1) cloud storage, (2) collaboration, and (3) version control.

Let’s get started!

1.3.2. GitHub as cloud storage

At the very least, GitHub allows for cloud storage, like Google Drive and Dropbox do. There’s a bit more structure than just storing files under your account:

  • Repositories (aka “repo”): All files must be organized into repositories. Think of these as folders with self-contained projects. These can either be public or private.

  • User Accounts vs. Organization Accounts (aka “Org”): All repositories belong to an account:

    • A user account is the account you just made, and typically holds repositories related to your own work.

    • An Organization account can be owned by multiple people, and typically holds repositories relevant to a group (like STAT 545).


  • The awesome-python repo is a “curated list of awesome Python frameworks, libraries, software and resources”

  • The ledatascifi-2021 repo, within its corresponding LeDataSciFi Org contains these lectures. Practice

By the end of lecture one and before the next class, you should be able to and have completed:

  1. Make a participation repo called “Class Notes” (with a blank README.md file). The repo on GitHub.com is called the “master” or “origin” repo.

  2. “Clone” that to your computer. The folder and files on your computer are called the “remote” repo.

  3. Modify the repo in the cloud and, then, “fetch” those changes to your computer. Think of “fetch” as syncing your computer to catch up with any changes in the master.

  4. Modify the repo on your computer, and then, “push” those changes to the cloud. Think of “push” as syncing the master to catch up with any changes from your computer.

With those exercises, done you’re ready to work with GitHub. We will also explore

  • What a good README file is

  • Understand what the .gitignore file is and how/why to use it

  • What a merge conflict is and how to resolve it

1.3.3. GitHub for collaboration

The “traditional” way to collaborate involves sending files over email. But emails get buried, and, also… who has the most recent version, and what is it? You don't want this

Git(Hub) solves this!

Git (just “Git”) is a distributed version control system. Basically: “Imagine if Dropbox and the “Track changes” feature in MS Word had a baby. Git would be that baby.” It’s great for us because it’s optimized for code.

GitHub (not just “Git”) is built on top of the Git system. Among the many added features that make collaboration easier, two are worth highlighting:

  • The GitHub repository is treated as the “master version”.

  • You can (and probably should!) use GitHub Issues instead of email to track open tasks.

    • Issues are a discussion board corresponding to a particular repository.

    • One “thread” is called an Issue. Some features:

    • You can tag other GitHub users using @username.

    • Get email notifications if you are tagged, or are Watching a repository.

As an example, check out the Issues in the ggplot2 repository. People raise issues of all kinds, and then when they are solved, “Close” the issue. You can

We will talk about collaboration later. Suffice it to say, managing group tasks is of paramount importance in virtually all jobs you might have after college. Collaboration practice

“Exercise 1”: VERY IMPORTANT

  1. Find the classmates team within the LeDataSciFi org. (You have to click on the gradebook link in coursesite (classroom.github.com/…) and then I’ll invite you.

  2. Start Watching the classmates team. THIS IS WHERE CLASS ANNOUNCEMENTS WILL BE POSTED.

You should now get an email notification whenever an Issue is posted by myself, the TA, or if your classmates ask a question.

“Exercise 2”: Use the discussion board

  • Introduce yourself with 2 truths and a lie, and we can go from there.

1.3.4. GitHub for version control with Git

Why version control? In addition to the awful “file naming conundrum” in the comic above,

  • Don’t fret removing stuff

  • Leave a breadcrumb trail for troubleshooting

  • “Undo” and navigate a previous state

  • Helps you define your work

The way you work on project with GitHub is by following what I will call the GitHub Workflow:


Fetch early, commit frequently, push often!

This habit will help you avoid disasters, so that you get the positive features of Github without the headaches.

Being careful about these steps might seem pointless during solo projects, but I encourage you to practice these good habits now, so that when you do collaborative work, you’re protected from mistakes. Practice

  • Fact: Git only pushes/tracks the changes (called a diff) associated with a commit, so that it doesn’t need to take a snapshot of all your files each time.

  • View commit history of the LeDataSciFi.github.io repository by clicking on the “commits” button on the repo home page. (You’ll end up here.

  • View a recent “diff” by clicking on the description of the commit or the button with the SHA or hash code (something like 990cf9a).

    • This is also useful for collaborators to see exactly what you changed.

  • View the repository from a while back with the <> button.

    • Before the 990cf9a commit, this folder looked VERY different!

  • View the history of a file by clicking on the any, then clicking “History”.

1.3.5. Credits

  • I have drawn heavily from STAT545

  • QuantEcon.org

  • EC607