1.3. GitHub Basics

1.3.1. Why are we using GitHub?

Note: I use Windows and will be of less help to Mac users. The TA, however, uses Mac and will be more helpful when issues stem from OS differences. GitHub, in particular, might be finicky if you have outdated versions of Safari or any other browser.

We will be using GitHub a lot in this course:

  • All of your course-related work will go on GitHub.

  • Discussion / help / announcements will happen on GitHub. (Yes, announcements!)

  • This entire textbook is on GitHub!

  • Assignments are “posted” and “submitted” on GitHub.

But why GitHub? Because it’s tremendously effective for developing projects. It is used by Apple, Uber, Netflix, Google, Microsoft, Bitcoin, CERN, Chinese censors (wait, what?), and many more large, sophisticated, multi-billion dollar entities.

It’s useful for

  1. cloud storage,

  2. collaboration,

  3. and version control.

Let’s get started!

1.3.1.1. GitHub as cloud storage

At the very least, GitHub allows for cloud storage, like Google Drive and Dropbox do. There’s a bit more structure than just storing files under your account:

  • Repositories (aka “repo”): All files must be organized into repositories. A repo is just a folder for a self-contained project. Repos can either be public (like this textbook) or private (like your assignments).

  • All repositories belong to a user account or an organization account (aka “org”):

    • A user account is the account you just made, and typically holds repositories related to your own work. Here is mine.

    • An organization account can be owned by multiple people, and typically holds repositories relevant to a group. I own the LeDataSciFi organization account and it contains the textbook and all of our assignments.

Examples:

  • The awesome-python repo is a “curated list of awesome Python frameworks, libraries, software, and resources”

  • The ledatascifi-2024 repo, within its corresponding LeDataSciFi Org contains this textbook.

1.3.1.2. GitHub for collaboration

The “traditional” way to collaborate involves sending files over email. But emails get buried, and, also… who has the most recent version, and what is it?

You don't want this

Git(Hub) solves this!

Git (just “Git”) is a distributed version control system. Basically: “Imagine if Dropbox and the “Track changes” feature in MS Word had a baby. Git would be that baby.” It’s great for us because it’s optimized for code.

GitHub (not just “Git”) is built on top of the Git system. Among the many added features that make collaboration easier, three are worth highlighting:

  1. The repo hosted on GitHub’s servers is the “remote” version.

    • You and yours collaborators can download (“fetch”) the remote repo to your computer. The folder and files on your computer are called the “local” version.

    • You work on the files on your computer. When you’re done, you upload (“push”) your work back to the cloud to update the remote version.

    • If you’ve ever used a shared Dropbox folder, fetching and pushing are done automatically by Dropbox.

    • We have to do these steps manually using GitHub Desktop, but for good reason.

    • Use the GitHub Workflow to make sure you and your collaborators don’t make conflicting changes to a file.

  2. GitHub Issues make it easier to track project tasks than using email.

  3. GitHub repos have discussion boards. Our textbook repo has one!

    • Each repo has a Discussion tab for conversations.

    • You can tag other GitHub users by typing @username. E.g. using @donbowen.

Tip

When you have a question about your grades or a private question about an assignment, you can open that repo, click on the “Issues” tab, and create a new issue. Make sure to tag me and the TA in the text so we see your post.

We will talk more about specific methods of collaboration later in the class. Suffice it to say, managing group tasks is of paramount importance in virtually all jobs you might have after college.

1.3.1.3. GitHub for version control

Version control is a time machine!

  • Instead of the awful “file naming conundrum” in the comic above, you just keep one file, “FINAL.doc”!

  • Delete stuff and recover it later easily

  • Leave a breadcrumb trail for troubleshooting - if some update broke your code, you can go back and see what changed when it broke

  • “Undo” and navigate a previous state

  • Helps you define your work and show contributions.

1.3.2. Using GitHub

1.3.2.1. Necessary skills

These skills require a mix of using the GitHub.com website and the GitHub Desktop app. We will cover these in class, by example. You should have all these covered by the third class:1










1.3.2.2. Necessary exercises

In class, I’ll give you a handout with a list of steps to follow. In summary, before the second class, you should have

  1. Cloned the textbook to our computers (fetching this repo after each class will download the slides as I post them)

  2. Created your own “Class Notes” repo, and

    • Invited me and the TA to it

    • Created a file named README.md, and added useful content to it from the GitHub browser page for the repo

    • Cloned it to your computer and worked on it there

    • Synced your work back to the cloud

  3. It is VERY IMPORTANT to join the classmates “team” so you GET ANNOUNCEMENTS! Do these:

    1. Click on the link in coursesite. Wait for an email.

    2. When I invite you to the LeDataSciFi organization, accept it.

    3. Find our discussion board within the textbook’s repo. This is where class announcements will be posted. Make sure you are “watching” it. You should now get an email notification whenever an Issue is posted by myself, the TA, or if your classmates ask a question.

  4. Introduced yourself to your classmates on the textbook’s discussion board. Look for the “introductions” topic.

  5. Download a file from GitHub.com: Go to any repo, and click on any file. On the next page that opens, right-click the “Raw” button and “Save Link As”.

1.3.2.3. Optional exercises

When you push changes to a repo, Git doesn’t upload the whole file. It sends “commits”, which are records of changes to files in the repo (called a diff). This reduces bandwidth needs and makes it easier to track specific changes.

These exercises will help you with the time-machine aspects of GitHub:

  1. View the “commit” history of the LeDataSciFi.github.io repository by clicking on the “commits” button on the repo home page. (You’ll end up here.)

  2. View a recent “diff” by clicking on the description of the commit or the button with the SHA or hash code (something like 990cf9a).

    • This is also useful for collaborators to see exactly what you changed.

  3. View the repository from a while back with the <> button.

    • Before the 990cf9a commit, this folder looked VERY different!

  4. View the history of a file by clicking on any file, then clicking “History”.

1.3.3. Credits


1

Students with GitHub experience, or those looking into issues later, will notice I am skipping past forks, branching, and using git commands. This is because GitHub is a means to getting the class going and not the target skill for the course.