1.3. GitHub Basics¶
1.3.1. Why are we using GitHub?¶
Note: I use Windows and will be of less help to Mac users. The TA, however, uses Mac and will be more helpful when issues stem from OS differences. GitHub, in particular, might be finicky if you have outdated versions of Safari or any other browser.
We will be using GitHub a lot in this course:
All of your course-related work will go on GitHub.
Discussion / help / announcements will happen on GitHub. (Yes, announcements!)
This entire textbook is on GitHub!
Assignments are “posted” and “submitted” on GitHub.
But why GitHub? Because it’s tremendously effective for developing projects. It is used by Apple, Uber, Netflix, Google, Microsoft, Bitcoin, CERN, Chinese censors (wait, what?), and many more large, sophisticated, multi-billion dollar entities.
It’s useful for
and version control.
Let’s get started!
18.104.22.168. GitHub as cloud storage¶
At the very least, GitHub allows for cloud storage, like Google Drive and Dropbox do. There’s a bit more structure than just storing files under your account:
Repositories (aka “repo”): All files must be organized into repositories. A repo is just a folder for a self-contained project. Repos can either be public (like this textbook) or private (like your assignments).
All repositories belong to a user account or an organization account (aka “org”):
A user account is the account you just made, and typically holds repositories related to your own work. Here is mine.
An organization account can be owned by multiple people, and typically holds repositories relevant to a group. I own the LeDataSciFi organization account and it contains the textbook and all of our assignments.
awesome-pythonrepo is a “curated list of awesome Python frameworks, libraries, software, and resources”
ledatascifi-2023repo, within its corresponding LeDataSciFi Org contains this textbook.
22.214.171.124. GitHub for collaboration¶
The “traditional” way to collaborate involves sending files over email. But emails get buried, and, also… who has the most recent version, and what is it?
Git(Hub) solves this!
Git (just “Git”) is a distributed version control system. Basically: “Imagine if Dropbox and the “Track changes” feature in MS Word had a baby. Git would be that baby.” It’s great for us because it’s optimized for code.
GitHub (not just “Git”) is built on top of the Git system. Among the many added features that make collaboration easier, three are worth highlighting:
The repo hosted on GitHub’s servers is the “remote” version.
You and yours collaborators can download (“fetch”) the remote repo to your computer. The folder and files on your computer are called the “local” version.
You work on the files on your computer. When you’re done, you upload (“push”) your work back to the cloud to update the remote version.
If you’ve ever used a shared Dropbox folder, fetching and pushing are done automatically by Dropbox.
We have to do these steps manually using GitHub Desktop, but for good reason.
Use the GitHub Workflow to make sure you and your collaborators don’t make conflicting changes to a file.
GitHub Issues make it easier to track project tasks than using email.
Each repo has an Issues tab that functions as a discussion board.
One “thread” is called an Issue.
You can tag other GitHub users by typing
@username. E.g. using
You get email notifications if you are tagged, or are
Watching a repository.
As an example, check out the Issues in the
ggplot2repository. People raise issues of all kinds, and then when they are solved, “Close” the issue.
You can create issues on the repo for this textbook to point out errors or suggest edits. Contributions are appreciated and rewarded!
GitHub orgs have “team” discussion boards. Our class has one!
When you have a question about your grades or a private question about an assignment, you can open that repo, click on the “Issues” tab, and create a new issue. Make sure to tag me and the TA in the text so we see your post.
We will talk more about specific methods of collaboration later in the class. Suffice it to say, managing group tasks is of paramount importance in virtually all jobs you might have after college.
126.96.36.199. GitHub for version control¶
Version control is a time machine!
Instead of the awful “file naming conundrum” in the comic above, you just keep one file, “FINAL.doc”!
Delete stuff and recover it later easily
Leave a breadcrumb trail for troubleshooting - if some update broke your code, you can go back and see what changed when it broke
“Undo” and navigate a previous state
Helps you define your work and show contributions.
1.3.2. Using GitHub¶
188.8.131.52. Necessary skills¶
These skills require a mix of using the GitHub.com website and the GitHub Desktop app. We will cover these in class, by example. You should have all these covered by the third class:1
184.108.40.206. Necessary exercises¶
In class, I’ll give you a handout with a list of steps to follow. In summary, before the second class, you should have
Cloned the textbook to our computers (fetching this repo after each class will download the slides as I post them)
Created your own “Class Notes” repo, and
Invited me and the TA to it
Created a file named README.md, and added useful content to it from the GitHub browser page for the repo
Cloned it to your computer and worked on it there
Synced your work back to the cloud
It is VERY IMPORTANT to join the classmates “team” so you GET ANNOUNCEMENTS! Do these:
Click on the link in coursesite. Wait for an email.
When I invite you to the LeDataSciFi organization, accept it.
classmatesteam within the LeDataSciFi org. This is where class announcements will be posted. Make sure you are “watching” it. You should now get an email notification whenever an Issue is posted by myself, the TA, or if your classmates ask a question.
Introduced yourself to your classmates on the team discussion board. Look for the “introductions” topic.
Download a file from GitHub.com: Go to any repo, and click on any file. On the next page that opens, right-click the “Raw” button and “Save Link As”.
220.127.116.11. Optional exercises¶
When you push changes to a repo, Git doesn’t upload the whole file. It sends “commits”, which are records of changes to files in the repo (called a diff). This reduces bandwidth needs and makes it easier to track specific changes.
These exercises will help you with the time-machine aspects of GitHub:
View the “commit” history of the LeDataSciFi.github.io repository by clicking on the “commits” button on the repo home page. (You’ll end up here.)
View a recent “diff” by clicking on the description of the commit or the button with the SHA or hash code (something like
This is also useful for collaborators to see exactly what you changed.
View the repository from a while back with the
990cf9acommit, this folder looked VERY different!
View the history of a file by clicking on any file, then clicking “History”.
I have drawn heavily from STAT545
Students with GitHub experience, or those looking into issues later, will notice I am skipping past forks, branching, and using
gitcommands. This is because GitHub is a means to getting the class going and not the target skill for the course.