Part 2, Lesson 1: Branching concepts

If you've finished Part 1, you should now be comfortable working with a linear timeline of commits in Git. (If not, you should really go back and do Part 1 - you need to be comfortable with those topics before tackling this part.) Part 2 will focus on working with branches in Git. Branches are to a linear history as parallel universes are to a single timeline. That is, each branch can have a different version of your code, which split off from the original timeline at some point. We can also merge branches back together, combining changes made in both of them.

Undestanding branches will also be important for Part 3, where we learn how to work with remote repositories (like those on GitHub). We'll go into more detail in Part 3, but to Git, a remote repo is essentially just another branch, so if you're comfortable working with branches locally, you can transfer all of that knowledge to working with a remote repo.

Why use branches - an example

I think this is best explained with an example. Let's say that you have some data analysis code that you use to carry out an analysis for a journal article. You submit the article, and it goes into peer review, and (as often seems to happen) gets stuck there for awhile. While that's going on, you start working on a new project. This new project needs some updates and changes to your analysis code, so you make those and commit them. Then, the reviews come back on your article, and reviewer 2 has pointed out a weakness in your analysis. You can fix it, but you need to start from the version of your code that you used to do the original analysis.

This is a great situation for a branch. First, you would check out the commit that corresponds to the version of your code you used for the first version of the article. Then, you create a branch starting from that commit, let's call the branch paper-revision. This new branch is now a separate history from your main branch. You can make whatever changes you need without affecting the changes you made for the new project. Plus, you can do incremental work on the code for the revision - write new code, test, commit, repeat.

TODO: image illustrating the commit history

When you finish the revision and resubmit the paper, you can choose what to do with the paper-revision branch. You could merge it into the main branch, when you have time to make sure those changes don't break your analysis for the new project. You could just leave it as another branch, and switch between them as needed. Or, you could tag the revision branch, and delete the branch itself. In this last approach, the commits will still be there, but because you won't be actively developing that version of the code, we clean up the branch so that only branches we are actively developing are listed in our repo.

General use of branches

The pattern we went through in the example is a very common use of branches. Frequently, we have some larger set of changes we want to make to our code that will take a comparatively long time to finish. A branch makes sense in this case when:

  1. We need to be able to switch back to our original version of the code easily, and
  2. We need to be able to still make changes to the original version, separately from the in-progress development.

If you don't need to switch back to the original version of the code, you may as well just keep all your work in the main branch. Likewise, if you don't need to make changes to that original version, you might as well just tag the version you want to go back to and use the methods we covered in part 1, lesson 6 to do so.

A helpful command

As we start working with multiple branches, it's very helpful to be able to visualize the layout of our branches. To do this on the command line, I use this command:

git log --all --decorate --graph --oneline --topo-order

Obviously I don't type this in each time. Instead I use a feature of Git called aliases to give myself a shortcut - I can run the command git glog and Git knows I mean that whole long command.

That glog is "g-log" as in "graph log," not the word "glog" that rhymes with "clog."

To make this alias, I run the command:

$ git config --global alias.glog 'log --all --decorate --graph --oneline --topo-order'

This will add the glog alias to your global Git configuration file (usually ~/.gitconfig). That means you can use it in all your repos, on this computer at least. (If you work on multiple computers, like a home computer and work laptop, or laptop and computing cluster, you'd need to do this on each computer.) Let's break down the config command:

  • config is the subcommand, used for modifying Git's configuration.
  • --global is a flag that indicates we want to modify the global user.
  • alias.glog means we want to set the value of the entry glog in the alias section of the configuration

As for the log command itself:

  • --all tells Git to show all ancestors of all named commits (HEAD, branches, and tags). By default, log will only show ancestors of the HEAD.
  • --decorate makes Git show the branch and tag names next to their respective commits, as well as show HEAD next to its commit.
  • --graph will format the list of commits with lines connecting each commit to its ancestors.
  • --oneline will only print the short commit hash, the names (if any), and the first line of the commit message; it will not give the full commit message, author, datetime, etc.
  • --topo-order helps put the commits in a better order for this view. In current versions of Git, this should be automatically included with --graph, but it doesn't hurt to add.

I really recommend you add this alias; we'll be making use of it throughout parts 2 and 3.

Other aliases: I usually define two other aliases: st as shorthand for status and idiff as shorthand for diff --staged. As you've probably found already, we run status a lot, so it's well worth shortening it. diff --staged will difference changes you've staged for commit against HEAD, i.e. it shows you exactly what you have staged. I don't use this one that often, but when I started using Git, this was diff --cached, which was harder to remember. So I created the idiff alias (for "index diff") to make it easier for me to remember.

Lessons in part 2

We have 4 topics to cover:

  • In lesson 2, we'll introduce basic commands to create and switch between branches.
  • In lesson 3, we'll cover stashing which is helpful when we have uncommitted changes and need to switch branches. We'll also discuss why uncommitted changes cause issues when switching branches.
  • In lesson 4, we'll go over merging branches together, as well as how to delete branches when we're done with them.
  • Finally, in lesson 5, we'll cover tags, which are similar to branches in that they allow you to identify a particular version of code, but unlike branches, tags don't move.

  • Next lesson: basic branching