Joshua L. Laughner - Part 2, Lesson 2: Basic branching and merging

Part 2, Lesson 2: Basic branching and merging

Lesson goal: learn how to create new branches, commit on different branches, and switch between branches.

Git commands:

git switch changes branches, and can create new ones as well.
git branch lists branches, and can also create new ones or delete existing ones.
git diff BRANCH1 BRANCH1 shows the differences between the two branches given.
git glog is an alias created in the previous lesson that will be useful for visualizing the history of all our branches.

Git concepts:

A branch is a sequence of one or more commits than can diverge from your main history.

Downloads:

Demo directory

Setting up our repository

We're going to keep things simple to start, so create a new repository with four files, named File1.txt, File2.txt, File3.txt, and File4.txt, each with a single line containing just its own filename without the .txt extension:

$ ls
File1.txt  File2.txt  File3.txt  File4.txt

$ for f in *.txt; do echo "$f: '$(cat $f)'"; done
File1.txt: 'File 1'
File2.txt: 'File 2'
File3.txt: 'File 3'
File4.txt: 'File 4'

You can download these files, but the zip file does not have a Git repo set up yet - you'll need to initialize it, add the files, and make an initial commit. You should now have a repo that is "clean", i.e. there are no files not being tracked or that have uncommitted changes. The output of git status, git ls-files, and git glog should look similar to this:

$ git status
On branch main
nothing to commit, working tree clean

$ git ls-files
File1.txt
File2.txt
File3.txt
File4.txt

$ git glog
* 074c6fe (HEAD -> main) Initial commit for branching demo

The commit hash (074c6fe) in the output for git glog will be different for you, and of course you may have written a different commit message. The main thing to check is that git glog outputs one line with (HEAD -> main) or, depending on your defaults, (HEAD -> master) in it.

Remember, git glog isn't a standard Git command, it's an alias we added in the previous lesson. If you get an error message about 'glog' is not a git command, go back and add that alias - we'll be using it a lot.

Whether your repository defaults to using "master" or "main" for the first branch depends on your global settings. "master" was the default for a long time, but in 2020 the decision was made to move away from this term and its negative connotations. "main" has become the most common replacement, and it's what I'll be using here.

Our first branch

Let's say that we want to edit File1.txt, but we're not sure that we'll want to use those edits in the end. We should create a branch and make our edits on that branch. By doing so, we can always switch back to our main branch if we want to use the unedited version.

To create a new branch and immediately switch to it, we use the git switch command with the -c flag (for "create"). We'll call our branch "child", so run:

$ git switch -c child
Switched to a new branch 'child'

If you run git glog now, you'll see something like:

$ git glog
* 074c6fe (HEAD -> child, main) Initial commit for branching demo

Remember from part 1 lesson 3 that the HEAD is a special reference that points to the parent commit of our working directory. We see that it's changed from pointing to main (the first time we ran git glog above) to pointing to child. We can also check our current branch with the git branch command:

$ git branch
* child
  main

The * next to child indicates that it is the active branch.

Let's go ahead and edit File1.txt to add a new line that says "Edited on the child branch".

$ git diff
diff --git a/File1.txt b/File1.txt
index 50fcd26..bfebdec 100644
--- a/File1.txt
+++ b/File1.txt
@@ -1 +1,2 @@
 File 1
+Edited on the child branch

Go ahead and commit this change. Now if you run git glog you'll see something like:

$ git glog
* 47962d4 (HEAD -> child) Edited File1.txt on child
* 074c6fe (main) Initial commit for branching demo

Remember that Git show history with the more recent entries at the top. Here, our child branch has the new commit, and main has not moved. Let's repeat this and add "Also edited on the child branch" to File2.txt:

$ git diff
diff --git a/File2.txt b/File2.txt
index 4475433..5d93118 100644
--- a/File2.txt
+++ b/File2.txt
@@ -1 +1,2 @@
 File 2
+Also edited on the child branch

Commit this and check git glog again:

$ git glog
* d169551 (HEAD -> child) Edited File2.txt on the child branch
* 47962d4 Edited File1.txt on child
* 074c6fe (main) Initial commit for branching demo

Again, child moved while main stayed behind. We can check the differences between two branches by using the git diff command with the two branch names as positional arguments:

$ git diff main child
diff --git a/File1.txt b/File1.txt
index 50fcd26..bfebdec 100644
--- a/File1.txt
+++ b/File1.txt
@@ -1 +1,2 @@
 File 1
+Edited on the child branch
diff --git a/File2.txt b/File2.txt
index 4475433..5d93118 100644
--- a/File2.txt
+++ b/File2.txt
@@ -1 +1,2 @@
 File 2
+Also edited on the child branch

There's our two additions!

Switching between branches

Now let's say we need to go back to the main branch. First let's look at the contents of our files before we switch:

# Feel free to just open the files in a text editor and check,
# I'm just using this command to show the contents all in one place.
$ for f in *.txt; do echo $f; cat $f; echo ""; done
File1.txt
File 1
Edited on the child branch

File2.txt
File 2
Also edited on the child branch

File3.txt
File 3

File4.txt
File 4

It's also a good idea to check that your repo is mostly clean, that is, has no uncommitted changes to tracked files. We'll see in a later lesson that Git usually does not let us change branches if a tracked file has uncomitted changes, which is a precaution to avoid accidentally breaking uncommitted work when it changes the file contents to match the new branch.

$ git status
On branch child
nothing to commit, working tree clean

Good, no files are listed as modified, so we can switch to main. We'll use git switch again, but without the -c flag since we're not creating a new branch:

$ git switch main
Switched to branch 'main'

$ git glog
* d169551 (child) Edited File2.txt on the child branch
* 47962d4 Edited File1.txt on child
* 074c6fe (HEAD -> main) Initial commit for branching demo

We get a confirmation message that we switched to the "main" branch, and git glog shows that HEAD now points to main as well. Let's look at our files' contents again:

$ for f in *.txt; do echo $f; cat $f; echo ""; done
File1.txt
File 1

File2.txt
File 2

File3.txt
File 3

File4.txt
File 4

As we wanted, everything has now been restored to the way it was when we left our main branch.

Parallel branches

We can make new commits in main too. Let's add the line "Appended in the main branch" to File3.txt:

$ git diff
diff --git a/File3.txt b/File3.txt
index 8cf9e18..12f5f87 100644
--- a/File3.txt
+++ b/File3.txt
@@ -1 +1,2 @@
 File 3
+Appended in the main branch

Commit this, and check git glog:

$ git glog
* 264999f (HEAD -> main) Added to File3.txt in main
| * d169551 (child) Edited File2.txt on the child branch
| * 47962d4 Edited File1.txt on child
|/  
* 074c6fe Initial commit for branching demo

Notice how git glog shows the diverging history. Now we have two new commits on child and one new one on main since their last common commit (074c6fe in my example).

Branches off branches

We're also free to branch off of any existing commit, not just the main branch. Let's switch back to the child branch:

$ git switch child
Switched to branch 'child'

$ git glog

* 264999f (main) Added to File3.txt in main
| * d169551 (HEAD -> child) Edited File2.txt on the child branch
| * 47962d4 Edited File1.txt on child
|/  
* 074c6fe Initial commit for branching demo

We can create a new branch from here, just as we did before. Let's call our new branch "grandchild":

$ git switch -c grandchild
Switched to a new branch 'grandchild'

# git glog
* 264999f (main) Added to File3.txt in main
| * d169551 (HEAD -> grandchild, child) Edited File2.txt on the child branch
| * 47962d4 Edited File1.txt on child
|/  
* 074c6fe Initial commit for branching demo

Let's edit File4.txt, since it's been left out so far, by adding "Added on the grandchild branch":

$ git diff
diff --git a/File4.txt b/File4.txt
index 442dd61..897f0ee 100644
--- a/File4.txt
+++ b/File4.txt
@@ -1 +1,2 @@
 File 4
+Added on the grandchild branch

Go ahead and commit so that git glog now shows:

$ git glog
* b1b5e42 (HEAD -> grandchild) Modified File4.txt on grandchild
* d169551 (child) Edited File2.txt on the child branch
* 47962d4 Edited File1.txt on child
| * 264999f (main) Added to File3.txt in main
|/  
* 074c6fe Initial commit for branching demo

Notice that now child and grandchild are shown on higher lines than main, whereas before this was the opposite. In this view, commits higher up aren't always more recent, as it tries to keep related commits together.

Let's make one more commit, editing File1.txt to add a third line, "Also edited on the grandchild branch":

$ git diff
diff --git a/File1.txt b/File1.txt
index bfebdec..4a5b31e 100644
--- a/File1.txt
+++ b/File1.txt
@@ -1,2 +1,3 @@
 File 1
 Edited on the child branch
+Also edited on the grandchild branch

Commit this change too, and then check git glog one more time.

$ git glog
* d8e56f9 (HEAD -> grandchild) Edited File1.txt on the grandchild branch
* b1b5e42 Modified File4.txt on grandchild
* d169551 (child) Edited File2.txt on the child branch
* 47962d4 Edited File1.txt on child
| * 264999f (main) Added to File3.txt in main
|/  
* 074c6fe Initial commit for branching demo

Summary

We've gone through how to create new branches and switch between them. So far we've created three different branches, and our files are slightly different in each. This allows use to essentially create parallel histories, which is useful in itself. A very common use of a branch is to try out changes to code and still be able to get back to the original version easily while you're working on the changes. Imagine this scenario:

You have a plotting function that took a while to get to work. Now it does what it's supposed to, but doesn't look very nice. You don't want to risk breaking the function after all that work, and you're not too comfortable with the options to make these plots look better, so you create a branch.
You've been tinkering with this function off and on for a few days. Its aesthetics are much better, but now the legend isn't showing the right symbols. Then you get a message from a colleague, asking for this plot for a meeting starting in 10 minutes! No problem, you just switch back to your original branch, make the plot, and switch back to the aesthetics branch.

Now, eventually you would finish this update and want to put that change back into the main branch. That's called merging in Git, and will be covered in Lesson 4. Our next lesson is on stashing, which is a way to temporarily put away uncommitted changes keeping you from switching between branches.

Some last notes

Before we leave this lesson, here's some miscellaneous advice for when you go to use branches in a real project.

Naming branches

We used "child" and "grandchild" as our branch names here just to illustrate how they split off from their "parent" branches. In practice, you should choose branch names that describe the purpose of the branch, rather than its relationship to other branches. For our plotting example in the summary, your branch might be called fix-plot-aesthetics. If you make a branch to deal with a bug with timestamps around the daylight savings time switch, it could be named dst-bugfix.

Branch names don't need to be long, but they definitely should give you an idea about why a branch exists. Sometimes it makes sense to prefix a branch with its category; for example feature/pretty-plot or bugfix/dst-timestamps (yes you can use slashes in branch names, though if we had a branch named "feature" or "bugfix" that would conflict here). This is good as the number of branches in a project grows, as it helps keep them grouped up. Speaking of which...

Pruning the number of branches

It's really tempting to go wild with branches and create a bunch of different branches, one for each change you want to make. It's also really easy to let branches accumulate. You start one idea, which doesn't work, so you leave that branch and go back to main. Then a different idea comes up, you try that - it seems promising, but will take time to implement correctly. A deadline comes up, so you switch to yet another branch to make some minor edits to finish up some plots or update an analysis. And once that project is wrapping up it's on to the next...

In my experience, it gets really difficult to remember what the ideas behind more than two or three branches are. As a general rule of thumb, try to keep to no more than three branches (not counting main) at any one time. If you get more than that, either get one wrapped up, tested, and merged back into main, or mark some as "inactive." You can use git branch -m to rename branches, so you could rename a branch X to inactive/X, just to give yourself a clue that said branch is in your backlog and not the most relevant one to deal with at the moment. (It's also probably a good idea to commit a note on that branch that describes where you left work on that branch, what's left to do, etc.)

Commit messages

Throughout this lesson, I included which branch I made a commit on in each message. That's only for illustration in this lesson, in practice I'd never do that. Keep to the points about what makes a good commit message we discussed at the end of part 1, lesson 3. Remember, branches are available to help you separate different tasks during development, whereas commit messages are there to explain where each change fits in the overall history.