Version Control Systems

Reproducible software pipelines

The problem

Whenever we work on any project, we have to keep track of at least one evolving artifact.

The situation is a lot worse if there are multiple files involved:

  • code
  • data
  • reports

Version Control to the rescue

Version control is a system that records changes to a file or set of files over time so that you can recall specific versions later. — Pro Git

  • Created in 2005 by the Linux kernel developers
  • Handles large software projects
  • Handles thousands of collaborators

What does version control allow you to do?

  • Recording history
  • Undoing mistakes
  • Experiment freely
  • Travelling in time
  • Sharing work

![Source: Excuse me, do you have a moment to talk about version control?

Initial setup

  • Install git: https://git-scm.com/book/en/v2/Getting-Started-Installing-Git
  • Create an account at github.com
  • (optional) Install a GUI like Github Desktop

Basic concepts

  • repository: collection of files and directories managed by git
  • commit: the state of the repo at a particular recorded point in time
  • diff: line-by-line difference between two commits
  • sha: cryptographic hash of the entire content of a repo, used as an identifier for the corresponding commit
  • commit message: abitrary text tied to a commit
  • working directory: the actual files in the directory managed by git

Staging changes for later commit

git add <FILES>

  • useful to split a large set of changes to the working directory into many smaller commits
  • useful to add a file to the repository upon creation
  • to add all changes just do git add .

Record the staged changes

git commit

  • commit messages should be meaningful
Capitalized, short (50 chars or less) summary

More detailed explanatory text, if necessary.  Wrap it to about 72
characters or so.  In some contexts, the first line is treated as the
subject of an email and the rest of the text as the body.  The blank
line separating the summary from the body is critical (unless you omit
the body entirely); tools like rebase will confuse you if you run the
two together.

Write your commit message in the imperative: "Fix bug" and not "Fixed bug"
or "Fixes bug."  This convention matches up with commit messages generated
by commands like git merge and git revert.

- Bullet points are okay, too
- Basically, you can use the Markdown conventions

A look through history

# show all the commits
git log

# go to a specific commit
git checkout <commit-ref>

Collaborating with git

Centralized workflow

Collaborating with git

Integration-manager workflow

Collaborating with git

Benevolent dictator and lieutenants workflow

Collaborating with git

# get and apply changes from remote
git pull

# get (but not apply) changes from upstream
git fetch

# send changes to remote
git push

Branching

  • git makes it very easy to create multiple branches
  • branching is easy, it’s merging that is difficult
  • care must be taken, or things will spiral out of control

Branching

Patterns for Managing Source Code Branches
by Martin Fowler

Branching for small teams

  • Keep the main branch the reference implementation for everybody
  • Possible merge conflicts
  • Before trying to push always pull the changes
  • Resolve any merge conflicts and then push

Warning

fetch frequently!

A tale of two committers

Once upon a time, Jessica and John were working on a project together

# Jessica's Machine
$ git push origin master
...
To jessica@githost:simplegit.git
   1edee6b..fbff5bc  master -> master
# John's Machine
$ git push origin master
To john@githost:simplegit.git
 ! [rejected]        master -> master (non-fast forward)
error: failed to push some refs to 'john@githost:simplegit.git'

# John's machine
$ git fetch origin
...
From john@githost:simplegit
 + 049d078...fbff5bc master     -> origin/master

# John's machine
$ git merge origin/master
Merge made by the 'recursive' strategy.
 TODO |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

# John's machine
$ git push origin master
...
To john@githost:simplegit.git
   fbff5bc..72bbc59  master -> master

In the meantime, Jessica continued to work

# Jessica's Machine
$ git fetch origin
...
From jessica@githost:simplegit
   fbff5bc..72bbc59  master     -> origin/master

Jessica can see which commits need merging like so

$ git log --no-merges issue54..origin/master
commit 738ee872852dfaa9d6634e0dea7a324040193016
Author: John Smith <jsmith@example.com>
Date:   Fri May 29 16:01:27 2009 -0700

   Remove invalid default value

$ git checkout master
Switched to branch 'master'
Your branch is behind 'origin/master' by 2 commits, and can be fast-forwarded.
$ git merge issue54
Updating fbff5bc..4af4298
Fast forward
 README           |    1 +
 lib/simplegit.rb |    6 +++++-
 2 files changed, 6 insertions(+), 1 deletions(-)

$ git merge origin/master
Auto-merging lib/simplegit.rb
Merge made by the 'recursive' strategy.
 lib/simplegit.rb |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

$ git push origin master
...
To jessica@githost:simplegit.git
   72bbc59..8059c15  master -> master

Braching for experimenting

# create a new branch
git switch -c crazy-feature
# do some edits
git commit -a -m 'short meaningless message'
# some other edits
git commit -a -m 'blah blah'
# now we rewrite the history of the branch, reducing it to a single commit
git rebase --interactive main
# do the edits following the instructions provided, and give a meaningful message

# merge the work, if you are happy with it
git switch main
git merge crazy-feature
git branch -d crazy-feature
# if things didn't work out, delete the branch
git switch main
git branch -D crazy-feature

Getting yourself out of trouble

An invaluable resource is https://dangitgit.com/en

Travelling in time

git reflog
# you will see a list of every thing you've
# done in git, across all branches!
# each one has an index HEAD@{index}
# find the one before you broke everything
git reset HEAD@{index}
# magic time machine

Tweaking commits after the fact

# make your change
git add . # or add individual files
git commit --amend --no-edit
# now your last commit contains that change!
# WARNING: never amend public commits

Reword a commit

git commit --amend
# follow prompts to change the commit message

Remove the last commit, keeping changes

# undo the last commit, but leave the changes available
git reset HEAD~ --soft

Undo a commit from the past

# find the commit you need to undo
git log
# use the arrow keys to scroll up and down in history
# once you've found your commit, save the hash
git revert [saved hash]
# git will create a new commit that undoes that commit
# follow prompts to edit the commit message
# or just save and commit

Finding bad commits

# start the search
git bisect start

# mark a commit as good
git bisect good

# mark a commit as bad
git bisect bad

# exit the bisection session
git bisect reset
  • At each step, you run your code and decide if the output is good or bad

The nuke option

cd ..
rm -r stupid-git-repo-dir
git clone https://some.github.url/stupid-git-repo-dir.git
cd stupid-git-repo-dir