paulgorman.org

Git version control

Git is a distributed version control system. Users clone the entire code repository, make any number of changes and commits, and can eventually merge their clone/branch back to the original branch. Git makes branching and merging easy.

Each Git repository has a .git directory, which contains the repo's config, index, and object store. The object store is what gets duplicated when a repo is cloned. The object store contains everything required to reconstruct any version/branch of the project. The index describes the pending commit. Adds, deletes, and edits are recorded and staged in the index before you commit them to the repository. Staging changes in the index allows commits to the repository to be atomic. A commit is just a snapshot of the staged index.


        working files
              |
              v
        index/staging
              |
              V
       commit to branch

Objects in Git (including commits) are identified by 40-digit hexadecimal numbers like 8db571d910c9c4ac93c57ce4359e767f5caf5166. These ID's are SHA1 hases of the objects themselves (i.e.—the contents of the objects, rather than their file names or whatever). Any two objects with the same hash are identical to each other, even across different repositories.

Using Git

Help:
git help [commit add etc]

Also, the Pro Git and Version Control With Git books are very helpful.

Creating a new repository from an existing project:
cd myproject git init git add . git commit -m "Initial import of existing source."

Adding/staging/caching a file:
git add foo.txt [file2 file3 etc]

Un-staging a file you decided not to commit:
git rm --cached bad.txt
(removes from index/cache, but leaves working copy).

Delete a file (working copy) and stop tracking it:
git rm foo.txt

Check the index for changes staged/cached for next commit:
git status

Diff all changed files that are not yet staged/cached:
git diff
Diff files that have been staged:
git diff --cached

Commit staged changes:
git commit

You can provide a brief commit message with git commit -m "My message", or omit the flag to have git drop you into an editor.

Stage and commit any changed files in one step (ignoring files not under version control):
git commit -a

Show commit logs (listed from most recent to oldest):
git log [master]
The log starts at the supplied commit name; master is the default. The --stat flag shows how many changes were made to each file.

To see details of a particular commit:
git show 8ca531d910c9c4ac935b7ce4859e467f5caf5167
(Omitting the hash shows details of the most recent commit.)

Compare two commits by hashes:
git diff 8ca531d910c9c4ac935b7ce4859e467f5caf5167 922c38d910c9b4ac935b7ce4859e467f5caf51b4

Restore erroneously removed file:
git checkout HEAD -- foo.txt
(HEAD is a reference that always points to the latest commit. A bare double-dash " -- " avoids ambiguity by separating options from a list of arguments.)

Rename a file:
git mv foo.txt bar.txt

Deletions and renames take affect upon commit.

Clone a repo locally for a quick experiment:
git clone myproject projectcopy

Git config files and initial post-install setup

List files (or file patterns) you want git to ignore in a .gitignore file. You can use multiple ignore files in a repo; the ignore file affects files in its directory and subdirectories of that directory (i.e.—the .gitignore in the root of the repo affects all files in the repository).

myproject/.git/config contains setting specific to this repo, and these settings override options in ~/.gitconfig and /etc/gitconfig. git config --file settings end up here (if you're in the repository, omit the --file flag, as the local repo is the default for config commands). These settings are not copied when you clone the repo.

~/.gitconfig contains per-user settings. git config --global settings end up here.

/etc/gitconfig contains settings for the box. Settings in this file are overridden by settings in the above files. git config --system settings end up here.

Show the effective settings with:
git config -l

Initial post-install setup:
git config --global user.name "My Name" git config --global user.email "me@example.com" git config --global credential.helper cache git config --global credential.helper 'cache --timeout=3600' git config --global core.editor vim git config --global merge.tool vimdiff

Store command aliases in config files. For example:
git config --global alias.show-graph 'log --graph --abbrev-commit --pretty=oneline'
so you can thereafter simply type git show-graph.

Branching and Merging (and Tags)

The default branch in a repository is named master. Git makes branching and merging easy, so it's not uncommon to casually create branches to tackle a particular bug or develop a new feature.

Branches change all the time. Tags do not. A tag is a signpost in the history of the project, whereas a branch is the road. A tag is a symbolic name for a particular release; a branch is a symbolic name for a line of development. Don't name a branch the same as a tag.

git tag shows all the tags in alphabetical order. Create an annotated tag with git tag -a v2.0 -m 'Release version 2.0', or a lightweight tag like git tag v2.0. Tag a particular past commit by ID with git tag -a v2.0 9fceb02. See the details of a tag with git show ef3b43.

Show the branches in the project:
git branch
(The branch you're working on is highlighted with an asterisk.)

Create a new branch from the current commit:

git branch mynewbranch

Switch to the new branch:
git checkout mynewbranch
(Have a clean working directory and switching branches.)

Switch back to the master branch, and merge your changes from mynewbranch:
git checkout master git merge mynewbranch
(Have a clean working directory and index before merging.)

If there are no conflicts, it just works. If there are conflicts, you can see them with:
git diff
Edit the files with conflicts, resolve the conflicts, and git add the resolved files.
You can run git mergetool to walk through the conflicts instead of editing the files directly.
Run git status again to verify the conflicts are resolved.

After you resolve any conflicts:
git commit

If you're working on a side branch and need to pull in the latest changes from the master branch:
git merge master

To see which branches have or have not yet been merged, run git branch with the --merged and --no-merged flag.

Delete a branch (with due caution):
git branch -d mynewbranch

Git has a crap-ton of ways to refer to a commit. The 40-hex-digit SHA1 hash of a commit is authoritative and globally unique. Git also creates a few handy references: HEAD points to the latest commit in the current branch, ORIG_HEAD points to the previous HEAD after a merge or reset, FETCH_HEAD points to the HEAD of a remote repo that was just fetched, and MERGE_HEAD points to the other HEAD while a merge is in progress. Git can also refer to commits with relative names: if master is the latest commit, master~1 is the penultimate/parent commit, master~2 points to the commit before that (grandparent). If a commit has more than one parent, master^1 is parent A, master^2 is parent B, and so forth. master^3~2, for example, would be the third parent's grandparent.

The gitk tool can, among other things, draw a graph of of the repo.

Remote Repositories (Push/Pull)

A remote repo is a clone. Specifying a remote repo just creates a handle linking one clone to its fellow. A repository can have multiple remotes. Remotes need not be on a different machine; they can just be in a different directory on the same filesystem.

Development repos have a working directory with checked out files. Bare repositories do not (they basically just have the contents of the .git directory). If a few developers are working on their own clones repos, and coordinating to a central repository, that central repo should be bare. By convention, bare repositories are names with a .git extenstion, like myprojectdirectory.git.

Creating a new remote git repository:
local$ ssh example.com remote$ mkdir /home/me/git/myproject remote$ git init --bare

Link your local repo to that remote repository:
local$ cd /home/me/git local$ git remote add origin ssh://example.com/home/me/git/myproject local$ git push origin master

Push local changes to remote repository:
cd myproject git push ssh://example.com/home/me/git/myproject master

Update your existing local repository with any changes from remote:
git pull ssh://paulgorman.org/home/paulgorman/git/blelo master

Grab a fresh local copy from remote server:
git clone ssh://example.com/home/me/git/myproject

Push your new branch upstream:
git push -u origin mynewbranch
(The -u flag sets origin as the upstream default for mynewbranch.)

Pull a new branch from upstream:
git fetch origin anewbranch

If git complains about
There is no tracking information for the current branch. Please specify which branch you want to merge with.
run
git pull origin master git push -u origin master

Finding Problems with blame, pickaxe, or bisect

To find who last modified a particular line and when it was modified:
git blame -L 1427, mydir/foo.py

To find when a string changed (was added or deleted) in a file:
git log -Smystring mydir/foo.py
(The -S flag is known as pickaxe. Note that there is an edge case where pickaxe will not work: if a commit had exactly the same number of additions and deletions of the string in the file.)

git bisect is a way to find a faulty commit or regression. To use it, you need to know a bad commit (often your current master, where you noticed the problem) and a good commit free of the problem (perhaps the last major release version).

Be sure to start with a clean working directory.

cd myproject
git bisect start
git bisect bad        # The default commit, HEAD, is the bad end of the range
git bisect good v3.0-release        # Define a known good start of the range

Test to see if the fault is present in this version. For each version that is free from the fault, tell git:
git bisect good
or, if this version is still faulty:
git bisect bad

Each time you tell git good or bad, git will halve the commit number to narrow in on where the problem was introduced. It keeps a log with:
git bisect log

When you've found the problem, tell git you're finished by resetting to your original branch:
git bisect reset
(You should now see * master if you run git branch.)

Links