Git Internals – The commit tree

Git stores the commits as a trees and blobs, and a tree object can consist of both trees and blobs as it’s children. An example of storage as blob can be seen here.

See the file structure of a project below


nikhil@nikhil-Inspiron-3537:~/dev/blog/git$ tree
.
|-- about.html
|-- css
| `-- simple.css
|-- index.html
`-- list.html

1 directory, 4 files

It is pretty straight forward with a couple of .html files and one .css file in a separate directory. Please note that all these files are committed, and that means it has already been written out to the underlying storage. To make sure we can use the really useful git log command,


nikhil@nikhil-Inspiron-3537:~/dev/blog/git/tree$ git log
commit 300f5c42a5aed68268547a95db4f40b6b122fb5b
Author: nikhil <nikhil@nikhil-Inspiron-3537.(none)>
Date: Sun Mar 16 16:52:54 2014 +0530

initial commit

It shows only one commit has been made. It also returns a hash value that is associated with the commit.

The cat-file command we overused and abused in the previous post can be used to see the tree corresponding to last commit. We pass the hash value of the commit as a parameter.


nikhil@nikhil-Inspiron-3537:~/dev/blog/git/tree$ git cat-file -p 300f5c42a5aed68268547a95db4f40b6b122fb5b
tree 5d0d7785b65180c195f7a1bf3cf02218b56f6f0a
author nikhil <nikhil@nikhil-Inspiron-3537.(none)> 1394968974 +0530
committer nikhil <nikhil@nikhil-Inspiron-3537.(none)> 1394968974 +0530

initial commit

So, the commit hash points to a tree, whose hash is displayed. Let us see what the tree contains,


nikhil@nikhil-Inspiron-3537:~/dev/blog/git/tree$ git cat-file -p 5d0d7785b65180c195f7a1bf3cf02218b56f6f0a
100644 blob 09e16f36b3c4993ba924b1074629283a49869be9 about.html
040000 tree 02ff2e2946f969bc640886861ff8c7039e1a2339 css
100644 blob 9015a7a32ca0681be64471d3ac2f8c1f24c1040d index.html
100644 blob b92b8b70267846c8b21b5ad412666cb99f9c9211 list.html

This tree contains three blobs for the three .html files and another tree for the css directory. Let us go into this tree,


nikhil@nikhil-Inspiron-3537:~/dev/blog/git/tree$ git cat-file -p 02ff2e2946f969bc640886861ff8c7039e1a2339
100644 blob dac138d9e013a2e9a10e67d793bd4703c1b86bd1 simple.css

It contains the .css file. So, the entire structure looks something like this.

commit one

Now, lets make a mall change to the index.html file and do a second commit. After this, this is how the log looks like,


nikhil@nikhil-Inspiron-3537:~/dev/blog/git/tree$ git log
commit ec8b103771588498923711c036ff3280c863f713
Author: nikhil <nikhil@nikhil-Inspiron-3537.(none)>
Date: Sun Mar 16 17:36:33 2014 +0530

second commit

commit 300f5c42a5aed68268547a95db4f40b6b122fb5b
Author: nikhil <nikhil@nikhil-Inspiron-3537.(none)>
Date: Sun Mar 16 16:52:54 2014 +0530

initial commit

There are two commits and both of them have two different hash values,

First Commit : 300f5c42a5aed68268547a95db4f40b6b122fb5b (Initial Commit)

Second Commit : ec8b103771588498923711c036ff3280c863f713 (Second Commit)

Let us follow the second commit tree like we did before,


nikhil@nikhil-Inspiron-3537:~/dev/blog/git/tree$ git cat-file -p ec8b103771588498923711c036ff3280c863f713
tree 455e56fbce4fe35ee64d9e7af572e5b0adef14f6
parent 300f5c42a5aed68268547a95db4f40b6b122fb5b
author nikhil <nikhil@nikhil-Inspiron-3537.(none)> 1394971593 +0530
committer nikhil <nikhil@nikhil-Inspiron-3537.(none)> 1394971593 +0530

second commit

The second commit is a child of the first commit, which is interesting, as it is exactly how we see in tools like gitk. A commit contains a reference to its parent commits. While there is usually just a single parent (for a linear history), a commit can have any number of parents in which case it’s usually called a merge commit. Most workflows will only ever make you do merges with two parents, but you can really have any other number too.

Going further deep into the tree,


nikhil@nikhil-Inspiron-3537:~/dev/blog/git/tree$ git cat-file -p 455e56fbce4fe35ee64d9e7af572e5b0adef14f6
100644 blob 09e16f36b3c4993ba924b1074629283a49869be9 about.html
040000 tree 02ff2e2946f969bc640886861ff8c7039e1a2339 css
100644 blob b110c44fd08f191062636f18cfeeaeccd5be1b73 index.html
100644 blob b92b8b70267846c8b21b5ad412666cb99f9c9211 list.html

The most interesting thing to note here is that, all the hash values, except the one for index.html remains the same.

The two commit trees look something like this.

commit two

In the second commit tree, the hash values for the unchanged files are the same as the previous commit tree. Now, just see the hash values as pointers to files. The second tree points to the unchanged files.

Well, that’s it.. commit trees…