19. Advanced Manipulations

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 19. Advanced Manipulations

Using git filter-branch

The command git filter-branch is a generic branch processing command that allows you to arbitrarily rewrite the commits of a branch using custom commands that operate on different objects within the repository. Some filters work on commits, some filters on tree objects and directory structures, and others provide environmental manipulation opportunity.

Does that sound useful and yet dangerous?

Good.

As you might suspect, with great power comes great responsibility.^[40] The power and purpose of filter-branch is also the source of my warning: it has the potential to rewrite the entire repository’s commit history. Executing this command on a repository that has already been published for others to clone and use will likely cause them endless grief later. As with all rebasing operations, commit history will change. After this command, you should consider any repositories cloned from it earlier as obsolete.

With that warning about rewriting repository history behind us, let’s find out what the command can do, when and why it might be useful, and how to use it responsibly.

The filter-branch command runs a series of filters on one or more branches within your repository. Each filter can have its own custom filtering command. You don’t have to run them all, or even more than one. But they are designed and sequenced so that earlier filters can affect the behavior of later filters. The subdirectory-filter runs as a precommit-processing selection filter, and the tag-name-filter runs as a postcommit-processing step,

To help you get a clearer picture of what is happening during the filtering process, it might help to know that as of version 1.7.9, git filter-branch is a shell script.^[41] Except for the commit-filter, each command is evaluated in a shell (sh) context using eval.

Here is a brief description of each filter and the order in which they run:

env-filter command: The env-filter can be used to create or alter the shell environment settings prior to running the subsequent filters and committing the newly rewritten objects. Of note, changing variables such as GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL, GIT_COMMITTER_NAME, and GIT_COMMITTER_EMAIL may be useful. The command should likely both set and export environment variables.
tree-filter command: The tree-filter allows you to modify the contents of a directory that will be captured by a tree object. You can use this filter to remove files from or add files to the repository retroactively. This filter checks out the branch at each commit during the filtering. Be aware that the .gitignore file is not effective during this filter.
index-filter command: The index-filter is used to alter the contents of the index prior to making a commit. Throughout the filtering process, the index of each commit is made available without checking out the corresponding files into a working directory. Thus, this filter is similar to the tree-filter but faster if you don’t actually need the file contents during the filter. You should study the low-level git update-index command.
parent-filter command: The parent-filter allows you to restructure the parent relationship of every commit. For a given commit, you specify its new parent or parents. To use this properly, you should study the low-level git commit-tree command.
msg-filter command: Just prior to actually making a newly filtered commit, the msg-filter allows you to edit the commit message. The command should accept the old message on stdin and write the new message on stdout.
commit-filter command: Normally during the filtering pipeline, git commit-tree will be used to perform the commit. However, this filter gives you control over this step yourself. Your command will be called with the new (possibly rewritten) tree-obj and a list of (possibly rewritten) -p parent-obj parameters. The (possibly rewritten) commit message will be on stdin. You should likely still use git commit-tree, but there are also a few convenience functions provided environmentally as well: map, skip_commit, git_commit_non_empty_tree, and die. The git filter-branch manual page has details for each of these functions.
tag-name-filter command: If your repository has any tags, you should probably use tag-name-filter to rewrite existing tags to reference the newly created corresponding commits. By default, the old tags will remain, but you can use cat as the filter to obtain direct new-for-old mappings of your tags. Although simply mapping tags to reference the new, corresponding commits is certainly possible, maintaining a signed tag is not. Remember that the whole point of signing a tag was to maintain a cryptographically secure indicator of the repository at a certain point in its history. That just went out the window here, right? So all those signatures on signed tags will be removed from the corresponding new tags.
subdirectory-filter command: The subdirectory-filter can be used to limit the rewriting of history to only those commits that affect a specific directory. That is, after filtering, the new repository will contain only the named directory at its root.

After a git filter-branch completes, the original references comprising the entire old commit history are available as new refs in refs/original. Naturally, this implies that the refs/original directory must be empty at the start of the filtering operation. After verifying that you obtained the filtered history you desired, and the original commit history is no longer needed, carefully remove the .git/refs/original refs. (Or, if you want to be fully Git compliant and Git friendly, you can even use the command git update-ref -d refs/original/branch for each branch you filtered.) If you do not remove this directory, you will continue to have the entirety of both the old and new content within your repository. The old refs will linger and prevent garbage collection (see Garbage Collection) from trimming out the otherwise obsolete commits.^[42] If you don’t want to explicitly remove this directory, you can also clone away from it. That is, make a clone of the repository, leaving these original refs behind and not cloning them into a new repository. Think of it as a natural checkpoint backup.

There are several reasons that best practices with git filter-branch suggest you should always operate on a newly cloned repository. For starters, git filter-branch flat-out requires that the operation to begin with a clean working directory. Because the git filter-branch modifies your original repository in place, it is often described as being a “destructive” operation. Because the command has many steps, options, and subtleties, running the command can be quite tricky and often difficult to get right on the first attempt. Saving the original repository is just prudent computing.

Examples Using git filter-branch

Now that we know what git filter-branch can do, let’s look at a few cases where it can be used productively. One of the most useful situations occurs when you have just created a repository full of commit history and want to clean it up or do a large-scale alteration on it prior to making it available for cloning and general use by others.

Using git filter-branch to expunge a file

A common use for git filter-branch is to completely remove a file from the entire history of a repository. Remember, Git maintains the complete history of every file within the repository. Thus, simply deleting a file with git rm will not remove it from older history. One can always go back to earlier commits and retrieve the file.

However, by using git filter-branch, a file can be removed from any and every commit in the repository, making it appear as if it was never there in the first place.

Let’s work on an example repository that contains personal notes after reading various books. The notes are stored in files named after the works.

    $ cd BookNotes

    $ ls
    1984  Animal_Farm  Nightfall  Readme  Snow_Crash

    $ git log --pretty=oneline --abbrev-commit
    ffd358c Read Asimov's 'Nightfall'.
    4df8f74 Read a few classics.
    8d3f5a9 Read 'Snow Crash'
    3ed7354 Collect some notes about books.

And the classics from the third commit 4df8f74 are:

    $ git show 4df8f74
    commit 4df8f74b786b31b6043c44df59d7d13ee2b4b298
    Author: Jon Loeliger <[email protected]>
    Date:   Sat Jan 14 12:57:35 2012 -0600

    Read a few classics.

        - Animal Farm by George Orwell
        - 1984 by George Orwell

    diff --git a/1984 b/1984
    new file mode 100644
    index 0000000..84a2da2
    --- /dev/null
    +++ b/1984
    @@ -0,0 +1 @@
    +George Orwell is disturbed.
    diff --git a/Animal_Farm b/Animal_Farm
    new file mode 100644
    index 0000000..e1fcda1
    --- /dev/null
    +++ b/Animal_Farm
    @@ -0,0 +1 @@
    +Animal Farm was interesting.

Suppose for some history-revising reason we have decided to remove any record of George Orwell’s 1984 from the repository. If you don’t care about the old commit history, simply issuing a git rm 1984 would suffice. But to be thoroughly Orwellian, it must be removed from the complete history of the repository. It must never have existed.

Of all the filters listed previously, the likeliest candidates for this operation are the tree-filter and index-filter. Because this is a small repository and the operation we want to do, namely, remove one file, is pretty simple and direct, we’ll use the tree-filter.

As advised earlier, start with a clean clone, just in case.

    $ cd ..
    $ git clone BookNotes BookNotes.revised
    Cloning into 'BookNotes.revised'...
    done.
    $ cd BookNotes.revised

    $ git filter-branch --tree-filter 'rm 1984' master
    Rewrite 3ed7354c2c8ae2678122512b26d591a9ed61663e (1/4)
        rm: cannot remove `1984': No such file or directory
    tree filter failed: rm 1984

    $ ls
    1984  Animal_Farm  Nightfall  Readme  Snow_Crash

Clearly that didn’t go well and something failed. The file is still in the repository.

Let’s think a little about what Git is doing here. Git will iterate over each commit in the master branch, starting with the very first commit, establish the context (index, files, directories, etc.) of that commit, and then try to remove the file 1984.

Git tells you which commit it was modifying when the command failed. Commit 3ed7354 is the first of 4 commits.

    Rewrite 3ed7354c2c8ae2678122512b26d591a9ed61663e (1/4)

But recall that the file 1984 was introduced in the third commit, 4df8f74, and not the first. That means that for the first two commits, 3ed7354 and 8d3f5a9, the 1984 file was not yet in the repository or any of its managed files. That in turn means that when establishing the filtering context of those first two commits, a simple rm 1984 shell command within the top-level directory will fail for lack of a file to remove. It’s exactly as if you had typed rm snizzle-frotz in a directory with no snizzle-frotz file in it.

    $ cd /tmp
    $ rm snizzle-frotz
    rm: cannot remove `snizzle-frotz': No such file or directory

The trick is to realize that when removing a file, you don’t care whether the file is actually present or not. So just force the removal and ignore nonexistent files using the -f or --force option:

    $ cd /tmp
    $ rm -f snizzle-frotz
    $

OK, back to the BookNotes.revised repository:

    $ cd BookNotes.revised
    $ git filter-branch --tree-filter 'rm -f 1984' master
    Rewrite ffd358c675a1c6d36114e10a92d93fdc1ee84629 (4/4)
    Ref 'refs/heads/master' was rewritten

As a side note, Git really scrolls through all the commits, stating which one it is presently rewriting, but only the last one shows up on your screen, as just shown. If you are a bit more clever, perhaps by piping that output through less, you can see that it actually prints each commit processed:

    Rewrite 3ed7354c2c8ae2678122512b26d591a9ed61663e (1/4)
    Rewrite 8d3f5a96b18f9795a1bb41295e5a9d2d4eb414b4 (2/4)
    Rewrite 4df8f74b786b31b6043c44df59d7d13ee2b4b298 (3/4)
    Rewrite ffd358c675a1c6d36114e10a92d93fdc1ee84629 (4/4)

But it worked this time:

    $ ls
    Animal_Farm  Nightfall  Readme  Snow_Crash

The 1984 file is now gone!

Tip

For the terminally curious, the corresponding command using index-filter would be something like this:

    $ git filter-branch --index-filter  
      'git rm --cached --ignore-unmatch 1984' master

Let’s look at the new commit log:

    $ git log --pretty=oneline --abbrev-commit
    ad1000b Read Asimov's 'Nightfall'.
    7298fc5 Read a few classics.
    8d3f5a9 Read 'Snow Crash'
    3ed7354 Collect some notes about books.

Notice how each commit starting with the original third commit (4df8f74 and ffd358c) now has different SHA1 values (7298fc5 and ad1000b), whereas the earlier commits (3ed7354 and 8d3f5a9) remain unchanged.

During the filtering and rewriting process, Git creates and maintains this mapping between old and new commit values and makes it available to you as the map convenience function. If for some reason you need to convert from an old commit SHA1 to the corresponding new SHA1, you can do so using this mapping from within your filter command command.

Let’s investigate a bit more, though.

    $ git show 7298fc5
    commit 7298fc55d1496c7e70909f3ebce238d447d07951
    Author: Jon Loeliger <[email protected]>
    Date:   Sat Jan 14 12:57:35 2012 -0600

    Read a few classics.

        - Animal Farm by George Orwell
        - 1984 by George Orwell

    diff --git a/Animal_Farm b/Animal_Farm
    new file mode 100644
    index 0000000..e1fcda1
    --- /dev/null
    +++ b/Animal_Farm
    @@ -0,0 +1 @@
    +Animal Farm was interesting.

Indeed the commit that first introduced 1984 no longer does so! That means the file was never introduced in the first place. It is not just gone from the top commit; it is not just gone from any commit reachable from the master branch; it never existed on this branch.

But doesn’t it bother you that the commit message itself still mentions the 1984 book? Let’s fix that in the next section!

Using filter-branch to edit a commit message

Here’s the problem we’re solving: some commit message needs to be revised. In the previous section, we saw how to remove a file from the complete history of a repository. However, the commit message that used to introduce it still alludes to it:

    $ git log -1 7298fc55
    commit 7298fc55d1496c7e70909f3ebce238d447d07951
    Author: Jon Loeliger <[email protected]>
    Date:   Sat Jan 14 12:57:35 2012 -0600

    Read a few classics.

        - Animal Farm by George Orwell
        - 1984 by George Orwell

That last line has to go!

This is the perfect use case for the --msg-filter filter. Your filter command should accept the old text of a commit message on stdin and write its revised text on stdout. That is, your filter should be a classic stdin-to-stdout edit filter. Typically, it will be something like sed, although it can be as complex as needed.

In our case, we’ll want to delete that last 1984 line. We’ll also want to touch up the previous sentence to just talk about one book rather than a “a few.” A sed command to do these edits looks like this:

    sed -e "/1984/d" -e "s/few classics/classic/"

Put that together with the --msg-filter option. Be careful with your line breaks on input here. It should be all one line, or use the single quote as a command input continuation technique.

    $ git filter-branch --msg-filter '
        sed -e "/1984/d" -e "s/few classics/classic/"' master
    Rewrite ad1000b936acf7dbe4a29da6706cb759efded1ae (4/4)
    Ref 'refs/heads/master' was rewritten

Let’s check:

    $ git log --pretty=oneline --abbrev-commit
    bf7351c Read Asimov's 'Nightfall.'
    f28e55d Read a classic.
    8d3f5a9 Read 'Snow Crash'
    3ed7354 Collect some notes about books.

We can already see that the log message from commit f28e55d has been singularized by our sed script. Good. Looking again at the whole message:

    $ git log -1 f28e55d
    commit f28e55dc8bbdee555a3f7778ba8355db9ab4c4a1
    Author: Jon Loeliger <[email protected]>
    Date:   Sat Jan 14 12:57:35 2012 -0600

    Read a classic.

        - Animal Farm by George Orwell

Now it is truly as if it never existed in this repository! And we’ve always been at war with Eastasia.

One cautionary note about the filtering process: make sure that you are both operating on the items you want to change, and that you are operating on only those items!

For example, the sed command from the previous --msg-filter example appears to change precisely the one commit message we wanted to adjust. However, be aware that same sed script is applied to every commit message in the history. If there were other, perhaps incidental occurrences of the string 1984 in other commit messages, they would also have been deleted because our script was not very discriminating. Subsequently, you may have to write a more detailed sed command or a more clever script.

filter-branch Pitfalls

It is important to understand a brutal consequence of the name of this Git command: it is filter-branch. At its core, the git filter-branch command is designed to operate on just one branch or ref. However, it can operate on many branches or refs.

In many cases, you want to have it operate on all branches so as to obtain a repository-wide coverage. In these cases, you will need the -- --all tacked onto the end of the command.

    $ git filter-branch --index-filter 
    "git rm --cached -f --ignore-unmatch '*.jpeg'" 
    -- --all

Similarly, you almost certainly want to translate any tag refs from a prefiltered state into the new postfiltered repository. That means adding --tag-name-filter cat is also quite common:

    $ git filter-branch --index-filter 
    "git rm --cached -f --ignore-unmatch '*.jpeg'" 
    --tag-name-filter cat 
    -- --all

Tip

How about this one? You used --tree-filter or --index-filter to remove a file from a repository, but did that file get moved or have its name changed at some point in its history? You can use a command like this to find out:

    $ git log --name-only --follow --all -- file

If other names for that file exist, you might want to delete those versions as well.

How I Learned to Love git rev-list

One day, I received this piece of email:

Jon,
I’m trying to figure out how to do a date-based check out from a Git repository into an empty working directory. Unfortunately, winding my way through the Git manual pages makes me feel like I’m playing “Adventure.”
Eric

Indeed. Let’s see if we can navigate some of those twisty passages.

Date-Based Checkout

It might seem that a command like git checkout master@{Jan 1, 2011} should work. However, that command is really using the reflog (See The Stash) to resolve the date-based reference for the master ref. There are lots of ways this innocent looking construct might fail: your repository may not have the reflog enabled, you may not have manipulated the master ref during that time period, or the reflog may have already expired refs from that time period. Even more subtly, that construct may not give you your expected answer. It requests the reflog to resolve where your master was at the given time as you manipulated the branch, and not according to the branch’s commit time line. They may be related, especially if you developed and committed that history in this repository, but they don’t have to be.

Ultimately, this approach can be a misleading dead-end. Using the reflog might get what you want. But it can also fail, and it isn’t a reliable method.

Instead, you should use the git rev-list command. It is the general purpose workhorse whose job is to combine a multitude of options, sort through a complex commit history of many branches, intuit potentially vague user specifications, limit search spaces, and ultimately locate selected commits from within the repository history. It then emits one or more SHA1 IDs for use by other tools. Think of git rev-list and its myriad options as a commit database front-end query tool for your repository.

In this case, the goal is fairly simple: find the one commit in a repository that existed immediately before a given date on a given branch and then check it out.

Let’s use the actual Git source repository because it has a fairly extensive and explorable history. First, we’ll use rev-list to find that SHA1. The -n 1 option limits the output from the command to just one commit ID.

Here, we try to locate just the last master commit of 2011 from the Git source repository:

    $ git clone git://github.com/gitster/git.git
    Cloning into 'git'...
    remote: Counting objects: 126850, done.
    remote: Compressing objects: 100% (41033/41033), done.
    remote: Total 126850 (delta 93115), reused 117003 (delta 84141)
    Receiving objects: 100% (126850/126850), 27.56 MiB | 1.03 MiB/s, done.
    Resolving deltas: 100% (93115/93115), done.

    $ cd git
    $ git rev-list -n 1 --before="Jan 1, 2012 00:00:00" master
    0eddcbf1612ed044de586777b233caf8016c6e70

Having identified the commit, you may use it, tag it, reference it, or even check it out. But as the checkout note reminds you, you are on a detached HEAD.

    $ git checkout 0eddcb
    Note: checking out '0eddcb'.

    You are in 'detached HEAD' state. You can look around, make experimental
    changes and commit them, and you can discard any commits you make in this
    state without impacting any branches by performing another checkout.

    If you want to create a new branch to retain commits you create, you may
    do so (now or later) by using -b with the checkout command again. Example:

      git checkout -b new_branch_name

    HEAD is now at 0eddcbf... Add MYMETA.json to perl/.gitignore

But is that really the right commit?

    $ git log -1 --pretty=fuller
    commit 0eddcbf1612ed044de586777b233caf8016c6e70
    Author:     Jack Nagel <[email protected]>
    AuthorDate: Wed Dec 28 22:42:05 2011 -0600
    Commit:     Junio C Hamano <[email protected]>
    CommitDate: Thu Dec 29 13:08:47 2011 -0800

    Add MYMETA.json to perl/.gitignore
    ...

The rev-list date selection uses the CommitDate field, not the AuthorDate field. So it looks like the last commit of 2011 in the Git repository happened on December 29, 2011.

Date-based checkout cautions

A few words of caution are in order, though. Git’s date handling is implemented using a function called approxidate(). Not that dates are inherently approximate, but rather that Git’s interpretation of what you meant are approximated, usually due to insufficient details or precision.

    $ git rev-list -n 1 --before="Jan 1, 2012 00:00:00" master
    0eddcbf1612ed044de586777b233caf8016c6e70

    $ git rev-list -n 1 --before="Jan 1, 2012" master
    5c951ef47bf2e34dbde58bda88d430937657d2aa

I typed those two commands at 11:05 A.M. local time. For lack of a specified time in the second command, Git assumed I meant “at this time on Jan 1, 2012.” Subsequently, 11 more hours of leeway were available in which to match commits.

    $ git log -1 --pretty=fuller 5c951ef
    commit 5c951ef47bf2e34dbde58bda88d430937657d2aa
    Author:     Clemens Buchacher <[email protected]>
    AuthorDate: Sat Dec 31 12:50:56 2011 +0100
    Commit:     Junio C Hamano <[email protected]>
    CommitDate: Sun Jan 1 01:18:53 2012 -0800

    Documentation: read-tree --prefix works with existing subtrees
    ...

This commit happened an hour and 18 minutes into the new year; well within the 11 hours past midnight that I accidentally specified in my second command.

So does Git’s date parsing behavior even make sense? Probably.

Git is trying to intuit the intended meaning behind vaguely specified time requests. For example, how should yesterday be interpreted? As the previous 24-hour period? As the absolute time period midnight-to-midnight of the previous calendar date? As some vague notion of yesterday’s business working hours? Git happens to use the first interpretation: the 24 hours prior to the current time. Generalizing now, any date used as a starting or ending point in Git uses the current time, and if a date is specified without a time, the current time is used as the demarcation, which is where the notion of “the current time” comes in. If you wanted to be more precise about just exactly when yesterday, you could have said something like yesterday noon, or 5pm yesterday.

One more caution about date-based checkout. Although you may get a valid answer to your query for a specific commit, that same question at some later date may yield a different answer. For example, consider a repository with several lines of development happening on different branches. As previously, when you request the commit --before date on a given branch, you get an answer for the branch as it exists just then. At some later point in time, however, new commits from other branches might be merged into your branch, altering the notion of which commit might satisfy your search conditions. In the previous January 1, 2012 example, someone might merge in a commit from another branch that is closer to midnight December 31, 2011 than December 29, 2011 at 13:08:47.

Retrieve Old Version of a File

Sometimes in the course of software archeology, you simply want to retrieve an old version of a file from the repository history. It seems overkill to use the techniques of a date-based checkout as described in Date-Based Checkout because that causes a complete change in your working directory state for every directory and file just to get one file. In fact, it is even likely that you want to keep your current working directory state but replace the current version of just one file by reverting it to an earlier version.

The first step is to identify a commit that contains the desired version of the file. The direct approach is to use an explicit branch, tag, or ref already known to have the correct version. In the absence of that information, some searching has to be done. And when searching the commit history, you should think about using some rev-list techniques to identify commits that have the desired file. As previously seen, dates can be used to select interesting commits. Git also allows the search to be restricted to a particular file or set of files. Git calls this approach “path limiting.” It provides the ultimate guide to possible previous commits that might contain different versions of a file, or as Git calls them, paths.

Again, let’s explore Git’s source repository itself to see what previous versions of, say, date.c are available.

    $ git clone git://github.com/gitster/git.git
    Cloning into 'git'...
    remote: Counting objects: 126850, done.
    remote: Compressing objects: 100% (41033/41033), done.
    remote: Total 126850 (delta 93115), reused 117003 (delta 84141)
    Receiving objects: 100% (126850/126850), 27.56 MiB | 1.03 MiB/s, done.
    Resolving deltas: 100% (93115/93115), done.

    $ git rev-list master -- date.c
    ee646eb48f9a7fc6c225facf2b7449a8a65ef8f2
    f1e9c548ce45005521892af0299696204ece286b
    ...
    89967023da94c0d874713284869e1924797d30bb
    ecee9d9e793c7573cf3730fb9746527a0a7e94e7

Uh, yeah, something like 60-odd lines of SHA1 commit IDs. Fun! But what does it all mean? And how do you use it?

Because I didn’t specify the -n 1 option, all matching commit IDs have been generated and printed. The default is to emit them in reverse chronological order. So this means commit ee646e contains the most recent version of the file date.c, and ecee9d9 contains the oldest version. In fact, looking at commit ecee9d9 shows the file being introduced into the repository for the first time.

    $ git show --stat ecee9d9 --pretty=short
    commit ecee9d9e793c7573cf3730fb9746527a0a7e94e7
    Author: Edgar Toernig <[email protected]>

    [PATCH] Do date parsing by hand...

     Makefile      |    4 +-
     cache.h       |    3 +
     commit-tree.c |   27 +--------
     date.c        |  184 +++++++++++++++++++++++++++++++++++++++++++++
     4 files changed, 191 insertions(+), 27 deletions(-)

Where you go from here to find your desired commit is kind of sketchy. You could do git log operations on randomly selected SHA1 values from that rev-list list output. Or you could binary search the time stamps on commits from that list. Earlier we used the -n 1 to select the most recent. It’s really hard to say what trick might work in your selection process to identify the precise commit that contains the version of a file that is interesting to you.

But once you have identified one of those commits, how do you use it? What does that version of date.c look like? What if we wanted to retrieve it in place?

There are three slightly different approaches you can use to get that version of a file. The first form directly checks out the named version and overwrites the existing version in your working directory.

    $ git checkout ecee9d9 date.c

Tip

If you want to get the version of a file from a commit and you don’t know its SHA1, but you do happen to know some text from its commit log message, you can use this searching technique to obtain it:

    $ git checkout :/"Fix PR-1705" main.c

The youngest commit found is used.

In two other very similar commands, Git accepts the form commit:path to name the desired file (i.e., path) as it existed at the time the commit happened, and writes the specified version of the file to be written to stdout. What you do with that output is up to you, though. You could pipe the output to other commands or create files:

    $ git show ecee9d9:date.c > date.c-oldest

Or:

    $ git cat-file -p 89967:date.c > date.c-first-change

The difference between these two forms is a bit esoteric. The former filters the output file through any applicable text conversion filters, whereas the latter is a more basic, plumbing command and does not. Differences might show up between these two commands when manipulating binaries, when textconv filters are set up, or possibly during some newline handling transformations. If you want the raw data, use the cat -p form. If you want the transformed version as it would be when checked out or added to the repository, use the show form.

These are exactly the same mechanisms you would use to obtain versions of a file as it appears in another branch:

    $ git checkout dev date.c 

    $ git show dev:date.c > date.c-dev

Or even earlier on the same branch:

    $ git checkout HEAD~2:date.c

Interactive Hunk Staging

Although a bit of an ominous moniker, interactive hunk staging is nevertheless an incredibly powerful tool that can be used to simplify and organize your development into concise and easily understood commits. If anyone has ever asked you to split your patch up or make single-concept patches, chances are good that this section is for you!

Unless you are a super coder, and both think and develop in concise patches, your day-to-day development probably resembles mine: a little scattered, perhaps over-extended, and likely containing several intertwined ideas all mixed up as they occurred to you. One coding thought leads to another and pretty soon you fixed the original bug, stumbled onto another (but fixed it!), and then added a new easy feature while you were there. Oh, and, you fixed those two typos as well.

And, if you, like I do, appreciate having someone review your changes to important code before you ask for it to be accepted upstream, chances are good that having all of those different, unrelated changes will not make for a logical presentation of a single patch. Indeed, some open source projects insist that submitted patches contain separate self-contained fixes. That is, a patch shouldn’t serve multiple purposes in one shot. Instead, each idea should stand alone and should be presentable as a well-defined, simple patch that is just large enough to do the job and nothing more. If more than one idea needs to be upstreamed, more than one patch, perhaps in a sequence, will be needed. Common wisdom suggests that these sorts of patches and patch sequences lead to very solid reviews, quick turnaround, and easy acceptance into the mainline upstream development.

So how do these perfect patch sequences come about? Although I strive for a development style that facilitates simple patches, I’m not always successful. Nevertheless, Git provides some tools to help formulate good patches. One of those tools is the ability to interactively select and commit pieces, or “hunks,” of a patch, leaving the rest to be committed in a later patch. Ultimately, you will want to create a new sequence of smaller commits that still sum up to your original work.

What Git won’t do for you is decide which conceptual pieces of a patch belong together and which do not. You have to be able to discern the meaning and grouping of hunks that make logical sense together. Sometimes those hunks are all in one file, but sometimes they are in multiple files. Collectively, all the conceptually related hunks must be selected and staged together as part of one commit.

Furthermore, you must ensure that your selection of hunks still meets any external requirements. For example, if you are writing source code that must be compiled, you will likely want to ensure that the code base continues to be compilable after each commit. Thus, you must ensure that your patch breakup, when reassembled in smaller parts, still compiles at each commit within the new sequence. Git can’t do that for you; that’s the part where you have to think. Sorry.

Staging hunks interactively is as easy as adding the -p option to the git add command!

    $ git add -p file.c

Interactive hunk staging looks pretty easy, and it is. But we should probably still have a mental model in mind of what Git is doing with our patches. Remember way back in Chapter 5, I explained how Git maintains the index as a staging area that accumulates your changes prior to committing them. That’s still happening. But instead of gathering the changes an entire file at a time, Git is picking apart the changes you have made in your working copy of a file, and allowing you to select which individual part or parts to stage in the index, waiting to be committed.

Let’s suppose we’re developing a program to print out a histogram of white-space–separated words found in a file. The very first version of this program is the “Hello, World!” program that proves things are starting out on the right compilation track. Here’s main.c:

    #include <stdio.h>

    int main(int argc, char **argv)
    {
        /*
         * Print a histogram of words found in a file.
         * "Words" are any whitespace separated characters.
         * Words are listed in no particular order.
         * FIXME: Implementation needed still!
         */
        printf("Histogram of words
");
    }

Add a Makefile and .gitignore, and put it all in a new repository:

    $ mkdir /tmp/histogram
    # cd /tmp/histogram
    $ git init
    Initialized empty Git repository in /tmp/histogram/.git/
    $ git add main.c Makefile .gitignore

    $ git commit -m "Initial histogram program."
    [master (root-commit) 42300e7] Initial histogram program.
     3 files changed, 18 insertions(+), 0 deletions(-)
     create mode 100644 .gitignore
     create mode 100644 Makefile
     create mode 100644 main.c

Let’s do some miscellaneous development until main.c looks like this:

    #include <stdio.h>
    #include <stdlib.h>

    struct htentry {
        char *item;
        int count;
        struct htentry *next;
    };

    struct htentry ht_table[256];

    void ht_init(void)
    {
        /* FIXME: details */
    }

    int main(int argc, char **argv)
    {
        FILE *f;

        f = fopen(argv[1], "r");
        if (f == 0)
            exit(-1);

        /*
         * Print a histogram of words found in a file.
         * "Words" are any whitespace separated characters.
         * Words are listed in no particular order.
         * FIXME: Implementation needed still!
         */
        printf("Histogram of words
");

        ht_init();
    }

Notice that this development effort has introduced two conceptually different changes: the hash table structure and storage, and the beginnings of the file reading operation. In a perfect world, these two concepts would be introduced into the program with two separate patches. It will take us a couple of steps to get there, but Git will help us split these changes properly.

Git, along with most of the Free World, considers a hunk to be any series of lines from a diff command that are delineated by a line that looks something like this:

    @@ -1,7 +1,27 @@

or this:

    @@ -9,4 +29,6 @@ int main(int argc, char **argv)

In this case, git diff shows two hunks:

    $ git diff
    diff --git a/main.c b/main.c
    index 9243ccf..b07f5dd 100644
    --- a/main.c
    +++ b/main.c
    @@ -1,7 +1,27 @@
     #include <stdio.h>
    +#include <stdlib.h>
    +
    +struct htentry {
    +       char *item;
    +       int count;
    +       struct htentry *next;
    +};
    +
    +struct htentry ht_table[256];
    +
    +void ht_init(void)
    +{
    +       /* FIXME: details */
    +}

     int main(int argc, char **argv)
     {
    +       FILE *f;
    +
    +       f = fopen(argv[1], "r");
    +       if (f == 0)
    +               exit(-1);
    +
        /*
         * Print a histogram of words found in a file.
         * "Words" are any whitespace separated characters.
    @@ -9,4 +29,6 @@ int main(int argc, char **argv)
         * FIXME: Implementation needed still!
         */
        printf("Histogram of words
");
    +
    +       ht_init();
     }

The first hunk starts with the line @@ -1,7 +1,27 @@ and finishes at the start of the second hunk: @@ -9,4 +29,6 @@ int main(int argc, char **argv).

When interactively staging hunks with git add -p, Git offers a choice for each hunk in turn: do you want to stage it?

But let’s look at our patch a bit more closely and consider the need to break up the pieces so that conceptually related parts are all gathered up and staged at the same time. That means we’d like to stage all the hash table parts together in one patch, and then stage all the file operations in a second patch. Unfortunately, it looks like the first hunk has both hash table and file operation pieces in one hunk! That means, for the purposes of the first commit (i.e., the hash table pieces), we want to both stage it and not stage it. Or more precisely, we want to stage part of the hunk. If Git only asks us about the first and second hunks, we are in trouble.

But, not to worry! The hunk staging will allow us to split a hunk. Any place where a contiguous sequence of added and deleted lines identified by a plus or minus in the first column is broken up by original context text, a split operation may be performed.

Let’s see how this works by starting with a git add -p main.c command:

    $ git add -p
    diff --git a/main.c b/main.c
    index 4809266..c60b800 100644
    --- a/main.c
    +++ b/main.c
    @@ -1,7 +1,27 @@
     #include <stdio.h>
    +#include <stdlib.h>
    +
    +struct htentry {
    +    char *item;
    +    int count;
    +    struct htentry *next;
    +};
    +
    +struct htentry ht_table[256];
    +
    +void ht_init(void)
    +{
    +    /* FIXME: details */
    +}

     int main(int argc, char **argv)
     {
    +    FILE *f;
    +
    +    f = fopen(argv[1], "r");
    +    if (f == 0)
    +        exit(-1);
    +
        /*
         * Print a histogram of words found in a file.
         * "Words" are any whitespace separated characters.
    Stage this hunk [y,n,q,a,d,/,j,J,g,s,e,?]?

After reviewing this hunk and seeing both the hash table and file operation related changes there, you realize you need to both stage and not stage this hunk. That is your clue to answer s, for split, to the question.

    Stage this hunk [y,n,q,a,d,/,j,J,g,s,e,?]? s
    Split into 2 hunks.
    @@ -1,4 +1,18 @@
     #include <stdio.h>
    +#include <stdlib.h>
    +
    +struct htentry {
    +    char *item;
    +    int count;
    +    struct htentry *next;
    +};
    +
    +struct htentry ht_table[256];
    +
    +void ht_init(void)
    +{
    +    /* FIXME: details */
    +}

     int main(int argc, char **argv)
     {
    Stage this hunk [y,n,q,a,d,/,j,J,g,e,?]?

Excellent. We want this hunk staged.

    Stage this hunk [y,n,q,a,d,/,j,J,g,s,e,?]? y

And immediately next up:

    @@ -2,6 +16,12 @@

     int main(int argc, char **argv)
     {
    +    FILE *f;
    +
    +    f = fopen(argv[1], "r");
    +    if (f == 0)
    +        exit(-1);
    +
        /*
         * Print a histogram of words found in a file.
         * "Words" are any whitespace separated characters.
    Stage this hunk [y,n,q,a,d,/,K,j,J,g,e,?]?

But not that one.

    Stage this hunk [y,n,q,a,d,/,j,J,g,s,e,?]? n

And finally, Git offers to stage the last hunk. We want it, too.

    @@ -9,4 +29,6 @@ int main(int argc, char **argv)
         * FIXME: Implementation needed still!
         */
        printf("Histogram of words
");
    +
    +    ht_init();
     }
    Stage this hunk [y,n,q,a,d,/,j,J,g,s,e,?]? y

Let’s review. Originally, there were two hunks. But we wanted only part of the first hunk and all of the second. So when Git offered us the first hunk we had to split it into two subhunks. We then staged the first subhunk, and not the second subhunk. We then staged the entire original second hunk.

Verifying that the staged pieces look correct is easy:

    $ git diff --staged
    diff --git a/main.c b/main.c
    index 4809266..8a95bb0 100644
    --- a/main.c
    +++ b/main.c
    @@ -1,4 +1,18 @@
     #include <stdio.h>
    +#include <stdlib.h>
    +
    +struct htentry {
    +       char *item;
    +       int count;
    +       struct htentry *next;
    +};
    +
    +struct htentry ht_table[256];
    +
    +void ht_init(void)
    +{
    +       /* FIXME: details */
    +}

     int main(int argc, char **argv)
     {
    @@ -9,4 +23,6 @@ int main(int argc, char **argv)
         * FIXME: Implementation needed still!
         */
        printf("Histogram of words
");
    +
    +       ht_init();
     }

That looks good, so you can go ahead and commit it. Don’t worry that there are lingering differences remaining in the file main.c. That’s by design because it is the next patch! Oh, and don’t use the filename with this next git commit command because that would use the entire file and not the just the staged parts.

    $ git commit -m "Introduce a Hash Table."
    [master 66a212c] Introduce a Hash Table.
     1 files changed, 16 insertions(+), 0 deletions(-)

    $ git diff
    diff --git a/main.c b/main.c
    index 8a95bb0..c60b800 100644
    --- a/main.c
    +++ b/main.c
    @@ -16,6 +16,12 @@ void ht_init(void)

     int main(int argc, char **argv)
     {
    +       FILE *f;
    +
    +       f = fopen(argv[1], "r");
    +       if (f == 0)
    +               exit(-1);
    +
        /*
         * Print a histogram of words found in a file.
         * "Words" are any whitespace separated characters.

And with that, just add and commit the remaining change because it is the total material for the file operations patch.

    $ git add main.c
    $ git commit -m "Open the word source file."
    [master e649d27] Open the word source file.
     1 files changed, 6 insertions(+), 0 deletions(-)

A glance at the commit history shows two new commits:

    $ git log --graph --oneline
    * e649d27 Open the word source file.
    * 66a212c Introduce a Hash Table.
    * 3ba81f7 Initial histogram program.

And that is a happy patch sequence!

As usual, there are a few caveats and extenuating circumstances. For instance, what about that sneaky line:

    #include <stdlib.h>

Doesn’t it really belong with the file operation patch and not the hash table patch? Yep. You got me. It does.

That’s a bit trickier to handle. But let’s do it anyway. We’ll have to use the e option. First, reset to the first commit and leave all those changes in your working tree so we can do it all over again.

    $ git reset 3ba81f7
    Unstaged changes after reset:
    M    main.c

Do the git add -p again, and split the first patch just like before. But this time, instead of answering y to the first subhunk staging request, answer e and request to edit the patch:

    $ git add -p
    diff --git a/main.c b/main.c
    index 4809266..c60b800 100644
    --- a/main.c
    +++ b/main.c
    @@ -1,7 +1,27 @@
     #include <stdio.h>
    +#include <stdlib.h>
    +
    +struct htentry {
    +    char *item;
    +    int count;
    +    struct htentry *next;
    +};
    +
    +struct htentry ht_table[256];
    +
    +void ht_init(void)
    +{
    +    /* FIXME: details */
    +}

     int main(int argc, char **argv)
     {
    +    FILE *f;
    +
    +    f = fopen(argv[1], "r");
    +    if (f == 0)
    +        exit(-1);
    +
        /*
         * Print a histogram of words found in a file.
         * "Words" are any whitespace separated characters.
    Stage this hunk [y,n,q,a,d,/,j,J,g,s,e,?]? s
    Split into 2 hunks.
    @@ -1,4 +1,18 @@
     #include <stdio.h>
    +#include <stdlib.h>
    +
    +struct htentry {
    +    char *item;
    +    int count;
    +    struct htentry *next;
    +};
    +
    +struct htentry ht_table[256];
    +
    +void ht_init(void)
    +{
    +    /* FIXME: details */
    +}

     int main(int argc, char **argv)
     {
    Stage this hunk [y,n,q,a,d,/,j,J,g,e,?]? e

You will be placed in your favorite editor^[43] and allowed the chance to manually edit the patch. Read the comment at the bottom of the editor buffer. Carefully delete that one #include <stdlib.h> line. Don’t disturb the context lines, and don’t mess with the line counts. Git, and most any patch program, will lose its mind if you mess with the context lines. However, my editor updates the line counts automatically.

In this case, because the #include line was removed, it will be swept up in the remainder of the patches that get formed. This effectively introduces it at the correct time in the patch with the other file operation changes.

It is kind of tricky here, but Git now assumes that when you exit your editor, the patch that is left in your editor should be applied and its effects staged. So it offers you the following hunk and lets you choose its disposition. Be careful.

Because Git has moved on to the file operation changes, don’t stage those changes yet, but do pick up the last hash table change:

    @@ -2,6 +16,12 @@

     int main(int argc, char **argv)
     {
    +    FILE *f;
    +
    +    f = fopen(argv[1], "r");
    +    if (f == 0)
    +        exit(-1);
    +
        /*
         * Print a histogram of words found in a file.
         * "Words" are any whitespace separated characters.
    Stage this hunk [y,n,q,a,d,/,K,j,J,g,e,?]? n
    @@ -9,4 +29,6 @@ int main(int argc, char **argv)
         * FIXME: Implementation needed still!
         */
        printf("Histogram of words
");
    +
    +    ht_init();
     }
    Stage this hunk [y,n,q,a,d,/,K,g,e,?]? y

The separation can be verified, noting that the #include <stdlib.h> line has been correctly associated with the file operations now:

    $ git diff
    diff --git a/main.c b/main.c
    index 3e77315..c60b800 100644
    --- a/main.c
    +++ b/main.c
    @@ -1,4 +1,5 @@
     #include <stdio.h>
    +#include <stdlib.h>

     struct htentry {
        char *item;
    @@ -15,6 +16,12 @@ void ht_init(void)

     int main(int argc, char **argv)
     {
    +       FILE *f;
    +
    +       f = fopen(argv[1], "r");
    +       if (f == 0)
    +               exit(-1);
    +
        /*
         * Print a histogram of words found in a file.
         * "Words" are any whitespace separated characters.

As before, wrap up with a git commit for the hash table patch, then stage and commit the remaining file operation pieces.

I’ve only touched on the essential responses to the “Stage this hunk?” question. In fact, even more options than those listed in its prompt (i.e., [y,n,q,a,d,/,K,g,e,?]) are available. There are options to delay the fate of a hunk and then revisit it when prompted again later.

Furthermore, although this example only had two hunks in one file, the staging operation generalizes too many hunks, possibly split, in many files. Pulling together changes across multiple files can be a simple process of applying git add -p to each file that has a hunk needing to be staged.

However, there is another, outer level to the whole interactive hunk staging process that can be invoked using the git add -i command. It can be a bit cryptic, but its purpose is to allow you to select which paths (i.e., files) to stage in the index. As a sub-option, you may then select the patch option for your chosen paths. This enters the previously described per file staging mechanism.

Recovering a Lost Commit

Occasionally, an ill-timed git reset command or an accidental branch deletion leaves you wishing you hadn’t lost the development it represented, and wishing you could recover it somehow. The usual approach to recovering such work is to inspect your reflog as shown in Chapter 11. Sometimes the reflog isn’t available, perhaps because it has been turned off (e.g., core.logAllRefUpdates = false), because you are manipulating a bare repository directly, or perhaps because the reflog has simply expired. For whatever reason, sometimes the reflog cannot help recover a lost commit.

The git fsck Command

Although not foolproof, Git provides the command git fsck to help locate lost data. The word “fsck” is an old abbreviation for “file system check.” Although this command does not check your filesystem, it does have many characteristics and algorithms that are quite similar to a traditional filesystem check, and results in some of the same output data as well.

Understanding how git fsck works is predicated on a good understanding of Git’s data structures as described in Chapter 4. Normally, every object in the Git repository, whether it is a blob, tree, commit, or tag, is connected to another object and anchored to a branch name, tag name, or some other symbolic ref such as a reflog name.

However, various commands and manipulations can leave objects in the object store that are not linked into the complete data structure somehow. These objects are called “unreachable” or “dangling.” They are unreachable because a traversal of the full data structure that starts from every named ref and follows every tag, commit, commit parent, and tree object reference will never encounter the lost object. In a sense, it is out there dangling on its own.

But traversing the ref-based commit graph isn’t the only way to walk every object in the database! Consider simply listing the objects in your object store using ls directly:

    $ cd path/to/some/repo
    $ ls -R .git/objects/
    .git/objects/:
    25  3b  73  82  info  pack

    .git/objects/25:
    7cc5642cb1a054f08cc83f2d943e56fd3ebe99

    .git/objects/3b:
    d1f0e29744a1f32b08d5650e62e2e62afb177c

    .git/objects/73:
    8d05ac5663972e2dcf4b473e04b3d1f19ba674

    .git/objects/82:
    b5fee28277349b6d46beff5fdf6a7152347ba0

    .git/objects/info:

    .git/objects/pack:

In this simple example, the set of objects in the repository has been listed without doing a traversal of the refs and commits.

By carefully comparing the total set of objects with those reachable via a traversal of the ref-based commit graph, you can determine all of the unreferenced objects. From the previous example, the second object listed turns out to be an unreferenced blob (i.e., file):

    $ git fsck
    Checking object directories: 100% (256/256), done.
    dangling blob 3bd1f0e29744a1f32b08d5650e62e2e62afb177c

Let’s follow an example that shows how a lost commit can occur, and see how git fsck can recover it. First, construct a simple, new repository with a single simple file in it.

    $ mkdir /tmp/lost
    $ cd /tmp/lost
    $ git init
    Initialized empty Git repository in /tmp/lost/.git/
    $ echo "foo" >> file
    $ git add file
    $ git commit -m "Add some foo"
    [master (root-commit) 1adf46e] Add some foo
     1 files changed, 1 insertions(+), 0 deletions(-)
     create mode 100644 file

    $ git fsck
    Checking object directories: 100% (256/256), done.

    $ ls -R .git/objects/
    .git/objects/:
    25  4a  f8  info  pack

    .git/objects/25:
    7cc5642cb1a054f08cc83f2d943e56fd3ebe99

    .git/objects/4a:
    1c03029e7407c0afe9fc0320b3258e188b115e

    .git/objects/f8:
    5b097ee0f77c5f4dc1868037acbffe59b0e93e

    .git/objects/info:

    .git/objects/pack:

Notice that there are only three objects and none of them are dangling. In fact, starting from the master ref, which is the f85b097ee commit object, the traversal points to the tree object 4a1c0302 and then the blob 257cc564.

Tip

The command git cat-file -t object-id can be used to determine an object’s type.

Now let’s make a second commit, and then hard reset back to the first commit:

    $ echo bar >> file
    $ git commit -m "Add some bar" file
    [master 11e0dc9] Add some bar
     1 files changed, 1 insertions(+), 0 deletions(-)

And now the “accident” that causes us to lose a commit:

    $ git reset --hard HEAD^
    HEAD is now at f85b097 Add some foo

    $ git fsck
    Checking object directories: 100% (256/256), done.

But wait! git fsck doesn’t report any dangling object. It doesn’t seem to be lost after all. This is exactly what the reflog is designed to do: prevent you from accidentally losing commits. (See The Reflog.)

So let’s try again after brutally eliminating the reflog:

    # Not recommended; this is for purposes of exposition only!
    $ rm -rf .git/logs
    $ git fsck
    Checking object directories: 100% (256/256), done.
    dangling commit 11e0dc9c11d8f650711b48c4a5707edf5c8a02fe

    $ ls -R .git/objects/
    .git/objects/:
    11  25  3b  41  4a  f8  info  pack

    .git/objects/11:
    e0dc9c11d8f650711b48c4a5707edf5c8a02fe

    .git/objects/25:
    7cc5642cb1a054f08cc83f2d943e56fd3ebe99

    .git/objects/3b:
    d1f0e29744a1f32b08d5650e62e2e62afb177c

    .git/objects/41:
    31fe4d33cd85da805ac9a6697c2251c913881c

    .git/objects/4a:
    1c03029e7407c0afe9fc0320b3258e188b115e

    .git/objects/f8:
    5b097ee0f77c5f4dc1868037acbffe59b0e93e

    .git/objects/info:

    .git/objects/pack:

Tip

You can use the git fsck --no-reflog command to find dangling objects as if the reflog were not available to reference commits. That is, objects that are only reachable from the reflog will be considered unreachable.

Now we can see that only the reflog was referencing the second commit 11e0dc9c in which the “bar” content was added.

But how would we even know what that dangling commit is?

    $ git show 11e0dc9c
    commit 11e0dc9c11d8f650711b48c4a5707edf5c8a02fe
    Author: Jon Loeliger <[email protected]>
    Date:   Sun Feb 10 11:59:59 2012 -0600

    Add some bar

    diff --git a/file b/file
    index 257cc56..3bd1f0e 100644
    --- a/file
    +++ b/file
    @@ -1 +1,2 @@
     foo
    +bar

    # The "index" line above named blob 3bd1f0e

    $ git show 3bd1f0e
    foo
    bar

Note that the blob 3bd1f0e is not considered dangling because it is actually referenced by the commit 11e0dc9c, even though the commit itself is unreferenced.

Sometimes, though, git fsck will find blobs that are unreferenced. Remember, every time you git add a file to the index, its blob is added to the object store. If you subsequently change that content and re-add it, no commit will have captured the intermediate blob that was added to the object store. Thus, it will be unreferenced.

    $ echo baz >> file
    $ git add file
    $ git fsck
    Checking object directories: 100% (256/256), done.
    dangling commit 11e0dc9c11d8f650711b48c4a5707edf5c8a02fe

    $ echo quux >> file
    $ git add file
    $ git fsck
    Checking object directories: 100% (256/256), done.
    dangling blob 0c071e1d07528f124e31f1b6c71348ec13f21a7a
    dangling commit 11e0dc9c11d8f650711b48c4a5707edf5c8a02fe

The reason the first git fsck didn’t show a dangling blob was because that blob was still referenced directly by the index. Only after the content associated with the pathname file was changed again and re-added did that blob become dangling.

    $ git show 0c071e1d
    foo
    baz

If you find you have a very cluttered git fsck report consisting entirely of unnecessary blobs and commits and want to clean it up, consider running garbage collection as described in Garbage Collection.

Reconnecting a Lost Commit

Although using git fsck is a handy way to discover the SHA1 of lost commits and blobs, I mentioned the reflog earlier as another mechanism. In fact, you could cut and paste it from some lingering line of output found by scrolling back over your terminal output log. Ultimately, it doesn’t matter how you discover the SHA1 of a lost blob or commit. The question remains, once you know it, how do you reconnect it or otherwise incorporate it into your project?

By definition, blobs are nameless file content. All you really have to do to reestablish a blob is place that content into a file and git add it again. As I showed in the previous section, git show can be used on the blob SHA1 to obtain the full object content. Just redirect that to your desired file:

    $ git show 0c071e1d > file2

On the other hand, reconnecting a commit might depend on what you want to do with it. The simple example from the previous section is only one commit. But it could equally well have been the first commit in an entire sequence of commits that was lost. Maybe even an entire branch was accidentally lost! Consequently, a usual practice would reintroduce a lost commit as a branch.

Here, the previously lost commit that introduced the bar content, 11e0dc9c, is re-introduced on the new branch called recovered:

    $ git branch recovered 11e0dc9c
    $ git show-branch
    * [master] Add some foo
     ! [recovered] Add some bar
    --
     + [recovered] Add some bar
    *+ [master] Add some foo

From there it can manipulated (kept as is, merged, etc.) as you wish.

^[40]François-Marie Arouet, of course!

^[41]Due to the scripting context for each filter, it’s likely to stay that way, too.

^[42]But also see the section called “Checklist for Shrinking a Repository” from the git-filter-branch manual page.

^[43]emacs, right?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.