Finding Commits

Part of a good revision control system is the support it provides for “archaeology” and investigating a repository. Git provides several mechanisms to help you locate commits that meet certain criteria within your repository.

Using git bisect

The git bisect command is a powerful tool for isolating a particular, faulty commit based on essentially arbitrary search criteria. git bisect is well suited to those times when you discover that something wrong or bad is affecting your repository and you know the code used to be fine. For example, let’s say you are working on the Linux kernel and a test boot fails, but you’re positive the boot worked sometime earlier—perhaps last week or at a previous release tag. In this case, your repository has transitioned from a known good state to a known bad state.

But when? Which commit caused it to break? That is precisely the question git bisect is designed to help you answer.

The only real search requirement is that, given a checked-out state of your repository, you are able to determine if it does or does not meet your search requirement. In this case, you have to be able to answer the question, Does the version of the kernel checked out build and boot? You also have to know a good and a bad version or commit before starting so that the search will be bounded.

git bisect is often used to isolate a particular commit that introduced some regression or bug into the repository. For example, if you were working on the Linux kernel, git bisect could help you find issues and bugs such as fails to compile, failure to boot, boots but can’t perform some task, or no longer has a desired performance characteristic. In all of these cases, git bisect can help you isolate and determine the exact commit that caused the problem.

The git bisect command systematically chooses a new commit in an ever-decreasing range bounded by good behavior at one end and bad behavior at the other. Eventually, the narrowing range will pinpoint the one commit that introduced the faulty behavior.

There is no need for you to do anything more than provide an initial good and bad commit and then repeatedly answer the question, Does this version work?

To start, you first need to identify a good commit and a bad commit. In practice, the bad version is often your current HEAD, as that is where you are working when you suddenly notice something wrong or are assigned a bug to fix.

Finding an initial good version can be a bit difficult, since it’s usually buried in your history somewhere. You can probably name or guess some version back in the history of the repository that you know works correctly. This may be a tagged release like v2.6.25 or some commit 100 revisions ago, master~100, on your master branch. Ideally it is close to your bad commit (master~25 is better than master~100) and not buried too far in the past. In any event, you need to know or be able to verify that it is, in fact, a good commit.

It is essential that you start the git bisect process from a clean working directory. The process necessarily adjusts your working directory to contain various different versions of your repository. Starting with a dirty work space is asking for trouble; your working directory edits could easily be lost.

Using a clone of the Linux kernel in our example, let’s tell Git to begin a search:

$ cd linux-2.6
$ git bisect start

After initiating a bisection search, Git enters a bisect mode, setting up some state information for itself. Git employs a detached HEAD to manage the current checked-out version of the repository. This detached HEAD is essentially an anonymous branch that can be used to bounce around within the repository and point to different revisions as needed.

Once started, tell Git which commit is bad. Again, since this is typically your current version, you can simply default the revision to your current HEAD:[15]

# Tell git the HEAD version is broken
$ git bisect bad

Similarly, tell Git which version works:

$ git bisect good v2.6.27
Bisecting: 3857 revisions left to test after this
[cf2fa66055d718ae13e62451bb546505f63906a2] Merge branch 'for_linus'
    of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6

Identifying a good and bad version delineates a range of commits over which a good to bad transition occurs. At each step along the way, Git will tell you how many revisions are in that range. Git also modifies your working directory by checking out a revision that is roughly midway between the good and bad endpoints. It is now up to you to now answer the question, Is this version good or bad? Each time you answer the question, Git narrows the search space in half, identifies a new revision, checks it out, and repeats the Good or bad? question.

Suppose this version is good:

$ git bisect good
Bisecting: 1939 revisions left to test after this
[2be508d847392e431759e370d21cea9412848758] Merge git://git.infradead.org/mtd-2.6

Notice that 3857 revisions have been narrowed down to 1939. Let’s do a few more:

$ git bisect good
Bisecting: 939 revisions left to test after this
[b80de369aa5c7c8ce7ff7a691e86e1dcc89accc6] 8250: Add more OxSemi devices

$ git bisect bad
Bisecting: 508 revisions left to test after this
[9301975ec251bab1ad7cfcb84a688b26187e4e4a] Merge branch 'genirq-v28-for-linus'
    of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

In a perfect bisection run, it takes log2 of the original number of revision steps to narrow down to just one commit.

After another good and bad answer:

$ git bisect good
Bisecting: 220 revisions left to test after this
[7cf5244ce4a0ab3f043f2e9593e07516b0df5715] mfd: check for
    platform_get_irq() return value in sm501

$ git bisect bad
Bisecting: 104 revisions left to test after this
[e4c2ce82ca2710e17cb4df8eb2b249fa2eb5af30] ring_buffer: allocate
    buffer page pointer

Throughout the bisection process, Git maintains a log of your answers along with their commit IDs:

$ git bisect log
git bisect start
# bad: [49fdf6785fd660e18a1eb4588928f47e9fa29a9a] Merge branch
    'for-linus' of git://git.kernel.dk/linux-2.6-block
git bisect bad 49fdf6785fd660e18a1eb4588928f47e9fa29a9a
# good: [3fa8749e584b55f1180411ab1b51117190bac1e5] Linux 2.6.27
git bisect good 3fa8749e584b55f1180411ab1b51117190bac1e5
# good: [cf2fa66055d718ae13e62451bb546505f63906a2] Merge branch 'for_linus'
    of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6
git bisect good cf2fa66055d718ae13e62451bb546505f63906a2
# good: [2be508d847392e431759e370d21cea9412848758] Merge
    git://git.infradead.org/mtd-2.6
git bisect good 2be508d847392e431759e370d21cea9412848758
# bad: [b80de369aa5c7c8ce7ff7a691e86e1dcc89accc6] 8250: Add more
    OxSemi devices
git bisect bad b80de369aa5c7c8ce7ff7a691e86e1dcc89accc6
# good: [9301975ec251bab1ad7cfcb84a688b26187e4e4a] Merge branch
    'genirq-v28-for-linus' of
    git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
git bisect good 9301975ec251bab1ad7cfcb84a688b26187e4e4a
# bad: [7cf5244ce4a0ab3f043f2e9593e07516b0df5715] mfd: check for
    platform_get_irq() return value in sm501
git bisect bad 7cf5244ce4a0ab3f043f2e9593e07516b0df5715

If you get lost during the process or if you just want to start over for any reason, type the git bisect replay command using the log file as input. If needed, this is an excellent mechanism to back up one step in the process and explore a different path.

Let’s narrow down the defect with five more bad answers:

$ git bisect bad
Bisecting: 51 revisions left to test after this
[d3ee6d992821f471193a7ee7a00af9ebb4bf5d01] ftrace: make it
    depend on DEBUG_KERNEL

$ git bisect bad
Bisecting: 25 revisions left to test after this
[3f5a54e371ca20b119b73704f6c01b71295c1714] ftrace: dump out
    ftrace buffers to console on panic

$ git bisect bad
Bisecting: 12 revisions left to test after this
[8da3821ba5634497da63d58a69e24a97697c4a2b] ftrace: create
    _mcount_loc section

$ git bisect bad
Bisecting: 6 revisions left to test after this
[fa340d9c050e78fb21a142b617304214ae5e0c2d] tracing: disable
    tracepoints by default

$ git bisect bad
Bisecting: 2 revisions left to test after this
[4a0897526bbc5c6ac0df80b16b8c60339e717ae2] tracing: tracepoints, samples

You may use the git bisect visualize to visually inspect the set of commits still within the range of consideration. Git uses the graphical tool gitk if the DISPLAY environment variable is set. If not, Git will use git log instead. In that case, --pretty=oneline might be useful, too.

$ git bisect visualize --pretty=oneline

fa340d9c050e78fb21a142b617304214ae5e0c2d tracing: disable tracepoints by default
b07c3f193a8074aa4afe43cfa8ae38ec4c7ccfa9 ftrace: port to tracepoints
0a16b6075843325dc402edf80c1662838b929aff tracing, sched: LTTng
    instrumentation - scheduler
4a0897526bbc5c6ac0df80b16b8c60339e717ae2 tracing: tracepoints, samples
24b8d831d56aac7907752d22d2aba5d8127db6f6 tracing: tracepoints, documentation
97e1c18e8d17bd87e1e383b2e9d9fc740332c8e2 tracing: Kernel Tracepoints

The current revision under consideration is roughly in the middle of the range:

$ git bisect good
Bisecting: 1 revisions left to test after this
[b07c3f193a8074aa4afe43cfa8ae38ec4c7ccfa9] ftrace: port to tracepoints

When you finally test the last revision and Git has isolated the one revision that introduced the problem,[16] it’s displayed:

$ git bisect good
fa340d9c050e78fb21a142b617304214ae5e0c2d is first bad commit
commit fa340d9c050e78fb21a142b617304214ae5e0c2d
Author: Ingo Molnar <[email protected]>
Date:   Wed Jul 23 13:38:00 2008 +0200

    tracing: disable tracepoints by default

    while it's arguably low overhead, we don't enable new features by default.

    Signed-off-by: Ingo Molnar <[email protected]>

:040000 040000 4bf5c05869a67e184670315c181d76605c973931
    fd15e1c4adbd37b819299a9f0d4a6ff589721f6c M  init

Finally, when your bisection run is complete and you are finished with the bisection log and the saved state, it is vital that you tell Git that you have finished. As you may recall, the whole bisection process is performed on a detached HEAD:

$ git branch
* (no branch)
  master

$ git bisect reset
Switched to branch "master"

$ git branch
* master

Running git bisect reset places you back on your original branch.

Using git blame

Another tool you can use to help identify a particular commit is git blame. This command tells you who last modified each line of a file and which commit made the change:

$ git blame -L 35, init/version.c

4865ecf1 (Serge E. Hallyn 2006-10-02 02:18:14 -0700 35)         },
^1da177e (Linus Torvalds  2005-04-16 15:20:36 -0700 36) };
4865ecf1 (Serge E. Hallyn 2006-10-02 02:18:14 -0700 37) EXPORT_SYMBOL_GPL(init_uts_ns);
3eb3c740 (Roman Zippel    2007-01-10 14:45:28 +0100 38)
c71551ad (Linus Torvalds  2007-01-11 18:18:04 -0800 39) /* FIXED STRINGS! Don't touch! */
c71551ad (Linus Torvalds  2007-01-11 18:18:04 -0800 40) const char linux_banner[] =
3eb3c740 (Roman Zippel    2007-01-10 14:45:28 +0100 41)         "Linux version " 
UTS_RELEASE " (" LINUX_COMPILE_BY "@"
3eb3c740 (Roman Zippel    2007-01-10 14:45:28 +0100 42)         
LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION "
";
3eb3c740 (Roman Zippel    2007-01-10 14:45:28 +0100 43)
3eb3c740 (Roman Zippel    2007-01-10 14:45:28 +0100 44) const char linux_proc_banner[] =
3eb3c740 (Roman Zippel    2007-01-10 14:45:28 +0100 45)         "%s version %s"
3eb3c740 (Roman Zippel    2007-01-10 14:45:28 +0100 46)        
 " (" LINUX_COMPILE_BY "@" LINUX_COMPILE_HOST ")"
3eb3c740 (Roman Zippel    2007-01-10 14:45:28 +0100 47)        
 " (" LINUX_COMPILER ") %s
";

Using Pickaxe

Whereas git blame tells you about the current state of a file, git log -Sstring searches back through the history of a file’s diffs for the given string. By searching the actual diffs between revisions, this command can find commits that perform a change in both additions and deletions.

$ git log -Sinclude --pretty=oneline --abbrev-commit init/version.c
cd354f1... [PATCH] remove many unneeded #includes of sched.h
4865ecf... [PATCH] namespaces: utsname: implement utsname namespaces
63104ee... kbuild: introduce utsrelease.h
1da177e... Linux-2.6.12-rc2

Each of the commits listed on the left (cd354f1…et al.) will either add or delete lines that contain the word include. Be careful, though. If a commit both adds and subtracts exactly the same number of instances of lines with your key phrase, that won’t be shown. The commit must have a change in the number of additions and deletions in order to count.

The -S option to git log is called pickaxe. That’s brute force archaeology for you.



[15] For the curious reader who would like to duplicate this example, here HEAD is commit 49fdf6785fd660e18a1eb4588928f47e9fa29a9a.

[16] No, this commit did not necessarily introduce a problem. The good and bad answers were fabricated and landed here.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.9.75