Git bisect is a git command that makes it easier to track down where a problem was introduced to a codebase.
In large projects you may find that a change was added to the code that causes a problem and you then need to track down where that problem occurred. Knowing where the problem was introduced makes debugging the issue a lot easier.
You could just checkout commits until you find the culprit, but git comes with the bisect tool that can assist in this process and can even be automated to quickly find the problem.
Before jumping into git bisect, let's look at the bisect algorithm, which is how git approaches the task of finding the relevant commit.
The Bisect Algorithm
The bisect or binary search algorithm is a way in which an array of items can be searched to find the correct one. This is much more efficient than going through the list in order to find the relevant item.
Let's take a list of 10 items, numbered 1 to 10.
1 2 3 4 5 6 7 8 9 10
If we want to find number 8 using a linear search we would need to go through the numbers one at a time until we find the right one. This means a minimum of 8 comparisons to find the correct one.
With a binary search we instead pick a midpoint in the list and see if our intended number is higher or lower.
1 2 3 4 5 6 7 8 9 10
^
We know that the selected item is lower than the item we are looking for, so we select a point between that point and the maximum value, which lands us on the number 8. This means we have reached our endpoint in just 2 steps, although it could have been 3 if we decided to pick 7 (it depends on how you elect to round the values).
1 2 3 4 5 6 7 8 9 10
^
This mechanism is just how git bisect works, but instead of higher or lower we are instead designating good or bad for that particular commit.
Let's look at how to run git bisect and how the bisect algorithm applies.
Running Git Bisect
The first thing we need to do is run the "git bisect start" command; this tells git that we want to run a bisect and puts the repository into bisect mode.
git bisect start
Git is now waiting for us to mark a commit as good or bad. We could use either, but since we know that the current state of the code repository is "bad" the easiest thing to do is start out with the bad commit.
So, let's mark the current commit as "bad".
git bisect bad
You could also run this command with the bad commit nominated, either through a git reference (e.g HEAD) or a sha of a commit, or even a tag. If you are currently looking at the head of the repository, and you know that this is the bad commit, then the above command is functional equivalent to running the following.
git bisect bad HEAD
Next, we then need to tell git when our last known "good" commit was; this again can be in the form of a git reference, a sha of a commit, or a tag. Using the tag can generally be a good starting point since that should point to a working version.
git bisect good <sha|tag>
Tracking down the last known "good" commit can be a little bit tricky, but the point of git bisect is that don't know when the problem was introduced, just when you were sure the problem didn't exist. Whilst in git bisect mode you can still navigate around the repository using the usual "git log" and "git checkout" commands but it's probably a good idea to find a good commit before you enter into bisect.
Once you have set the good and bad commits for your bisect command it will then get to work. It will checkout a commit in the mid point between the good and bad commit. This is where git starts using the bisect algorithm to allow you to find out where the problem was introduced.
What we have essentially done is give git bisect the upper and lower bands of the search parameters, which it will then use to search through the available commits to find where the problem was introduced.
After giving these bands, you'll see a message that looks a little like this.
Bisecting: 3 revisions left to test after this (roughly 2 steps)
[57abd582330996e929df7a7b4be2e17e3e6b2705] Some commit message.
This shows us the currently checked out commit sha and commit message. It is telling us that there are about 2 steps left in the bisection algorithm, which means that we just need to label the commit as good or bad just a couple more times to get our result.
If this commit does not contain the problem then you can say it's good.
git bisect good
Git will then run the bisection algorithm and pick a point that lies between this good point and the bad point we stipulated at the start.
If the new commit does contain the problem then you label this as a bad commit.
git bisect bad
If we go this, git will run the bisection algorithm and pick a point that lies between this bad point and the good point we stipulated at the start.
Git will then check out a new commit and you repeat the process until the bisect algorithm narrows down the search and finds the first instance where the issue was introduced.
Eventually, git will produce a message that looks a little like this, showing you the first time that the problem was introduced to the codebase.
f88f003da43d691c12c548290cf2d0e03e8e2ad3 is the first bad commit
commit f88f003da43d691c12c548290cf2d0e03e8e2ad3
Author: Phil Norton <[email protected]>
Date: Fri Mar 29 22:15:39 2024 +0000
Adding this amazing feature that totally won't break anything.
index.php | 1 +
1 file changed, 1 insertion(+)
This shows clearly that the index.php file was changed to introduce a feature that also caused a problem.
Once you have your information about where the bad change was introduced you can then reset everything back to normal by running reset.
git bisect reset
With the commit sha in hand you can then figure out why the change was introduced and go about fixing it. Or, assigning the issue to the developer who broke it in the first place.
Automating The Test
If manually inspecting lots of different commits seems like a laborious process then the good news is that you can easily automate things.
Git bisect comes with the ability to run a command that you can use to test the code in some way to automatically detect "good" or "bad" commits. You can either use a stand alone command or run something within your project and the outcome should either be an exit code of 0 for good commit, or between 1 and 127 (except 125) for a bad commit. A good unit testing system should return a non-zero exit code if a test failed, but you can also just rely on an error being produced by the script.
Let's look at a simple example that uses a script to detect if a variable called "variableName" was added to the index.php file.
Here's the script we are going to use. It's pretty simple and just uses grep to look at the contents of the index.php file for the variable and will return an exit code of 1 if this is found.
#!/bin/sh
if grep -q variableName index.php; then
exit 1
fi
exit 0
Whilst the script doesn't return any output, we can look at the exit code by using "echo $?". Note, you need to make the script executable before it can be run like this.
$ ./test.sh
$ echo $?
1
We then start git bisect in the normal way, stipulating the starting good and bad commits, in this case we are selecting the HEAD as the bad commit and the tag "1.0" as our last known good commit.
git bisect start
git bisect bad
git bisect good 1.0
Then, we can use "git bisect run" to run the script and automatically decide on the good or bad commit.
git bisect run ./test.sh
This will then checkout a commit and run the test to decide if the commit is good or bad based on the outcome of the test. As this is a simple test the bisect command runs very quickly and a result is almost instantly returned.
The challenge with this approach is that you need to be able to run a test or build process that might be within your codebase. As git bisect essentially checks out the commit to allow you to inspect it you can't change the test and run git bisect as it would cause an overwrite error. Also, if you are running tests within your project and that test failed for another reason then this would create a false "bad" commit that would throw off your automation. This process tends to work better if you create a small script (like I have above) to test the code from an external perspective.
An Example
Let's look at a concrete example of git bisect running on a real git repo.
The following script will setup a git repository and add some text to a file, committing the change each time. The first commit is tagged so that we can easily reference it without having to use a sha value.
#!/bin/sh
# Initialise git repo
git init
# Add a file with some content and commit it to the repo.
echo "Text" > file.txt
git add file.txt
git commit -m "Initial commit."
# Tag this commit.
git tag -a "1.0" -m "1.0"
# Run a couple of commits.
echo "More text" >> file.txt
git add file.txt
git commit -m "Commit 2"
echo "More text" >> file.txt
git add file.txt
git commit -m "Commit 3"
echo "More text" >> file.txt
git add file.txt
git commit -m "Commit 4"
# Create our "bad" commit.
echo "A bad commit oooooooooooooooooo" >> file.txt
git add file.txt
git commit -m "Commit 5"
# Continue with some commits.
echo "Even more text" >> file.txt
git add file.txt
git commit -m "Commit 6"
echo "Some text" >> file.txt
git add file.txt
git commit -m "Commit 7"
echo "Some text" >> file.txt
git add file.txt
git commit -m "Commit 8"
One of these commits (commit number 5) has added the string "A bad commit oooooooooooooooooo" to the file, but more commits have been added in front of this so we now have a file with a problem in it and a git history to inspect.
Here is a representation of the history of this repository, with the tag "1.0" being present on the left hand side (in commit 1) and the HEAD being present on the right hand side. The issue is somewhere between these two points.
1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7 -> 8 -> HEAD
To find the problem, we setup git bisect with HEAD being the known bad commit and the "1.0" tag being the last known good commit
git bisect start
git bisect bad
git bisect good 1.0
Once we run the last command here we see the following on the command line.
Bisecting: 3 revisions left to test after this (roughly 2 steps)
[e2d8209b5ea2e7d1ac1728a1a44ecc12d0b7c3fb] Commit 4
In the git history the pointer is now located at commit 4.
1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7 -> 8 -> HEAD
^
Looking at the file shows us that the offending string isn't part of the file at this point, so we label this as a good commit.
git bisect good
Git then checks out commit 6 and presents the following on the command line. Commit 6 is selected because it lies midway between the most recent known bad commit (commit 8) and the last commit we labelled as good (commit 4).
Bisecting: 1 revision left to test after this (roughly 1 step)
[8b393238a6f6f4b1dd473d32ad79a285450f980a] Commit 6
In the git history the pointer is now located at commit 6.
1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7 -> 8 -> HEAD
^
Inspecting the file shows us that the offending string is here, so we label this as a bad commit.
git bisect bad
Git will check check out the commit that lies between this commit and the last known good commit, which means our git history pointer is now pointing at commit 5.
1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7 -> 8 -> HEAD
^
We already know that this is the commit that introduced the change, but let's complete the git bisect process.
We are shown the following on the command line.
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[bc2d487be1e93f9b5ff0f82df72e6e12a85789c0] Commit 5
Git is saying here that there are no more commits to test after this, so this must be the commit where the bad code was introduced. Looking at the file shows that the offending test is present, so this must be our problem commit.
We can then complete the git bisect command by labeling this commit as bad.
git bisect bad
Git will then show us a report confirming that commit 5 is the first bad commit, also showing the commit sha and the diff of that commit.
bc2d487be1e93f9b5ff0f82df72e6e12a85789c0 is the first bad commit
commit bc2d487be1e93f9b5ff0f82df72e6e12a85789c0
Author: Phil Norton <[email protected]>
Date: Sat Mar 30 10:42:17 2024 +0000
Commit 5
file.txt | 1 +
1 file changed, 1 insertion(+)
We can now run "git bisect reset" to return out codebase to its original state.
Conclusion
Git bisect is highly useful part of git that makes tracking down where an issue was introduced to the codebase much easier.
The key strength of this command is that you don't necessarily need to know what the cause of the problem was, just that there was a problem and it was introduced somewhere in this history of the project. By running git bisect you can look at snapshots of your project in the past and narrow down when the problem was introduced to the project until you have a single commit. It is then usually easier to see what changed within a single commit that caused a problem instead of being faced with a large codebase with thousands of commits.
The good thing about git bisect is that it works well even if your codebase has many thousands of commits. The run sub-command makes this process very quick; although creating an automated test that will work with a large codebase can be a bit of a challenge. If it's not possible to use the built in project build and testing tools to automate the bisect run then it is a good idea to setup some kind of external test that won't be effected by the changes within the project. If you do this then it's normally best to have an indication of what might have caused the problem, although that doesn't have to be the case.
What you do with the information that the bisect command gives you is up to you, but if your project is properly disciplined the commit message should contain details about what was changed; including a ticket reference that you can then use to see more history about the change. The most important thing you know now is when the breaking change was introduced.
I have only touched the surface of the git bisect command here (this is a getting started guide) and there are a number of other options that you can pass to git bisect that change the way it is run. You can read more about the git bisect command on the git documentation pages.
Comments
This article is a valuable resource for anyone who wants to learn how to use git bisect to efficiently identify the commit that introduced an issue in their code base. Thanks!
Submitted by Mandy on Fri, 04/12/2024 - 16:32
PermalinkAdd new comment