Chapter 3

What Is A Commit?

The most important concept in Git is the commit. Git manages versions of software, and each version is stored as a commit in a repository. A commit always spans the entire project. With a commit, a copy of each file in the project is stored in the repository.

Figure 3.1 shows the summary of important information about a commit that you created using git log -stat -1.

commit 9acc5d5efec1d2d62f7e98bcc3880cda762cb831 
Author: Bjørn Stachmann <[email protected]>
Date:	 Sat Dec 18 18:20:45 2010 +0100 

    Section about the commit.

book/commits/commits.tex | 28 +++++++++++++++++++++++++---
1 files changed, 25 insertions(+), 3 deletions(-) 

Figure 3.1: Information about a commit

The first line is the commit hash 9acc5d5e ... cb831, followed by information about the author, the time the commit was made and a comment. Finally, a summary of what files have changed since the previous version. What the summary does not show: that this commit contains not only the modified file commits.tex, but all the files in the project. For each commit Git calculates a 40-character unique code, called the commit hash. If you know this hash, you can restore the files in the project from the repository as they were held at the time of the commit. In Git restoring a version is referred to as checkout.

Access Permissions and Timestamps

Git stores access permissions (POSIX File Permissions: Read, Write, Execute) for each file, but not the modification time. At checkout the modification time is set to the current time.

Why isn’t the modification timestamp saved? The reason for this is that many build tools use the modification time as the trigger for the re-building of files: If the last change is later than the last build result, make a new build. Since Git always uses the current time as the modification time at checkout, it also makes sure tools will follow the build process correctly.

The add and commit Commands

A commit takes all changes, including newly added files and deleted files. The only exceptions are files in the .gitignore file (.gitignore is discussed in Chapter 4).

Revisiting the Commit Hash

At first glance, the 40-character commit hash is a bit long. Other version control systems use simple sequential numbers (as in Subversion) or version names such as 1:17 (as in CVS).

However, there are good reasons why the developers of Git have opted for the hash.

  • A commit hash may be generated locally. Communication with other computers or a central server is not required. You can create a new commit anytime, anywhere. The commit hash is calculated from the contents of files and the metadata (author, commit time). The probability that two different changes get the same commit hash is extremely low. After all, there are 2160 different values at disposal.

  • Even more important is this: the commit hash is more than just a name for a software version. It is also its sum. With the Git fsck command you can check the integrity of the repository. If the content does not match the hash, an error like the following will be reported.
> git fsck 
error: sha1 mismatch 2b6c746e5e20a64032bac627f2729f72a9cba4ee 
error: 2b6c746e5e20a64032bac627f2729f72a9cba4ee: 
object corrupt or missing 

You can also specify a shortened commit hash. Mostly just a few characters to identify a commit. If you specify too few characters, Git will display an error message.

> git checkout 9acc5d5efec1d2d62f7e98bcc3880cda762cb831 
> git checkout 9acc

It is also possible to use a meaningful name (such as release-1.2.3) for a commit. This is called a tag.

> git checkout release-1.2.3 

The Commit History

Not only does the repository contain individual commits, it also stores the relationships between the commits. Every time you change the software and confirm this with a commit, Git remembers the previous versions of this commit. A graph of commits can be drawn to show the development of the project (See Figure 3.2).

Figure 3.2 : Commit history

It is interesting when multiple developers work simultaneously on a piece of software. Often branches are created in the commit graph, such as in node C, and re-merged, as in G.

A Slightly Different Way of Looking at Commits

You can view a commit as a frozen version level, but it can also be regarded as a set of changes introduced in relation to the previous commit. We also speak of a diff or a change set. So the repository is also a history of changes.

Many Different Histories of the Same Project

Initially, the distributed architecture of Git needs some getting used to. In a central version control system (such as CVS or Subversion) there is a central server that contains the history of the project. In Git, however, each developer has his/her own clone of the repository (sometimes more). When a developer creates a commit, this is done locally. His/her repository then will have a different history from the repositories of the other developers, who have cloned the same project.

Each repository can tell its own story. Commits between repositories can be shared using the fetch, pull and push commands. In addition, the merge command can make different histories merge again.

In many projects there is a repository (usually on the project server) that contains the official history of the project. Such a repository is called the blessed repository. However, this is just a convention. From a technical perspective, all clones are equal. For example, if the main repository is damaged, another clone can do its job.

A very large project can be distributed across multiple repositories. In this case, there is a main repository, which in turn contains the repositories for the subprojects. Such repositories are called submodules.

The log command has several options that allow you to determine which commits are displayed in what format. Some of the more frequently used options are shown below.

Output Limit: -n

It is often useful to limit the output. The following example shows only the last 3 commits:

> git log -n 3 

Formatting Output: --format, --oneline

The format for the log output can be controlled using --format. For example, --format=fuller provides many details. Here is a quick overview of the --oneline option.

> git log --oneline
2753f19 TODO indented for illustration.
e0ffbdb Note.
4200ba2 Section on different histories of the same project.
... 

Change statistics: --stat, --shortstat

Also useful are the statistics: --stat shows which files have been changed. --dirstat shows all directories that contain files that have changed, and --shortstat shows a short summary of how many files were changed, added and deleted.

> git log --shortstat --oneline 
753f19 TODO indented for illustration.
 1 files changed, 2 insertions (+), 2 deletions (-) 
e0ffbdb Note. 
 1 files changed, 27 insertions (+), 4 deletions (-)
4200ba2 Section on different histories of the same project. 
 1 files changed, 15 insertions (+), 6 deletions (-)
...

Option: log --graph

You can use the --graph option to view the relationships between commits.

> git log --graph --oneline 
*   6d7f278 Merge branch 'master' into editorial 
| 
| * 419b389 merge: built-in formatting. 
| | 
| | * 8f5b053 Quick Start: Formatting installed. 
| | * 5f22c8d New Macros for formatting. 
| |/ 
* | Ab36269 TODOs 
* | C2cae84 intro to the first steps. 
|/ 
*   63788eb merge: Section 'Examples and notation' added. 

Summary

  • Repository: The project repository resides in the .git directory. It contains the history of the project in the form of commits. Because Git is distributed, a project often has many repositories with different histories. Git is designed so that it can merge these histories again if necessary.
  • Commit (also called version, revision, or changeset): The commit command creates a commit. A commit stores a defined state of the project. It includes the state of all the files in the project. Each commit contains metadata about the author and the commit date. In particular, Git stores the predecessor/successor relationship. The relations form a version graph of the project. The log command displays the commits from the repository.
  • Commit hash: A commit hash identifies a commit. At the same time it serves as a checksum to verify the integrity of the stored software object. A commit hash is 40 characters long.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.3.175