Chapter 6. Commits

In Git, a commit is used to record changes to a repository.

At face value, a Git commit seems no different from a commit or check-in found in other version control systems. However, under the hood, a Git commit operates in a unique way.

When a commit occurs, Git records a snapshot of the index and places that snapshot in the object store. (Preparing the index for a commit is covered in Chapter 5.) This snapshot does not contain a copy of every file and directory in the index, because such a strategy would require enormous and prohibitive amounts of storage. Instead, Git compares the current state of the index to the previous snapshot and so derives a list of affected files and directories. Git creates new blobs for any file that has changed and new trees for any directory that has changed, and it reuses any blob or tree object that has not changed.

Commit snapshots are chained together, with each new snapshot pointing to its predecessor. Over time, a sequences of changes is represented as a series of commits.

It may seem expensive to compare the entire index to some prior state, yet the whole process is remarkably fast because every Git object has a SHA1 hash. If two objects, even two subtrees, have the same SHA1 hash, the objects are identical. Git can avoid swaths of recursive comparisons by pruning sub-trees that have the same content.

There is a one-to-one correspondence between a set of changes in the repository and a commit: a commit is the only method of introducing changes to a repository, and any change in the repository must be introduced by a commit. This mandate provides accountability. Under no circumstance should repository data change without a record of the change! Just imagine the chaos if, somehow, content in the master repository changed and there was no record of how it happened, who did it, or why.

While commits are most often introduced explicitly by a developer, Git itself can introduce commits. As you’ll see in Chapter 9, a merge operation causes a commit in the repository in addition to any commits made by users before the merge.

How you decide when to commit is pretty much up to you and your preferences or development style. In general, you should perform a commit at well-defined points in time when your development is at a quiescent stage, such as when a test suite passes, when everyone goes home for the day, or any number of other reasons.

However, don’t hesitate to introduce commits! Git is well suited to frequent commits and provides a rich set of commands for manipulating them. Later, you’ll see how several commits—each with small, well-defined changes—can also lead to better organization of changes and easier manipulation of patch sets.

Atomic Changesets

Every Git commit represents a single, atomic changeset with respect to the previous state. Regardless of the number of directories, files, lines, or bytes that change with a commit,[12] either all changes apply or none do.

In terms of the underlying object model, atomicity just makes sense: a commit snapshot represents the total set of modified files and directories. It must represent one tree state or the other, and a changeset between two state snapshots represents a complete tree-to-tree transformation. (You can read about derived differences between commits in Chapter 8.)

Consider the workflow of moving a function from one file to another. If you perform the removal with one commit and then follow with a second commit to add it back, there remains a small semantic gap in the history of your repository during which time the function is gone. Two commits in the other order is problematic, too. In either case, before the first commit and after the second, your code is semantically consistent, but after the first commit, the code is faulty.

However, with an atomic commit that simultaneously deletes and adds the function, no such semantic gap appears in the history. You can learn how best to construct and organize your commits in Chapter 10.

Git doesn’t care why files are changing. That is, the content of the changes doesn’t matter. As the developer, you might move a function from here to there and expect it to be handled as one unitary move. But you could, alternatively, commit the removal and then later commit the addition. Git doesn’t care. It has nothing to do with the semantics of what is in the files.

But this does bring up one of the key reasons why Git implements atomicity: it allows you to structure your commits more appropriately by following some best practice advice.

Ultimately you can rest assured that Git has not left your repository in some transitory state between one commit snapshot and the next.



[12] Git also records a mode flag indicating the executability of each file. Changes in this flag are also part of a changeset.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.164.228