Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Living with Distributed Development

Changing Public History

Once you have published a repository from which others might make a clone, you should consider it static and refrain from rewriting the history of any branch. Although this is not an absolute guideline, avoiding “rewinds” and alteration of published history simplifies the life of anyone who clones your repository.

Let’s say you publish a repository that has a branch with commits A, B, C, and D. Anyone who clones your repository gets those commits. Suppose Alice clones your repository and heads off to do some development based on your branch.

In the meantime, you decide, for whatever reason, to fix something in commit C. Commits A and B remain the same, but starting with commit C, the branch’s notion of commit history changes. You could slightly alter C or make some totally new commit, X. In either case, republishing the repository leaves the commits A and B as they were but will now offer, say, X and then Y instead.

Alice’s work is now greatly affected. Alice cannot send you patches, make a pull request, or push her changes to your repository because her development is based on commit D.

Patches won’t apply because they’re based on commit D. Suppose Alice issues a pull request and you attempt to pull her changes; you may be able to fetch them into your repository (depending on your tracking branches for Alice’s remote repository), but the merges will almost certainly have conflicts. The failure of this push is due to a non-fast-forward push problem.

In short, the basis for Alice’s development has been altered. You have pulled the commit rug out from underneath her development feet.

The situation is not irrecoverable, though. Git can help Alice, especially if she uses the git rebase --onto command to relocate her changes onto your new branch after fetching the new branch into her repository.

Also, there are times when it is appropriate to have a branch with a dynamic history. For example, within the Git repository itself, there is a so-called proposed updates branch, pu, which is specifically labeled and advertised as being “rewound,” “rebased,” or “rewritten” frequently. You, as a cloner, are welcome to use that branch as the basis for your development, but you must remain conscious of the branch’s purpose and take special effort to use it effectively.

So why would anyone publish a branch with a dynamic commit history? One common reason is specifically to alert other developers about possible and fast-changing directions some other branch might take. You can also create such a branch for the sole purpose of making available, even temporarily, a published changeset that other developers can use.

Separate Commit and Publish Steps

One of the clear advantages of a distributed version control system is the separation of commit and publish. A commit just saves a state in your private repository; publishing through patches or push/pull makes the change public, which effectively freezes the repository history. Other version control systems, such as CVS or SVN, have no such conceptual separation. To make a commit, you must publish it simultaneously.

By making commit and publish separate steps, a developer is much more likely to make precise, mindful, small, logical steps with patches. Indeed, any number of small changes can be made without affecting any other repository or developer. The commit operation is offline in the sense that it requires no network access to record positive, forward steps within your own repository.

Git also provides mechanisms for refining and improving commits into nice, clean sequences prior to making them public. Once you are ready, the commits can be made public in a separate operation.

No One True History

Development projects within a distributed environment have a few quirks that might not be obvious at first. And while these quirks might initially be confusing and their treatment often differs from other nondistributed version control systems, Git handles them in a clear and logical manner.

As development takes place in parallel among different developers on a project, each has created what he believes to be the correct history of commits. As a result, there is my repository and my commit history, your repository and your commit history, and possibly several others being developed simultaneously or otherwise.

Each developer has a unique notion of history, and each history is correct. There is no one “true” history. You cannot point to one and say, “This is the real history.”

Presumably, the different development histories have formed for a reason, and ultimately the various repositories and different commit histories will be merged into one common repository. After all, the goal is likely to be advancement toward a common goal.

When the various branches from the different repositories are merged, all of the variations are present. The merged result states, effectively, “The merged history is better than any one independently.”

Git expresses this “history ambivalence” toward branch variations when it traverses the commit DAG. So if Git, when trying to linearize the commit sequence, reaches a merge commit, it must select one branch or the other first. What criteria would it use to favor or select one branch over another? The spelling of the author’s last name? Perhaps the timestamp of a commit? That might be useful.

Even if you decide to use timestamps and agree to use UTC and extremely precise values, it doesn’t help. Even that recipe turns out to be completely unreliable! (The clocks on a developer’s computer can be wrong either intentionally or accidentally.)

Fundamentally, Git doesn’t care what came first. The only real, reliable relationship that can be established between commits is the direct parent relationship recorded in the commit objects. At best, timestamps offer a secondary clue, usually accompanied by various heuristics to allow for errors such as unset clocks.

In short, neither time nor space operates in well-defined ways, so Git must allow for the effects of quantum physics.

Git as Peer-to-Peer Backup

Linus Torvalds once said, “Only wimps use tape backup: real men just upload their important stuff on ftp, and let the rest of the world mirror it.” The process of uploading files to the Internet and letting individuals make copies was how the source code for the Linux kernel was “backed up” for years. And it worked!

In some ways, Git is just an extension of the same concept. Nowadays, when you download the source code to the Linux kernel using Git, you’re downloading not just the latest version but the entire history leading up to that version, making Linus’s backups better than ever.

This concept has been leveraged by projects that allow system administrators to manage their /etc configuration directories with Git and even allow users to manage and back up their home directories. Remember, just because you use Git doesn’t mean you are required to share your repositories; it does, however, make it easy to “version control” your repositories right onto your Networked Attached Storage (NAS) box for a back-up copy.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Living with Distributed Development

Create new playlist

Sign In

Sign Up

Living with Distributed Development

Changing Public History

Separate Commit and Publish Steps

No One True History

Table of Contents for
Living with Distributed Development