Knowing Your Place

When participating in a distributed development project, it is important to know how you, your repository, and your development efforts fit into the larger picture. Besides the obvious potential for development efforts in different directions and the requirement for basic coordination, the mechanics of how you use Git and its features can greatly affect how smoothly your efforts align with other developers working on the project.

These issues can be especially problematic in a large-scale distributed development effort, as is often found in open source projects. By identifying your role in the overall effort and understanding who the consumers and producers of changes are, many of the issues can be easily managed.

Upstream and Downstream Flows

There isn’t a strict relationship between two repositories that have been cloned one from the other. However, it’s common to refer to the parent repository as being upstream from the new, cloned repository. Reflexively, the new, cloned repository is often described as being downstream from the original parent repository.

Furthermore, the upstream relationship extends up from the parent repository to any repository from which it might have been cloned. It also extends down past your repository to any that might be cloned from yours.

However, it is important to recognize that this notion of upstream and downstream is not directly related to the clone operation. Git supports a fully arbitrary network between repositories. New remote connections can be added, and your original clone remote can be removed, to create arbitrary new relationships between repositories.

If there is any established hierarchy, it is purely one of convention. Bob agrees to send his changes to you; in turn, you agree to send your changes on to someone further upstream; and so forth.

The important aspect of the repository relationship is how data is exchanged between them. That is, any repository to which you send changes is usually considered upstream of you. Similarly, any repository that relies upon you for its basis is usually considered downstream of yours.

It’s purely subjective but conventional. Git itself doesn’t care and doesn’t track the “stream” notion in any way. Upstream and downstream simply help us visualize where patches are going.

Of course, it’s possible for repositories to be true peers. If two developers exchange patches or push and fetch from each other’s repositories, neither is really upstream or downstream.

The Maintainer and Developer Roles

Two common roles are the maintainer and the developer. The maintainer serves primarily as an integrator or moderator, and the developer primarily generates changes. The maintainer gathers and coordinates the changes from multiple developers and ensures that all are acceptable with respect to some standard. In turn, the maintainer makes the whole set of updates available again. That is, the maintainer is also the publisher.

The maintainer’s goal should be to collect, moderate, accept or reject changes, and then ultimately publish branches that project developers can use. To ensure a smooth development model, maintainers should not alter a branch once it has been published. In turn, a maintainer expects to receive changes from developers that are relevant and that apply to published branches.

A developer’s goal, beyond improving the project, is to get her changes accepted by the maintainer. After all, changes kept in a private repository do no one else any good. The changes need to be accepted by the maintainer and made available for others to use and exploit. Developers need to base their work on the published branches in the repositories that the maintainer offers.

In the context of a derived clone repository, the maintainer is usually considered to be upstream from developers.

Since Git is fully symmetric, there is nothing to prevent a developer from considering herself a maintainer for other developers further downstream. But she must now understand that she is in the middle of both an upstream and a downstream dataflow and must adhere to the maintainer and developer contract (see the next section) in this dual role.

Because the dual or mixed-mode role is possible, upstream and downstream is not strictly correlated to being a producer or consumer. You can produce changes with the intent of them going either upstream or downstream.

Maintainer-Developer Interaction

The relationship between a maintainer and a developer is often loose and ill-defined, but there is an implied contract between them. The maintainer publishes branches for the developer to use as her basis. Once published, though, the maintainer has an unspoken obligation not to change the published branches since this would disturb the basis upon which development takes place.

In the opposite direction, the developer (by using the published branches as her basis) ensures that when her changes are sent to the maintainer for integration, they cleanly apply without problems, issues, or conflicts.

It may seem as this makes for an exclusive, lock-step process. Once published, the maintainer can’t do anything until the developer sends in changes. And then, after the maintainer applies updates from one developer, the branch will necessarily have changed and thus will have violated the won’t change the branch contract for some other developers. If this were true, then truly distributed, parallel, and independent work could never really take place.

Thankfully, it is not that grim at all! Instead, Git is able to look back through the commit history on the affected branches, determine the merge basis that was used as the starting point for a developer’s changes, and apply them even though other changes from other developers may have been incorporated by the maintainer in the meantime.

With multiple developers making independent changes and with all of them being brought together and merged into a common repository, conflicts are still possible. It is up to the maintainer to decide and resolve such problems. The maintainer can either resolve these conflicts directly or reject changes from a developer if they would create conflicts.

Role Duality

There are two basic mechanisms for transferring commits between an upstream and a downstream repository.

The first uses git push or git pull to directly transfer commits, while the second uses git format-patch and git am to send and receive representations of commits. The method that you use is primarily dictated by agreement within your development team and, to some extent, direct access rights as discussed in Chapter 11.

Using git format-patch and git am to apply patches achieves the exact same blob and tree object content as if the changes had been delivered via a git push or incorporated with a git pull. However, the actual commit object will be different because the metadata information for the commit will be different between a push or pull and a corresponding application of a patch.

In other words, using push or pull to propagate a change from one repository to another copies that commit exactly, whereas patching copies only the file and directory data exactly. Furthermore, push and pull can propagate merge commits between repositories. Merge commits cannot be sent as patches.

Because Git compares and operates on the tree and blob objects, Git is able to understand that two different commits for the same underlying change in two different repositories, or even on different branches within the same repository, really represent the same change. Thus, it is no problem for two different developers to apply the same patch sent via email to two different repositories. As long as the resulting content is the same, Git treats the repositories as having the same content.

Let’s see how these roles and dataflows combine to form a duality between upstream and downstream producers and consumers:

Upstream Consumer

An upstream consumer is a developer upstream from you who accepts your changes either as patch sets or as pull requests. Your patches should be rebased to the consumer’s current branch HEAD. Your pull requests should either be directly mergeable or already merged by you in your repository. Merging prior to the pull ensures that conflicts are resolved correctly by you, relieving the upstream consumer of that burden. This upstream consumer role could be a maintainer who turns around and publishes what he has just consumed.

Downstream Consumer

A downstream consumer is a developer downstream from you who relies on your repository as the basis for work. A downstream consumer wants solid, published topic branches. You shouldn’t rebase, modify, or rewrite the history of any published branch.

Upstream Producer/Publisher

An upstream publisher is a person upstream from you who publishes repositories that are the basis for your work. This is likely to be a maintainer with the tacit expectation that he will accept your changes. The upstream publisher’s role is to collect changes and publish branches. Again, those published branches should not have their histories altered, given that they are the basis for further downstream development. A maintainer in this role expects developer patches to apply and expects pull requests to merge cleanly.

Downstream Producer/Publisher

A downstream producer is a developer downstream from you who has published changes either as a patch set or as a pull request. The goal of a downstream producer is to have changes accepted into your repository. A downstream producer consumes topic branches from you and wants those branches to remain stable, with no history rewrites or rebases. Downstream producers should regularly fetch updates from upstream and should also regularly merge or rebase development topic branches to ensure they apply to the local upstream branch HEADs. A downstream producer can rebase her own local topic branches at any time, because it doesn’t matter to an upstream consumer that it took several iterations for this developer to make a good patch set that has a clean, uncomplicated history.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.73.147