Chapter 12. Repository Management

This chapter presents two approaches to managing and publishing repositories for cooperative development. One approach centralizes the repository; the other distributes the repository. Each solution has its place, and which is right for you and your project depends on your requirements and philosophy.

However, no matter which approach you adopt, Git implements a distributed development model. For example, even if your team centralizes the repository, each developer has a complete, private copy of the repository and can work independently. The work is distributed, albeit coordinated through a central, shared repository. The repository model and the development model are orthogonal characteristics.

Repository Structure

The Shared Repository Structure

Some version control systems use a centralized server to maintain a repository. In this model, every developer is a client of the server, which maintains the authoritative version of the repository. Given the server’s jurisdiction, almost every versioning operation must contact the server to obtain or update repository information. Thus, for two developers to share data, all information must pass through the centralized server; no direct sharing of data between developers is possible.

With Git, in contrast, a shared, authoritative, centralized repository is merely a convention. Each developer still has a clone of the depot’s repository, so there’s no need for every request or query to go to a centralized server. For instance, simple log history queries can be made privately and offline by each developer.

One of the reasons that some operations can be performed locally is that a checkout retrieves not just the particular version you ask for—the way most centralized version control systems operate—but the entire history. Hence, you can reconstruct any version of a file from the local repository.

Furthermore, nothing prevents a developer from either establishing an alternate repository and making it available, on a peer-to-peer basis with other developers, or from sharing content in the form of patches and branches.

In summary, Git’s notion of a shared, centralized repository model is purely one of social convention and agreement.

Distributed Repository Structure

Large projects often have a highly distributed development model consisting of a central, single, yet logically segmented repository. Although the repository still exists as one physical unit, logical portions are relegated to different people or teams that work largely or wholly independently.

Note

When it’s said that Git supports a distributed repository model, that doesn’t mean that a single repository is broken up into separate pieces and spread around many hosts. Instead, the distributed repository is just a consequence of Git’s distributed development model. Each developer has her own repository that is complete and self-contained. Each developer and her respective repository might be spread out and distributed around the network.

How the repository is partitioned or allocated to different maintainers is largely immaterial to Git. The repositories might be deeply organized or more broadly structured. For example, different development teams might be responsible for certain portions of a code base along submodule, library, or functional lines. Each team might raise a champion to be the maintainer, or steward, of its portion of the code base, and agree as a team to route all changes through this appointed maintainer.

The structure may even evolve or change over time as different people or groups become involved in the project. Furthermore, a team could likely form intermediate repositories that contain combinations of other repositories, with or without further development. There may be specific stable or release repositories, for instance, each with an attendant development team and maintainer.

It may be a good idea to allow the large-scale repository iteration and data-flow to grow naturally and according to peer review and suggestion rather than impose a possibly artificial layout in advance. Git is flexible, so if development in one layout or flow doesn’t seem to work, it is quite easy to change it to a better one.

How the repositories of a large project are organized, or how they coalesce and combine, is again largely immaterial to the workings of Git; Git supports any number of organizational models. Remember that repository structure is not absolute. Moreover, the connection between any two repositories is not prescribed. Git repositories are peers.

So how is a repository structure maintained over time if no technical measures enforce the structure? In effect, the structure is a web of trust for the acceptance of changes. Repository organization and data-flow between repositories is guided by social or political agreements.

The question is, will the maintainer of a target repository allow your changes to be accepted? Conversely, do you have enough trust in the source repository’s data to fetch it into your own repository?

Repository Structure Examples

The Linux kernel project is the canonical example of a highly distributed repository and development process. In each Linux kernel release, there are roughly 800 to 1,100 individual contributors from roughly 100 to 200 different companies. Over the last few kernel releases (2.6.24 to 2.6.26), the corp of developers made roughly 10,000 to 13,500 commits per release. That’s between four and six commits per hour, every development hour, somewhere on the planet.[24]

While Linus Torvalds does maintain an official repository at the top of the heap that most people consider authoritative, there are still many, many derived second-tier repositories in use. For example, many of the Linux Distribution vendors take Linus’s official, tagged release, test it, apply bug fixes, tweak it for their distribution, and publish it as their official release. (With any luck, bug fixes are sent back and applied to Linus’s Linux repository so that all may benefit.)

During a kernel development cycle, hundreds of repositories are published and moderated by hundreds of maintainers, and they are used by thousands of developers to gather changes for the release. The main kernel website, http://www.kernel.org/, alone publishes about 500 Linux kernel-related repositories with roughly 150 individual maintainers.

There are certainly thousands, perhaps tens of thousands, of clones of these repositories around the world that form the basis of individual contributor patches or uses.

Short of some fancy snapshot technology and some statistical analysis, there isn’t really a good way to tell how all these repositories interconnect. It is safe to say it is a mesh, or network, that is not strictly hierarchical at all.

Curiously, though, there is a sociological drive to get patches and changes into Linus’s repository, thus effectively treating it like it is the top of the heap! If Linus himself had to accept each and every patch or change one at a time into his repository, there is simply no way he could keep up. Remember, changes are collectively going into his tree at a rate of about 1 every 10 to 15 minutes throughout a release’s entire development cycle.

It is only through the maintainers—who moderate, collect, and apply patches on subrepositories—that Linus can keep up at all. It is as if the maintainers create a pyramid-like structure of repositories that funnel patches toward Linus’s conventional master repository.

In fact, below the maintainers but still near the top of the Linux repository structure are many sub-maintainers and individual developers who act in the role of maintainer and developer peer as well. The Linux kernel effort is a large, multilayered mesh of cooperating people and repositories.

The point isn’t that this is a phenomenally large code base that exceeds the grasp of a few individuals or teams. The point is that those many teams are scattered around the world and yet manage to coordinate, develop, and merge a common code base toward a fairly consistent long-term goal, all using Git’s facilities for distributed development.

At the other end of the spectrum, Freedesktop.org development is done entirely using a shared, centralized repository model powered by Git. In this development model, each developer is trusted to push changes straight into a repository, as found on http://cgit.freedesktop.org/.

The X.org project itself has roughly 350 X-related repositories available on http://cgit.freedesktop.org/, with hundreds more for individual users. The majority of the X-related repositories are various submodules from the entire X project, representing a functional breakdown of applications, X servers, different fonts, and so on.

Individual developers are also encouraged to create branches for features that are not ready for a general release. These branches allow the changes (or proposed changes) to be made available for other developers to use, test, and improve. Eventually, when the new-feature branches are ready for general use, they are merged into their respective mainline development branches.

A development model that allows individual developers to directly push changes into a repository runs some risk, though. Without any formal review process prior to a push, it is possible for bad changes to be quietly introduced into a repository and to go unnoticed for quite some time.

Mind you, there is no real fear of losing data or of being unable to recover a good state again, because the complete repository history is still available. The issue is that it will take time to discover the problem and correct it.

As Keith Packard wrote:

We are slowly teaching people to post patches to the xorg mailing list for review, which happens sometimes. And, sometimes we just back stuff out. Git is robust enough that we never fear losing data, but the state of the top of the tree isn’t always ideal.

It’s worked far better than using CVS in the same way….[25]

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.196.223