Chapter 26

Git’s Shortcomings

In the previous chapters we have discussed the advantages of Git and how efficient it is to work with a distributed version control. This chapter deals with the problem areas of Git.

High Complexity

Dealing with a centralized version control is now standard knowledge of every developer. However, this is often limited to the basic functions, such as fetching new versions and uploading changes. Branching and repository administration are often carried out by build managers with specialist knowledge.

In Git, however, branching is a fundamental concept that must be understood with every commit, pull and push. Also, every developer is the administrator of his or her own repository. Every member of the team must also be able to deal with remotes and exchange between repositories.

In addition, compared to the centralized version control system, there is an extra push step after a commit in the normal flow in Git. While a commit is sufficient in a centralized version control system to make the changes visible, in Git the commit must still be transmitted with the push command to the central repository.

These are due to the complexity of a distributed version control and are also found in other distributed tools, such as Mercurial. In all probability, software developers will have these concepts ready soon as standard knowledge.

In addition, Git also brings with it a few quirks. Originally, Git resides in the Linux kernel development. In Linux you must be used to working a lot with the command line. Git is powerful and there are a plethora of commands and parameters. If you look at the help pages of Git commands, you will almost feel killed by all the possibilities. The verbosity of the help pages is good to understand all the details, but they help little when it comes to distinguishing between the important and the unimportant.

Finally, a command name often highlights the technical aspect and not the application aspect. For example, the following command is used to discard local changes in Git:

> git checkout -- FILE

Got it?

Some Git command names also have a different meaning in other known version control systems. For example, in Subversion the command to discard local changes is as follows:

> svn revert FILE

A revert command is also available in Git, but it is for removing changes to an already conducted commit.

The point is Git is highly complex, and as such has a steep learning curve. Therefore, it is important that developers prepare well for the introduction of Git and it is important to define clear procedures for standard workflows.

For your effort, however, you will be rewarded with a very powerful tool that does not restrict you in your own way of working.

Complicated Submodules

The submodule concept was described in Chapter 11, “Dependencies between Repositories.” Submodules are separate repositories that are linked from another repository (the main repository).

Cloning a repository with submodules is complicated and requires additional steps (the submodule-init and submodule-update commands). You can see clearly that the submodule concept was retrofitted.

With Git you can always restore a reproducible version of your project that includes submodules. Unfortunately, this also makes the work complicated. Changes to a submodule must first be completed with a separate commit. After that, the new commit of the submodule must be selected and subsequently persisted in the main repository with a second commit.

In many development projects, you always want the current versions of submodules integrated in the main project during the development phase. Git submodules do not support this approach. You must always select a commit explicitly.

The fact that submodules are stand-alone repositories, it is not possible to move files including the history between them.

This all makes people often omit submodules in Git. If technical modules serve only as a structuring unit within a project, then you would work best with a large repository in which all modules are included. So you can always have the latest versions of all modules and files including the history can be moved. However, separate release cycles, branches and tags for individual modules are not possible with this solution.

Alternatively, if the modules are not tightly coupled and require their own release cycles, then you can use an external component repository that supports dependency management (e.g. Maven or Ivy in Java) and use Git to version only the definitions of the dependencies of the modules (in Maven with the pom.xml file).

Resource Consumption for Large Binary Files

Git has a very efficient memory management. The content of a file is stored once only, even if there are multiple copies of the file. This also works across commit boundaries. That is, as long as the content of a file does not change, there is only one object for all Git commits.

In addition, Git objects are combined into packages and Git compresses them. This all leads to a very resource-efficient storage of files.

However, all versions of a file in the local repository are kept. As soon as you store large binary files in Git, like movies, photos, virtual machines, this will cause more resource consumption. When a new version of a large binary file is created, both the old and the new files are in the local repository.

In this case, centralized version control systems have the advantage that only the latest version of a binary file is available locally at the developer’s machine. Older versions are only on the server.

As a consequence, you should try to minimize the number of large binary files in the actual Git development repository. “Small” binary files, such as Java libraries, are no problem for today’s hard drives and network bandwidth.

If a repository becomes very large, you can delete the old versions of files using the workflow “Outsourcing Long Histories.”

Repositories Can Only Be Dealt with in Its Entirety

In a commit Git always versions the entire project or directory. By contrast, most central versions administrations manage files individually. Therefore, central version controls also support partial checkouts, i.e., you can get individual subdirectories separately from a version.

In Git partial checkouts are not supported, since all the files are already available locally. The need for partial checkouts often indicates a lack of modularization in the project, i.e., you should create multiple repositories.

Often partial checkouts are used in central version controls to offset the slowness of the systems, a problem Git does not have.

If you really want to look at individual files only, then you can set up a GitWeb server (See the instaweb command). This allows direct access to certain files and versions.

Alternatively, you can use the archive command to export only parts of the repository.

Authorization Only on the Entire Repository

In the previous section we mentioned that a Git repository can only be dealt with in its entirety. This also applies to authorization.

With Git It is not possible to set up permissions for individual folders of a project. Either a user has full access to a repository, or he or she cannot access it. It is only possible to distinguish between read and write access, again only for the entire repository.

In open source projects the problem of different access permissions are often solved through the “Network of Trust” concept.

In “Network of Trust” none is allowed push access to the repository, it uses a pure pull workflow. In this workflow, developers generate local commits and send pull requests to integrators.

The integrators only accept pull requests from well known and trusted people. In other pull requests, the changes must first be verified by a trusted person. Git supports the distinction between author and committer and the concept of “signed commits.” With signed commits a trusted committer signs to confirm that he has verified the changes. To this end, the commit message is extended accordingly:

Signed-off-by: Rene Preissel <[email protected]>

With this, a new commit is created with the verified changes. The signing developer is written as the person who carried out the inspection.

Thus, in the “Network of Trust” the rigid assignment of rights for directories is replaced by a review process. In large open source projects (such as the Linux kernel), there are several levels of integrators. Only after several steps will a change end up in the official repository. The top level integrators does not have to control all commits, since they are already signed by trusted developers.

A modification of the “Network of Trust” workflow is supported by the tool Gerrit. All code changes need to go through a review process before the changes are accepted in the official branch.

For in-house projects, often neither fine-grained permissions for directories or complex formal review processes are necessary. All team members are allowed to see and modify all the files. Maximum release of a project or transition to a predefined test level should be limited. A developer can also be easily restricted by creating separate repositories with limited write access. Once a transition is about to happen, the commits of authorized users are transferred into another repository.

Moderate Graphical Tools for History Analysis

When it comes to merge conflicts in projects or problems after a merge error, then the commit history can be used to find the causes. It is often a question why a change was incorporated. With active development activities, and thus many commits and merges, that is not trivial.

Git offers a very powerful command line tools (log, blame, annotate commands) for analyzing the commit history. However, the graphical tool gitk that ships with Git and also the plug-ins for development environments (such as EGit) are not great. It is tedious to trace the paths. In this aspect, commercial version administration tools offer clearer display options.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.237.29