Chapter 1

Basic Concepts

This chapter introduces you to the idea of a distributed version control system and shows you how it differs from a centralized one. In this chapter you will also learn how distributed repositories work and why branching and merging are not a big deal in Git.

Distributed Version Control, How Different?

Before looking into the concepts of distributed version control, let’s take a quick look at the classic architecture of centralized version control.

Figure 1.1: Centralized version control

Figure 1.1 shows the typical layout of a centralized version control system, such as CVS or Subversion. Every developer has a working directory (workspace) with all the project files in his or her computer. After the developer makes changes locally, he or she then sends the changes by regularly committing to a central server. With an update the developer retrieves changes made by other developers. The central server stores the current and historical versions of files (repository). Parallel development branches and named versions (tags) are also managed centrally.

Figure 1.2: Distributed version control

In a distributed version control system (see Figure 1.2) there is no separation between the developer environment and the server environment. Every developer has both a workspace with the files being worked on and their own local repository (called a clone) with all versions, branches and tags. Changes are also enshrined here with a commit, but initially only in the local repository. Other developers see the new versions immediately. Push and pull commands then transmit changes from one repository to another. Technically, all repositories are equivalent in the distributed architecture. In theory, it does not need a server: You could transfer all changes directly from one development computer to another development computer. In practice server repositories play an important role in Git, for example in the form of the following specific repositories:

  • Blessed repository: In this repository, “official” releases are created.
  • Shared repository: This repository is used to exchange files between developers in the team. In a small project, the blessed repository can be used for this purpose. In multi-site development, there may be several of these repositories.
  • Workflow repository: A workflow repository is filled only with changes that have achieved a certain status in the workflow, such as after a successful review.
  • Fork repository: This repository is used to decouple from the development main line (for example, for large conversions that do not fit in the normal release cycle) or for experimental developments that may never be included in the main line.

Here are the advantages of a distributed system over a centralized one.

  • High performance: Almost all operations are performed locally without network access.
  • Efficient ways of working: Developers can use local branches to quickly switch between different tasks.
  • Offline capability: Developers can perform commits, create branches, tag versions, etc. without a server connection. They can upload them later.
  • Flexible development processes: In teams and companies specialized repositories can be created in order to communicate with other departments, such as the testers. Changes are easily released with a push into this repository.
  • Backup: Every developer has a copy of the repository with a full history. Thus, the probability of losing data due to server failure is slim.
  • Maintainability: Tricky restructuring can first be tried on a copy of a repository before being transmitted to the original repository.

The Repository, the Basis of Distributed Work

The repository is basically an efficient data storage. In a nutshell, it contains:

  • Files (blobs):
  • Directories (Trees): Directories associate file names with content. Directories can in turn contain other directories.
  • Versions (commits): A version defines a recoverable state of a directory. When creating a new version, the author, the time, a comment and the previous version will be stored.

Figure 1.3: Storage of objects in the repository

For all data a hexadecimal hash is calculated, eg 1632acb65b01c6b621d6e1105205773931bb1a41. This hash is used as reference between the objects and as a key to recover the data later (See Figure 1.3).

The hash of a commit is its “version number.” If you have a commit hash, you can check if this version is included in a repository and you can restore the associated directory in the workspace. If the version is not available, you can import (pull) that commit along with all the referenced objects from another repository.

The following are the advantages of using the hash and the given repository structure:

  • High Performance: Access to data via the hash is very fast.
  • Redundancy-free storage: Identical file content needs to be stored only once.
  • Distributed version numbers: Because the hash of the files, the author and the date is calculated, versions can also be generated “offline” without causing conflicts in the future.
  • Efficient synchronization between repositories: When a commit from one repository to another is transferred, only objects that do not yet exist will be copied. Figuring out whether an object already exists is very fast thanks to the hash.
  • Data integrity: The hash is calculated from the content of the data. You can check with Git any time if a hash matches the data. Unintentional changes or malicious manipulation of data can be detected.
  • Automatic rename detection: Renamed files are automatically detected since the hash of the content does not change. Therefore, no special commands for renaming and moving are necessary.

Branching and Merging, Easy!

For the majority of version control systems, branching and merging are exceptional circumstances that are considered advanced topics. Git, on the other hand, was originally created for Linux kernel developers who were scattered all over the world. Merging of many individual results had been one of the biggest challenges, so one of the design objectives of Git was to make branching and merging as easy and safe as possible.

Figure 1.4 shows how developers working in parallel cause branches to be created. Each point represents a version (commit) of the project. In Git you can only version the entire project, and thus each point represents files that belong to the same version.

Both developers start with the same version. After both of them make changes, they commit their changes. As each of the developers has his/her own repository, now there are two different versions of the project: two branches have been created. If one of the developers imports the changes from the other developer, he/she can make Git merge the versions. If the merge is successful, Git will create a merge commit, which include changes from both developers. If the other developer picks this commit, both developers will again have the same version of the project.

Figure 1.4: Branches are created by developers working in parallel

In the previous example, a branch was created unplanned, simply because two developers were working in parallel on the same software. Of course you can initiate targeted branching in Git and create a branch explicitly (see Figure 1.5). Explicit branching is often done to coordinate a parallel development of features.

Figure 1.5: Explicit branches for different tasks

Repository pulls and pushes can be explicitly done to determine which branches are transferred. In addition to simple branching and merging, you can also do the following with branches:

  • Transplant a branch: Commits in a branch can be moved to another repository.
  • Transfer certain changes only: Individual commits of a branch can be copied to another branch. This is called cherry-picking.
  • Clean up history: A branch’s history can be transformed, sorted and deleted. This would make the history better documentation for the project. This is called interactive rebasing.

Summary

After reading this chapter, you should now be familiar with the basic concepts of Git. Even if you now put the book down (which we hope not!), you can participate in a keynote discussion on distributed version control systems, the necessity and usefulness of hashes as well as permanent branching and merging in Git.

You may be asking yourself the following questions, though.

  • What do I use these general concepts to manage my project?
  • How do I coordinate the many repositories?
  • How many branches do I need?
  • How do I integrate my build server?

For the first question, read the next chapter right away. There you will find the specifics of the commands for creating a repository, versioning and replacing commits between repositories. For the other questions, there are chapters with detailed workflows.

If you are a busy executive still trying to decide whether or not to use Git, then take a look at the discussion of the limits of Git in Chapter 26, “Git's Shortcomings.”

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.74.211