Chapter 1
Basic Concepts
This chapter introduces you to the idea of a distributed version control system and shows you how it differs from a centralized one. In this chapter you will also learn how distributed repositories work and why branching and merging are not a big deal in Git.
Distributed Version Control, How Different?
Before looking into the concepts of distributed version control, let’s take a quick look at the classic architecture of centralized version control.
Figure 1.1: Centralized version control
Figure 1.1 shows the typical layout of a centralized version control system, such as CVS or Subversion. Every developer has a working directory (workspace) with all the project files in his or her computer. After the developer makes changes locally, he or she then sends the changes by regularly committing to a central server. With an update the developer retrieves changes made by other developers. The central server stores the current and historical versions of files (repository). Parallel development branches and named versions (tags) are also managed centrally.
Figure 1.2: Distributed version control
In a distributed version control system (see Figure 1.2) there is no separation between the developer environment and the server environment. Every developer has both a workspace with the files being worked on and their own local repository (called a clone) with all versions, branches and tags. Changes are also enshrined here with a commit, but initially only in the local repository. Other developers see the new versions immediately. Push and pull commands then transmit changes from one repository to another. Technically, all repositories are equivalent in the distributed architecture. In theory, it does not need a server: You could transfer all changes directly from one development computer to another development computer. In practice server repositories play an important role in Git, for example in the form of the following specific repositories:
Here are the advantages of a distributed system over a centralized one.
The Repository, the Basis of Distributed Work
The repository is basically an efficient data storage. In a nutshell, it contains:
Figure 1.3: Storage of objects in the repository
For all data a hexadecimal hash is calculated, eg 1632acb65b01c6b621d6e1105205773931bb1a41. This hash is used as reference between the objects and as a key to recover the data later (See Figure 1.3).
The hash of a commit is its “version number.” If you have a commit hash, you can check if this version is included in a repository and you can restore the associated directory in the workspace. If the version is not available, you can import (pull) that commit along with all the referenced objects from another repository.
The following are the advantages of using the hash and the given repository structure:
Branching and Merging, Easy!
For the majority of version control systems, branching and merging are exceptional circumstances that are considered advanced topics. Git, on the other hand, was originally created for Linux kernel developers who were scattered all over the world. Merging of many individual results had been one of the biggest challenges, so one of the design objectives of Git was to make branching and merging as easy and safe as possible.
Figure 1.4 shows how developers working in parallel cause branches to be created. Each point represents a version (commit) of the project. In Git you can only version the entire project, and thus each point represents files that belong to the same version.
Both developers start with the same version. After both of them make changes, they commit their changes. As each of the developers has his/her own repository, now there are two different versions of the project: two branches have been created. If one of the developers imports the changes from the other developer, he/she can make Git merge the versions. If the merge is successful, Git will create a merge commit, which include changes from both developers. If the other developer picks this commit, both developers will again have the same version of the project.
Figure 1.4: Branches are created by developers working in parallel
In the previous example, a branch was created unplanned, simply because two developers were working in parallel on the same software. Of course you can initiate targeted branching in Git and create a branch explicitly (see Figure 1.5). Explicit branching is often done to coordinate a parallel development of features.
Figure 1.5: Explicit branches for different tasks
Repository pulls and pushes can be explicitly done to determine which branches are transferred. In addition to simple branching and merging, you can also do the following with branches:
Summary
After reading this chapter, you should now be familiar with the basic concepts of Git. Even if you now put the book down (which we hope not!), you can participate in a keynote discussion on distributed version control systems, the necessity and usefulness of hashes as well as permanent branching and merging in Git.
You may be asking yourself the following questions, though.
For the first question, read the next chapter right away. There you will find the specifics of the commands for creating a repository, versioning and replacing commits between repositories. For the other questions, there are chapters with detailed workflows.
If you are a busy executive still trying to decide whether or not to use Git, then take a look at the discussion of the limits of Git in Chapter 26, “Git's Shortcomings.”
3.147.74.211