Chapter 14
Project Setup
Once you have decided to use Git, the first step would be to allocate files and directories for a Git repository. It is important to decide whether the project should be versioned in one repository or in multiple repositories. Since Git can only create a branch or a tag for the whole repository, the decision very much depends on the release units of your project.
After the project division is done, you must create a repository for each module and fill it. Empty directories and files that are not to be versioned will have special treatment.
When working in a team, for each module you must define a repository as the central repository. All developers will take advantage of this central repository to fetch the current status and record their changes.
You have to decide how all developers in the team can access the central repository. Git supports access via a shared network drive, through a web server, a proprietary network protocol and the Secure Shell Infrastructure (SSH).
Which protocol you choose depends on the existing infrastructure, local distribution, and the requirements relating to the administration of rights.
This workflow describes the following.
Overview
This workflow is made up of two parts. In the first step, a repository for a project directory is created. In the second step, a central repository is made available to all developers.
Figure 14.1 shows how a project named projecta is transferred to a repository. Pay special attention to the empty directory named EmptyDir, because empty directories are not usually versioned by Git. You can force Git to version an empty directory by creating a file in it, such as .gitignore.
Likewise, during the initial commit you should make sure you do not version files that are not supposed to be in the repository, such as build results or temporary files. In the example you can see this in the TempDir directory. Backup files are stored in it even though this directory should not be versioned. To exclude this directory in future commits, you should create a .gitignore file in the root directory of the project and specify directories to be ignored in this file.
In the second step the new repository will be made available to other developers. In this case, Git supports different protocol variants:
In Git, multiple access points for the same repository can be deployed in parallel. For instance, it is common to configure a HTTP access for anonymous read access and an SSH access for write access.
Requirements
Compact Workflow: Setting Up A Project
A project directory is imported into a new repository. This repository is provided as a central repository for the development team.
Figure 14.1: Workflow overview
Process and Implementation
The following procedures use the simple sample project projecta in Figure 14.1.
Create A New Repository from the Project Directory
This section shows how you can create a bare repository for an existing project. A bare repository is a precondition to share the repository with the team later.
The starting point is a directory in the file system that is gradually transformed into the finished bare repository.
Step 1: Prepare Empty Directories
Git is basically a content tracker, it can efficiently manage versions of files of different types. By contrast, directories are only considered structuring units and only versioned in conjunction with files.
Empty directories are not relevant in Git and cannot be added using the add command to a commit.
As long as the development environment is not dependent on the empty directories, you can just ignore them. Sometimes you cannot delete these empty directories because some development environments and tools assume the existence of these directories and they will complain if the empty directories are missing.
By adding a file to an empty directory, you can force Git to version the empty directory. Theoretically, you can add any file as long as it does not mean anything to the development environment. Adding a .gitignore or .gitkeep file should be okay.
As an example, you should do this to the EmptyDir directory in Figure 14.1. Using the Unix touch command, you can create an empty file in EmptyDir.
> cd projecta/EmptyDir
> touch .gitignore
Files whose names start with a dot are hidden files in Unix and ignored by many development environments.
Temporary files (such as build results) are often added to an empty directory. To prevent these files from being included by mistake in a commit, you can create a .gitignore file in the directory and insert a line with an asterisk (“*”) to indicate to Git that all files in the directory must be ignored and should not show up as “untracked” when the status command is called.
The following Unix command echo creates a new .gitignore file and inserts “*” to it.
> echo "*" > .gitignore
Step 2: Ignore unnecessary files and directories
Development and build tools often create temporary files, such as class files in Java. These files should not be versioned. To prevent temporary files from being versioned, create a file named .gitignore and list all unwanted files and directories in it.
The .gitignore file can be created in each directory. The entries are always applied from this level and all its subdirectories.
The following is the content of a .gitignore file in the example in Figure 14.1. Each line specifies a pattern for a file name that should be ignored. In this case, the TempDir directory and all files with .bak extension should be ignored.
# Content of .gitignore /TempDir *.bak
To easily keep track of what files are ignored, create a .gitignore file only in the root of the project. Even deeper subdirectories can be excluded from that file. The only exceptions are the .gitignore files in empty directories where the files are only there to force Git to version the directories.
Step 3: Creating a repository
After the project files to be imported were prepared in the previous steps, in this step you create the repository.
> cd projecta
> git init
Step 4: Define treatment of line endings
Prior to actually importing files, you need to decide how to deal with line endings in text files.
Problems with line endings always occur when you develop simultaneously on different operating systems or when you use text files in different operating systems.
Windows uses CRLF (Carriage Return and Line Feed) to encode line breaks. Unix systems and Mac computers use LF (Line Feed) for line breaks. Text editors on different platforms deal with the line breaks of the other platforms, and, as such, this problem is largely solved.
However, a text editor, with or without the user’s knowledge, adjusts the line breaks for the respective platform. This in turn means that Git recognizes a row as changed even though the content did not change. You can well imagine how many merge conflicts arise from it.
Git provides a solution to the problem by standardizing line breaks in the repository as LF. When the standardization is enabled, with every commit command Git converts all line endings to LF and slide in and out, if desired, in the respective platform-dependent default.
There are three different ways of dealing with line endings:
Since you usually cannot prevent a repository from being used in the future on other platforms, it makes sense to work from the outset with standardized line breaks.
Therefore, on Windows systems, core.autocrif is set to true and on Unix systems it is set to input before the first import. Note that setting core.autocrif to true or input can be problematic if Git identifies a file as a text file where in fact it is a binary file. Use the .gitattribute file to override auto-detection.
Here is how to set core-autocrlf to input.
> git config --global core.autocrlf input
Step 5: Import files
Next, add all files for the first commit using the add command. All existing files, including the added .gitignore files, will then be committed and the ignored files left off.
Before you issue the add command, it is recommended to once again use the status command to check which files are reported as “untracked”. Sometimes you forget a temporary file or directory by unintentionally adding it to the repository.
> git status
> git add .
A commit is concluded with the commit command.
> git commit -m "init"
Step 6: Create a bare repository
So far, we have created a normal repository with a workspace for the new project. To work in a team with a central repository using pull and push commands, the repository needs to be converted to a bare repository without a workspace. A bare repository consists only of the contents of the .git directory.
The conversion is done using the clone command and the --bare parameter. Bare repositories typically have the ending .git, to distinguish them from normal repositories.
> git clone --bare projecta projecta.git
The --bare option causes the clone to have no workspace and include only objects in the repository. The projecta parameter is the name of the repository to be prepared. The projecta.git parameter is the name of the bare repository to be created.
Sharing A Repository via File Access
This section describes how a bare repository can be shared using a shared network drive.
Step 1: Copy the bare repository
After a bare repository with the project files has been created, it can be easily stored on a network drive that is accessible to all.
> cp -R projecta.git /shared/gitrepos/.
In this example, we assume that the /shared/gitrepos directory is a network drive.
Step 2: Clone the central repository
When cloning a repository that has been shared over a network drive, simply specify the path to the central bare repository.
> git clone /shared/gitrepos/projecta.git
The path can be specified using the file:// prefix.
> git clone file:///shared/gitrepos/projecta.git
Step 3: Manage the read and write access
The read and write access to the repository are managed through the file system. Each team member will have read access to the repository, because you need read permission to the bare repository directory. The same applies to the write permission.
Advantage and disadvantage
The advantage is the fact that in many corporate environments there are already shared network drives that are parts of a shared file system, the easiest option for a central repository.
The disadvantage is it is difficult to set up this option if you work in a different location than the central repository. Also, data access in Git is not the most efficient because remote Git commands (push, fetch and pull) must always work with remote data. In the following three server versions, however, Git can run remote commands on the server and only needs to transmit the result to the local machine.
Sharing A Repository Using the Git Daemon
The standard Git installation includes a built-in server service that provides access to the repository via a simple network protocol.
Note that the Git daemon in only available on Windows in Git version 1.7.4.
Step 1: Enable the bare repository for the Git daemon
When the Git daemon exports a repository, a git-daemon-export-ok file will be created in the root directory of the bare repository. The file can be empty and is only there to tell Git that it is okay to serve the project without authentication.
> cd projecta.git
> touch git-daemon-export-ok
Step 2: Start the git daemon
You start the Git daemon by using the daemon command.
> git daemon
Afterward, you can access all the repositories in the current computer that are approved for export. For this purpose, the full path to the repository must be specified in the Git URL.
Here is an example URL:
git://server-42/shared/gitrepos/projecta.git
The prefix git: indicates that the Git daemon must be used as the protocol. This is followed by the computer name (server-42) and the path to the directory (/shared/gitrepos/projecta.git) that is the location of the repository.
In order to make the URL not so dependent on a specific directory, it is often useful to specify a base path. This can be done using the --base-path parameter.
> git daemon --base-path=/shared/gitrepos
Now you can access the repository through git://server-42/projecta.git.
By default, the daemon command only exports a repository for reading. To enable write access to a repository, use the --enable=receive-pack parameter.
> git daemon --base-path=/shared/gitrepos --enable=receive-pack
The Git daemon can also be configured as a service in the operating system. For more details, see the documentation for the daemon command.
Step 3: Clone the central repository
When you clone a repository which is released via the daemon, just type in the URL to the central bare repository.
> git clone git://server-42/projecta.git
Step 4: Manage read and write access rights
The read and write access rights cannot be defined separately for individual developers in this variant. That is, each repository that was released for export can be read by anyone who has access to the computer.
If the Git daemon started with write access enabled, anyone can also change all the exported repositories.
Advantage and disadvantage
Advantage: The Git daemon provides the most efficient and fastest data transfer to and from the central repository.
Disadvantage: Lacking the capability to authenticate users, i.e., in environments where the read and write access rights must be limited to repositories, the Git daemon cannot be used.
Another disadvantage: In distributed teams, the firewall can still be a problem since the Git daemon requires a shared port.
Sharing A Repository via HTTP
The standard Git installation provides a CGI script that allows access to repositories through a web server. The CGI script is only available with Git version 1.6.6. Before that, it was possible to access a repository via HTTP, but the “old” protocol was very inefficient and slow.
As an example, the following describes the integration of the CGI script in an Apache2 infrastructure.
Apache2 is typically configured through a file called httpd.conf. The following describes what changes need to be made to the Apache2 configuration file. For details and background information, please read the Apache2 documentation.
Step 1: Enable Apache2 modules
CGI scripts can only be integrated with Apache2 if the mod_cgi module is enabled. In addition, for Git integration, you also need the mod_alias and the mod_env modules. You must enable these modules if they are not yet enabled.
Note that the exact paths in the following example depend on the Apache2 installation and the operating system.
LoadModule cgi_module libexec/apache2/mod_cgi.so LoadModule alias_module libexec/apache2/mod_alias.so LoadModule env_module libexec/apache2/mod_env.so
Step 2: Allow access to the CGI script
A typical Apache2 installation restricts access to the web server to certain directories in the file system. If you want to use the CGI script directly from the installation directory of Git, this directory needs to be enabled for access.
In this example, the CGI script is located in /usr/local/git/libexec/git-core directory. The following snippet will allow Apache2 to call the CGI script from there:
<Directory "/usr/local/git/libexec/git-core"> AllowOverride None Options None Order allow,deny Allow from all </Directory>
Attention! It is important to ensure that the user under which the server is running Apache2, has read and execute permissions over the CGI script.
Step 3: Allow access to the repository via HTTP
In order for the CGI script to export a repository, a file named git-daemon-export-ok must be created in the root directory of the bare repository. The file can be empty and is only there to tell Git that it is okay to serve the project without authentication.
> cd /shared/gitrepos/projecta.git
> touch git-daemon-export-ok
Attention! It is important to ensure that the Apache2 server has read and write access to the repository directory and all its files and subdirectories.
Now, you have to specify the root directory in the httpd.conf file, which contains the repositories to be exported. In this example it is the /shared/gitrepos/ directory.
SetEnv GIT_PROJECT_ROOT /shared/gitrepos
Finally, you have to set up an alias for the CGI script. In this case it is /git.
ScriptAlias /git/ /usr/local/git/libexec/git-core/git-http-backend/
After you restart Apache2, access to all repositories under /shared/gitrepos/ will be allowed.
Step 4: Clone the central repository
When you clone a repository, simply point the URL to the central repository. In this case, the URL consists of the machine name, the script alias for the CGI script, and the directory name of the repository.
> git clone http://server-42/git/projecta.git
In this example, the repository projecta.git is located on computer server-42 under the alias script git.
Step 5: Manage read and write access rights
In this variant, the read and write permissions can be defined using the normal web server access rights.
For instance, to require a password when writing to the repositories (with the push command), add this entry in the Apache2 configuration file:
<LocationMatch "^/git/.*/git-receive-pack$"> AuthType Basic AuthName "Git Access" AuthUserFile /shared/gitrepos/git-auth-file Require valid-user </LocationMatch>
With this entry, all requests for git-receive-pack, which is required in every push command, will be intercepted and only allowed if the user is authenticated. Read access, on the other hand, are still possible without a password.
To protect both read and write access to a repository with a password, use this entry in the Apache2 configuration file.
<LocationMatch /git/projecta.git> AuthType Basic AuthName "Git Access" AuthUserFile /shared/gitrepos/git-auth-file Require valid-user </LocationMatch>
More examples of the web server configuration can be found in the documentation for the http-backend command.
Advantage and disadvantage
Advantage: The HTTP variant allows easy access to repositories in a web environment. Typical problems with firewalls are not expected from the use of the HTTP protocol. Authentication can be done via the web server. If you need it to be more secure, you can use the HTTPS protocol.
Disadvantage: You need a web server, which must be operated and administered.
Sharing A Repository via SSH
In order to share a repository via Secure Shell (SSH), the necessary infrastructure must be in place. That is, you must at least have a computer with SSH daemon and all participants must have an SSH account on the server.
Step 1: Copy the bare repository
Simply copy the bare repository with the project files to an SSH host to which all developers have access. The scp command can be used to copy a file or files over SSH.
> scp -r projecta.git server-42:/shared/gitrepos/projecta.git
In this example, we assume that the computer (server-42) allows SSH access and that the /shared/gitrepos directory is allocated on this computer for storing the repository.
Step 2: Clone the central repository
When you clone a repository which is shared via SSH, you need a normal SSH path to the central repository.
> git clone ssh://server-42:/shared/gitrepos/projecta.git
The prefix ssh:// can be omitted.
> git clone server-42:/shared/gitrepos/projecta.git
Step 3: Manage read and write access rights
In this variant, read and write access to the repository are managed by administering the SSH and file system rights. That is, each team member will have read access to the repository and need SSH access and read access to the repository directory. The same applies to the write permission.
Advantages and disadvantage
Advantages: Access to a repository over SSH is very easy to set up with an existing SSH infrastructure. Network access is very efficient, since most of the Git commands take place on the SSH server and only the results are transmitted over the network. Furthermore, the access is encrypted.
Disadvantage: If there is no existing SSH infrastructure, setting up this infrastructure can be costly. Even with the existing infrastructure, the management of user accounts can be complex, because each user needs a separate account, even for read access.
Note that you can use Gitolite (https://github.com/sitaramc/gitolite) and Gitosis (https://github.com/tv42/gitosis) software to simplify SSH infrastructure administration for Git. Gitolite can even manage read and write access rights at branch level. There is also Gerrit (http://code.google.com/p/gerrit/), which can also act as a SSH server in addition to providing review functionality.
Why Not the Alternatives?
Why Not Give up push?
The workflow described assumes that each developer has write access to the central repository and thus can publish their commits with the push command.
Typically, in an open source project, a pure pull sequence is used. In this case, all developers only work on their local repository and only the integration managers (integrators) have the permission to update the central software version.
Figure 14.2 shows this pure pull workflow.
The developers clone the central repository and generate new local commits. They then send the integrators a pull request, which is a request to import a branch or a commit and to merge it with the integration branch in the central repository.
The integrator is now responsible for merging all changes from all the developers to the central repository with the pull command. The integrator also takes the role of quality assurance. Once the integrator has all the changes in the central repository, the developers can import the official version from the central repository again with the pull command.
Figure 14.2: Working with pull only
In the normal project work and product development, this process can quickly become an unnecessary brake. There are always high-frequenters in a team who need to see the changes from the other parties quickly, such as when many files are changed in refactoring. The release cycles are shorter in agile projects. In such a scenario, the integrator can become a bottleneck and the changes are not built fast enough in the central repository.
In most projects, the advantage of having more control over changes in the official version does not weigh on the higher cost.
Another problem is the backup of changes. Only after the pull request has been processed, will data be stored in the central repository. Usually only the central repository is backed up by a backup system in enterprises. If the data is destroyed in the computer of the developer before the pull, work will be lost.
Note: It is of course also possible to back up the developer repository. In the open-source environment GitHub (https://github.com/) is often used for this purpose. This also ensures that the integrator can access the developer’s repository.
18.119.138.202