Chapter 20

Splitting A Large Project

Often a software project begins as a small monolithic system. During development the project and the team are growing larger. Modularization is becoming increasingly important. At first the internal structure of the project is modularized. Eventually, you would like to develop individual modules separately and submit your own release cycle.

Since Git repositories are always versioned as a whole, a new Git repository must be created for each separate release module.

The challenge of modularization of a Git repository is to take as much as possible of the old file versions in the new repository. At the same time the new repository should not contain any files that are not used within the module. Also not required are commits in which no changes were made to any file in the module.

In the main repository, the history of the module is not removed so that older project versions can be reproduced. As a consequence, the historical data of the separated module is located in both repositories.

Most of the separated modules will still be needed from the main project and should be integrated as external modules. For this type of integration, there is the submodule concept in Git.

This workflow shows how you can extract a module with Git so that


  • only files required by the module are transferred to a new repository,
  • the history of the module files will remain in the new repository, and
  • the module can be integrated again as an external submodule.

Overview

For the following procedure, we use a project structure as shown in the top section of Figure 20.1.

The example for this workflow is based on a Java directory structure. There are three modules in the project as a whole. The files in each module are located in the src and test subdirectories. In other words, a module in each case consists of two separate parts. The third module, module3, is separated into a separate Git repository.

As a first step, you need to remove all unnecessary files and commits from a clone of the original repository using the filter branch command. Subsequently, the directory structure of the new module repository is updated. Finally, the module3 module is removed from the main project and the new module repository is incorporated again as a submodule in the external directory. The result will look like the bottom section of Figure 20.1.

In the new module repository it will be possible to reconstruct the historical changes to files, that is, to track who changed what and when. However, it is usually not possible to reproduce old versions completely. The reason is that a module often emerges from files of other modules. If you tried to restore an old version of the project from the module repository, there would be a patchwork of files in different directories. In addition, in the past, the module might have been determined as a dependency of other files that are not longer available.

In the main repository, old versions of the entire project, including the module files, are still recoverable.

Requirements

Internal modularization: The project has been modularized internally, i.e., there is a module that can be developed and versioned separately.

Module files are located in a few directories: When extracting the old versions of the module, files in each directory must be treated separately. If the files are very scattered, the cost is very large.

Workflow “Splitting A Large Project”

A module is removed from a project and migrated to a separate repository. The commit history will be preserved, unnecessary files and commits will be removed. The separated module is brought back as an external submodule.

Figure 20.1: Workflow overview

Process and Implementation

Warning! Some of the following commands change the repository very fundamentally. Although in Git it is often possible to undo changes, you should make sure to back up your repository before you start with the next steps.

> git clone --no-hardlinks --bare projekt.git projekt.backup.git 

The --no-hardlinks option guarantees that the cloned repository and the original repository to do not share any files.

Separating the Module Repository

Step 1: Clone the main repository

As a starting point for the module repository, create a copy of the main repository.

> git clone --no-hardlinks --bare projekt.git modul3-work.git 

Step 2: Remove unnecessary files and commits

Next, unnecessary files and commits must be removed. This is the most complex step and vital to the preservation of history.

To remove the content of a repository content, use the filter branch command. This creates a new commit for each existing commit. By varying the filter, the new commits can be changed.

The following filter branch command removes the src/module1 directory from the commit.

> cd module3

> git filter-branch --force --index-filter 
    'git rm -r --cached --ignore-unmatch src/module1'
    --tag-name-filter cat
    --prune-empty -- --all

The parameters used are as follows.

-index-filter 'git rm -r -cached -ignore-unmatch ...': With this option, files can be removed from a commit. The rm command is called for each commit. In our example, we remove the src/module1 directory. If your project does not have a very modular structure, you may need to delete more files and directories .

-tag-name-filter cat: This option creates tags for existing or new commits.

-prune-empty: This option removes all commits that contain no files in the previous filter.

-all: This option applies the filter to all branches in the project.

For the example project, we have to invoke the command repeatedly on the test/module1, src/module2 and test/module2 directories.

For the exact descriptions of all the options of the filter branch command, please refer to the Git help.

Step 3: Remove unnecessary branches and tags

Not all tags and branches in the module repository are useful. For instance, the tags and branches that have no relation to the extracted module are useless. The unnecessary branches and tags should be removed.

> git tag -d v1.0.1

> git branch -D v2.0_bf

Step 4: Scale down the module repository

In order for Git to remove all unnecessary files from its internal management data, a repeated cloning is required.

> git clone --no-hardlinks 
--bare module3-work.git module3.git 

The previous module repository module3-work.git is no longer needed and can be deleted.

> rm -rf modul3-work.git 

Step 5: Customize the directory structure of the repository module

So far, the directory structure of the new module repository looks the same as that of the main project. Only the unnecessary modules are missing. The adaptation of the directory structure can now be done through normal file operations. For this purpose, a clone with a workspace is necessary.

> git clone module3.git module3 

The src/module3 directory is renamed src and the test/module3 directory test.

> cd module3

> mv src/module3 module3

> rmdir src

> mv module3 src

> mv test/module3 module3

> rmdir test 

> mv module3 test

Subsequently, the changes are normally confirmed with the commit command and transferred to the bare repository with the push command.

> git add --all 

> git commit -m "Directory structure adapted"

> git push 

If there are other branches in the module repository, the file operations must be performed on all branches.

It often makes little sense to take the branches of the main project. The module starts a new release cycle, and the old branches are uninteresting.

Step 6: Remove the module directory from the main repository

After the separated module has been migrated to a separate repository, in the next steps the main repository is adapted. The now unnecessary directories of the separated module, src/module3 and test/module3, must be removed. The adaptation is quite normal at the file level in a clone of the main repository.

If there are other branches in the project, which should be adapted to the new structure, the changes need to be made there as well. The cherry-pick command can be used to transfer the changes automatically into several branches.

Integrating A Module Repository as An External Repository

After the previous sequence, there are now two separate repositories. Normally, the main project will continue to require the separated module, so an integration is necessary.

The integration options depend very strongly on the development platform used. So in a Java Maven project, you would build the separated module individually and store the resulting artifacts in a Maven repository. The main project would define the artifacts as dependencies and during the build get them from the Maven repository.

If you want to perform the integration with Git, the concept of submodules is available. With submodules, directories in a Git repository can be linked with other Git repositories.

Step 1: Integrate an external module into the main repository

In our example, we want to integrate the repository of module3 in the extern/module3 directory of the main project. The starting point is a clone of the main repository. We decide on the root directory of the project and add it to the module repository using the submodule add command. The first parameter is the path or URL to the module repository, and the second parameter is the future directory in the repository:

> git submodule add /global-path-to/module3.git extern/module3

The submodule add command creates a clone of the external repository in the specified directory. In addition, it is recorded in the current repository that the directory references an external repository.

The extern/module3 directory now points to the latest commit (HEAD) of the external repository.

Still, the submodule is only visible in the workspace. Only by using the commit command will the changes be reflected in the repository:

> git add –all

> git commit -m "Modul3 added"

You can use the push command to transfer the submodule shortcut to the central repository.

Why Not the Alternatives?

Why Not Use A New Repository?

An alternative to this workflow would be to simply create a new repository for the module. Thus, there would be no project history of the module in the new repository. It would also be possible to find the old versions in the main repository.

As long as this limitation does not bother you, this solution is very easy to implement.

Why Not Use --subdirectory-filter?

This workflow uses the filter-branch command with --index-filter. This allows you to remove files from commits.

The --subdirectory-filter option of the filter-branch command removes all files except the specified directory. In addition, it removes the selected directory on the new root of the repository.

As long as the module to be separated resides in exactly one directory, this command is easy to use. In this example, however, the module is divided into two directories, and thus could not be used with this filter.

Even if individual files of the module were formerly in other directories or the module directory has been renamed, subdirectory-filter would only import the history incompletely.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.61.218