Chapter 11
Dependencies between Repositories
In Git the repository is the release unit, in the sense that a version, branch and tag can only be created on a whole repository. If a project contains subprojects that each has its own release cycle and thus its own versions, there must also be a repository for each subproject.
The relationship between the main project and the sub-projects can be implemented with Git submodule or subtree command.
Note
The subtree command is first officially included in Git in version 1.7.11. However, it is only an optional component located under the contrib directory. Some Git installation packages include the subtree command automatically, others require manual installation.
The main difference between the submodule and subtree concepts is that with submodules the main repository only references the module repositories, whereas with subtree the contents of the module repositories are imported into the main repository.
Dependencies with Submodules
With submodules, a module repository can be embedded into the main repository. For this purpose, a directory with a commit in a module repository will be linked to the main repository.
Figure 11.1 shows the basic structure. There are two repositories: main and sub. In the main repository, the sub directory will be linked with the module repository. In the workspace of the main repository there is a complete module repository under the sub directory. The actual main repository will only reference the module repository. For this purpose, there is a file named .gitmodules, in which the absolute path to each module repository is defined.
[submodule "sub"] path = sub url = /project/sub
In addition to the .gitmodules file, the references to submodules are also stored in the .git/config file. This is done when you call the submodule init command, which reads the .gitmodules file and writes the information in the .git/config file. This indirect configuration allows the paths to the module repositories to be adjusted locally in the .git/config file.
[core] repositoryformatversion = 0 filemode = true bare = false logallrefupdates = true ignorecase = true [submodule "sub"] url = /project/sub
Figure 11.1: The fundamentals of submodules
With the previous information it would not be possible for each commit in the main repository to reproduce the corresponding version of the module repository. As such, the commit in the module repository is still needed. This will be stored in the object tree in the main repository. The following is the object tree. The third entry, sub, is a sub-module, which can be recognized by its commit type. The hash that follows references the commit in the module repository.
100644 blob 1e2b1d1d51392717a479eaaaa79c82df1c35d442 .gitmodules 100644 tree 19102815663d23f8b75a47e7a01965dcdc96468c src 160000 commit 7fa7e1c1bd6c920ba71bd791f35969425d28b91b sub
Step by Step
Embedding a submodule
An existing Git project is embedded as a submodule in a different project.
1. Link the directory
To include a submodule, you have to use the submodule add command and pass the absolute path to the module repository and the directory name:
> git submodule add /global-path-to/sub sub
As a result, the module repository is completely cloned into the specified directory (it creates its own .git directory). In addition, the .gitmodules file in the main repository is created or updated.
2. Register the submodule in config
In addition, the new submodule has to be registered in the .git/config file. This is done by using the submodule init command.
> git submodule init
3. Select the submodule version
The workspace of the module repository is initially set to the HEAD of the default branch. To use another commit in the submodule, use the checkout command and select the appropriate version:
> cd sub > git checkout v1.0
4. Add the .gitmodules file and subdirectories to the commit
When you add a submodule, the .gitmodules
file will be created or updated and must be added to the commit. In addition, the new directory of the submodule must also be added:
> cd .. > git add .gitmodules > git add sub
5. Do a commit
Finally, you need to do a commit in the main repository.
> git commit -m "Submodule added"
If a repository that contains submodules is cloned, you must call the submodule init command. This command will transfer the URLs of the submodules in the .git/config file. Subsequently, calling the submodule update command will clone the directories of the module repositories.
Step by Step
Cloning a project with submodules
When you clone a repository that contains a submodule, initially only the main repository will be created in the workspace. The submodule must be explicitly initialized and updated.
1. Initialize the submodule
First, you must register the submodule using the submodule init command:
> git submodule init
2. Update the submodule
After the submodule is initialized in the Git configuration, you can use the submodule update command to completely download the submodule:
> git submodule update
You can view the hash of a referenced commit in a submodule using the submodule status command. Any tag will be shown in brackets at the end of the output.
> git submodule status
091559ec65c0ded42556714c3e6936c3b1a90422 sub (v1.0)
Git always refers to exactly one commit in the module repository. Meanwhile the commit hash is also part of every commit in the main repository. It follows that new commits in the module repository are not automatically recorded in the main repository. This behavior is explicitly intended, so that you have to obtain the matching project version in the module repository when restoring a project version in the main repository.
Step by Step
Using a new version in the submodule
There is a new version of the submodule that should be used. What do you do?
1. Update the submodule
First, bring the local workspace of the submodule to the desired state. Typically, you would start with a fetch command to get the latest commit in the module repository.
> cd sub > git fetch
Next, specify the desired commit with the checkout command.
> git checkout v2.0
2. Use the new version
Finally, prepare the new commit in the submodule directory and do a commit.
> cd .. > git add sub > git commit -m "New version of the submodule"
If you intend to use a new version of the module repository in the main repository, you have to explicitly change this.
If you are working in the main repository and at the same time working in the module repository, then you have to commit changes to both repositories. If you have a central repository, both repositories must be transmitted separately using the push command.
Step by Step
Working with submodules
In a workspace files in the main repository and files in the module repository have been changed. The main repository should then point to the new commit in the module repository.
1. Commit and push changes in the module repository
First, the changes in the module repository are
completed with a commit and possibly transmitted with the push command to the central repository:
> cd sub > git add foo.txt > git commit -m "Changed submodule" > git push
2. Commit and push changes in the main repository
Next, changes in the main repository, including the reference to the new commit in the module repository, are committed and, if necessary, transferred:
> cd .. > git add bar.txt > git add sub > git commit -m "New version of submodule"
After every update to a workspace that contains submodules, you should call the submodule update command to get the correct versions of the submodules.
If an entirely new submodule has been added, then before calling the submodule update command you should call submodule init.
As a developer, it is good enough if you run the init-update sequence after each update to the workspace (checkout, merge, rebase, reset, pull).
Step by Step
Updating a submodule
If a new version of a submodule was recorded by another developer, then you should update your own local clone and workspace.
> git submodule init
> git submodule update
From /project/sub 091559e..4722848 master -> origin/master * [new tag] v1.0 -> v1.0 * [new tag] v2.0 -> v2.0 Submodule path 'sub': checked out '472284843ce4c0b0bb503bc4921ab7...1e51'
The submodule init command transfers information from the .gitmodules file to the .git/config file only if there is no corresponding entry for the module. As such, the paths to the module repository can be adjusted locally. However, if another developer changed the official path of the .gitmodules file, the change will not be accepted. The submodule sync command does exactly this task. It updates the paths in the .git/config file and overwrites any local changes.
Dependencies with Subtrees
With subtrees, module repositories can be embedded into a Git repository. For this purpose, a directory in the repository is associated with a commit, a tag or a branch in a module repository. Unlike submodules, however, the entire content of an embedded module repository is imported and not just referenced in the main repository. This makes the main directory self-sufficient to work against.
Figure 11.2: The fundamentals of subtree
Figure 11.2 shows the basic structure. There are two repositories: main and sub. The sub directory in the main directory is linked with the module directory (using the subtree add command). In the main repository, under the sub directory, there are files from a version of the module repository.
Technically, the subtree add command imports all commits from the module repository to the main repository (See commits S1 and S2). Then, the current branch of the main repository is linked with the specified commit in the module repository (See merge commit G3). Internally, the subtree merge strategy (--strategy=subtree) is used. This results in a merge in the specified directory, so that the content of the module repository lands under the sub directory.
Step by Step
Embedding a subtree
To embed a module repository, you have to add it to the main repository using the subtree add command. (You only have to call subtree add once.) In this case, the subdirectory is specified as --prefix. In addition, the URL of the module repository and the desired tag or branch must also be specified:
> git subtree add --prefix=sub /global-path-to/sub v2.0
If the module repository’s history is not relevant in the context of the main repository, you can use the --squash option to only fetch the contents of the specified commit.
> git subtree add --squash --prefix=sub /global-path-to/sub master
As a result, a new merge commit is created and its hash added as a comment, so that the correct module commit can be fetched at the next update.
Unlike submodules, when cloning a repository that has subtrees, there is nothing special to be observed. The normal clone command will pick up the main repository and all the module repositories it contains.
> git clone /path-to/main
Step by Step
Using a new version of subtree
There is another version of an embedded subtree to be used.
The subtree pull command updates an embedded subtree. The same parameters used for subtree add can be used for subtree pull. If a tag has been used with an add, a new tag must be used. If a branch was used, the same branch or a different branch can be specified. If there are no changes to the branch, subtree pull will do nothing.
> git subtree pull --prefix=sub /global-path-to/sub v2.1
Also, by using the --squash option with a pull, you can skip the module repository’s history. In this case, no intermediate commits will be brought, only the one specified. You can also use the --squash option to return to an older version of the module repository, eg from v2.0 to v1.5.
> git subtree pull --squash --prefix=sub /global-path-to/sub master
Also with subtrees, it is possible to make changes in the embedded module directories. Here, nothing special needs to be done. You simply use the normal commit command. You can version changes in the main repository and one or more module directories in one commit.
Only when retransmitting the module changes in the respective repository do you need to take special precautions.
Step by Step
Propagating changes in a module repository
Changes in module directories are to be transferred to a module repository.
1. Separate changes in the module directory
First, use the subtree split command to separate changes in the module directory from other changes. This command will generate, based on the last known module repository commit, a new commit for each commit in which a module file has been changed. The result is a local branch which points to the new commits (for example, sub/master). If you do not use squash with the subtree add and subtree pull commands, use the --rejoin option. This will simplify the repeated invocation of split:
> git subtree split --rejoin --prefix sub --branch sub/master
2. Merge changes with the module repository
The local changes must be merged with the remote changes in the module repository. Therefore, first activate the newly created branch and then retrieve the latest version of the target branch. Afterward, both branches must be merged.
> git checkout sub/master > git fetch /global-path-to/sub master > git merge FETCH_HEAD
Note that the fetch with a URL creates a temporary reference FETCH_HEAD, which points to the most recent commit of the fetched branch. If you are working with a remote branch, you can of course use the remote name instead of the URL. After that, the target branch will be directly available, not only FETCH_HEAD.
3. Transfer changes to the module repository and delete the temporary branch
The local changes in the temporary branch must be pushed to the remote module repository. You can then switch back to the branch in the main repository and delete the temporary branch:
> git push /global-path-to/sub HEAD:master > git checkout master > git branch -d sub/master
As can be clearly seen from the above, most subtree operations are simpler than those of submodules. Only extracting the changes is similarly complex.
In many scenarios, however, no extraction is required, as you would work in the main repository and not in the module directory.
Summary
18.188.61.81