The Native Solution: gitlinks and git submodule

Git contains a command designed to work with submodules, called git submodule. I saved it for last for two reasons:

  • It is much more complicated than simply importing the history of subprojects into your main project’s repository.

  • It is fundamentally the same as but more restrictive than the script-based solution just discussed.

Even though it sounds like Git submodules should be the natural option, you should consider carefully before using them.

Git’s submodule support is evolving fast. The first mention of submodules in Git development history was by Linus Torvalds in April 2007, and there have been numerous changes since then. That makes it something of a moving target, so you should check git help submodule in your version of Git to find out if anything has changed since this book was written.

Unfortunately, the git submodule command is not very transparent; you won’t be able to use it effectively unless you understand exactly how it works. It’s a combination of two separate features: so-called gitlinks and the actual git submodule command.

gitlinks

A gitlink is a link from a tree object to a commit object.

Recall from Chapter 4 that each commit object points to a tree object and that each tree object points to a set of blobs and trees, which correspond (respectively) to files and subdirectories. A commit’s tree object uniquely identifies the exact set of files, filenames, and permissions attached to that commit. Also recall from Commit Graphs that the commits themselves are connected to each other in a directed acyclic graph, or DAG. Each commit object points to zero or more parent commits, and together they describe the history of your project.

But we haven’t yet seen a tree object pointing to a commit object. The gitlink is Git’s mechanism to indicate a direct reference to another Git repository.

Let’s try it out. As in Importing Subprojects with git pull -s subtree, we’ll create a myapp repository and import the Git source code into it:

$ cd /tmp
$ mkdir myapp
$ cd myapp

# Start the new super-project
$ git init
Initialized empty Git repository in /tmp/myapp/.git/

$ echo hello >hello.txt

$ git add hello.txt

$ git commit -m 'first commit'
[master (root-commit)]: created c3d9856: "first commit"
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 100644 hello.txt

But this time, when we import the git project, we’ll do so directly; we don’t use git archive like we did last time:

$ ls
hello.txt

# Copy in a repository clone 
$ git clone ~/git.git git
Initialized empty Git repository in /tmp/myapp/git/.git/

$ cd git

# Establish the desired submodule version
$ git checkout v1.6.0
Note: moving to "v1.6.0" which isn't a local branch
If you want to create a new branch from this checkout, you may do so
(now or later) by using -b with the checkout command again. Example:
  git checkout -b <new_branch_name>
HEAD is now at ea02eef... GIT 1.6.0

# Back to the super-project
$ cd ..

$ ls
git/  hello.txt

$ git add git

$ git commit -m 'imported git v1.6.0'
[master]: created b0814ac: "imported git v1.6.0"
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 160000 git

Because there already exists a directory called git/.git (created during the git clone), git add git knows to create a gitlink to it.

Warning

Normally, git add git and git add git/ (with the POSIX-compatible trailing slash indicating that git must be a directory) would be equivalent. But that’s not true if you want to create a gitlink! In the sequence we just showed, adding a slash to make the command git add git/ won’t create a gitlink at all; it will just add all the files in the git directory, which is probably not what you want.

Observe how the outcome of the preceding sequence differs from that of the related steps in Importing Subprojects with git pull -s subtree. In that section, the commit changed all the files in the repository. This time, the commit message shows that only one file changed. The resulting tree looks like this:

$ git ls-tree HEAD
160000 commit ea02eef096d4bfcbb83e76cfab0fcb42dbcad35e    git
100644 blob ce013625030ba8dba906f756967f9e9ca394464a      hello.txt

The git subdirectory is of type commit and has mode 160000. That makes it a gitlink.

Git usually treats gitlinks as simple pointer values or references to other repositories. Most Git operations, such as clone, do not dereference the gitlinks and then act on the submodule repository.

For example, if you push your project into another repository, it won’t push in the submodule’s commit, tree, and blob objects. If you clone your super-project repository, the subproject repository directories will be empty.

In the following example, the git subproject directory remains empty after the clone command:

$ cd /tmp

$ git clone myapp app2
Initialized empty Git repository in /tmp/app2/.git/

$ cd app2

$ ls
git/  hello.txt

$ ls git

$ du git
4       git

Gitlinks have the important feature that they link to objects that are allowed to be missing from your repository. After all, they’re supposed to be part of some other repository.

It is exactly because the gitlinks are allowed to be missing that this technique even achieves one of the original goals: partial checkouts. You don’t have to check out every subproject; you can check out just the ones you need.

So now you know how to create a gitlink and that it’s allowed to be missing. But missing objects aren’t very useful by themselves. How do you get them back? That’s what the git submodule command is for.

The git submodule Command

At the time of this writing, the git submodule command is actually just a 700-line Unix shell script called git-submodule.sh. And if you’ve read this book all the way through, you now know enough to write that script yourself. Its job is simple: to follow gitlinks and check out the corresponding repositories for you.

First of all, you should be aware that there’s no particular magic involved in checking out a submodule’s files. In the app2 directory we just cloned, you could do it yourself:

$ cd /tmp/app2

$ git ls-files --stage -- git
160000 ea02eef096d4bfcbb83e76cfab0fcb42dbcad35e 0    git

$ rmdir git

$ git clone ~/git.git git
Initialized empty Git repository in /tmp/app2/git/.git/

$ cd git

$ git checkout ea02eef
Note: moving to "ea02eef" which isn't a local branch
If you want to create a new branch from this checkout, you may do so
(now or later) by using -b with the checkout command again. Example:
  git checkout -b <new_branch_name>
HEAD is now at ea02eef... GIT 1.6.0

The commands you just ran are exactly equivalent to git submodule update. The only difference is that git submodule will do the tedious work, such as determining the correct commit ID to check out for you. Unfortunately, it doesn’t know how to do this without a bit of help:

$ git submodule update
No submodule mapping found in .gitmodules for path 'git'

The git submodule command needs to know one important bit of information before it can do anything: where can it find the repository for your submodule? It retrieves that information from a file called .gitmodules, which looks like this:

[submodule "git"]
        path = git
        url = /home/bob/git.git

Using the file is a two-step process. First, create the .gitmodules file, either by hand or with git submodule add. Because we created the gitlink using git add earlier, it’s too late now for git submodule add, so just create the file by hand:

$ cat >.gitmodules <<EOF
[submodule "git"]
        path = git
        url = /home/bob/git.git
EOF

Tip

The git submodule add command that performs the same operations is:

$ git submodule add /home/bob/git.git git

The git submodule add command will add an entry to the .gitmodules and populate a new Git repository with a clone of the added repository.

Next, run git submodule init to copy the settings from the .gitmodules file into your .git/config file:

$ git submodule init
Submodule 'git' (/home/bob/git.git) registered for path 'git'

$ cat .git/config
[core]
        repositoryformatversion = 0
        filemode = true
        bare = false
        logallrefupdates = true
[remote "origin"]
        url = /tmp/myapp
        fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
        remote = origin
        merge = refs/heads/master
[submodule "git"]
        url = /home/bob/git.git

The git submodule init command added only the last two lines.

The reason for this step is that you can reconfigure your local submodules to point at a different repository from the one in the official .gitmodules. If you make a clone of someone’s project that uses submodules, you might want to keep your own copy of the submodules and point your local clone at that. In that case, you wouldn’t want to change the module’s official location in .gitmodules, but you would want git submodule to look at your preferred location. So git submodule init copies any missing submodule information from .gitmodules into .git/config, where you can safely edit it. Just find the [submodule] section referring to the submodule you’re changing, and edit the URL.

Finally, run git submodule update to actually update the files; or, if needed, clone the initial sub-project repository:

# Force a complete new clone by removing what's there
$ rm -rf git

$ git submodule update
Initialized empty Git repository in /tmp/app2/git/.git/
Submodule path 'git': checked out 'ea02eef096d4bfcbb83e76cfab0fcb42dbcad35e'

Here, git submodule update goes to the repository pointed to in your .git/config, fetches the commit ID found in git ls-tree HEAD -- git, and checks out that revision in the directory specified in .git/config.

There are a few other things you need to know:

  • When you switch branches or git pull someone else’s branch, you always need to run git submodule update to obtain a matching set of submodules. This isn’t automatic, since it could cause you to lose work in the submodule by mistake.

  • If you switch to another branch and don’t issue git submodule update, Git will think you have deliberately changed your submodule directory to point at a new commit (when really it was the old commit you were using before). If you then git commit -a, you will accidentally change the gitlink. Be careful!

  • You can update an existing gitlink by simply checking out the right version of a submodule, executing git add on the submodule directory, and then running git commit. You don’t use the git submodule command for that.

  • If you have updated and committed a gitlink on your branch and if you git pull or git merge another branch that updates the same gitlink differently, Git doesn’t know how to represent this as a conflict and will just pick one or the other. You must remember to resolve conflicted gitlinks by yourself.

As you can see, the use of gitlinks and git submodule is quite complex. Fundamentally, the gitlink concept can represent perfectly how your submodules relate to your main project, but actually making use of that information is a lot harder than it sounds.

When considering how you want to use submodules in your own project, you need to consider carefully: is it worth the complexity? Note that git submodule is a standalone command like any other, and it doesn’t make the process of maintaining submodules any simpler than, say, writing your own submodule scripts or using the ext package described at the end of the previous section. Unless you have a real need for the flexibility that git submodule provides, you might consider using one of the simpler methods.

Even so, I fully expect that the Git development community will address the shortfalls and issues with the git submodule command, ultimately leading to a technically correct and very usable solution.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.156.231