Most systems, such as CVS or Subversion, track a series of revisions and store just the changes between each pair of files. This technique is meant to save storage space and overhead.
Internally, such systems spend a lot of time thinking about things
like the series of changes between A and B. When you update your files
from the central repository, for example, Subversion remembers that the
last time you updated the file you were at revision
r1095
, but now the repository is at revision
r1123
. Thus, the server must send you the diff
between r1095
and r1123
. Once your
Subversion client has these diffs, it can incorporate them into your
working copy and produce r1123
. (That’s how
Subversion avoids sending you all the contents of all files every time
you update.)
To save disk space, Subversion also stores its own repository as a series of
diffs on the server. When you ask for the diffs between
r1095
and r1123
, it looks up all
the individual diffs for each version between those two versions, merges
them together into one large diff, and sends you the result. But Git
doesn’t work like that.
In Git, as you’ve seen, each commit contains a tree, which is a list of files contained by that commit. Each tree is independent of all other trees. Git users still talk about diffs and patches, of course, since these are still extremely useful. Yet, in Git a diff and a patch are derived data, not the fundamental data they are in CVS or Subversion. If you look in the .git directory, you won’t find a single diff; if you look in a Subversion repository, it consists mostly of diffs.
Just as Subversion is able to derive the complete set of
differences between r1095
and
r1123
, Git can retrieve and derive the differences
between any two arbitrary states. But while Subversion must look at each
version between r1095
and r1123
,
Git doesn’t care about the intermediate steps.
Each revision has its own tree, but Git doesn’t require those to generate the diff; Git can operate directly on snapshots of the complete state at each of the two versions.
This simple difference in storage systems is one of the most important reasons that Git is so much faster than other revision control systems.
18.119.112.7