Comparing How Subversion and Git Derive diffs

Most systems, such as CVS or Subversion, track a series of revisions and store just the changes between each pair of files. This technique is meant to save storage space and overhead.

Internally, such systems spend a lot of time thinking about things like the series of changes between A and B. When you update your files from the central repository, for example, Subversion remembers that the last time you updated the file you were at revision r1095, but now the repository is at revision r1123. Thus, the server must send you the diff between r1095 and r1123. Once your Subversion client has these diffs, it can incorporate them into your working copy and produce r1123. (That’s how Subversion avoids sending you all the contents of all files every time you update.)

To save disk space, Subversion also stores its own repository as a series of diffs on the server. When you ask for the diffs between r1095 and r1123, it looks up all the individual diffs for each version between those two versions, merges them together into one large diff, and sends you the result. But Git doesn’t work like that.

In Git, as you’ve seen, each commit contains a tree, which is a list of files contained by that commit. Each tree is independent of all other trees. Git users still talk about diffs and patches, of course, since these are still extremely useful. Yet, in Git a diff and a patch are derived data, not the fundamental data they are in CVS or Subversion. If you look in the .git directory, you won’t find a single diff; if you look in a Subversion repository, it consists mostly of diffs.

Just as Subversion is able to derive the complete set of differences between r1095 and r1123, Git can retrieve and derive the differences between any two arbitrary states. But while Subversion must look at each version between r1095 and r1123, Git doesn’t care about the intermediate steps.

Each revision has its own tree, but Git doesn’t require those to generate the diff; Git can operate directly on snapshots of the complete state at each of the two versions.

This simple difference in storage systems is one of the most important reasons that Git is so much faster than other revision control systems.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.112.7