The backup package

We are first going to write the backup package, of which we will become the first customer when we write the associated tools. The package will be responsible for deciding whether directories have changed and need backing up or not as well as actually performing the backup procedure.

Considering obvious interfaces first

One of the early things to think about when embarking on a new Go program is whether any interfaces stand out to you. We don't want to over-abstract or waste too much time upfront designing something that we know will change as we start to code, but that doesn't mean we shouldn't look for obvious concepts that are worth pulling out. If you're not sure, that is perfectly acceptable; you should write your code using concrete types and revisit potential abstractions after you have actually solved the problems.

However, since our code will archive files, the Archiver interface pops out as a candidate.

Create a new folder inside your GOPATH/src folder called backup, and add the following archiver.go code:

package backup 
type Archiver interface { 
  Archive(src, dest string) error 
} 

An Archiver interface will specify a method called Archive, which takes source and destination paths and returns an error. Implementations of this interface will be responsible for archiving the source folder and storing it in the destination path.

Note

Defining an interface up front is a nice way to get some concepts out of our heads and into the code; it doesn't mean that this interface can't change as we evolve our solution as long as we remember the power of simple interfaces. Also, remember that most of the I/O interfaces in the io package expose only a single method.

From the very beginning, we have made the case that while we are going to implement ZIP files as our archive format, we could easily swap this out later with another kind of Archiver format.

Testing interfaces by implementing them

Now that we have the interface for our Archiver types, we are going to implement one that uses the ZIP file format.

Add the following struct definition to archiver.go:

type zipper struct{} 

We are not going to export this type, which might make you jump to the conclusion that users outside of the package won't be able to make use of it. In fact, we are going to provide them with an instance of the type for them to use in order to save them from having to worry about creating and managing their own types.

Add the following exported implementation:

// Zip is an Archiver that zips and unzips files. 
var ZIP Archiver = (*zipper)(nil) 

This curious snippet of Go voodoo is actually a very interesting way of exposing the intent to the compiler without using any memory (literally 0 bytes). We are defining a variable called ZIP of type Archiver, so from outside the package, it's pretty clear that we can use that variable wherever Archiver is needed if you want to zip things. Then, we assign it with nil cast to the type *zipper. We know that nil takes no memory, but since it's cast to a zipper pointer, and given that our zipper struct has no state, it's an appropriate way of solving a problem, which hides the complexity of code (and indeed the actual implementation) from outside users. There is no reason anybody outside of the package needs to know about our zipper type at all, which frees us up to change the internals without touching the externals at any time: the true power of interfaces.

Another handy side benefit to this trick is that the compiler will now be checking whether our zipper type properly implements the Archiver interface or not, so if you try to build this code, you'll get a compiler error:

./archiver.go:10: cannot use (*zipper)(nil) (type *zipper) as type 
    Archiver in assignment:
  *zipper does not implement Archiver (missing Archive method)

We see that our zipper type does not implement the Archive method as mandated in the interface.

Tip

You can also use the Archive method in test code to ensure that your types implement the interfaces they should. If you don't need to use the variable, you can always throw it away using an underscore and you'll still get the compiler help:

var _ Interface = (*Implementation)(nil)

To make the compiler happy, we are going to add the implementation of the Archive method for our zipper type.

Add the following code to archiver.go:

func (z *zipper) Archive(src, dest string) error { 
  if err := os.MkdirAll(filepath.Dir(dest), 0777); err != nil { 
    return err 
  } 
  out, err := os.Create(dest) 
  if err != nil { 
    return err 
  } 
  defer out.Close() 
  w := zip.NewWriter(out) 
  defer w.Close() 
  return filepath.Walk(src, func(path string, info os.FileInfo, err error) 
  error { 
    if info.IsDir() { 
      return nil // skip 
    } 
    if err != nil { 
      return err 
    } 
    in, err := os.Open(path) 
    if err != nil { 
      return err 
    } 
    defer in.Close() 
    f, err := w.Create(path) 
    if err != nil { 
      return err 
    } 
    _, err = io.Copy(f, in) 
    if err != nil { 
      return err 
    } 
    return nil 
  }) 
} 

You will also have to import the archive/zip package from the Go standard library. In our Archive method, we take the following steps to prepare writing to a ZIP file:

  • Use os.MkdirAll to ensure that the destination directory exists. The 0777 code represents the file permissions with which you may need to create any missing directories
  • Use os.Create to create a new file as specified by the dest path
  • If the file is created without an error, defer the closing of the file with defer out.Close()
  • Use zip.NewWriter to create a new zip.Writer type that will write to the file we just created and defer the closing of the writer

Once we have a zip.Writer type ready to go, we use the filepath.Walk function to iterate over the source directory, src.

The filepath.Walk function takes two arguments: the root path and a callback function to be called for every item (files and folders) it encounters while iterating over the filesystem.

Tip

Functions are first class types in Go, which means you can use them as argument types as well as global functions and methods. The filepath.Walk function specifies the second argument type as filepath.WalkFunc, which is a function with a specific signature. As long as we adhere to the signature (correct input and return arguments) we can write inline functions rather than worrying about the filepath.WalkFunc type at all.

Taking a quick look at the Go source code tell us that the signature for filepath.WalkFunc matches the function we are passing in func(path string, info os.FileInfo, err error) error

The filepath.Walk function is recursive, so it will travel deep into subfolders too. The callback function itself takes three arguments: the full path of the file, the os.FileInfo object that describes the file or folder itself, and an error (it also returns an error in case something goes wrong). If any calls to the callback function result in an error (other than the special SkipDir error value) being returned, the operation will be aborted and filepath.Walk returns that error. We simply pass this up to the caller of Archive and let them worry about it, since there's nothing more we can do.

For each item in the tree, our code takes the following steps:

  • If the info.IsDir method tells us that the item is a folder, we just return nil, effectively skipping it. There is no reason to add folders to ZIP archives because the path of the files will encode that information for us.
  • If an error is passed in (via the third argument), it means something went wrong when trying to access information about the file. This is uncommon, so we just return the error, which will eventually be passed out to the caller of Archive. As the implementor of filepath.Walk, you aren't forced to abort the operation here; you are free to do whatever makes sense in your individual case.
  • Use os.Open to open the source file for reading, and if successful, defer its closing.
  • Call Create on the ZipWriter object to indicate that we want to create a new compressed file and give it the full path of the file, which includes the directories it is nested inside.
  • Use io.Copy to read all of the bytes from the source file and write them through the ZipWriter object to the ZIP file we opened earlier.
  • Return nil to indicate no errors.

This chapter will not cover unit testing or Test-driven Development (TDD) practices, but feel free to write a test to ensure that our implementation does what it is meant to do.

Tip

Since we are writing a package, spend some time commenting on the exported pieces so far. You can use golint to help you find anything you may have missed.

Has the filesystem changed?

One of the biggest problems our backup system has is deciding whether a folder has changed or not in a cross-platform, predictable, and reliable way. After all, there's no point in creating a backup if nothing is different from the previous backup. A few things spring to mind when we think about this problem: should we just check the last modified date on the top-level folder? Should we use system notifications to be informed whenever a file we care about changes? There are problems with both of these approaches, and it turns out it's not a simple problem to solve.

Tip

Check out the fsnotify project at https://fsnotify.org (project source: https://github.com/fsnotify). The authors are attempting to build a cross-platform package for subscription to filesystem events. At the time of writing this, the project is still in its infancy and it not a viable option for this chapter, but in the future, it could well become the standard solution for filesystem events.

We are, instead, going to generate an MD5 hash made up of all of the information that we care about when considering whether something has changed or not.

Looking at the os.FileInfo type, we can see that we can find out a lot of information about a file or folder:

type FileInfo interface { 
  Name() string       // base name of the file 
  Size() int64        // length in bytes for regular files;  
                         system-dependent for others 
  Mode() FileMode     // file mode bits 
  ModTime() time.Time // modification time 
  IsDir() bool        // abbreviation for Mode().IsDir() 
  Sys() interface{}   // underlying data source (can return nil) 
} 

To ensure we are aware of a variety of changes to any file in a folder, the hash will be made up of the filename and path (so if they rename a file, the hash will be different), size (if a file changes size, it's obviously different), the last modified date, whether the item is a file or folder, and the file mode bits. Even though we won't be archiving the folders, we still care about their names and the tree structure of the folder.

Create a new file called dirhash.go and add the following function:

package backup 
import ( 
  "crypto/md5" 
  "fmt" 
  "io" 
  "os" 
  "path/filepath" 
) 
func DirHash(path string) (string, error) { 
  hash := md5.New() 
  err := filepath.Walk(path, func(path string, info os.FileInfo, err error) 
  error { 
    if err != nil { 
      return err 
    } 
    io.WriteString(hash, path) 
    fmt.Fprintf(hash, "%v", info.IsDir()) 
    fmt.Fprintf(hash, "%v", info.ModTime()) 
    fmt.Fprintf(hash, "%v", info.Mode()) 
    fmt.Fprintf(hash, "%v", info.Name()) 
    fmt.Fprintf(hash, "%v", info.Size()) 
    return nil 
  }) 
  if err != nil { 
    return "", err 
  } 
  return fmt.Sprintf("%x", hash.Sum(nil)), nil 
} 

We first create a new hash.Hash function that knows how to calculate MD5s before using filepath.Walk again to iterate over all of the files and folders inside the specified path directory. For each item, assuming there are no errors, we write the differential information to the hash generator using io.WriteString, which lets us write a string to io.Writer and fmt.Fprintf, which does the same but exposes formatting capabilities at the same time, allowing us to generate the default value format for each item using the %v format verb.

Once each file has been processed, and assuming no errors occurred, we then use fmt.Sprintf to generate the result string. The Sum method in hash.Hash calculates the final hash value with the specified values appended. In our case, we do not want to append anything since we've already added all of the information we care about, so we just pass nil. The %x format verb indicates that we want the value to be represented in hex (base 16) with lowercase letters. This is the usual way of representing an MD5 hash.

Checking for changes and initiating a backup

Now that we have the ability to hash a folder and perform a backup, we are going to put the two together in a new type called Monitor. The Monitor type will have a map of paths with their associated hashes, a reference to any Archiver type (of course, we'll use backup.ZIP for now), and a destination string representing where to put the archives.

Create a new file called monitor.go and add the following definition:

type Monitor struct { 
  Paths       map[string]string 
  Archiver    Archiver 
  Destination string 
} 

In order to trigger a check for changes, we are going to add the following Now method:

func (m *Monitor) Now() (int, error) { 
  var counter int 
  for path, lastHash := range m.Paths { 
    newHash, err := DirHash(path) 
    if err != nil { 
      return counter, err 
    } 
    if newHash != lastHash { 
      err := m.act(path) 
      if err != nil { 
        return counter, err 
      } 
      m.Paths[path] = newHash // update the hash 
      counter++ 
    } 
  } 
  return counter, nil 
} 

The Now method iterates over every path in the map and generates the latest hash of that folder. If the hash does not match the hash from the map (generated the last time it checked), then it is considered to have changed and needs backing up again. We do this with a call to the as-yet-unwritten act method before then updating the hash in the map with this new hash.

To give our users a high-level indication of what happened when they called Now, we are also maintaining a counter, which we increment every time we back up a folder. We will use this later to keep our end users up to date on what the system is doing without bombarding them with information:

m.act undefined (type *Monitor has no field or method act) 

The compiler is helping us again and reminding us that we have yet to add the act method:

func (m *Monitor) act(path string) error { 
  dirname := filepath.Base(path) 
  filename := fmt.Sprintf("%d.zip", time.Now().UnixNano()) 
  return m.Archiver.Archive(path, filepath.Join(m.Destination,  dirname, filename)) 
} 

Because we have done the heavy lifting in our ZIP Archiver type, all we have to do here is generate a filename, decide where the archive will go, and call the Archive method.

Tip

If the Archive method returns an error, the act method and then the Now method will each return it. This mechanism of passing errors up the chain is very common in Go and allows you to either handle cases where you can do something useful to recover or else defer the problem to somebody else.

The act method in the preceding code uses time.Now().UnixNano() to generate a timestamp filename and hardcodes the .zip extension.

Hardcoding is OK for a short while

Hardcoding the file extension like we have is OK in the beginning, but if you think about it, we have blended concerns a little here. If we change the Archiver implementation to use RAR or a compression format of our making, the .zip extension would no longer be appropriate.

Tip

Before reading on, think about what steps you might take to avoid this hardcoding. Where does the filename extension decision live? What changes would you need to make in order to avoid hardcoding?

The right place for the filename extensions decision is probably in the Archiver interface, since it knows the kind of archiving it will be doing. So we could add an Ext() string method and access that from our act method. But we can add a little extra power with not much extra work by allowing Archiver authors to specify the entire filename format rather than just the extension instead.

Back in archiver.go, update the Archiver interface definition:

type Archiver interface { 
  DestFmt() string 
  Archive(src, dest string) error 
} 

Our zipper type needs to now implement this:

func (z *zipper) DestFmt() string { 
  return "%d.zip" 
} 

Now that we can ask our act method to get the whole format string from the Archiver interface, update the act method:

func (m *Monitor) act(path string) error { 
  dirname := filepath.Base(path) 
  filename := fmt.Sprintf(m.Archiver.DestFmt(), time.Now().UnixNano()) 
  return m.Archiver.Archive(path, filepath.Join(m.Destination, dirname, 
  filename)) 
} 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.179.100