Visualizing the filesystem tree using a polar bar

We want to show in this recipe how to solve a "real-world" task—how to use matplotlib to visualize our directory occupancy.

In this recipe, you will learn how to visualize a filesystem tree with relative sizes.

Getting ready

We all have big hard drives that sometimes contain stuff that we usually forget about. It would be nice to see what is inside such a directory, and what the biggest file inside it is.

Although there are many more sophisticated and elaborate software products for this job, we want to demonstrate how this is achievable using Python and matplotlib.

How to do it...

Let's perform the following steps:

  1. Implement a few helper functions to deal with folder discovery and internal data structures.
  2. Implement the main function, draw(), that does the plotting.
  3. Implement the main program body that verifies the user input arguments:
    import os
    import sys
    
    import matplotlib.pyplot as plt
    import matplotlib.cm as cm
    import numpy as np
    
    def build_folders(start_path):
        folders = []
    
        for each in get_directories(start_path):
            size = get_size(each)
            if size >= 25 * 1024 * 1024:
                folders.append({'size' : size, 'path' : each})
    
        for each in folders:
            print "Path: " + os.path.basename(each['path'])
            print "Size: " + str(each['size'] / 1024 / 1024) + " MB"
        return folders
    
    def get_size(path):
        assert path is not None
    
        total_size = 0
        for dirpath, dirnames, filenames in os.walk(path):
            for f in filenames:
                fp = os.path.join(dirpath, f)
                try:
                    size = os.path.getsize(fp)
                    total_size += size
                    #print "Size of '{0}' is {1}".format(fp, size)
                except OSError as err:
                    print str(err)
                    pass
        return total_size
    
    def get_directories(path):
        dirs = set()
        for dirpath, dirnames, filenames in os.walk(path):
            dirs = set([os.path.join(dirpath, x) for x in dirnames])
            break # we just want the first one
        return dirs
    
    def draw(folders):
        """ Draw folder size for given folder"""
        figsize = (8, 8)  # keep the figure square
        ldo, rup = 0.1, 0.8  # leftdown and right up normalized
        fig = plt.figure(figsize=figsize)
        ax = fig.add_axes([ldo, ldo, rup, rup], polar=True)
    
        # transform data
        x = [os.path.basename(x['path']) for x in folders]
        y = [y['size'] / 1024 / 1024 for y in folders]
        theta = np.arange(0.0, 2 * np.pi, 2 * np.pi / len(x))
        radii = y
    
        bars = ax.bar(theta, radii)
        middle = 90/len(x)
        theta_ticks = [t*(180/np.pi)+middle for t in theta]
        lines, labels = plt.thetagrids(theta_ticks, labels=x, frac=0.5)
        for step, each in enumerate(labels):
            each.set_rotation(theta[step]*(180/np.pi)+ middle)
            each.set_fontsize(8)
    
        # configure bars
        colormap = lambda r:cm.Set2(r / len(x))
        for r, each in zip(radii, bars):
            each.set_facecolor(colormap(r))
            each.set_alpha(0.5)
    
        plt.show()
  4. Next, we will implement the main program body where we verify the input arguments given by the user when the program is called from the command line:
    if __name__ == '__main__':
        if len(sys.argv) is not 2:
            print "ERROR: Please supply path to folder."
            sys.exit(-1)
    
        start_path = sys.argv[1]
    
        if not os.path.exists(start_path):
            print "ERROR: Path must exits."
            sys.exit(-1)
    
        folders = build_folders(start_path)
    
        if len(folders) < 1:
            print "ERROR: Path does not contain any folders."
            sys.exit(-1)
    
        draw(folders)
  5. You need to run the following from the command line:
    $ pythonch04_rec11_filesystem.py /usr/
    
  6. It will produce a plot similar to this one:
    How to do it...

How it works...

We will start from the bottom of the code, after if __name__ == '__main__' because this is the place where our program starts.

Using the module sys, we pick up the command-line arguments; they represent the path to the directory we want to visualize.

The function build_folders builds the list of dictionaries, each containing the size and path that it found inside the given start_path. This function calls get_directories, which returns a list of all the subdirectories in start_path. Later, for each directory found, we calculated the sizes in bytes using the get_size function.

For debugging purposes, we print our dictionary so that we are able to compare the figure against what our data looks like.

After we have built the folders as a list of dictionaries, we pass them to a function, draw, that performs all the work of transforming the data to the right dimensions (here, we are using the polar coordinate system), constructing the polar figure, and drawing all the bars, ticks, and labels.

Strictly speaking, we should divide this job into smaller functions, especially if this code is to be further developed.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.136.18.218