Managing objects

We've been focused on objects and their attributes and methods. Now, we'll take a look at designing higher-level objects; the kinds of objects that manage other objects. The objects that tie everything together.

The difference between these objects and most of the examples we've seen so far is that our examples tend to represent concrete ideas. Management objects are more like office managers; they don't do the actual "visible" work out on the floor, but without them, there would be no communication between departments and nobody would know what they are supposed to do. Analogously, the attributes on a management class tend to refer to other objects that do the "visible" work; the behaviors on such a class delegate to those other classes at the right time, and pass messages between them.

As an example, we'll write a program that does a find and replace action for text files stored in a compressed ZIP file. We'll need objects to represent the ZIP file and each individual text file (luckily, we don't have to write these classes, they're available in the Python Standard Library). The manager object will be responsible for ensuring three steps occur in order:

  1. Unzipping the compressed file.
  2. Performing the find and replace action.
  3. Zipping up the new files.

The class is initialized with the .zip filename and search and replace strings. We create a temporary directory to store the unzipped files in, so that the folder stays clean. We also add a useful helper method for internal use that helps identify an individual filename inside that directory:

	import sys
	import os
	import shutil
	import zipfile
	
	class ZipReplace:
		def __init__(self, filename, search_string,
				replace_string):
			self.filename = filename
			self.search_string = search_string
			self.replace_string = replace_string
			self.temp_directory = "unzipped-{}".format(
				filename)
			
		def _full_filename(self, filename):
			return os.path.join(self.temp_directory, filename)

Then we create an overall "manager" method for each of the three steps. This method delegates responsibility to other methods. Obviously, we could do all three steps in one method, or indeed, in one script without ever creating an object. There are several advantages to separating the three steps:

  • Readability: The code for each step is in a self-contained unit that is easy to read and understand. The method names describe what the method does, and no additional documentation is required to understand what is going on.
  • Extensibility: If a subclass wanted to use compressed TAR files instead of ZIP files, it could override the zip and unzip methods without having to duplicate the find_replace method.
  • Partitioning: An external class could create an instance of this class and call the find and replace method directly on some folder without having to zip the content.

The delegation method is the first in the code below; the rest of the methods are included for completeness:


		def zip_find_replace(self):
			self.unzip_files()
			self.find_replace()
			self.zip_files()

		def unzip_files(self):
			os.mkdir(self.temp_directory)
			zip = zipfile.ZipFile(self.filename)
			try:
				zip.extractall(self.temp_directory)
			finally:
				zip.close()

		def find_replace(self):
			for filename in os.listdir(self.temp_directory):
				with open(self._full_filename(filename)) as file:
					contents = file.read()
				contents = contents.replace(
						self.search_string, self.replace_string)
				with open(
					self._full_filename(filename), "w") as file:
					file.write(contents)

		def zip_files(self):
			file = zipfile.ZipFile(self.filename, 'w')
			for filename in os.listdir(self.temp_directory):
				file.write(
					self._full_filename(filename), filename)
			shutil.rmtree(self.temp_directory)
	
	if __name__ == "__main__":
		ZipReplace(*sys.argv[1:4]).zip_find_replace()

For brevity, the code for zipping and unzipping files is sparsely documented. Our current focus is on object-oriented design; if you are interested in the inner details of the zipfile module, refer to the documentation in the standard library, either online at http://docs.python.org/library/zipfile.html or by typing import zipfile ; help(zipfile) into your interactive interpreter. Note that this example only searches the top-level files in a ZIP file; if there are any folders in the unzipped content, they will not be scanned, nor will any files inside those folders.

The last two lines in the code allow us to run the example from the command line by passing the zip filename, search string, and replace string as arguments:


python zipsearch.py hello.zip hello hi

Of course, this object does not have to be created from the command line; it could be imported from another module (to perform batch ZIP file processing) or accessed as part of a GUI interface or even a higher-level management object that knows what to do with ZIP files (for example to retrieve them from an FTP server or back them up to an external disk).

As programs become more and more complex, the objects being modeled become less and less like physical objects. Properties are other abstract objects and methods are actions that change the state of those abstract objects. But at the heart of every object, no matter how complex, is a set of concrete properties and well-defined behaviors.

Removing duplicate code

Often the code in management style classes such as ZipReplace is quite generic and can be applied in many different ways. It is possible to use either composition or inheritance to help keep this code in one place, thus eliminating duplicate code. Before we look at any examples of this, let's discuss a tiny bit of theory. Specifically: why is duplicate code a bad thing?

There are several reasons, but they all boil down to readability and maintainability. When we're writing a new piece of code that is similar to an earlier piece, the easiest thing to do is copy the old code and change whatever needs to change (variable names, logic, comments) to make it work in the new location. Alternatively, if we're writing new code that seems similar, but not identical to code elsewhere in the project, the easiest thing to do is write fresh code with similar behavior, rather than figure out how to extract the overlapping functionality.

But as soon as someone has to read and understand the code and they come across duplicate blocks, they are faced with a dilemma. Code that might have made sense suddenly has to be understood. How is one section different from the other? How are they the same? Under what conditions is one section called? When do we call the other? You might argue that you're the only one reading your code, but if you don't touch that code for eight months it will be as incomprehensible to you as to a fresh coder. When we're trying to read two similar pieces of code, we have to understand why they're different, as well as how they're different. This wastes the reader's time; code should always be written to be readable first.

Note

I once had to try to understand someone's code that had three identical copies of the same three hundred lines of very poorly written code. I had been working with the code for a month before I realized that the three "identical" versions were actually performing slightly different tax calculations. Some of the subtle differences were intentional, but there were also obvious areas where someone had updated a calculation in one function without updating the other two. The number of subtle, incomprehensible bugs in the code could not be counted.

Reading such duplicate code can be tiresome, but code maintenance is an even greater torment. As the preceding story suggests, keeping two similar pieces of code up to date can be a nightmare. We have to remember to update both sections whenever we update one of them, and we have to remember how the multiple sections differ so we can modify our changes when we are editing each of them. If we forget to update both sections, we will end up with extremely annoying bugs that usually manifest themselves as, "but I fixed that already, why is it still happening?"

The result is that people who are reading or maintaining our code have to spend astronomical amounts of time understanding and testing it compared to if we had written the code in a non-repetitive manner in the first place. It's even more frustrating when we are the ones doing the maintenance. The time we save by copy-pasting existing code is lost the very first time we have to maintain it. Code is both read and maintained many more times and much more often than it is written. Comprehensible code should always be paramount.

This is why programmers, especially Python programmers (who tend to value elegant code more than average), follow what is known as the Don't Repeat Yourself, or DRY principle. DRY code is maintainable code. My advice to beginning programmers is to never use the copy and paste feature of their editor. To intermediate programmers, I suggest they think thrice before they hit Ctrl + C.

But what should we do instead of code duplication? The simplest solution is often to move the code into a function that accepts parameters to account for whatever sections are different. This isn't a terribly object-oriented solution, but it is frequently sufficient. For example, if we have two pieces of code that unzip a ZIP file into two different directories, we can easily write a function that accepts a parameter for the directory to which it should be unzipped instead. This may make the function itself slightly more difficult to read, but a good function name and docstring can easily make up for that, and any code that invokes the function will be easier to read.

That's certainly enough theory! The moral of the story is: always make the effort to refactor your code to be easier to read instead of writing bad code that is only easier to write.

In practice

Let's explore two ways we can reuse existing code. After writing our code to replace strings in a ZIP file full of text files, we are later contracted to scale all the images in a ZIP file to 640x480. Looks like we could use a very similar paradigm to what we used in ZipReplace. The first impulse, obviously, would be to save a copy of that file and change the find_replace method to scale_image or something similar. But, that's just not cool. What if someday we want to change the unzip and zip methods to also open TAR files? Or maybe we want to use a guaranteed unique directory name for temporary files. In either case, we'd have to change it in two different places!

We'll start by demonstrating an inheritance-based solution to this problem. First we'll modify our original ZipReplace class into a superclass for processing generic ZIP files:

	import os
	import shutil
	import zipfile
	
	class ZipProcessor:
		def __init__(self, zipname):
			self.zipname = zipname
			self.temp_directory = "unzipped-{}".format(
				zipname[:-4])
	
		def _full_filename(self, filename):
			return os.path.join(self.temp_directory, filename)
	
		def process_zip(self):
			self.unzip_files()
			self.process_files()
			self.zip_files()
	
		def unzip_files(self):
			os.mkdir(self.temp_directory)
			zip = zipfile.ZipFile(self.zipname)
			try:
				zip.extractall(self.temp_directory)
			finally:
				zip.close()
	
		def zip_files(self):
			file = zipfile.ZipFile(self.zipname, 'w')
			for filename in os.listdir(self.temp_directory):
				file.write(self._full_filename(
					filename), filename)
			shutil.rmtree(self.temp_directory)

We changed the filename property to zipfile to avoid confusion with the filename local variables inside the various methods. This helps make the code more readable even though it isn't actually a change in design. We also dropped the two parameters to __init__ (search_string and replace_string) that were specific to ZipReplace. Then we renamed the zip_find_replace method to process_zip and made it call an (as yet undefined) process_files method instead of find_replace; these name changes help demonstrate the more generalized nature of our new class. Notice that we have removed the find_replace method altogether; that code is specific to ZipReplace and has no business here.

This new ZipProcessor class doesn't actually define a process_files method; so if we ran it directly, it would raise an exception. Since it actually isn't meant to be run directly, we also removed the main call at the bottom of the original script.

Now, before we move on to our image processing app, let's fix up our original zipsearch to make use of this parent class:


	from zip_processor import ZipProcessor
	import sys
	import os
	
	class ZipReplace(ZipProcessor):
		def __init__(self, filename, search_string,
				replace_string):
			super().__init__(filename)
			self.search_string = search_string
			self.replace_string = replace_string

		def process_files(self):
			'''perform a search and replace on all files
			in the temporary directory'''
			for filename in os.listdir(self.temp_directory):
				with open(self._full_filename(filename)) as file:
					contents = file.read()
				contents = contents.replace(
					self.search_string, self.replace_string)
				with open(
					self._full_filename(filename), "w") as file:
					file.write(contents)
	if __name__ == "__main__":
		ZipReplace(*sys.argv[1:4]).process_zip()

This code is a bit shorter than the original version, since it inherits its ZIP processing abilities from the parent class. We first import the base class we just wrote and make ZipReplace extend that class. Then we use super() to initialize the parent class. The find_replace method is still here, but we renamed it to process_files so the parent class can call it. Because this name isn't as descriptive as the old one, we added a docstring to describe what it is doing.

Now, that was quite a bit of work, considering that all we have now is a program that is functionally no different from the one we started with! But having done that work, it is now much easier for us to write other classes that operate on files in a ZIP archive, such as our photo scaler. Further, if we ever want to improve the zip functionality, we can do it for all classes by changing only the one ZipProcessor base class. Maintenance will be much more effective.

See how simple it is, now to create a photo scaling class that takes advantage of the ZipProcessor functionality. (Note: this class requires the third-party pygame library to be installed. You can download it from http://www.pygame.org/.)

	from zip_processor import ZipProcessor
	import os
	import sys
	from pygame import image
	from pygame.transform import scale
	
	class ScaleZip(ZipProcessor):
	
		def process_files(self):
			'''Scale each image in the directory to 640x480'''
			for filename in os.listdir(self.temp_directory):
				im = image.load(self._full_filename(filename))
				scaled = scale(im, (640,480))
				image.save(scaled, self._full_filename(filename))

	if __name__ == "__main__":
		ScaleZip(*sys.argv[1:4]).process_zip()

All that work we did earlier paid off! Look how simple this class is! All we do is open each file (assuming that it is an image; it will unceremoniously crash if the file cannot be opened), scale it, and save it back. The ZipProcessor takes care of the zipping and unzipping without any extra work on our part.

Or we can use composition

Now, let's try solving the same problem using a composition-based solution. Even though we're completely changing paradigms, from inheritance to composition, we only have to make a minor modification to our ZipProcessor class:

	import os
	import shutil
	import zipfile
	
	class ZipProcessor:
		def __init__(self, zipname, processor):
			self.zipname = zipname
			self.temp_directory = "unzipped-{}".format(
				zipname[:-4])
			self.processor = processor
		
		def _full_filename(self, filename):
			return os.path.join(self.temp_directory, filename)

		def process_zip(self):
			self.unzip_files()
			self.processor.process(self)
			self.zip_files()

		def unzip_files(self):
			os.mkdir(self.temp_directory)
			zip = zipfile.ZipFile(self.zipname)
			try:
				zip.extractall(self.temp_directory)
			finally:
				zip.close()
		
		def zip_files(self):
			file = zipfile.ZipFile(self.zipname, 'w')
				for filename in os.listdir(self.temp_directory):
			file.write(self._full_filename(filename), filename)
			shutil.rmtree(self.temp_directory)

All we did was change the initializer to accept a processor object. The process_zip function now calls a method on that processor object; the method called accepts a reference to the ZipProcessor itself. Now we can change our ZipReplace class to be a suitable processor object that no longer uses inheritance:

	from zip_processor import ZipProcessor
	import sys
	import os

	class ZipReplace:
		def __init__(self, search_string,
				replace_string):
		self.search_string = search_string
		self.replace_string = replace_string

		def process(self, zipprocessor):
			'''perform a search and replace on all files in the
			temporary directory'''
			for filename in os.listdir(
					zipprocessor.temp_directory):
				with open(
					zipprocessor._full_filename(filename)) as file:
					contents = file.read()
				contents = contents.replace(
					self.search_string, self.replace_string)
				with open(zipprocessor._full_filename(
						filename), "w") as file:
					file.write(contents)

	if __name__ == "__main__":
		zipreplace = ZipReplace(*sys.argv[2:4])
		ZipProcessor(sys.argv[1], zipreplace).process_zip()

We didn't actually change much here; the class no longer inherits from ZipProcessor, and when we process the files, we accept a zipprocessor object that gives us the function to calculate __full_filename. In the bottom two lines, when we run from the command line, we first construct a ZipReplace object. This is then passed into the ZipProcessor constructor so the two objects can communicate.

This design is a terrific separation of interests. Now we have a ZipProcessor that can accept any object that has a process method to do the actual processing. Further, we have a ZipReplace that can be passed to any method, function, or object that wants to call its process function; it is no longer tied to the zip processing code through an inheritance relationship; it could now be applied with equal ease to a local or network filesystem, for example, or to a different kind of compressed file such as a RAR archive.

Any inheritance relationship can be modeled as a composition relationship (change the "is a" to a "has a parent") instead, but that does not mean it always should be. And the reverse is not true, most composition relationships cannot be (properly) modeled as inheritance.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.80.34