Case study

For this case study, we'll try to delve further into the question, "when should I choose an object versus a built-in type?" We'll be modeling a Document class that might be used in a text editor or word processor. What objects, functions, or properties should it have?

We might start with a str for the Document contents, but strings aren't mutable. A mutable object is one that can be changed; but a str is immutable, we can't insert a character into it or remove one without creating a brand new string object. That's leaving a lot of str objects for Python's garbage collector to clean up behind us. So, instead of a string, we'll use a list of characters, which we can modify at will. In addition, a Document would need to know the current cursor position within the list, and should also store a filename for the document.

Now, what methods should it have? There are a lot of things we might want to do to a text document, including inserting and deleting characters, cut, copy, paste, and saving or closing the document. It looks like there are copious amounts of both data and behavior, so it makes sense to put all this stuff into its own Document class.

The question is, should this class be composed of a bunch of basic Python objects such as str filenames, int cursor positions, and a list of characters? Or should some or all of those things be specially defined objects in their own right? What about individual lines and characters, do they need to have classes of their own?

We'll answer these questions as we go, but let's just design the simplest possible Document class first and see what it can do:

	
	class Document:
		def __init__(self):
			self.characters = []	
			self.cursor = 0	
			self.filename = ''	
		
		def insert(self, character):
			self.characters.insert(self.cursor, character)
			self.cursor += 1
	
		def delete(self):
			del self.characters[self.cursor]
			
		def save(self):
			f = open(self.filename, 'w')
			f.write(''.join(self.characters))
			f.close()

		def forward(self):
			self.cursor += 1

		def back(self):
			self.cursor -= 1

This simple class allows us full control over editing a basic document. Have a look at it in action:


>>> doc = Document()
>>> doc.filename = "test_document"
>>> doc.insert('h')
>>> doc.insert('e')
>>> doc.insert('l')
>>> doc.insert('l')
>>> doc.insert('o')
>>> "".join(doc.characters)
'hello'
>>> doc.back()
>>> doc.delete()
>>> doc.insert('p')
>>> "".join(doc.characters)
'hellp'

Looks like it's working. We could connect a keyboard's letter and arrow keys to these methods and the document would track everything just fine.

But what if we want to connect more than just arrow keys. What if we want to connect the Home and End keys as well? We could add more methods to the Document class that search forward or backwards for newline characters (in Python, a newline character, or represents the end of one line and the beginning of a new one) in the string and jump to them, but if we did that for every possible movement action (move by words, move by sentences, Page Up, Page Down, end of line, beginning of whitespace, and more), the class would be huge. Maybe it would be better to put those methods on a separate object. What we can do is turn the cursor

attribute into an object that is aware of its position and can manipulate that position. We can move the forward and back methods to that class, and add a couple more for the Home and End keys:

	class Cursor:
		def __init__(self, document):
			self.document = document
			self.position = 0
		
		def forward(self):
			self.position += 1

		def back(self):
			self.position -= 1
			
		def home(self):
			while self.document.characters[
					self.position-1] != '
':
				self.position -= 1
				if self.position == 0:
					# Got to beginning of file before newline
					break	

		def end(self):
			while self.position < len(self.document.characters
					) and self.document.characters[
						self.position] != '
':
				self.position += 1

This class takes the document as an initialization parameter so the methods have access to the contents of the document's character list. It then provides simple methods for moving backwards and forwards, as before, and for moving to the home and end positions.

Tip

This code is not very safe. You can very easily move past the ending position, and if you try to go home on an empty file it will crash. These examples are kept short to make them readable, that doesn't mean they are defensive! You can improve the error checking of this code as an exercise; it might be a great opportunity to expand your exception handling skills.

The Document class itself is hardly changed, except for removing the two methods that were moved to the Cursor class:

	class Document:
		def __init__(self):
			self.characters = []
			self.cursor = Cursor(self)
			self.filename = ''
		
		def insert(self, character):
			self.characters.insert(self.cursor.position,
					character)
			self.cursor.forward()

		def delete(self):
			del self.characters[self.cursor.position]

		def save(self):
			f = open(self.filename, 'w')
			f.write(''.join(self.characters))
			f.close()

We simply updated anything that accessed the old cursor integer to use the new object instead. We can test that the home method is really moving to the newline character.


>>> d = Document()
>>> d.insert('h')
>>> d.insert('e')
>>> d.insert('l')
>>> d.insert('l')
>>> d.insert('o')
>>> d.insert('
')
>>> d.insert('w')
>>> d.insert('o')
>>> d.insert('r')
>>> d.insert('l')
>>> d.insert('d')
>>> d.cursor.home()
>>> d.insert("*")
>>> print("".join(d.characters))
hello
*world

Now, since we've been using that string join function a lot (to concatenate the characters so we can see the actual document contents), we can add a property to the Document class to give us the complete string:


		@property 
		def string(self):
			return "".join(self.characters)

This makes our testing a little simpler:


>>> print(d.string)
hello
world

This framework is easy enough to extend to create a complete text editor document. Now, let's make it work for rich text; text that can have bold, underlined, or italic characters. There are two ways we could process this; the first is to insert "fake" characters into our character list that act like instructions such as "bold characters until you find a stop bold character". The second is to add information to each character indicating what formatting it should have. While the former method is probably more common, we'll implement the latter solution. To do that, we're obviously going to need a class for characters. This class will have an attribute representing the character, as well as three boolean attributes representing whether it is bold, italic, or underlined.

Hmm, Wait! Is this character class going to have any methods? If not, maybe we should use one of the many Python data structures instead; a tuple or named tuple would probably be sufficient. Are there any actions that we would want to do to, or invoke on a character?

Well, clearly, we might want to do things with characters, such as delete or copy them, but those are things that need to be handled at the Document level, since they are really modifying the list of characters. Are there things that need to be done to individual characters?

Actually, now that we're thinking about what a Character actually is... what is it? Would it be safe to say that a Character is a string? Maybe we should use an inheritance relationship here? Then we can take advantage of the numerous methods that str instances come with.

What sorts of methods are we talking about? There's startswith, strip, find, lower, and many more. Most of these methods expect to be working on strings that contain more than one character. In contrast, if Character were to subclass str, we'd probably be wise to override __init__ to raise an exception if a multi-character string were supplied. Since all those methods we'd get for free wouldn't really apply to our Character class, it turns out we shouldn't use inheritance, after all.

This leaves us at our first question; should Character even be a class? There is a very important special method on the object class that we can take advantage of to represent our characters. This method, called __str__ (two underscores, like __init__), is used in string manipulation functions like print and the str constructor to convert any class to a string. The default implementation does some boring stuff like printing the name of the module and class and its address in memory. But if we override it, we can make it print whatever we like. For our implementation, we can make it prefix characters with special characters to represent whether they are bold, italic, or underlined. So we will create a class to represent a character, and here it is:

	class Character:
		def __init__(self, character,
				bold=False, italic=False, underline=False):
			assert len(character) == 1
			self.character = character
			self.bold = bold
			self.italic = italic
			self.underline = underline

		def __str__(self):
			bold = "*" if self.bold else ''
			italic = "/" if self.italic else ''
			underline = "_" if self.underline else ''
			return bold + italic + underline + self.character

This class allows us to create characters and prefix them with a special character when the str() function is applied to them. Nothing too exciting there. We only have to make a few minor modifications to the Document and Cursor classes to work with this class. In the Document class, we add these two lines at the beginning of the insert method:

		def insert(self, character):
			if not hasattr(character, 'character'):
				character = Character(character)

This is a rather strange bit of code. Its basic purpose is to check if the character being passed in is a Character or a str. If it is a string, it is wrapped in a Character class so all objects in the list are Character objects. However, it is entirely possible that someone using our code would want to use a class that is neither Character nor string, using duck typing. If the object has a character attribute, we assume it is a "Character-like" object. But if it does not, we assume it is a "str-like" object and wrap it in a Character. This helps the program take advantage of duck typing as well as polymorphism; as long as an object has a character attribute, it can be used in the Document. This could be very useful, for example, if we wanted to make a programmer's editor with syntax highlighting: we'd need extra data on the character, such as what type of token the character belongs to.

In addition, we need to modify the string property on Document to accept the new Character values. All we need to do is call str() on each character before we join it:

		@property
		def string(self):
			return "".join((str(c) for c in self.characters))

This code uses a generator expression, which we'll discuss in the next chapter. It's simply a shortcut to perform a specific action on all the objects in a sequence.

Finally we also need to check Character.character, instead of just the string character we were storing before, in the home and end functions when we're looking to see if it matches a newline.

		def home(self):
			while self.document.characters[
					self.position-1].character != '
':
				self.position -= 1
				if self.position == 0:
					# Got to beginning of file before newline
					break

		def end(self):
			while self.position < len(
					self.document.characters) and 
					self.document.characters[
						self.position
						].character != '
':
				self.position += 1		

This completes the formatting of characters. We can test it to see that it works:


>>> d = Document()
>>> d.insert('h')
>>> d.insert('e')
>>> d.insert(Character('l', bold=True))
>>> d.insert(Character('l', bold=True))
>>> d.insert('o')
>>> d.insert('
')
>>> d.insert(Character('w', italic=True))
>>> d.insert(Character('o', italic=True))
>>> d.insert(Character('r', underline=True))
>>> d.insert('l')
>>> d.insert('d')
>>> print(d.string)
he*l*lo
/w/o_rld
>>> d.cursor.home()
>>> d.delete()
>>> d.insert('W')
>>> print(d.string)
he*l*lo
W/o_rld
>>> d.characters[0].underline = True
>>> print(d.string)
_he*l*lo
W/o_rld
>>>

As expected, whenever we print the string, each bold character is preceded by a *, each italic character by a /, and each underlined character by a _. All our functions seem to work, and we can modify characters in the list after the fact. We have a working rich text document object that could be plugged into a user interface and hooked up with a keyboard for input and a screen for output. Naturally, we'd want to display real bold, italic, and underlined characters on the screen, instead of using our __str__ method, but it was sufficient for the basic testing we demanded of it.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.79.65