Zen and the Art of Docbase Maintenance

One of the themes winding its way through this section is a text-file-oriented approach to managing semistructured data. I’m not wedded to this approach, and in later chapters we’ll see applications that use both object-oriented and relational data stores. But I’ve found text-oriented methods to be appropriate for many practical purposes. These are, after all, the methods used by Internet applications—mail, news, and the Web—that have connected more people to more information than anyone a few years ago dreamed possible.

Let’s explore this notion of appropriate technology. Later in this chapter, we’ll see an example of a Perl function called getSeqInfo( ) (part of the Docbase::Navigate module in Example 7.16) that looks up a piece of an index by reading in a small text file. Shouldn’t that be done, instead, as an SQL query against a “real” database? Certainly it could be done that way. Whether it should, though, is another matter. Often we’re too easily swayed by a technological imperative that urges us to use the biggest available hammer to drive every nail. When a different tool is appropriate for a job, it makes sense to use it. Perl’s unparalleled strengths in two key realms—text processing and data structure wrangling—make it eminently appropriate for many of the challenges that confront a groupware developer.

Groupware applications rarely fail because their developers pick the wrong database engines. They fail, instead, because they don’t solve problems that really matter to people, in ways that people find convenient. To prove the worth of an application, you have to get it into people’s hands quickly, improve it continuously, and watch the outcome closely. Requirements that you couldn’t have anticipated—among them, storage requirements—will emerge. But groupware isn’t online transaction processing. The make-or-break issues aren’t likely to be the number of transactions per second that you can pump through the system or the nature of its concurrency controls. What matters most is fluid integration of structured and semistructured data. Achieving that requires rapid prototyping not only of code, but also of data structures.

For the same reason that it makes sense to prototype code using a scripting language, it makes sense to prototype data structures by externalizing the in-memory objects of a scripting language. If you later need to upgrade a primordial text-file-based and Perl-managed data store, you can. Nothing fundamental to your application should need to change. The getSeqInfo( ) function, for example, could continue to present the same API to the modules that use it. Only its implementation would need to change—from a file system lookup to a DataBase Management (DBM) lookup, or an SQL lookup, or perhaps an object-database lookup.

None of these alternatives will turn out to have mattered, though, if you end up ditching the application for reasons unrelated to the performance or capabilities of its storage engine. The application might fail because it addresses the wrong group, or the right group in the wrong way, or because it sets the data-entry threshold too high, or because people just don’t see a reason to use it, or for any number of other reasons.

If the application does succeed, you may want to upgrade its data store. But only if you have to! Nowadays we often find ourselves shipping the prototype—that is, delivering a Perl or Visual Basic solution—because there turns out to be no need to recast a solution in a compiled language. The same principle can apply in the data realm. The simplest possible data store—for example, Perl hashtables externalized to text files—will help you get a solution up and running quickly. And who knows? You might find yourself shipping the prototype because the simplest solution turns out to be effective and appropriate.

In this chapter, we’ll build a pair of Perl modules, Docbase::Indexer and Docbase::Navigate, which together implement the tabbed-index and sequential controls we saw earlier in this chapter. These modules expect a set of meta-tagged records like those created by the Docbase::Input module we explored in Chapter 6. Docbase::Indexer reads these records and writes a collection of tabbed-index pages plus some extra index information to support sequential navigation.

Is this an appropriate data-management solution? Reasonable people may differ. On the one hand, Perl arguably ought not to be doing so much of the indexing work that might otherwise be handled by a database engine. On the other hand, the simple indexing required for this application is well within Perl’s comfort zone. Either way, the real point is to give you a practical feel for how to express a semistructured data store as a richly interconnected web docbase. The principles that govern the expression of a data store, which I’ll illustrate here using an XML repository and Perl modules, can apply more broadly to any data store and any programming language.

We’ll proceed in phases. Starting with the dynamic version, we’ll build the tabbed-index pages, then tackle sequential navigation. Finally, we’ll reprise these themes for the static version. Along the way, we’ll explore ways you can use Perl to process text and wrangle data structures. These are basic skills, that, once mastered, prepare a groupware developer to meet a wide range of challenges.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.108.168