APPENDIX A
Information Organization and Classification: Taxonomies and Metadata

Barb Blackburn, CRM, with Robert Smallwood; edited by Seth Earley

The creation of electronic documents and records is exploding exponentially, multiplying at an increasing rate, and sifting through all this information results in a lot of wasted, unproductive (and expensive) knowledge-worker time. This has real costs to the enterprise. According to the study, “The High Cost of Not Finding Information,” an IDC report, “knowledge workers spend at least 15 to 25 percent of the workday searching for information. Only half the searches are successful.”1 Experts point to poor taxonomy design as being at the root of these failed searches and lost productivity.

Taxonomies are at the heart of the solution to harnessing and governing information. Taxonomies are hierarchical classification structures used to standardize the naming and organization of information, and their role and use in managing electronic records cannot be overestimated.

Although the topic of taxonomies can get complex, in electronic records management (ERM), they are a sort of online card catalog that is cross-referenced with hyperlinks and is used to organize and manage records and documents.2

According to Forrester Research, taxonomies “represent agreed-upon terms and relationships between ideas or things and serve as a glossary or knowledge map helping to define how the business thinks about itself and represents itself, its products and services to the outside world.”3

Gartner Group researchers warn that “to get value from the vast quantities of information and knowledge, enterprises must establish discipline and a system of governance over the creation, capture, organization, access, and utilization of information.”4

Over time, organizations have implemented taxonomies to attempt to gain control over their mounting masses of information, creating an orderly structure to harness unstructured information (such as e-documents, e-mail messages, scanned records, and other digital assets), and to improve searchability and access.5

Taxonomies for electronic records management (ERM) standardize the vocabulary used to describe records, making it easier and faster for searches and retrievals to be made.

Search engines can deliver faster and more accurate results from good taxonomy design by limiting and standardizing terms. A robust and efficient taxonomy design is the underpinning that indexes collections of documents uniformly and helps knowledge workers find the proper files to complete their work. The way a taxonomy is organized and implemented is critical to the long-term success of any enterprise, as it directly impacts the quality and productivity of knowledge workers who need organized, trusted information to make business decisions.

It does not sound so complicated, simply categorizing and cataloguing information, yet most enterprises have had disappointing or inconsistent results from the taxonomies they use to organize information. Designing taxonomies is hard work. Developing an efficient and consistent taxonomy is a detailed, tedious, labor-intensive team effort on the front end, and its maintenance must be consistent and regular and follow established information governance (IG) guidelines, to maintain its effectiveness.

Once a taxonomy is in place, it requires systematic updates and reviews, to ensure that guidelines are being followed and new document and record types are included in the taxonomy structure. Technology tools like text mining, social tagging, and autoclassification can help uncover trends and suggest candidate terms (more on these technologies later in this chapter).

When done correctly, the business benefits of good taxonomy design go much further than speeding search and retrieval; an efficient, operational taxonomy also is a part of IG efforts that help the organization to manage and control information so that it may efficiently respond to litigation requests, comply with governmental regulations, and meet customer needs (both external and internal).

Taxonomies are crucial to finding information and optimizing knowledge worker productivity, yet some surveys estimate that nearly half of organizations do not have a standardized taxonomy in place.6

According to the Montague Institute, “The way your company organizes information (i.e. its taxonomy) is critical to its future. A taxonomy not only frames the way people make decisions, but also helps them find the information to weigh all the alternatives. A good taxonomy helps decision makers see all the perspectives, and ‘drill down’ to get details from each and explore lateral relationships among them” (italics added).7 Without it, your company will find it difficult to leverage intellectual capital, engage in electronic commerce, keep up with employee training, and get the most out of strategic partnerships.

With the explosion in growth of electronic documents and records, a standardized classification structure that a taxonomy imposes optimizes records retrievals for daily business operations and legal and regulatory demands.8

Since end-users can choose from topic areas, subject categories, or groups of documents, rather than blindly typing word searches, taxonomies narrow searches and speed search time and retrieval.9

“The link between taxonomies and usability is a strong one. The best taxonomies efficiently guide users to exactly the content they need. Usability is judged in part by how easily content can be found,” according to the Montague Institute.10

Importance of Navigation and Classification

Taxonomies need to be considered from two main perspectives: navigation and classification. Most people consider the former, but not the latter. The navigational construct that is represented by a taxonomy is evident in most file structures and file shares—the nesting of folders within folders—and in many Web applications where users are navigating hierarchical arrangements of pages or links. However, classification is frequently behind the scenes. A document can “live” in a folder that the user can navigate to. But within that folder, the document can be classified in different ways through the application of metadata. In these cases, the records indicate what business function created them. Metadata are descriptive fields that delineate a (document or) record's characteristics, such as author, title, department of origin, date created, length, number of pages or file size, and so forth. The metadata is also part of the taxonomy or related to the taxonomy. In this way, usability can be impacted by giving the user multiple ways to retrieve their information,11 while still maintaining the authenticity and evidence trail of the business function.

When Is a New Taxonomy Needed?

In some cases, organizations have existing taxonomy structures, but they have gone out of date or have not been maintained. They may not have been developed with best practices in mind or with correct representation of user groups, tasks, or applications. There are many reasons why taxonomies no longer provide the full value that they can provide. There are certain situations that clearly indicate that the organization needs a refactored or new taxonomy.12

If knowledge workers in your organization regularly conduct searches and receive hundreds of pages of results, then you need a new taxonomy. If you have developed a vast knowledge base of documents and records, and designated subject matter experts (SMEs), yet employees struggle to find answers, you need a new taxonomy. If there is no standardization of the way content is classified and catalogued, or there is conflict between how different groups or business units classify content, you need a new taxonomy. And if your organization has experienced delays, fines, or undue costs in producing documentation to meet compliance requests or legal demands, your organization needs to work on a new taxonomy.13

Taxonomies Improve Search Results

Taxonomies can improve a search engine's ability to deliver results to user queries in finding documents and records in an enterprise. The way the digital content is indexed (e.g. spidering, crawling, rule sets, algorithms) is a separate issue, and a good taxonomy improves search results regardless of the indexing method.14

Search engines struggle to deliver accurate and refined results since the wording in queries may vary and words can have multiple meanings. A taxonomy addresses these problems since the terms are set and defined in a controlled vocabulary.

Metadata, which, as stated earlier, are data fields that describe content, such as document type, creator, date of creation, and so forth, must be leveraged in the taxonomy design effort.

A formal definition of metadata is “standardized administrative or descriptive data about a document [or record] that is common for all documents [or records] in a given repository.” Standardized metadata elements of e-documents should be utilized and supported by including them in controlled vocabularies when possible.15

The goal of a taxonomy development effort is to help users find the information they need, in a logical and familiar way, even if they are not sure what the correct search terminology is. Good taxonomy design makes it easier and more comfortable for users to browse topics and drill down into more narrow searches to find the documents and records they need. Where it really becomes useful and helps contribute to productivity is when complex or compound searches are conducted.

Metadata and Taxonomy

One potential limitation of a purely hierarchical taxonomy is the lack of association between tiers (or nodes). There are often one-to-many or many-to-many associations between records. For example, an employee travels to a certification course. The resultant “expense report” is classified in the Finance/Accounts Payable/Travel Expense node of the taxonomy. The “course completion certificate” that is generated from the same travel (and is included as backup documentation for the expense report) is appropriately classified in the Human Resources/Training and Certification/Continuing Education node. For ERM systems that do not provide the functionality for a multifaceted taxonomy, metadata is used to provide the link between the nodes in the taxonomy (see Figure A.1).

Metadata schema must be structured to provide the appropriate associations as well as meet the users’ keyword search needs. It is important to limit the number of metadata fields that a user must manually apply to records. Most recordkeeping systems provide the functionality to automatically assign certain metadata to records based on rules that are established in advance and set up by a system administrator (referred in this book as inherited metadata). The record's classification or location in the taxonomy is appropriate for inherited metadata.

Metadata Link to Taxonomy Example

Figure A.1 Metadata Link to Taxonomy Example

Source: Blackburn Consulting.

Application of Metadata to Taxonomy Structure

Figure A.2 Application of Metadata to Taxonomy Structure

Metadata can also be applied by autocategorization software. This can reduce the level of burden placed on the user and increase the quality and consistency of metadata. These approaches need to be tested and fine-tuned to ensure that they meet the needs of the organization.16

The File Plan will provide the necessary data to link the taxonomy to the document via inherited metadata. In most systems, this metadata is applied by the system and is transparent to the users. Additional metadata will need to be applied by the user. To maintain consistency, a thesaurus, which contains all synonyms and definitions, is used to enforce naming conventions (see Figure A.2).

Metadata Governance, Standards, and Strategies

Metadata can be a scary term to a lot of people. It just sounds complicated. And it can get complicated. It is often defined as “data about data,” which is true but somewhat confusing, and this does not provide enough information for most people to understand.

“Meta” derives from the Greek word that means “alongside, with, after, next.” Metadata can be defined as “structured data about other data.”17

In electronic records management (ERM), metadata identifies a record and its contents. ERM metadata describes a record's characteristics so that it may be classified more easily and completely. Metadata fields, or terms, for e-records can be as basic as identifying the name of the document, the creator or originating department, the subject, the date it was created, the document type, the length of the document, its security classification, and its file type.

Creating standardized metadata terms is part of an information governance (IG) effort that enables faster, more complete, and more accurate searches and retrieval of records. This is important not only in everyday business operations, but also, for example, when searching through potentially millions of records during the discovery phase of litigation.

Good metadata management also assists in the maintenance of corporate memory and improving accountability in business operations.18

Using a standardized format and controlled vocabulary provides a “precise and comprehensible description of content, location, and value.”19 Using a controlled vocabulary means your organization has standardized a set of terms used for metadata elements describing records. This “ensures consistency across a collection” and helps with optimizing search and retrieval functions and records research, as well as meeting e-discovery requests, compliance demands, and other legal and regulatory requirements. Your organization may, for instance, decide to use the standardized Library of Congress Subject Headings as standard terms for the “subject” metadata field.20

Metadata also describes a record's relationships with other documents and records, and what actions may have been taken on the record over time. This helps to track its history and development, and aid in any future e-discovery requests.

The role of metadata in managing records is multifaceted; it helps to:

  • Identify the records, record creators and users, and the areas within which they are utilized.
  • Determine the relationships between records and the knowledge workers who use them, and the relationships between the records and the business processes they are supporting.
  • Assist in managing and preserving the content and structure of the record.
  • Support IG efforts that outline who has access to records, and the context (when and where) in which access to the records is granted.
  • Provide an audit trail to document changes to or actions upon the record and its metadata.
  • Support the finding and understanding of records and their relationships.21

In addition, good metadata management provides additional business benefits including increased management control over records, improved records authenticity and security, and reusability of metadata.22

Often, organizations will establish mandatory metadata terms that must accompany a record, and some optional ones that may help in identifying and finding it. A record is more complete with more metadata terms included, which also facilitates search and retrieval of records.23 This is particularly the case when knowledge workers are not quite sure which records they are searching for, and therefore enter some vague or conceptual search terms. So, the more detail that is in the metadata fields, the more likely the end user is to find the records they need to complete their work. This provides a measurable productivity benefit to the organization, although it is difficult to quantify. Certainly, search times will decrease upon implementation of a standardized metadata program, and improved work output and decisions will also follow.

Standardizing the metadata terms, definitions, and classifications for documents and records is done by developing and enforcing IG policy. This standardization effort gives users confidence that the records they are looking for are, in fact, the complete and current set they need to work with. And it provides the basis for a legally defensible records management program that will hold up in court.

A metadata governance program must be an ongoing effort that keeps metadata up-to-date and accurate. Often, once a metadata project is complete, attention to it wanes and maintenance tasks are not executed and soon the accuracy and completeness of searches for documents and records deteriorates. So metadata maintenance is an ongoing process and it must be formalized into a program that is periodically checked, tested, and audited.

Types of Metadata

There are several types or categories of metadata, including:

  • Descriptive metadata. Metadata that describes the intellectual content of a resource and is used for the indexing, discovery, and identification of a digital resource.
  • Administrative metadata. Metadata that includes management information about the digital resource, such as ownership and rights management.
  • Structural metadata. Metadata that is used to display and navigate digital resources and describes relationships between multiple digital files, such as page order in a digitized book.
  • Technical metadata. Metadata that describes the features of the digital file, such as resolution, pixel dimension, and hardware. The information is critical for migration and long-term sustainability of the digital resource.
  • Preservation metadata. Metadata that specifically captures information that helps facilitate management and access to digital files over time. This inherently includes descriptive, administrative, structural, and technical metadata elements that focus on the provenance, authenticity, preservation activity, technical environment, and rights management of an object.24

Core Metadata Issues

Some key considerations and questions that need to be answered for effective implementation of a metadata governance program are:

  • Who is the audience? Which users will be using the metadata in their daily operations? What is their skill level? Which metadata terms/fields are most important to them? What has been their approach to working with documents and records in the past and how can it be streamlined or improved? What terms are important to management? How can the metadata schema be designed to accommodate the primary audience and other secondary audiences? Answers to these questions will come only with close consultation with these key stakeholders.25
  • Who else can help? That is, which other stakeholders can help build a consensus on the best metadata strategy and approach? What other records creators, users, custodians, auditors, and legal counsel personnel can be added to the team to design a metadata approach that maximizes its value to the organization? Are there subject matter experts (SMEs)? What standards and best practices can be applied across functional boundaries to improve the ability of various groups to collaborate and leverage the metadata?
  • How can metadata governance be implemented and maintained? Creating IG guidelines and rules for metadata assignment, input, and upkeep are a critical step—but how will the program continue to be updated to maintain its value to the organization? What business processes and audit checks should be in place? How will the quality of the metadata be monitored and controlled? Who is accountable?
  • What will the user training program look like? How will users be trained initially, and how will continued education and reinforcement be communicated? Will there be periodic meetings of the IG or metadata team to discuss issues and concerns? What is the process for adding or amending metadata terms as the business progresses and changes? These questions must be answered, and a documented plan must be in place.
  • What will the communications plan be? Management time and resources are also needed to continue the practice of informing and updating users and encouraging compliance with internal metadata standards and policies. Users need to know on a consistent basis why metadata is important and the value that good metadata management can bring to the organization.26

International Metadata Standards and Guidance

Metadata is what gives an e-record its record status, or, in other words, electronic records metadata is what makes an electronic file a record. There are several established international standards for metadata structure, and additional guidance on strategy and implementation has been provided by standards groups such as ISO and ANSI/NISO, and other bodies, such as the Dublin Core Metadata Initiative (DCMI).

ISO 15489 Records Management Definitions and Relevance

The international records management standard ISO 15489 states that “a record should correctly reflect what was communicated or decided or what action was taken. It should be able to support the needs of the business to which it relates and be used for accountability purposes” and its metadata definition is “data describing context, content, and structure of records and their management through time.”27

A key difference between a document and a record is that a record is fixed, whereas a document can continue to be edited. (This line has been blurred with the advent of blockchain technology, which keeps records in a sequence, and creates a new record each time a change or update is made.) Preventing records from being edited can be partly accomplished by indicating their formal record status in a metadata field, among other controls.

Proving that a record is, in fact, authentic and reliable, necessarily includes proving that its metadata has remained intact and unaltered through the entire chain of custody of the record.

ISO Technical Specification 23081–1: 2006 Information and Documentation—Records Management Processes—Metadata for Records—Part 1: Principles

ISO 23081–1 “covers the principles that underpin and govern records management metadata. These principles apply through time to:

  • Records and their metadata;
  • All processes that affect them;
  • Any system in which they reside;
  • Any organization that is responsible for their management.28

The ISO 23081–1 standard provides guidance for metadata management within the “framework” of ISO 15489, and addresses the relevance and roles that metadata plays in records management intensive business processes. There are no mandatory metadata terms set, as these will differ by organization and by location and governing national and state/provincial laws.29 The standard lists 10 purposes or benefits of using metadata in records management, which can help build the argument for convincing users and managers of the importance of good metadata governance and its resultant benefits.

Dublin Core Metadata Initiative

The DCMI produced a basic or core set of metadata terms that have served as the basis for many public and private sector metadata governance initiatives. Initial work in workshops filled with experts from around the world took place in 1995 in Dublin, Ohio (not Ireland). From these working groups the idea of a set of “core metadata” or essential metadata elements with generic descriptions arose.30 “The fifteen-element ‘Dublin Core’ achieved wide dissemination as part of the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) and has been ratified as IETF RFC 5013, ANSI/NISO Standard Z39.85–2007, and ISO Standard 15836:2009.”

“Dublin Core has as its goals:31

Simplicity of creation and maintenance

The Dublin Core element set has been kept as small and simple as possible to allow a non-specialist to create simple descriptive records for information resources easily and inexpensively, while providing for effective retrieval of those resources in the networked environment.

Commonly understood semantics

Discovery of information across the vast commons of the Internet is hindered by differences in terminology and descriptive practices from one field of knowledge to the next. The Dublin Core can help the ‘digital tourist’—a non-specialist searcher—find his or her way by supporting a common set of elements, the semantics of which are universally understood and supported. For example, scientists concerned with locating articles by an author, and art scholars interested in works by a particular artist, can agree on the importance of a ‘creator’ element. Such convergence on a common, if slightly more generic, element set increases the visibility and accessibility of all resources, both within a given discipline and beyond.

International scope

The Dublin Core Element Set was originally developed in English, but versions are being created in many other languages, including Finnish, Norwegian, Thai, Japanese, French, Portuguese, German, Greek, Indonesian, and Spanish. The DCMI Localization and Internationalization Special Interest Group is coordinating efforts to link these versions in a distributed registry.

Although the technical challenges of internationalization on the World Wide Web have not been directly addressed by the Dublin Core development community, the involvement of representatives from virtually every continent has ensured that the development of the standard considers the multilingual and multicultural nature of the electronic information universe.

Extensibility

While balancing the needs for simplicity in describing digital resources with the need for precise retrieval, Dublin Core developers have recognized the importance of providing a mechanism for extending the DC element set for additional resource discovery needs. It is expected that other communities of metadata experts will create and administer additional metadata sets, specialized to the needs of their communities. Metadata elements from these sets could be used in conjunction with Dublin Core metadata to meet the need for interoperability. The DCMI Usage Board is presently working on a model for accomplishing this in the context of ‘application profiles.’

“The fifteen element ‘Dublin Core’ described in this standard is part of a larger set of metadata vocabularies and technical specifications maintained by the Dublin Core Metadata Initiative (DCMI). The full set of vocabularies, DCMI Metadata Terms [DCMI-TERMS], also includes sets of resource classes (including the DCMI Type Vocabulary [DCMI-TYPE]), vocabulary encoding schemes, and syntax encoding schemes. The terms in DCMI vocabularies are intended to be used in combination with terms from other, compatible vocabularies in the context of application profiles and on the basis of the DCMI Abstract Model [DCAM].”32

Global Information Locator Service

Global Information Locator Service (GILS) is ISO 23950, the international standard for information searching over networked (client/server) computers, which is a simplified version of structured query language (SQL). ISO 23950 is a federated search protocol that equates to the US standard ANSI/NISO Z39.50. The US Library of Congress is the official maintenance agency for both standards, “which are technically identical (though with minor editorial differences).”33

ISO 23950 (also known as ANSI/NISO standard Z39.50) grew out of the library science community, although it is widely used, particularly in the public sector.34 The use of GILS has tapered off as other metadata standards, at the international, national, industry level, and agency level have been established.35

“It [GILS] specifies procedures and formats for a client to search a database provided by a server, retrieve database records, and perform related information retrieval functions.” While it does not specify a format, information retrieval can be accomplished through full-text search, although it “also supports large, complex information collections.” The standard specifies how searches are made and how results are returned.

GILS helps people find information, especially in large, complex environments, such as across multiple government agencies. It is used in over 40 US states and several countries, including Argentina, Australia, Brazil, Canada, France, Germany, Hong Kong, India, Spain, Sweden, Switzerland, United Kingdom, and many others.36

Text Mining

On a continuing basis, text mining can be conducted on documents to learn of emerging potential taxonomy terms. Text mining is simply performing detailed full-text searches on the content of document. And with more sophisticated tools like neural computing and artificial intelligence (AI), concepts, not just key words, can be discovered and leveraged for improving search quality for users.

Another tool is the use of faceted search (sometimes referred to as faceted navigation or faceted browsing) where, for instance, document collections are classified in multiple ways, rather than in a single, rigid taxonomy. Knowledge workers may apply multiple filters to search across documents and records and find better and more complete results. And when they are not quite sure what they are looking for, or if it exists, then a good taxonomy can help suggest terms, related terms, and associated content, truly contributing to enterprise knowledge management (KM) efforts, adding to corporate memory and increasing the organizational knowledge base.37 Good KM helps to provide valuable training content for new employees, and helps to reduce the impact of turnover and retiring employees.

Search is ultimately about metadata—whether your content has explicit metadata or not. The search engine creates a forward index and determines what words are contained in the documents being searched. It then inverts that index to provide the documents that words are contained in. This is effectively metadata about the content. A taxonomy can be used to enrich that search index in various ways. This does require configuration and integration with search engines, but the result is the ability to increase both precision and recall of search results. Search results can also be grouped and clustered using a taxonomy. This allows large numbers of results to be more easily scanned and understood by the user. Many of these functions are determined by the capabilities of search tools and document and records management systems. As search functionality is developed, don't miss this opportunity to leverage the taxonomy.

Records Grouping Rationale

The primary reasons that records are grouped together are:

  • They tie together documents with like content, purpose, or theme.
  • To improve search and retrieval capabilities.
  • To identify content creators, owners, and managers.
  • To provide an understandable context.
  • For retention and disposition scheduling purposes.38

Taxonomies group records with common attributes. The groupings are constructed not only for records management classification and functions, but also to support end users in their search and retrieval activities. Associating documents of a similar theme enables users to find documents when they do not know the exact document name. Choosing the theme or topic enables the users to narrow their search to find the relevant information.

The theme or grouping also places the document name into context. Words have many meanings and adding a theme to them further defines them. For example, the word “article” could pertain to a newspaper article, an item or object, or a section of a legal document. If it were grouped with publications, periodicals, and so on the meaning would be clear. The challenge here is when to choose to have a separate category for “article” or to group “article” with other similar publications. Some people tend to develop finer levels of granularity in classification structures. These people can be called the “splitters.” Those who group things together are “lumpers.” But there can be clear rules for when to lump versus split. Experts recommend splitting into another category when business needs demand that we treat the content differently or users need to segment the content for some purpose. This rule can be applied to many situations when trying to determine whether a new category is needed.39

Management, security, and access requirements are usually based on a user's role in a process. Grouping documents based on processes makes the job of assigning the responsibilities and access easier. For example, documents used in financial processes can be sensitive and there is a need to restrict access to only those users that have the role in the business with a need to know.

Records retention periods are developed to be applied to a series (or group) of documents. When similar documents are grouped, it is easier to apply retention rules. However, when the grouping for retention is not the same as the grouping for other user views, a cross-mapping (file plan) scheme must be developed and incorporated into the taxonomy effort.

Business Classification Scheme, File Plans, and Taxonomy

In its simplest definition a business classification scheme (BCS) is a hierarchical conceptual representation of the business activity performed by an organization.40 The highest level of a BCS is called an Information Series, which signifies “high-level business functions” of a business or governmental agency, and the next level is Themes, which represent the specific activities that feed into the high-level functions at the information series level. These two top levels are rarely changed in an organization.41

A BCS is often viewed as synonymous with the term file plan, which is the shared file structure in an Electronic Records Management (ERM) System, but it is not a direct file plan.

Yet, a file plan can be developed and mapped back to the BCS and automated through an electronic document and records management system (EDRMS) or electronic records management (ERM) system.42

A BCS is required by ISO 15489, the international records management standard, and, together with the folders and records it contains, comprises what in the paper environment was called simply a “File plan.” A BCS is therefore a full representation of the business of an organization.

Classification and Taxonomy

Classification of records extends beyond the categorization of records in the taxonomy. It also must include the application of retention requirements. These are legal and business requirements that specify the length of time a record must be maintained. A Records Retention Schedule (RRS) is a document that identifies regulatory relevant records and specifies the periods for which an organization should retain these records to meet its operational needs. The RRS also is a guide that indicates legal and other statutory requirements. The Records Retention Schedule groups documents into records series that relate to specific business functions. This grouping is performed because laws and regulations are mainly based on the business functions that creates the documents. These business functions are not necessarily the same as the activities described in the hierarchy of the taxonomy. Therefore, there must be a method to map the RRS to the Taxonomy. This is accomplished with a File Plan. The File Plan facilitates the application of retention rules during document categorization without requiring a user to know or understand the Records Retention Schedule (see Figure A.3).

Mapping the Records Retention Schedule to the Taxonomy

Figure A.3 Mapping the Records Retention Schedule to the Taxonomy

Source: Blackburn Consulting.

Prebuilt Versus Custom Taxonomies

Taxonomy templates for specific vertical industries (e.g. law, pharmaceuticals, aerospace) are provided by content and knowledge management software, enterprise search vendors, and trade associations. These prebuilt taxonomies use consistent terminology, have been tried and tested, and incorporate industry best practices, where possible. They can provide a jump-start and faster implementation at a lower cost than developing a custom taxonomy in-house or with external consulting assistance.

There are advantages and disadvantages to each approach. A prebuilt taxonomy will typically have some parameters that are able to be configured to better meet the business needs of an organization, yet compromises and trade-offs will have to be made. It may also introduce unfamiliar terminology that knowledge workers will be forced to adapt to, increasing training time and costs, and reducing its overall effectiveness. These considerations must be factored into the “build or buy” decision. Using the custom-developed approach, a taxonomy can be tailored to meet the precise business needs of an organization or business unit and can include nuances such as company-specific nomenclature and terminology.43

Frequently, the longer and more costly customized approach must be used, since there are no prebuilt taxonomies that fit well. This is especially the case with niche enterprises or those operating in developing or esoteric markets. For mature industries, more prebuilt taxonomies and template choices exist. Attempting to tailor a prebuilt taxonomy can end up taking longer than building one from scratch if it is not a good fit in the first place, so best practices dictate that organizations use prebuilt taxonomies where practical, and custom design taxonomies where needed.

There really is no “one size fits all” when it comes to taxonomy. And even when two organizations do the exact same thing in the exact same industry, there will be differences in their culture, process, and content that will require customization and tuning of the taxonomy. Standards are useful for improving efficiency of a process, and taxonomy projects really are internal standards projects. However, competitive advantage is attained through differentiation. A taxonomy specifically tuned to meet the needs of a particular enterprise is actually a competitive advantage.44

There is one other alternative, which is to “autogenerate” a taxonomy from the metadata in a collection of e-documents and records by using sophisticated statistical techniques like term frequency and entity extraction to attempt to create a taxonomy. It seems to be perhaps the “best of both worlds” in that it offers instant customization at a low cost, but, although these types of tools can help provide useful insights into the data on the front end of a taxonomy project, providing valuable statistical renderings, the only way to focus on user needs is to interview and work with users to gain insights into their business process needs and requirements, while considering the business objectives of the taxonomy project. This cannot be done with mathematical computations—the human factor is key.

In essence, these autogenerated taxonomy tools can determine which terms and documents are used frequently, but they cannot assess the real value of information being used by knowledge workers and how they use the information. That takes consultation with stakeholders, studied observation, and business analysis. Machine-generated taxonomies look like they were generated by machines—which is to say that they are not very usable by humans.45

Thesaurus Use in Taxonomies

A thesaurus in the use of taxonomies contains the agreed-on synonyms and similar names for terms used in a controlled vocabulary. So, “invoice” may be listed as the equivalent term for “bill” when categorizing records. The thesaurus goes further and lists “information about each term and their relationships to other terms within the same thesaurus.”

A thesaurus is like a hierarchical taxonomy but also includes “associative relationships.”46 An associative relationship is a conceptual relationship. It is the “see also” that we may come across in the back-of-the-book index. But the question is: Why do we want to see it? Associative relationships can provide a linkage to specific classes of information of interest to users and for processes. Use of associative relationships can provide a great deal of functionality in content and document management systems and needs to be considered in records management applications.47

There are international standards for thesauri creation from International Organization for Standardization (ISO), American National Standards Institute (ANSI), and the British Standards Institution (BSI).48

ISO 25964-1:2011, “Information and Documentation—Thesauri and Interoperability with Other Vocabularies,” draws “on [the British standard, BS 8723] but reorganize[d] the content to fit into two parts.” Part 1, “Thesauri for Information Retrieval,” of the standard ISO 25964 was published in August 2011. Part 2, “Interoperability with Other Vocabularies,” was published in 2013.49

Taxonomy Types

Taxonomies used in ERM systems are usually hierarchical where categories (nodes) in the hierarchy progress from general to specific. Each subsequent node is a subset of the higher-level function. There are three basic types of hierarchical taxonomies: subject, business-unit, and functional.50

A subject taxonomy uses controlled terms for subjects. The subject headings are arranged in alphabetical order by the broadest subjects, with more precise subjects listed under them. An example is the Library of Congress subject headings (LCSH) used to categorize holdings in a library collection (see Figure A.4). Even the Yellow Pages could be considered a subject taxonomy.

Library of Congress Subject Headings

Figure A.4 Library of Congress Subject Headings

It is difficult to establish a universally recognized set of terms in a subject taxonomy. If users are unfamiliar with the topic, they may not know the appropriate term heading with which to begin their search. For example, say a person is searching through the Yellow Pages for a place to purchase eyeglasses. They begin their search alphabetically by turning to the E's and scanning for the term eyeglasses. Since there are no topics titled “eyeglasses,” the person consults the Yellow Pages index, finds the term eyeglasses, and this provides a list of preferred terms or “see alsos” that direct the person to “Optical—Retail” for a list of eyeglass businesses (see Figure A.5).51

In both examples (LCSH and Yellow Pages), the subject taxonomy is supported by a thesaurus. Again, a thesaurus is a controlled vocabulary that includes synonyms, related terms, and preferred terms. In the case of the Yellow Pages, the index functions as a basic thesaurus.

In a business-unit-based taxonomy, the hierarchy reflects the organizational charts (e.g. Department/Division/Unit). Records are categorized based on the business unit that manages them. Figure A.6 shows the partial detail of one node of a business-unit based taxonomy that was developed for a county government.52

One advantage of a business-unit-based taxonomy is that it mimics most existing paper-filing system schemas. Therefore, users are not required to learn a “new” system. However, conflicts arise when documents are managed or shared among multiple business units. As an example, for the county government referenced earlier, a property transfer document called the “TD1000” is submitted to the Recording Office for recording and then forwarded to the Assessor for property tax evaluation processing. This poses a dilemma as to where to categorize the TD1000 in the taxonomy.53

Another issue arises with organizational changes. When the organizational structure changes, so must the business-unit based taxonomy.

In a functional taxonomy records are categorized based on the functions and activities that produce them (function/activity/transaction). The organization's business processes are used to establish the taxonomy. The highest or broadest level represents the business functions. The next level down the hierarchy constitutes the activities performed for the function. The lowest level in the hierarchy consists of the records that are created as a result of the activity (a.k.a., the transactions).

Yellow Pages Example

Figure A.5 Yellow Pages Example

Figure A.7 shows partial detail of one node of a functional taxonomy developed for a state government regulatory agency. The agency organizational structure is based on regulatory programs. Within the program areas are similar (repeated) functions and activities (e.g. permitting, compliance, and enforcement, etc.). When the repeated functions and activities are universalized, the results are a “flatter” taxonomy. This type of taxonomy is better suited to endure organizational shifts and changes. In addition, the process of universalizing the functions and activities inherently results in broader and more generic naming conventions. This provides flexibility when adding new record types (transactions) because there will be fewer changes to the hierarchy structure.54

Community Government Business-Unit Taxonomy

Figure A.6 Community Government Business-Unit Taxonomy

One disadvantage of a functional taxonomy is its inability to address case files (or project files). A case file is a collection of records that relate to a entity, person, or project. The records in the case file can be generated by multiple activities. For example, at the regulatory agency, enforcement files are maintained that contain records generated by enforcement activities (Notice of Violation, Consent Decree, etc.) and other ancillary, but related activities such as Contracting, Inspections, and Permitting.55

To address the case file issue at the regulatory Agency, metadata cross-referencing was used to provide a virtual case-file view of the records collection (see Figure A.8).

A hybrid [taxonomy] is usually the best approach. There are certain business units that usually don't change over time. For example, accounting and human resources activities are fairly constant. Those portions of the taxonomy could be constructed in a business-unit manner even when other areas within the organization use a functional structure (see Figure A.9).56

State Government Regulatory Agency Functional Taxonomy

Figure A.7 State Government Regulatory Agency Functional Taxonomy

Faceted taxonomies allow for multiple organizing principles to be applied to information along various dimensions. Facets can contain subjects, departments, business units, processes, tasks, interests, security levels, and other attributes used to describe information. There is never really one single taxonomy, but rather collections of taxonomies that describe different aspects of information. In the e-commerce world, facets are used to describe brand, size, color, price, and other context-specific attributes. Records management systems can also be developed with knowledge and process attributes related to the enterprise.57

Business Process Analysis

To establish the taxonomy, business processes must be documented and analyzed. There are two basic process analysis methods: top-down and bottom-up. In the top-down method, a high-level analysis of business functions is performed to establish the higher tiers. Detailed analyses are performed on each business process to “fill in” the lower tiers. The detailed analyses are usually conducted in a phased approach and the taxonomy is incrementally updated.

Metadata Cross-Referencing within a Taxonomy

Figure A.8 Metadata Cross-Referencing within a Taxonomy

Basic Accounting Business-Unit Taxonomy

Figure A.9 Basic Accounting Business-Unit Taxonomy

In order to use the bottom-up method, detailed analyses must be performed for all processes in one effort. Using this method ensures that there will be fewer modifications to the taxonomy. However, this is sometimes not feasible for organizations with limited resources. A phased or incremental approach is usually more budget-friendly and places fewer burdens on the organization's resources.

There are many diagramming formats and tools that will provide the details needed for the analysis. The most basic diagramming can be accomplished with a standard tool such as Visio® from Microsoft. There are also more advanced modeling tools that could be used to produce the diagrams that provide the functionality to statistically analyze process changes through simulation and provide information for architecture planning and other process initiatives within the organization.

Any diagramming format will suffice as long as it depicts the flow of data through the processes showing process steps, inputs, and outputs (documents), decision steps, organizational boundaries, and interaction with information systems. The diagrams should depict document movement within as well as between the subject department and other departments or outside entities.

Figure A.10 uses a swim-lane type diagram. Each horizontal “lane” represents a participant or role. The flow of data and sequence of process steps is shown with lines (the arrows note the direction). Process steps are shown as boxes.

Complete Travel Request Form

Decision steps are shown as diamonds.

Approve Request

Documents are depicted as a rectangle with a curved bottom line.

Travel Request Form

The first step is to review any existing business process documentation (e.g. business plans, procedures manuals, employee training manuals, etc.) to gain a better understanding of the functions and processes. This is done in advance of interviews to provide a base-level understanding to reduce the amount of time required of the interviewees.

Two different types of interviews (high-level and detailed business process) are conducted with key personnel from each department. The initial (high-level) interviews are conducted with a representative that will provide an overall high-level view of the department including its mission, responsibilities, and identification of the functional areas. This person will identify those staff that will provide details of the specific processes in each of the functional areas identified. For instance, if the department is Human Resources, functional areas of the department might include: Applicant Processing, Classification, Training, and Personnel File Management. It is expected that this first interview/meeting will last approximately one hour.

Business Process Example—Travel Expense Process

Figure A.10 Business Process Example—Travel Expense Process

Source: Blackburn Consulting.

The second interviews will be detailed interviews that will focus on daily processes performed in each functional area. For example, if the function is Human Resources Classification, the process may be the creation/management of position descriptions. It is only necessary to interview one person that represents a process—there is no need to interview multiple staff performing the same function. These second interviews will likely last one to two hours each, depending on the complexity of the process.

When there are processes that “connect” (e.g. the output from one process is the input to another), it is useful to conduct group interviews with representatives for each process. This often results in “aha” moments when an employee from one process finally understands why they are sending certain records to another process. It also brings to light business process improvement (BPI) opportunities. When employees understand the big picture process, they can identify unnecessary process steps and redundant or obsolete documents that can be eliminated.

One purpose of process analysis is to develop taxonomy facets that can be used to surface information for steps in the process. In some cases, process steps can directly inform the types of artifacts that are needed at a part of the process and therefore be used to develop content types in knowledge management use cases. This is related to records management in that knowledge management applications are simply another lens under which content can be viewed. Process analysis can also help determine the scope of metadata for content. For example, when developing an application to view invoices, if the process includes understanding line item detail, this will dictate a different metadata model than if the process sought only to determine whether invoices over a certain threshold were unpaid. Different processes, different use cases, different metadata.58

Taxonomy Testing: A Necessary Step

Once a new taxonomy is developed, it must be tested and piloted to see if it meets user needs and expectations. To attempt the rollout of a new taxonomy without testing it first is imprudent and will end up costing more time and resources in the long run. So budget the time and money for it.59 Taxonomy testing is where the rubber meets the road; it provides real data to see if the taxonomy design has met user expectations and actually helps them in their work.

User testing provides valuable feedback and allows the taxonomist or taxonomy team to fine-tune the work they have done to more closely align the taxonomy with user needs and business objectives. What may have seemed an obvious term or category may, in fact, be way off. This may result from the sheer focus and myopia of the taxonomy team. So getting user feedback is essential.

There are many taxonomy testing tools that can assist in the design effort. Once an initial design is drafted, a “low-tech” approach is to handwrite classification categories and document types on post-it notes or index cards. Then bring in a sampling of users and ask them to place the notes or cards in the proper category. The results are tracked and calculated.60

Software is available to conduct this card sorting in a more high-tech way, and more sophisticated software to assist in the development and testing effort, and to help to update and maintain the taxonomy.

Regardless of the method used, the taxonomy team or even IG team or task force needs to be the designated arbiter when conflicting opinions arise.

Taxonomy testing is not a one-shot task; with feedback and changes, you progress in iterations closer and closer to meeting user requirements, which may take several rounds of testing and changes.61

Taxonomies can be tested in multiple ways. User acceptance throughout the derivation process can be simple conference room pilots or validation, formal usability testing based on use cases, card sorting (open and closed), and tagging processes. Autotagging of content with target taxonomies is also an area that requires testing.62

Taxonomy Maintenance

After a taxonomy has been implemented, it will need to be updated over time to reflect changes in document management processes as well to increase usability. Therefore, users should have the opportunity to suggest changes, addition, deletions, and so on. There should be a formal process in place to manage requests for changes. A person or committee should be assigned the responsibility to determine how and if each request will be facilitated.

There must be guidelines to follow in making changes to the taxonomy. A US State Agency organization uses the following guidelines in determining taxonomy changes:

  • The new term must have a definition, preferably provided by the proposer of the new term.
  • It should be a term someone would recognize even if they have no background within our agency's workings; use of industry standard terminology is preferred.
  • Terms should be mutually exclusive from other terms.
  • Terms that can be derived using a combination of other terms or facilitated with metadata will not be added.
  • The value should not be a “temporary” term—it should have some expectation to have a long lifespan.
  • We should expect that there would be a significant volume of content that could be assigned the value—otherwise, use of a more general document type and clarification through the metadata on items is preferred: if enough items are titled with the new term over time to warrant reconsideration, it will be reconsidered.
  • For higher-level values in the hierarchy, the relationship between parents and children (functions and activities) is always “is a kind of …” Other relationships are not supported.
  • Document type values should not reflect the underlying technology used to capture the content and should not reflect the format of the content directly.

Social Tagging and Folksonomies

Social tagging is a method that allows users to manage content with metadata they apply themselves using keywords or metadata tags. Unlike traditional classification, which uses a controlled vocabulary, social tagging keywords are freely chosen by each individual.

Folksonomy is the term used for this free-form, social approach to metadata assignment.

Folksonomies are not an ordered classification system; rather, they are a list of keywords input by users that are ranked by popularity.63

Taxonomies and folksonomies both have their place. Folksonomies can be used in concert with taxonomies to nominate key terms for use in the taxonomy, which contributes toward the updating and maintenance of the taxonomy while making the user experience better by utilizing their own preferred terms.

A combined taxonomy and folksonomy approach may provide for an optional “free-text metadata field” for social tags that might be titled “Subject” or “Comment.” Then users could search that free-form, uncontrolled field to narrow document searches. The folksonomy fields will be of most use to a user or departmental area, but if the terms are used frequently enough, they may need to be added to the formal taxonomy's controlled vocabulary to benefit the entire organization.

In sum, taxonomy development, testing and maintenance is hard work—but it can yield significant and sustained benefits to the organization over the long haul by providing more complete and accurate information when knowledge workers make searches, better IG and control over the organization's documents, records, and information, and a more agile compliance and litigation readiness posture.

Endnotes

  1. 1.   Cadence Group, “Taxonomies: The Backbone of Enterprise Content Management,” August 18, 2006, www.cadence-group.com/articles/taxonomy/backbone.htm.
  2. 2.   Delphi Group, “Taxonomy and Content Classification: Market Milestone Report,” 2002, https://whitepapers.us.com/taxonomy-content-classification-market-milestone-report-white-paper-uga-edu.html (accessed September 14, 2018).
  3. 3.   Ibid.
  4. 4.   Cadence Group, “Taxonomies: The Backbone of Enterprise Content Management.”
  5. 5.   Daniela Barbosa, “The Taxonomy Folksonomy Cookbook, www.slideshare.net/HeuvelMarketing/taxonomy-folksonomy-cookbook (accessed September 14, 2018).
  6. 6.   Ibid.
  7. 7.   Montague Institute Review, “Your Taxonomy Is Your Future,” February 2000, http://www.montague.com/review/articles/future.pdf.
  8. 8.   The Free Library, “Creating Order out of Chaos with Taxonomies,” 2005, www.thefreelibrary.com/Creating+order+out+of+chaos+with+taxonomies%3A+the+increasing+volume+of…-a0132679071 (accessed September 14, 2018).
  9. 9.   Susan Cisco and Wanda Jackson, Information Management Journal, “Creating Order out of Chaos with Taxonomies” May/June 2005, www.arma.org/bookstore/files/Cisco.pdf.
  10. 10. Marcia Morante, “Usability Guidelines for Taxonomy Development,” April 2003, www.montague.com/abstracts/usability.html.
  11. 11. Seth Earley, e-mail to author, September 10, 2012.
  12. 12. Ibid.
  13. 13. Cadence Group, “Taxonomies,” 3.
  14. 14. Dam Coalition, “8 Things You Need to Know about How Taxonomy Can Improve Search,” May 16, 2010, www.tech-speed.co.uk/dam/2010/05/17/8-things-you-need-to-know-about-how-taxonomy-can-improve-search.html (accessed September 14, 2018).
  15. 15. Ibid.
  16. 16. Seth Earley, e-mail to author, September 10, 2012.
  17. 17. National Archives of Australia, “AGLS Metadata Standard, Part 2—Usage Guide,” Version 2.0, July 2010, www.agls.gov.au/.
  18. 18. Kate Cumming, “Metadata Matters,” in Managing Electronic Records, eds. Julie McLeod and Catherine Hare (London: Facet Publishing, 2005), 34.
  19. 19. Minnesota State Archives, “Electronic Records Management Guidelines,” March 12, 2012, www.mnhs.org/preserve/records/electronicrecords/ermetadata.html.
  20. 20. Ibid.
  21. 21. Kate Cumming, “Metadata Matters,” 35.
  22. 22. Ibid.
  23. 23. “Understanding Metadata,” NISO, https://groups.niso.org/apps/group_public/download.php/17443/understanding-metadata (accessed September 14, 2018).
  24. 24. Minnesota State Archives, “Electronic Records Management Guidelines.”
  25. 25. Ibid.
  26. 26. Ibid.
  27. 27. The National Archives, “Requirements for Electronic Records Management Systems,” 2002, http://webarchive.nationalarchives.gov.uk/+/http://www.nationalarchives.gov.uk/documents/metadatafinal.pdf (accessed September 21, 2018).
  28. 28. “ISO 23081-1:2006, Information and Documentation—Records Management Processes—Metadata for Records—Part 1: Principles,” www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=40832 (accessed September 21, 2018).
  29. 29. Carl Weise, “ISO 23081-1: 2006, Metadata for Records, Part 1: Principles,” January 27, 2012, www.aiim.org/community/blogs/expert/ISO-23081-1-2006-Metadata-for-records-Part-1-principles.
  30. 30. Dublin Core Metadata Initiative, http://dublincore.org/metadata-basics/ (accessed September 21, 2018).
  31. 31. Diane Hillman, Dublin Core Metadata Initiative, User Guide, November 7, 2005, http://dublincore.org/documents/usageguide/.
  32. 32. Dublin Core Metadata Initiative, “Dublin Core Metadata Element Set,” Version 1.1, June 14, 2012, http://dublincore.org/documents/dces/.
  33. 33. Library of Congress, International Standard Maintenance Agency, www.loc.gov/z3950/agency/ (accessed September 14, 2018).
  34. 34. National Information Standards Organization (NISO), “ANSI/NISO Z39.50 2003 (R2009) Information Retrieval: Application Service Definition & Protocol Specification,” https://www.niso.org/publications/ansiniso-z3950-2003-s2014-information-retrieval-application-service-definition (accessed September 24, 2018).
  35. 35. Jenn Riley, “Glossary of Metadata Standards,” 2009–2010, http://jennriley.com/metadatamap/seeingstandards_glossary_pamphlet.pdf (accessed September 14, 2018).
  36. 36. Global Information Locator Service (GILS), “Initiatives,” www.gils.net/initiatives.html (accessed September 14, 2018).
  37. 37. Ibid.
  38. 38. Adventures in Records Management, “The Business Classification Scheme,” October 15, 2006, http://adventuresinrecordsmanagement.blogspot.com/2006/10/business-classification-scheme.html.
  39. 39. Seth Earley, e-mail to author, September 10, 2012.
  40. 40. National Archives of Australia, www.naa.gov.au (accessed September 14, 2018).
  41. 41. Adventures in Records Management, “The Business Classification Scheme.”
  42. 42. Ibid.
  43. 43. Cisco and Jackson, “Creating Order out of Chaos with Taxonomies.”
  44. 444Ibid.
  45. 45. Seth Earley, e-mail to author, September 10, 2012.
  46. 46. Hedden, “The Accidental Taxonomist,” 10.
  47. 47. Seth Earley, e-mail to author, September 10, 2012.
  48. 48. Hedden, “The Accidental Taxonomist,” 8.
  49. 49. NISO, Project ISO 25964, www.niso.org/workrooms/iso25964 (accessed September 14, 2018).
  50. 50. This section is reprinted with permission from Barb Blackburn, “Taxonomy Design Types,” www.imergeconsult.com/img/114BB.pdf (accessed October 12, 2012); e-Doc Magazine, AIIM International (May/June 2006), 14 and 16.
  51. 51. Ibid.
  52. 52. Ibid.
  53. 53. Ibid.
  54. 54. Ibid.
  55. 55. Ibid.
  56. 56. Ibid.
  57. 57. Seth Earley, e-mail to author, September 10, 2012.
  58. 58. Ibid.
  59. 59. Stephanie Lemieux, “The Pain and Gain of Taxonomy User Testing,” July 8, 2008, https://sethearley.wordpress.com/2008/07/08/the-pain-and-gain-of-taxonomy-user-testing.
  60. 60. Ibid.
  61. 61. Ibid.
  62. 62. Seth Earley, e-mail to author, September 10, 2012.
  63. 63. Tom Reamy, “Folksonomy Folktales,” KM World, September 29, 2009, www.kmworld.com/Articles/Editorial/Feature/Folksonomy-folktales-56210.aspx.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.28.48