CHAPTER 17
Long-Term Digital Preservation*

By Charles M. Dollar and Lori J. Ashley

Every organization—public, private, or not-for-profit—now has electronic records and digital content that it wants to access and retain for periods in excess of 10 years. This may be due to regulatory or legal reasons, a desire to preserve organizational memory and history, or entirely for operational reasons. But long-term continuity of digital information does not happen by accident—it takes information governance (IG), planning, sustainable resources, and a keen awareness of the information technology (IT) and file formats in use by the organization, as well as evolving standards and computing trends.

Defining Long-Term Digital Preservation

Information is universally recognized as a key asset that is essential to organizational success. Digital information, which relies on complex computing platforms and networks, is created, received, and used daily to deliver services to citizens, consumers and customers, businesses, and government agencies. Organizations face tremendous challenges in the twenty-first century to manage, preserve, and provide access to electronic records for as long as they are needed.

Digital preservation is defined as long-term, error-free storage of digital information, with means for retrieval and interpretation, for the entire time span the information is required to be retained. Digital preservation applies to content that is born digital as well as content that is converted to digital form.

Some digital information assets must be preserved permanently as part of an organization's documentary heritage. Dedicated repositories for historical and cultural memory, such as libraries, archives, and museums, need to move forward to put in place trustworthy digital repositories that can match the security, environmental controls, and wealth of descriptive metadata that these institutions have created for analog assets (such as books and paper records). Digital challenges associated with records management affect all sectors of society—academic, government, private, and not-for-profit enterprises—and ultimately all citizens of all developed nations.

The term “preservation” implies permanence, but it has been found that electronic records, data, and information that is retained for only 5 to 10 years is likely to face challenges related to storage media failure and computer hardware/software obsolescence. A useful point of reference for the definition of “long term” comes from the International Organization for Standardization (ISO) standard 14721, which defines long-term as “long enough to be concerned with the impacts of changing technologies, including support for new media and data formats, or with a changing user community. Long Term may extend indefinitely.”1

Long-term records are common in many different sectors, including government, health care, energy, utilities, engineering and architecture, construction, and manufacturing. During the course of routine business, thousands or millions of electronic records are generated in a wide variety of information systems. Most records are useful for only a short period of time (up to seven years), but some may need to be retained for long periods or permanently. For those records, organizations must plan for and allocate resources for preservation efforts to ensure that the data remains accessible, usable, understandable, and trustworthy over time.

In addition, there may be the requirement to retain the metadata associated with records even longer than the records themselves.2 A record may have been destroyed according to its scheduled disposition at the end of its life cycle, but the organization still may need its metadata to identify the record, its life cycle dates, and the authority or person who authorized its destruction.

Key Factors in Long-Term Digital Preservation

Some electronic records must be preserved, protected, and monitored over long periods of time to ensure they remain authentic, complete, and unaltered and available into the future. Planning for the proper care of these records is a component of an overall records management program and should be integrated into the organization's information governance (IG) policies and technology portfolio as well as its privacy and security protocols.

Enterprise strategies for sustainable and trustworthy digital preservation repositories have to take into account several prevailing and compound conditions: the complexity of electronic records, decentralization of the computing environment, obsolescence and aging of storage media, massive volumes of electronic records, and software and hardware dependencies.

The challenges of managing electronic records significantly increased with the trend of decentralization of the computing environment. In the centralized environment of a mainframe computer, prevalent from the 1960s to 1980s but also in use today, it is relatively easy to identify, assess, and manage electronic records. This is not the case in the decentralized environment of specialized business applications and office automation systems, where each user creates electronic objects that may constitute a formal record and thus will have to be preserved under IG policies that address record retention and disposition rules, processes, and accountability.

Electronic records have evolved from simple text-based word processing files or reports to include complex mixed media digital objects that may contain embedded images (still and animated), drawings, sounds, hyperlinks, or spreadsheets with computational formulas. Some portions of electronic records, such as the content of dynamic Web pages, are created on demand from databases and exist only for the duration of the viewing session. Other digital objects, such as electronic mail, may contain multiple attachments, and they may be threaded (i.e. related e-mail messages linked in send-reply chains). These records cannot be converted to paper or text formats for preservation without the loss of context, functionality, and metadata.

Electronic records are being created at rates that pose significant threats to our ability to organize, control, and make them accessible for as long as they are needed. This accumulating volume of digital content includes documents that are digitally scanned or imaged from a variety of formats to be stored as electronic records.

Electronic records are stored as representations of bits—1s and 0s—and therefore depend on software applications and hardware networks for the entire period of retention, whether it is 3 days, 3 years, or 30 years or longer. As information technologies become obsolete and are replaced by new generations, the capability of a specific software application to read the representations of 1s and 0s and render them into human-understandable form will degrade to the point that the records are neither readable nor understandable. As a practical matter, this means that the readability and understandability of the records can never be recovered, and there can be serious legal consequences.

Storage media are affected by the dual problems of obsolescence and decay. They are fragile, have limited shelf life, and become obsolete in a matter of a few years. Mitigating media obsolescence is critical to long-term digital preservation (LTDP) because the bitstreams of 1s and 0s that comprise electronic records must be kept “alive” through periodic transfer to new storage media.

In addition to these current conditions associated with technology and records management, organizations face tremendous internal change management challenges with regard to reallocation of resources, business process improvements, collaboration and coordination between business areas, accountability, and the dynamic integration of evolving recordkeeping requirements. Building and sustaining the capability to manage digital information over long periods of time is a shared responsibility of all stakeholders.

Threats to Preserving Records

A number of known threats may degrade or destroy electronic records and data:

  • Failure of storage media. Storage media is inherently vulnerable to errors and malfunction, including disk crashes. Solid-state drives (SSD) largely address these concerns as there are no moving parts, and data can be stored without needing electrical power.
  • Failure of computer systems. Computer hardware has moving parts and circuits that deteriorate and fail over time, at an average rate called mean time between failures. Some failures are complete and irrecoverable, and some are minor and can be fixed with no loss of data. Computer software is prone to bugs and malware that can compromise the safekeeping of data.
  • Systems and network communications failures. A small number of network communications is likely to contain errors or misreads, especially undetected checksum errors, which may impact the authenticity of a record. Network errors can occur from changes or redirection of URLs, and any communication over a network is subject to intrusions, errors, and hackers.
  • Component obsolescence. As hardware, software, and media age, they become obsolete over time, due to the continued innovation and advances by the computer industry. Sometimes obsolescence is due to outdated component parts, changes in software routines, or changes in the hardware to read removable media.
  • Human error. People make mistakes, and they can make mistakes in selecting, classifying, storing, or handling archived records. Some of these errors may be detected and can be remedied; some go unnoticed or cannot be fixed.
  • Natural disaster. Hurricane Katrina is the clearest US example of how a natural disaster can interrupt business operations and destroy business records, although in some instances, damaged records were able to be recovered. Floods, fires, earthquakes, and other natural disasters can completely destroy or cause media or computer hardware/software failures.
  • Attacks. Archived electronic records are subject to external attacks from malware, such as viruses and worms, so preserved records must be scanned for malware and kept separate from external threats. Preserved records also can be subject to theft or damage from insiders, such as the theft of historical radio recordings by a National Archives and Records Administration employee, which was reported in 2012. Proper monitoring and auditing procedures must be in place to detect and avoid these types of attacks.
  • Financial shortfall. It is expensive to preserve and maintain digital records. Power, cooling and heating systems, personnel costs, and other preservation-associated costs must be budgeted and funded.
  • Business viability. If an organization has financial or legal difficulties or suffers a catastrophic disaster, it may not survive, placing the preserved records at risk. Part of the planning process is to include consideration of successor organization alternatives, should the originating organization go out of business.3

The impact on the preserved records can be gauged by determining what percentage of the data has been lost and cannot be recovered or, for the data that can be recovered, what the impact or delay to users may be.

It should be noted that threats can be interrelated and more than one type of threat may impact records at a time. For instance, in the event of a natural disaster, operators are more likely to make mistakes, and computer hardware failures can create new software failures.

Digital Preservation Standards

The digital preservation community recognizes that open standard technology-neutral standards play a key role in ensuring that digital records are usable, understandable, and reliable for as far into the future as may be required.

There are two broad categories of digital preservation standards. The first category involves systems infrastructure capabilities and services that support a trustworthy repository. The second category relates to open standard technology-neutral file formats.

Digital preservation infrastructure capabilities and services that support trustworthy digital repositories include the international standard ISO 14721:2003, 2012 Space Data and Information Transfer Systems—Open Archival Information System (OAIS)—Reference Model, which is a key standard applicable to LTDP.4

The fragility of digital storage media in concert with ongoing and sometimes rapid changes in computer software and hardware poses a fundamental challenge to ensuring access to trustworthy and reliable digital content over time. Eventually, every digital repository committed to LTDP must have a strategy to mitigate computer technology obsolescence. Toward this end, the Consultative Committee for Space Data Systems developed an Open Archival Information System (OAIS) reference model to support formal standards for the long-term preservation of space science data and information assets. OAIS was not designed as an implementation model.

The OAIS Reference Model defines an archival information system as an archive, consisting of an organization of people and systems that has accepted the responsibility to preserve information and make it available and understandable for a designated community (i.e. potential users or consumers), who should be able to understand the information. Thus, the context of an OAIS-compliant digital repository includes producers who originate the information to be preserved in the repository, consumers who retrieve the information, and a management/organization that hosts and administers the digital assets being preserved.

OAIS encapsulates digital objects into information packages. Each information package includes the digital object content (a sequence of bits) and representation information that enables rendering of an object into human usable information along with preservation description information (PDI) such as provenance, context, and fixity.

The OAIS Information Model employs three types of information packages: a submission information package (SIP), an archival information package (AIP), and a dissemination information package (DIP). An OAIS-compliant digital repository preserves AIPs and any PDI associated with them. A SIP encompasses digital content that a producer has organized for submission to the OAIS. After the completion of quality assurance and transformation procedures, an AIP is created, which is the focus of preservation activity. Subsequently, a DIP is created that consists of an AIP or information extracted from an AIP customized to the requirements of the designated community of users and consumers.

The core of OAIS is a functional model that consists of six entities:

  1. Ingest processes the formal incorporation (in archival terms, accession) of submitted information (i.e., a SIP) into the digital repository. It acknowledges the transfer, conducts quality assurance, extracts metadata from the SIP, generates the appropriate AIP, and populates PDI and extracted metadata into the AIP.
  2. Archival storage encompasses all of the activities associated with storage of AIPs. They include receipt of AIPs, transferring AIPs to the appropriate storage location, replacing media as necessary, transforming AIPs to new file formats as necessary, conducting quality assurance tests, supporting backups and business continuity procedures, and providing copies of AIPs to the access entity.
  3. Data management manages the storage of description and system information, generates reports, and tracks use of storage media.
  4. Administration encompasses a host of technical and human processes that include audit, policy making, strategy, and provider and customer service, among other management and business functions. OAIS administration connects with all of the other OAIS functions.
  5. Preservation planning does not execute any preservation activities. Rather, it supports a technology watch program for sustainable standards, file formats, and software for digital preservation, monitoring changes in the access needs of the designated community, and recommending updated digital preservation strategies and activities.
  6. Access receives queries from the designated community, passes them to archival storage, and makes them available as DIPs to the designated community.

Figure 17.1 displays the relationships between these six functional entities.5

Open Archival Information System Reference Model

Figure 17.1 Open Archival Information System Reference Model

In archival storage, the OAIS reference model articulates a migration strategy based on four primary types of AIP migration that are ordered by an increasing risk of potential information loss: refreshment, replication, repackage, and transformation.6

  1. Migration refreshment occurs when one or more AIPs are copied exactly to the same type of storage media with no alterations occurring in the packaging information, the content information, the preservation description information (PDI), or the AIP location and access archival storage mapping infrastructure.
  2. Migration replication occurs when one or more AIPs are copied exactly to the same or new storage media with no alterations occurring in the packaging information, the content information, and the PDI. However, there is a change in the AIP location and access archival storage mapping infrastructure.
  3. Migration repackage occurs when one or more AIPs are copied exactly to new storage media with no alterations in the content information and the PDI. However, there are changes in the packaging information and the AIP location and to the access to the archival storage mapping infrastructure.
  4. Migration transformation occurs when changes in bitstreams result when a new content encoding procedure replaces the current encoding procedure (e.g. Unicode representation of A through Z replaces the ASCII representation of A through Z), a new file format replaces an existing one, or a new software application is required to access and render the AIP content.

OAIS is the lingua franca of digital preservation. The international digital preservation community has embraced it as the framework for viable and technologically sustainable digital preservation repositories. An LTDP strategy that is OAIS-conforming offers the best means available today for preserving the digital heritage of all organizations, private and public.

ISO TR 18492 (2005), Long-Term Preservation of Electronic Document-Based Information

ISO 18492 provides practical methodological guidance for the long-term preservation and retrieval of authentic electronic document-based information, when the retention period exceeds the expected life of the technology (hardware and software) used to create and maintain the information assets. It emphasizes both the role of open standard technology–neutral standards in supporting long-term access and the engagement of IT specialists, document managers, records managers, and archivists in a collaborative environment to promote and sustain a viable digital preservation program.

ISO 18492 takes note of the role of ISO 15489 but does not cover processes for the capture, classification, and disposition of authentic electronic document-based information. Ensuring the usability and trustworthiness of electronic document-based information for as long as necessary in the face of limited media durability and technology obsolescence requires a robust and comprehensive digital preservation strategy. ISO 18492 describes such a strategy, which includes media renewal, software dependence, migration, open standard technology-neutral formats, authenticity protection, and security:

  • Media renewal. ISO 18492 defines media renewal as a baseline requirement for digital preservation because it is the only known way to keep bitstreams of information based on electronic documents alive. It specifies the conditions under which copying and reformatting of storage media and storage devices should occur.
  • Open standard technology-neutral formats. The fundamental premise of ISO 18492 is that open standard technology-neutral formats are at the core of a viable and technologically sustainable digital preservation strategy because they help mitigate software obsolescence. ISO 18492 recommends the use of several standard formats, including: eXtensible Markup Language (XML), Portable Document Format/Archival (PDF/A), tagged image file format (TIFF), and Joint Photographic Experts Group (JPEG).
  • Migrating electronic content. ISO 18492 recommends two ways of migrating electronic content to new technologies. The first relies on backwardly compatible new open standard technology-neutral formats that are displacing existing ones. Generally, this is a straightforward process that typically can be executed with minimal human intervention. The second involves writing computer code that exports the electronic content to a new target application or open standard technology-neutral format. This can be a very labor-intensive activity and requires rigorous quality control.
  • Authenticity. ISO 18492 recommends the use of hash digest algorithms to validate the integrity of electronic content after execution of media renewal activities that do not alter underlying bit streams of electronic content. In instances where bitstreams are a result of format conversion, comprehensive preservation metadata should be captured that documents the process.
  • Security. ISO 18492 recommends protecting the security of electronic records by creating a firewall between electronic content in a repository and external users. In addition, procedures should be in place to maintain backup/disaster recovery capability, including at least one off-site storage location.

ISO 16363 (2012)—Space Data and Information Transfer Systems—Audit and Certification of Trustworthy Digital Repositories

ISO 14721 (OAIS) acknowledged that an audit and certification standard was needed that incorporated the functional specifications for records producers, records users, ingest of digital content into a trusted repository, archival storage of this content, and digital preserving planning and administration. ISO 16363 is this audit and certification standard. Its use enables independent audits and certification of trustworthy digital repositories and thereby promotes public trust in digital repositories that claim they are trustworthy. To date only a handful of ISO 16363 test audits have been undertaken; additional time is required to determine how widely adopted the standard becomes.

ISO 16363 is organized into three broad categories: organization infrastructure, digital object management, and technical infrastructure and security risk management. Each category is decomposed into a series of primary elements or components, some of which may be more appropriate for digital libraries than for public records digital repositories. In some instances there are secondary elements or components. An explanatory discussion of each element accompanies “empirical metrics” relevant to that element. The empirical metrics typically include high-level examples of how conformance can be demonstrated. Hence, they are subjective high-level conformance metrics rather than explicit performance metrics.

Organizational infrastructure7 consists of these primary elements:

  • Mission statement that reflects a commitment to the preservation of, long-term retention of, management of, and access to digital information
  • Preservation strategic plan that defines the approach the repository will take in the long-term support of its mission
  • Collection policy or other document that specifies the types of information it will preserve, retain, manage, and provide access to
  • Identification and establishment of the duties identified and establishment of the duties and roles that are required to perform along with a staff with adequate skills and experience to fulfill these duties
  • Dissemination of the definitions of its designated community and associated knowledge base(s)
  • Preservation policies that ensure that the preservation strategic plan will be met
  • Documentation of the history of changes to operations, procedures, software, and hardware
  • Commitment to transparency and accountability in all actions supporting the operation and management of the repository that affect the preservation of digital content over time
  • Dissemination as appropriate of the definition, collection, and tracking of information integrity measurements
  • Commitment to a regular schedule of self-assessment and external certification
  • Short- and long-term business planning processes in place to sustain the repository over time
  • Deposit agreements for digital materials transferred to the custody of the organization
  • Written policies that specify when the preservation responsibility for contents of each set of submitted data objects occurs
  • Intellectual property ownership rights policies and procedures

Digital object management,8 which is the core of the standard, comprises these primary elements:

  • Methods and factors used to determine the different types of information for which an organization accepts preservation responsibility
  • An understanding of digital collections sufficient to carry out the preservation necessary for as long as required
  • Specifications that enable recognition and parsing of SIPs
  • An ingest procedure that verifies each SIP for completion and correctness
  • An ingest procedure that validates successful ingest of each SIP
  • Definitions for each AIP or class of AIPs used that are adequate for parsing and suitable for long-term preservation requirements
  • Descriptions of how AIPs are constructed from SIPs, including extraction of metadata
  • Documentation of the final disposition of SIPs, including those not ingested
  • A convention that generates unique, persistent identifiers of all AIPs
  • Reliable linking services that support the location of each uniquely identified object, regardless of its physical location
  • Tools and resources that support authoritative representation information for all of the digital objects in the repository, including file type
  • Documented processes for acquiring and creating PDI
  • Understandable content information for the designated community at the time of creation of the AIPs
  • Verification of the completeness and correctness of AIPs at the point of their creation
  • Contemporaneous capture of documentation of actions and administration processes that are relevant to AIP creation
  • Documented digital preservation strategies
  • Mechanisms for monitoring the digital preservation environment
  • Documented evidence of the effectiveness of digital preservation activities
  • Specifications for storage of AIPs down to the bit level
  • Preservation of the content information of AIPs
  • Monitoring the integrity of AIPs
  • Documentation that preservation actions associated with AIPs complied with the specifications for those actions
  • Specification of minimum information requirements that enable the designated community to discover and identify material of interest
  • Bidirectional linkage between each AIP and its associated descriptive information
  • Compliance with access policies
  • Policies and procedures that enable the dissemination of digital objects that are traceable to the “originals,” with evidence supporting their authenticity
  • Procedures that require documentation of actions taken in response to reports about errors in data or responses from users

Technical infrastructure and security risk management primary elements9 include these elements:

  • Technology watches or other monitoring systems that track when hardware and software is expected to become obsolete
  • Procedures, commitment, and funding when it is necessary to replace hardware
  • Procedures, commitment, and funding when it is necessary to replace software
  • Adequate hardware and software support for backup functionality sufficient for preserving the repository content and tracking repository functions
  • Effective mechanisms that identify bit corruption or loss
  • Documentation captures of all incidents of data corruption or loss, and steps taken to repair/replace corrupt or lost data
  • Defined processes for storage media and/or hardware change (e.g. refreshing, migration)
  • Management of the number and location of copies of all digital objects
  • Systematic analysis of security risk factors associated with data, systems, personnel, and physical plant
  • Suitable written disaster preparedness and recovery plan(s), including at least one off-site backup of all preserved information together with an off-site copy of the recovery plan(s)

ISO 16363 represents the gold standard of audit and certification for trustworthy digital repositories. In some instances the resources available to a trusted repository may not support full implementation of the audit and certification specifications. Decisions about where full and partial implementation is appropriate should be based on a risk assessment analysis.

PREMIS Preservation Metadata Standard

ISO 14721 specifies that preservation metadata associated with all archival storage activities (e.g. generation of hash digests, transformation, and media renewal) should be captured and stored in PDI. This high-level guidance requirement demands greater specificity in an operational environment.

Toward this end, the US Library of Congress and the Research Library Group supported a new international working group called PREservation Metadata Information Strategies (PREMIS)10 to define a core set of preservation metadata elements with a supporting data dictionary that would be applicable to a broad range of digital preservation activities and to identify and evaluate alternative strategies for encoding, managing, and exchanging preservation metadata. Version 2.2 was released in June 2012.11

PREMIS enables designers and managers of digital repositories to have a clear understanding of the information required to support the “functions of viability, renderability, understandability, authenticity, and identity in a preservation context.” PREMIS accomplishes this through a data model that consists of five “semantic units” (think of them as high-level metadata elements, each of which is decomposed into subelements) and a data dictionary that decomposes these “semantic units” into a structure hierarchy. The five semantic units and their relationships are displayed in Figure 17.2.

Note the arrows that define relationships between these entities:

  • Intellectual entities are considered a single intellectual unit such as a book, map, photograph, database, or records (e.g. an AIP).
  • Objects are discrete units of information in digital form that may exist as a bitstream, a file, or a representation.
  • Events denote actions that involve at least one digital object and/or agent known to the repository. Events may include the type of event (e.g. media renewal), a description of the event, and the agents involved in the event. Events support the chain of custody of digital objects.
    PREMIS Data Model

    Figure 17.2 PREMIS Data Model

    Source: Library of Congress, PREMIS Data Dictionary Version 2.2: Hierarchical Listing of Semantic Units, September 13, 2012, www.loc.gov/standards/premis/v2/premis-dd-Hierarchical-Listing-2-2.html.

  • Agents are actors in digital preservation that have roles. An agent can be an individual, organization, or a software application.
  • Rights involve the assertion of access rights and access privileges that relate to intellectual property, privacy, or other related rights

The PREMIS Data Dictionary decomposes objects, events, agents, and rights into a structured hierarchical schema. In addition, it contains semantic units that support documentation of relationships between Objects. An important feature of the PREMIS is an XML schema for the PREMIS Data Dictionary. The primary rationale for the XML schema is to support the exchange of metadata information, which is crucial in ingest and archival storage. The XML schema enables automated extraction of preservation related metadata in SIPs and population of this preservation metadata into AIPs. In addition, the XML schema can enable automatic capture of preservation events that are foundational for maintaining a chain of custody in archival storage.

Recommended Open Standard Technology–Neutral Formats

A digital file format specifies the internal logical structure of digital objects (i.e. binary bits of 1s and 0s) and signal encoding (e.g. text, image, sound, etc.). File formats are crucial to long-term preservation because a computer can open, process, and render file formats that it recognizes. Many file formats are proprietary (also known as native), meaning that digital content can be opened and rendered only by the software application used to create, use, and store it. However, as IT changed, some software vendors introduced new products that no longer support earlier versions of a file format. In such instances these formats become “legacy” format, and digital content embedded in them can be opened only with computer code written expressly for this purpose. Other vendors, such as Microsoft, support backward compatibility across multiple generations of technology so Microsoft Word 2010 can open and render documents in Microsoft Word 95. Nonetheless, it is unrealistic to expect any software vendor to support backward compatibility for its proprietary file formats for digital content that will be preserved for multiple decades.

In the late 1980s, an alternative to vendor-supported backward compatibility emerged to mitigate dependence on proprietary file formats through open system interoperable file formats. Essentially, this meant that digital content could be exported from one proprietary file format and imported to one or more other proprietary file formats. Over time, interoperable file formats evolved into open standard technology-neutral formats that today have these characteristics:

  • Open means that the process is transparent and that participants in the process reach a consensus on the properties of the standard.
  • Standard means that a recognized regional or international organization (e.g. the ISO) published the standard.
  • Technology neutral means that the standard is interoperable on almost any technology platform that asserts conformance to the standard.

Because even open standard technology-neutral formats are not immune to technology obsolescence, their selection must take into account their technical sustainability and implementation in digital repositories. The PRONON program of the National Archives of the United Kingdom and long-term sustainability of file formats of the US Library of Congress assess the sustainability of open standard technology-neutral formats.

The recommended open standard technology-neutral formats for nine content types listed in Table 17.1 are based on this ongoing work along with preferred file formats supported by Library and Archives Canada and other national archives. Unlike PDF/A, several of these file formats (e.g. XML, JPEG 2000, and Scalable Vector Graphics [SVG]) were not explicitly designed for digital preservation. It cannot be emphasized too strongly that this list of recommended open standard technology–neutral formats (or any other comparable list) is not static and will change over time as technology changes.

Table 17.1 Recommended Open Standard Technology-Neutral Formats

PDF/A XML TIFF PNG JPEG 2000 SVG MPEG-2 BWF WARC
Text
Spreadsheets
Images (raster)
Photographs (digital)
Vector graphics
Moving images
Audio
Web
Databases

ISO 19005 (PDF/A)—Document Management—Electronic Document File Format for Long-Term Preservation (2005, 2011, and 2012)

PDF/A is an open standard technology-neutral format that enables the accurate representation of the visual appearance of digital content without regard for the proprietary format or application in which it was created or used. PDF/A is widely used in digital repositories as a preservation format for static textual and image content. Note that PDF/A is agnostic with regard to digital imaging processes or storage media. PDFA/A supports conversion of TIFF and PNG images to PDF/A. There are two levels of conformance to PDF/A specifications. PDF/A-1a references the use of a “well-formed” hierarchical structure with XML tags that enable searching for a specific tag in a very large digital document. PDF/A-1b does not require this conformance, and as a practical matter, it does not affect the accurate representation of visual appearance.

Since its publication in 2005, there have been two revisions of PDF/A. The first revision, PDF/A-2, was aligned with the Adobe Portable Document Format 1.7 published specifications, which Adobe released to the public domain in 2011. The second revision, PDF/A-3, supports embedding documents in other formats, such as the original source document, in a PDF document.

Extensible Markup Language (XML)—World Wide Web Consortium (W3C) Internet Engineering Group (1998)

XML is a markup language that is a derivative of Standard General Markup Language (SGML) that logically separates the rendering of a digital document from its content to enable interoperability across multiple technology platforms. Essentially XML defines rules for marking up the structure of content and its content in ASCII text. Any conforming interoperable XML parser can render the original structure and content. XML-encoded text is human-readable because any text editor can display the marked-up text and content. XML is ubiquitous in IT environments because many communities of users have developed document type definitions unique to their purposes, including genealogy, math, and relational databases. Structure data elements work with relational databases, so this enables relational database portability.

Tagged Image File Format: 1992

Tagged image file format (TIFF) was initially developed by the Aldus Corporation in 1982 for storing black-and-white images created by scanners and desktop publishing application. Over the next six years, several new features were added, including a wide range of color images and compression techniques, including lossless compression. The most recent version of TIFF 6.0 was released by Aldus in 1992. Subsequently, Adobe purchased Aldus and chose not to support any further significant revisions and updates. Nonetheless, TIFF is widely used in desktop scanners for creating digital images for preservation. With such a large base of users, it is likely to persist for some time, but Adobe's decision to discontinue further development of TIFF means that it will lack features of other current and future image file formats. Fortunately, there are tools available to convert TIFF images to PDF and PNG images.

ISO/IEC 15498: 2004—Information Technology-Computer Graphics and Image Processing-Portable Network Graphics (PNG): Functional Specifications

The W3C Internet Engineering Task Force supported the development of PNG as a replacement for graphics image format (GIF) because the GIF compression algorithm was protected by patent rights rather than being in the public domain, as many believed. In 2004, PNG became an international standard that supports lossless compression, grayscale, and true-color images with bit depths that range from 1 to 16 bits per pixel, file integrity checking, and streaming capability.

Scalable Vector Graphics (SVG): 2003—W3C Internet Engineering Task Force

Vector graphics images consist of two-dimensional lines, colors, curves, or other geometrical shapes and attributes that are stored as mathematical expressions, such as where a line begins, its shape, where it ends, and its color. Changes in these mathematical expressions will result in changes in the image. Unlike raster images, there is no loss of clarity of a vector graphics image when it is made larger. SVG images and their behavior properties are defined in XML text files, which means any named element in a SVG image can be indexed and searched. SVG images also can be accessed by any text editor, which minimizes on a specific software application to render and edit the images.

ISO/IEC 15444:2000—Joint Photographic Engineers Group (JPEG 2000)

JPEG 2000 is an international standard for compressing full-color and grayscale digital images and rendering them as full-size images and thumbnail images. Unlike JPEG, its predecessor, which supported only lossy compression, JPEG 2000 supports both lossy and lossless compression. Lossy compression means that during compression, bits that are considered technically redundant are permanently deleted. Lossless compression means no bits are lost or deleted. The latter is very important for LTDP because lossy compression is irreversible. JPEG 2000 is widely used in producing digital images in digital cameras and is an optional format in many digital scanners.

ISO/IEC 13818–3: 2000—Motion Picture Expert Group (MPEG-2)

MPEG-2 is an international broadcast standard for lossy compression of moving images and associated audio. The major competitor for MPEG-2 appears to be Motion JPEG 2000, which is used in small devices, such as cell phones.

European Broadcasting Tech 3285: 2011—Broadcast Wave Format (BWF)

First issued by the European Broadcasting Union in 1997 and revised in 2001 (v1) and 2011 (v2), BWF is a file format for audio data that is an extension of the Microsoft Wave audio format. Its support of metadata ensures that it can be used for the seamless exchange of audio material between different broadcast environments and between equipment based on different computer platforms.

ISO 28500: 2009—WebARChive (WARC)

WebARChive (WARC) is an extension of the Internet Archive's ARC format to store digital content harvested through “Web crawls.” WARC was developed to support the storage, management, and exchange of large volumes of “constituent data objects” in a single file. Currently, WARC is used to store and manage digital content collected through Web crawls and data collected by environmental sensing equipment, among others.

Digital Preservation Requirements

Implementing a sustainable LTDP program is not an effort that should be undertaken lightly. Digital preservation is complex and costly and requires collaboration with all of the stakeholders who are accountable for or have an interest in ensuring access to usable, understandable, and trustworthy electronic records for as far into the future as may be required.

As noted earlier, ISO 14721 and ISO 16363 establish the baseline functions and specifications for ensuring access to usable, understandable, and trustworthy electronic records, whether this involves regulatory and legal compliance for a business entity, vital records, accountability for a government unit, or cultural memory for a public or private institution. Most first-time readers who review the functions and specifications of ISO 14721 and ISO 16363 are likely to be overwhelmed by the detail and complexity of almost 150 specifications.

Long-Term Digital Preservation Capability Maturity Model®

A useful approach that both simplifies these specifications and provides explicit criteria regarding conformance to ISO 14721 and ISO 16363 is the Long-Term Digital Preservation Capability Maturity Model® (DPCMM).12 The DPCMM, which is described in some detail in this section, draws on functions and preservation services identified in ISO 14721(OAIS) as well as attributes specified in ISO 16363, Audit and Certification of Trustworthy Repositories. It is important to note that the DPCMM is not a one-size-fits-all approach to ensuring long-term access to authentic electronic records. Rather, it is a flexible approach that can be adapted to an organization's specific requirements and resources.

DPCMM can be used to identify the current state capabilities of digital preservation that form the basis for debate and dialogue regarding the desired future state of digital preservation capabilities, and the level of risk that the organization is willing to assume. In many instances, this is likely to come down to the question of what constitutes digital preservation that is good enough to fulfill the organization's mission and meet the expectations of its stakeholders. The DPCMM has five incremental stages, which are depicted in Figure 17.3. In Stage 1, a systematic digital preservation program has not been undertaken or the digital preservation program exists only on paper, whereas Stage 5 represents the highest level of sustainable digital preservation capability and repository trustworthiness that an organization can achieve.

The DPCMM is based on the functional specifications of ISO 14721and ISO 16363 and accepted best practices in operational digital repositories. It is a systems-based tool for charting an evolutionary path from disorganized and undisciplined management of electronic records, or the lack of a systematic electronic records management program, into increasingly mature stages of digital preservation capability.

The goal of the DPCMM is to identify at a high level where an electronic records management program is in relation to optimal digital preservation capabilities, report gaps, capability levels, and preservation performance metrics to resource allocators and other stakeholders to establish priorities for achieving enhanced capabilities to preserve and ensure access to long-term electronic records.

Five Levels of Digital Preservation Capabilities

Figure 17.3 Five Levels of Digital Preservation Capabilities

Stage 5: Optimal Digital Preservation Capability

Stage 5 is the highest level of digital preservation readiness capability that an organization can achieve. It includes a strategic focus on digital preservation outcomes by continuously improving the manner in which electronic records life cycle management is executed. Stage 5 digital preservation capability also involves benchmarking the digital preservation infrastructure and processes relative to other best-in-class digital preservation programs and conducting proactive monitoring for breakthrough technologies that can enable the program to significantly change and improve its digital preservation performance. In Stage 5, few if any electronic records that merit long-term preservation are at risk.

Stage 4: Advanced Digital Preservation Capability

Stage 4 capability is characterized by an organization with a robust infrastructure and digital preservation processes that are based on ISO 14721 specifications and ISO 16363 audit and certification criteria. At this stage, the preservation of electronic records is framed entirely within a collaborative environment in which there are multiple participating stakeholders. Lessons learned from this collaborative framework serve as the basis for adapting and improving capabilities to identify and proactively bring long-term electronic records under life cycle control and management. Some electronic records that merit long-term preservation still may be at risk.

Stage 3: Intermediate Digital Preservation Capability

Stage 3 describes an environment that embraces the ISO 14721 specifications and other best practice standards and schemas and thereby establishes the foundation for sustaining an enhanced digital preservation capability over time. This foundation includes successfully completing repeatable projects and outcomes that support the enterprise digital preservation capability and enables collaboration, including shared resources, between record-producing units and entities responsible for managing and maintaining trustworthy digital repositories. In this environment, many electronic records that merit long-term preservation are likely to remain at risk.

Stage 2: Minimal Digital Preservation Capability

Stage 2 describes an environment where an ISO 14721–based digital repository is not yet in place. Instead, a surrogate repository for electronic records is available to some records producers that satisfies some but not all of the ISO 14721 specifications. Typically, the digital preservation infrastructure and processes of the surrogate repository are not systematically integrated into business processes or universally available so the state of digital preservation is somewhat rudimentary and life cycle management of the organization's electronic records is incomplete. There is some understanding of digital preservation issues, but it is limited to a relatively few individuals. There may be virtually no relationship between the success or failure of one digital preservation initiative and the success or failure of another one. Success is largely the result of exceptional (perhaps even heroic) actions of an individual or a project team. Knowledge about such success is not widely shared or institutionalized. Most electronic records that merit long-term preservation are at risk.

Stage 1: Nominal Digital Preservation Capability

Stage 1 describes an environment in which the specifications of ISO 14721 and other standards may be known, accepted in principle, or under consideration, but they have not been formally adopted or implemented by the record-producing organization. Generally, there may be some understanding of digital preservation issues and concerns, but this understanding is likely to consist of ad hoc electronic records management and digital preservation infrastructure, processes, and initiatives. Although there may be some isolated instances of individuals attempting to preserve electronic records on a workstation or removable storage media (e.g. DVD or hard drive), practically all electronic records that merit long-term preservation are at risk.

Scope of the Capability Maturity Model

This capability maturity model consists of 15 components, or key process areas, that are necessary and required for the long-term preservation of usable, understandable, accessible, and trustworthy electronic records. Each component is identified and is accompanied by explicit performance metrics for each of the five levels of digital preservation capability.

The objective of the model is to provide a process and performance framework (or benchmark) against best practice standards and foundational principles of digital preservation, records management, information governance, and archival science. Figure 17.4 displays the components of the DPCMM.

Digital Preservation Capability Maturity Model

Figure 17.4 Digital Preservation Capability Maturity Model

Scope notes for each of the graphic elements in Figure 17.4 diagram are provided next for additional clarity. Numbered components in the model are associated with performance metrics and capability levels described in the next section.

Producers and Users

  • Records creators and owners are stakeholders who have either the obligation or the option to transfer permanent and long-term (10-plus-years’ retention) electronic records to one or more specified digital repositories for safekeeping and access.
  • Users. Individuals or groups that have an interest in and/or right to access records held in the digital repository. These stakeholders represent a variety of interests and access requirements that may change over time.
  • Digital preservation infrastructure. Seven key organizational process areas required to ensure sustained commitment and adequate resources for the long-term preservation of electronic records are:
    1. Digital preservation policy. The organization charged with ensuring preservation and access to long-term and permanent legal, fiscal, operational, and historical records should issue its digital preservation policy in writing, including the purpose, scope, accountability, and approach to the operational management and sustainability of trustworthy repositories.
    2. Digital preservation strategy. The organization charged with the preservation of long-term and permanent business, government, or historical electronic records must proactively address the risks associated with technology obsolescence, including plans related to periodic renewal of storage devices, storage media, and adoption of preferred preservation file formats.
    3. Governance. The organization has a formal decision-making framework that assigns accountability and authority for the preservation of electronic records with long-term and permanent historical, fiscal, operational, or legal value, and articulates approaches and practices for trustworthy digital repositories sufficient to meet stakeholder needs. Governance is exercised in conjunction with information management and technology functions and with other custodians and digital preservation stakeholders, such as records-producing units and records consumers, and enables compliance with applicable laws, regulations, record retention schedules, and disposition authorities.
    4. Collaboration. Digital preservation is a shared responsibility. The organization with a mandate to preserve long-term and permanent electronic business, government, or historical records in accordance with accepted digital preservation standards and best practices is well served by maintaining and promoting collaboration among its internal and external stakeholders. Interdependencies between and among the operations of records producing units, legal and statutory requirements, IT policies and governance, and historical accountability should be addressed systematically.
    5. Technical expertise. A critical component in a sustainable digital preservation program is access to professional technical expertise that can proactively address business requirements and respond to impacts of evolving technologies. The technical infrastructure and key processes of an ISO 14721/ISO 16363–conforming archival repository requires professional expertise in archival storage, digital preservation solutions, and life cycle electronic records management processes and controls. This technical expertise may exist within the organization or be provided by a centralized function or service bureau or by external service providers, and should include an in-depth understanding of critical digital preservation actions and their associated recommended practices.
    6. Open standard technology-neutral formats. A fundamental requisite for a sustainable digital preservation program that ensures long-term access to usable and understandable electronic records is mitigation of obsolescence of file formats. Open standard platform-neutral file formats are developed in an open public setting, issued by a certified standards organization, and have few or no technology dependencies. Current preferred open standard technology file format examples include:
      • XML and PDF/A for text
      • PDF/A for spreadsheets
      • JPEG 2000 for photographs
      • PDF/A, PNG, and TIFF for scanned images
      • SVG for vector graphics
      • BWF for audio
      • MPEG-4 for video
      • WARC for Web pages

        Over time, new digital preservation tools and solutions will emerge that will require new open standard technology-neutral standard file formats. Open standard technology-neutral formats are backwardly compatible so they can support interoperability across technology platforms over an extended period of time.

    7. Designated community. The organization that has responsibility for preservation and access to long-term and permanent legal, operational, fiscal, or historical government records is well served through proactive outreach and engagement with its designated community. There are written procedures and formal agreements with records-producing units that document the content, rights, and conditions under which the digital repository will ingest, preserve, and provide access to electronic records. Written procedures are in place regarding the ingest of electronic records and access to its digital collections. Records producers will submit fully conforming ISO 14721/ISO 16363 SIPs while DIPs are developed and updated in conjunction with its user communities.
  • Trustworthy digital repository. This includes the integrated people, processes, and technologies committed to ensuring the continuous and reliable design, operation, and management of digital repositories entrusted with long-term and permanent electronic records. A trustworthy digital repository may range from a simple system that involves a low-cost file server and software that provide nonintegrated preservation services, to complex systems comprising data centers and server farms, computer hardware and software, and communication networks that interoperate.

    The most complete trustworthy digital repository is based on models and standards that include ISO 14721, ISO 16363, and generally accepted best digital preservation practices. The repository may be managed by the organization that owns the electronic records or may be provided as a service by an external third party. It is likely that many organizations initially will rely on surrogate digital preservation capabilities and services that approximate some but not all of the capabilities and services of a conforming ISO14721/ISO 16363 trustworthy digital repository.

  • Digital preservation processes and services. Eight key business process areas needed for continuous monitoring of the external and internal environments in order to plan and take actions to sustain the integrity, security, usability and accessibility of electronic records stored in trustworthy digital repositories.
    1. Electronic records survey. A trustworthy repository cannot fully execute its mission or engage in realistic digital preservation planning without a projected volume and scope of electronic records that will come into its custody. It is likely that some information already exists in approved retention schedules, but it may require further elaboration as well as periodic updates, especially with regard to preservation ready, near preservation ready, and legacy electronic records held by records-producing units.
    2. Ingest. A digital repository that conforms to ISO 14721/ISO 16363 has the capability to systematically ingest (receive and accept) electronic records from records-producing units in the form of SIPs, move them to a staging area where virus checks and content and format validations are performed, transform electronic records into designated preservation formats as appropriate, extract metadata from SIPs and write it to PDIPDI, create AIPs, and transfer the AIPs to the repository's storage function. This process is considered the minimal work flow for transferring records into a digital repository for long-term preservation and access.
    3. Archival storage. ISO 14721 delineates systematic automated storage services that support receipt and validation of successful transfer of AIPs from ingest, creation of PDI for each AIP that confirms its “fixity”13 during any preservation actions through the generation of hash digests, capture and maintenance of error logs, updates to PDI including transformation of electronic records to new formats, production of DIPs from access, and collection of operational statistics.
    4. Device and media renewal. No known digital device or storage medium is invulnerable to decay and obsolescence. A foundational digital preservation capability is ensuring the readability of the bitstreams underlying the electronic records. ISO 14721/ ISO 16363 specify that a trustworthy digital repository's storage devices and storage media should be monitored and renewed (“refreshed”) periodically to ensure that the bitstreams remain readable over time. A projected life expectancy of removable storage media does not necessarily apply in a specific instance of storage media. Hence, it is important that a trustworthy digital repository have a protocol for continuously monitoring removable storage media (e.g. magnetic tape, external tape drive, or other media) to identify any that face imminent catastrophic loss. Ideally, this renewal protocol would execute renewal automatically after review by the repository.
    5. Integrity. A key capability in conforming ISO 14721/ISO 16363 digital repositories is ensuring the integrity of the records in its custody, which involves two related preservation actions. The first action generates a hash digest algorithm (also known as a cyclical redundancy code) to address a vulnerability to accidental or intentional alterations to electronic records that can occur during device/media renewal and internal data transfers. The second action involves integrity documentation that supports an unbroken electronic chain of custody captured in the PDI in AIPs.
    6. Security. Contemporary enterprise information systems typically execute a number of shared or common services that may include communication, name services, temporary storage allocation, exception handling, role-based access rights, security, backup and business continuity, and directory services, among others. A conforming ISO 14721/ISO 16363 digital repository is likely to be part of an information system that may routinely provide some or perhaps all of the core security, backup, and business continuity services, including firewalls, role-based access rights, data-transfer-integrity validations, and logs for all preservation activities, including failures and anomalies, to demonstrate an unbroken chain of custody.
    7. Preservation metadata. A digital repository collects and maintains metadata that describes actions associated with custody of long-term and permanent records, including an audit trail that documents preservation actions carried out, why and when they were performed, how they were carried out, and with what results. A current best practice is the use of a PREMIS-based data dictionary to support an electronic chain of custody that documents authenticity over time as preservation actions are executed. Capture of all related metadata, transfer of the metadata to any new formats/systems, and secure storage of metadata are critical. All metadata is stored in the PDI component of conforming AIPs.
    8. Access. Organizations with a mandate to support access to permanent business, government, or historical records are subject to authorized restrictions. A conforming ISO 14721/ISO 16363 digital repository will provide consumers with trustworthy records in “disclosure-free” DIPs redacted to protect, privacy, confidentiality, and other rights, where appropriate, and searchable metadata that users can query to identify and retrieve records of interest to them. Production of DIPs is tracked, especially when they involve extractions, to verify their trustworthiness and to identify query trends that are used to update electronic accessibility tools to support these trends.

Digital Preservation Capability Performance Metrics

Digital preservation performance metrics for each level of the five levels of the model have been mapped to each of the 15 numbered components described in the previous section. The performance metrics are explicit empirical indicators that reflect an incremental level of digital preservation capability. The digital preservation capability performance metrics for digital preservation strategy listed in Table 17.2 illustrate the results of this mapping exercise.14

Conducting a gap analysis of its digital preservation capabilities using these performance metrics enables the organization to identify both its current state and desired future state of digital preservation capabilities. In all likelihood, this desired future state will depend on available resources, the organization's mission, and stakeholder expectations. “Good-enough” digital preservation capabilities will vary by organization; what is good enough for one organization is unlikely to coincide with what is good enough for another.

Table 17.2 Digital Preservation Performance Metrics

Level Capability Description
0 A formal strategy to address technology obsolescence does not exist.
1 A strategy to mitigate technology obsolescence consists of accepting electronic records in their native format with the expectation that new software will become available to support these formats. During this interim period, viewer technologies will be relied on to render usable and understandable electronic records.
2 Electronic records in interoperable “preservation-ready”* file formats and transformation of one native file format to an open standard technology-neutral file format are supported. Changes in information technologies that may impact electronic records collections and the digital repository are monitored proactively and systematically.
3 The organization supports transformation of selected native file formats to preferred/supported preservation file formats in the trustworthy digital repository. Records-producing units are advised to use preservation-ready file formats for permanent or indefinite long-term (e.g. case files, infrastructure files) electronic records in their custody.
4 Electronic records in all native formats are transformed to available open standard technology-neutral file formats.

* The term “preservation-ready file formats” refers to open standard technology-neutral formats that the organization has identified as preferred for long-term digital preservation.

Digital Preservation Strategies and Techniques

Any organization with long-term or permanent electronic records in its custody must ensure that the electronic records can be read and correctly interpreted by a computer application, rendered in an understandable form to humans, and trusted as accurate representations of their logical and physical structure, substantive content, and context. To achieve these goals, a digital repository should operate under the mandate of a digital preservation strategy that addresses 10 digital preservation processes and activities:

  1. Adopt preferred open standard technology-neutral formats. Earlier, nine open standard technology-neutral file formats that covered text, images, photographs, vector graphics, moving images, audio, and Web pages were discussed. Adoption of these file formats means that the digital repository will support their use in its internal digital preservation activities and notify the producers of records of the preferred formats for preservation-ready electronic records to be transferred to the repository's custody.
  2. Acquire electronic records in preservation-ready formats. Likely many born-digital electronic records along with scanned images will be created or captured in a preservation-ready format. Acquisition or ingest of electronic records already in preservation-ready formats can significantly reduce the workload of the repository because it will not be necessary to transform records to open standard technology-neutral formats.
  3. Acquire and transform electronic records in near-preservation-ready formats. Near-preservation-ready format are native proprietary file formats that can be easily transformed to preservation-ready file formats through widely available software plug-ins. Ideally, over time, the volume of near-preservation-ready records will diminish as records producers increasingly convert records scheduled for long-term retention into preservation-ready formats before they are transferred to the repository.
  4. Acquire legacy electronic records. Legacy electronic records initially were created in a proprietary file format that is obsolete and no longer supported by a vendor. In most instances, electronic records embedded in legacy file formats can be recovered and saved in a preservation-ready format only if special computer code is written to extract the records from their legacy format. Once extracted from the legacy format, they can be written to a contemporary format. Niche vendors provide this kind of service, but it is relatively expensive and perhaps beyond the resources of many repositories.

    An alternative is to forgo this costly process in the hope that a future technology, such as emulation, will be widely available and relatively inexpensive. Meanwhile, the repository would rely on a file viewer technology, such as Inside Out, to render legacy electronic records into format understandable to humans with the exact logical and physical structure and representation at the time they were created and used.

  5. Maintain bitstream readability through device/media removal. No known digital storage device or media is exempt from degradation and technology obsolescence. Consequently, the bitstreams of 1s and 0s that underlie electronic records are stored on media that are vulnerable to degradation and technology obsolescence. Technology obsolescence may occur when a vendor introduces a new form factor for storage device/media, such as the transition from 5.25-inch disk drives and disks to 3.5-inch disk drives and media to thumb drives. With today's technology, periodic device/medial renewal is the only known way to keep bitstreams available. A rule of thumb is to renew storage device/media at least every 10 years. Failure to maintain the readability of bitstreams over time is an absolute guarantee the electronic records cannot be recovered and that the records will be permanently lost for all practical purposes.
  6. Migrate to new open standard technology-neutral formats. These formats are not immune to technology obsolescence. The inevitable changes in IT mean that new open standard technology formats will be created that displace current ones. The solution to this issue is migration from an older or current open standard technology-neutral format to newer ones. Seamless migration from old to new open standard technology–neutral formats is made possible through backward compatibility. “Backward compatibility” means that a new standard can interpret digital content in an old standard and then save it in the new format standard. Migration is the most widely used tool to mitigate file format obsolescence.
  7. Protect the integrity and security of electronic records. Imperfect information technologies inevitably have glitches that, along with accidental human error and intentional human actions, can corrupt or otherwise compromise the trustworthiness of electronic records though some alteration in the underlying bitstream. Accidental alteration occurs when preservation actions are initiated for electronic records. These actions may occur during transformation, migration, media renewal, accessions to digital records, and relocation of electronic records from one part of the repository to another. The most effective tool for validating that no unauthorized changes to electronic records occur is to compute a hash digest before a preservation action occurs and after the action is completed. If there is change of only one bit, a comparison of the two will identify it. Capturing these pre- and posthash digests and saving them as preservation description information can contribute to an electronic chain of custody.

    A robust firewall that blocks unauthorized access with tightly controlled role-based permission rights will help protect the security of records in the custody of the repository.

    A further enhancement to protect against a cataclysmic natural or man-made disaster is maintaining a backup copy of the repository's holdings at an off-site facility.

  8. Capture and save preservation metadata. Preservation metadata, which consists of tracking, capturing, and maintaining documentation of all preservation actions associated with electronic records, involves identifying these events, the agents that executed the actions, and the results of the actions, including any corrective action taken. Saving this metadata along with the hash digest integrity validations just discussed enables robust electronic chain of custody and establishes a strong basis for the trustworthiness of electronic records in the custody of the digital repository.
  9. Provide access. Access to usable and trustworthy records is the ultimate justification for digital preservation. In some respects, this may be the most challenging aspect of digital preservation because user expectations for customized retrieval tools, access speed, and delivery formats of electronic records may exceed the current resources of a trusted digital repository. Nonetheless, some form of user access, through replication of records in a single open standard technology format such as PDF/A for text and scanned images and JPEG 2000 for digital photographs, would be a major accomplishment and form the basis for a more aggressive access program over time.
  10. Engage proactively with records producers and other stakeholders. The traditional notion of an archive being in a reactive mode with regard to records producers and other stakeholders in LTDP simply will not work in today's world. Proactive engagement with records producers about how capturing electronic records in open standard technology-neutral can support both current business operation requirements and long-term requirements for usable, understandable, and trustworthy can be a win-win for the digital repository and the records producers. Equally important is the notion of proactive engagement with all of the stakeholders in ensuring long-term access to usable, understandable, and trustworthy electronic records. Support of other stakeholders can be leveraged to gain broad organizational support for the digital repository.

Evolving Marketplace

The design and implementation of a digital repository that operates under this digital preservation strategy can be carried out in several different ways. One way is to use internal expertise to build a stand-alone repository that conforms to these digital preservation strategy requirements. Typically, an internally built repository is costly, takes considerable time to implement, and may not meet all expectations because of technical inexperience. An alternative is to use the services and/or solutions offered by an external institution or supplier. A third-party solution is offered by Archivematica, a Vancouver, British Columbia, company that specializes in the use of open-source software and conformance to the specifications of ISO SO 14721. “Archivematica is a free and open-source digital preservation system that is designed to maintain standards-based, long-term access to collections of digital objects.”15 Another company, Preservica,16 has an ISO 14721–conforming digital preservation SaaS and on-premise solution that has been implemented in national and pan-national archives as well as 19 US state archives. It is likely that other repository solutions and preservation services will emerge over the next few years as demand for digital archiving increases. The digital preservation strategy discussed earlier can be used to assess the capabilities of these solutions. Spain-based Libnova also offers a cloud-based digital preservation solution, especially for handling large collections at national libraries or archives.

In November 2017 The National Cultural AudioVisual Archives (NCAA), hosted by the Indira Gandhi National Centre for the Arts Audio/Visual Repository, became the first digital repository in the world to be awarded ISO 16363 certification. The audit was conducted by PTAB, the Primary Trustworthy Digital Repository Authorisation Body. Its members are the group of global digital preservation who developed the ISO standards 14721, 16363, and 16919. PTAB was the first organization to be accredited to perform repository audits. A number of public sector repositories have announced plans to undergo audits and seek certification. It remains to be seen whether commercial enterprises and/or those who fund repositories for valuable digital encoded information will seek certification to ensure repositories are worthy of trust and sustained funding.

Looking Forward

It wasn't long ago—5 to 10 years—that LTDP required a relatively expensive and complicated set of internal processes to store digital records needed for 10 years or more. Migrating digital images from older, proprietary file formats and maintaining records in industry-standard, technology-neutral file formats while ensuring readability presented major challenges.

But today, there are new outsourced options that make digital preservation much easier and more cost effective for organizations needing to preserve digital documents. The approach that digital preservation suppliers take is to manage the entire digital conversion process (from paper or microfilm to digital) and to store five to six copies of each image with a major cloud supplier like Microsoft Azure or Amazon AWS on servers dispersed geographically around the world. Some approaches use more than one cloud supplier to reduce the risk of loss even further.

Error-detecting software uses checksum algorithms to scan digital records periodically for any degradation or loss of bits due to hardware failures, hacking attacks, or other anomalies. Then the damaged copy is either restored or replaced, ensuring that five to six viable copies are still available in various parts of the world.

This newer cloud-based approach has made digital preservation more accessible and viable for major organizations with the need to preserve digital information far into the future, especially movie studios, national libraries, universities, research organizations, and government entities.

Conclusion

Organizations, especially those whose primary mission is to preserve and provide access to permanent records, face significant challenges in meeting their LTDP needs. They must collaborate with internal and external stakeholders, develop governance policies and strategies to govern and control information assets over long periods of time, inventory records in the custody of records producers, monitor technology changes and evolving standards, and sustain trustworthy digital repositories. The most important consideration is to determine what level of LTDP maturity is appropriate, achievable, and affordable for the organization and to begin working methodically toward that goal for the good of the organization and its stakeholders over the long term. In addition, organizations should focus on what is doable over the next 10 to 20 years rather than the next 50 or 100 years.

Notes

  1. 1.   Consultative Committee for Space Data Systems, Reference Model for an Open Archival Information System (OAIS) (Washington, DC: CCSDS Secretariat, 2002), p. 1.
  2. 2.   Kate Cumming, “Metadata Matters,” in Julie McLeod and Catherine Hare, eds., Managing Electronic Records (London: Facet, 2005), 48.
  3. 3.   David Rosenthal et al., “Requirements for Digital Preservation Systems,” D-Lib Magazine 11, no. 11 (November 2005), www.dlib.org/dlib/november05/rosenthal/11rosenthal.html.
  4. 4.   “ISO 14721:2003, 2012Space Data and Information Transfer Systems—Open Archival Information System—Reference Model,” www.iso.org/iso/catalogue_detail.htm?csnumber=24683 (accessed May 21, 2012).
  5. 5.   ISO 14721:2003(E), section 4.1.
  6. 6.   ISO 14721:2003(E), section 5.4.
  7. 7.   See ISO 16363:2012 (E), sections 3.1–3.5.2.
  8. 8.   See ibid., sections 4.1–4/6/2/1.
  9. 9.   See ibid., sections 5.1–5.2.3.
  10. 10. For a useful overview of PREMIS, see Priscilla Caplan, “Understanding PREMIS,” Library of Congress, February 1, 2009, www.loc.gov/standards/premis/understanding-premis.pdf.
  11. 11. Library of Congress, “PREMIS Data Dictionary Version 2.2: Hierarchical Listing of Semantic Units,” September 13, 2012, www.loc.gov/standards/premis/v2/premis-dd-Hierarchical-Listing-2-2.html.
  12. 12. Charles Dollar and Lori Ashley are codevelopers of this model. Since 2007 they have used it successfully in both the public and private sectors. The most recent instance is a digital preservation capability assessment for the U.S. Council of State Archivists (CoSA). For more information about the model, see “Digital Preservation Capability Maturity Model” at www.savingthedigitalworld.com (accessed December 12, 2013).
  13. 13. ISO 1472 uses “fixity” to express the notion that there have been no unauthorized changes to electronic records and associated Preservation Description Information in the custody of the repository. See ISO 14721: 2003 (E): 1.6.
  14. 14. For information about digital preservation capability performance metrics, visit “Digital Preservation Capability Maturity Model,” https://www.statearchivists.org/resource-center/resource-library/digital-preservation-capability-maturity-model-dpcmm/
  15. 15. Archivematica, “What Is Archivematica?” October 15, 2012, www.archivematica.org/wiki/Main_Page.
  16. 16. Preservica, www.Preservica.com (accessed June 10, 2019).
  17. *   Portions of this chapter are adapted from Chapter 17, Robert F. Smallwood, Managing Electronic Records: Methods, Best Practices, and Technologies, © John Wiley & Sons, Inc., 2013. Reproduced with permission of John Wiley & Sons, Inc.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.66.151