Every organization—public, private, or not-for-profit—now has electronic records and digital content that it wants to access and retain for periods in excess of 10 years. This may be due to regulatory or legal reasons, a desire to preserve organizational memory and history, or entirely for operational reasons. But long-term continuity of digital information does not happen by accident—it takes information governance (IG), planning, sustainable resources, and a keen awareness of the information technology (IT) and file formats in use by the organization, as well as evolving standards and computing trends.
Information is universally recognized as a key asset that is essential to organizational success. Digital information, which relies on complex computing platforms and networks, is created, received, and used daily to deliver services to citizens, consumers and customers, businesses, and government agencies. Organizations face tremendous challenges in the twenty-first century to manage, preserve, and provide access to electronic records for as long as they are needed.
Digital preservation is defined as long-term, error-free storage of digital information, with means for retrieval and interpretation, for the entire time span the information is required to be retained. Digital preservation applies to content that is born digital as well as content that is converted to digital form.
Some digital information assets must be preserved permanently as part of an organization's documentary heritage. Dedicated repositories for historical and cultural memory, such as libraries, archives, and museums, need to move forward to put in place trustworthy digital repositories that can match the security, environmental controls, and wealth of descriptive metadata that these institutions have created for analog assets (such as books and paper records). Digital challenges associated with records management affect all sectors of society—academic, government, private, and not-for-profit enterprises—and ultimately all citizens of all developed nations.
The term “preservation” implies permanence, but it has been found that electronic records, data, and information that is retained for only 5 to 10 years is likely to face challenges related to storage media failure and computer hardware/software obsolescence. A useful point of reference for the definition of “long term” comes from the International Organization for Standardization (ISO) standard 14721, which defines long-term as “long enough to be concerned with the impacts of changing technologies, including support for new media and data formats, or with a changing user community. Long Term may extend indefinitely.”1
Long-term records are common in many different sectors, including government, health care, energy, utilities, engineering and architecture, construction, and manufacturing. During the course of routine business, thousands or millions of electronic records are generated in a wide variety of information systems. Most records are useful for only a short period of time (up to seven years), but some may need to be retained for long periods or permanently. For those records, organizations must plan for and allocate resources for preservation efforts to ensure that the data remains accessible, usable, understandable, and trustworthy over time.
In addition, there may be the requirement to retain the metadata associated with records even longer than the records themselves.2 A record may have been destroyed according to its scheduled disposition at the end of its life cycle, but the organization still may need its metadata to identify the record, its life cycle dates, and the authority or person who authorized its destruction.
Some electronic records must be preserved, protected, and monitored over long periods of time to ensure they remain authentic, complete, and unaltered and available into the future. Planning for the proper care of these records is a component of an overall records management program and should be integrated into the organization's information governance (IG) policies and technology portfolio as well as its privacy and security protocols.
Enterprise strategies for sustainable and trustworthy digital preservation repositories have to take into account several prevailing and compound conditions: the complexity of electronic records, decentralization of the computing environment, obsolescence and aging of storage media, massive volumes of electronic records, and software and hardware dependencies.
The challenges of managing electronic records significantly increased with the trend of decentralization of the computing environment. In the centralized environment of a mainframe computer, prevalent from the 1960s to 1980s but also in use today, it is relatively easy to identify, assess, and manage electronic records. This is not the case in the decentralized environment of specialized business applications and office automation systems, where each user creates electronic objects that may constitute a formal record and thus will have to be preserved under IG policies that address record retention and disposition rules, processes, and accountability.
Electronic records have evolved from simple text-based word processing files or reports to include complex mixed media digital objects that may contain embedded images (still and animated), drawings, sounds, hyperlinks, or spreadsheets with computational formulas. Some portions of electronic records, such as the content of dynamic Web pages, are created on demand from databases and exist only for the duration of the viewing session. Other digital objects, such as electronic mail, may contain multiple attachments, and they may be threaded (i.e. related e-mail messages linked in send-reply chains). These records cannot be converted to paper or text formats for preservation without the loss of context, functionality, and metadata.
Electronic records are being created at rates that pose significant threats to our ability to organize, control, and make them accessible for as long as they are needed. This accumulating volume of digital content includes documents that are digitally scanned or imaged from a variety of formats to be stored as electronic records.
Electronic records are stored as representations of bits—1s and 0s—and therefore depend on software applications and hardware networks for the entire period of retention, whether it is 3 days, 3 years, or 30 years or longer. As information technologies become obsolete and are replaced by new generations, the capability of a specific software application to read the representations of 1s and 0s and render them into human-understandable form will degrade to the point that the records are neither readable nor understandable. As a practical matter, this means that the readability and understandability of the records can never be recovered, and there can be serious legal consequences.
Storage media are affected by the dual problems of obsolescence and decay. They are fragile, have limited shelf life, and become obsolete in a matter of a few years. Mitigating media obsolescence is critical to long-term digital preservation (LTDP) because the bitstreams of 1s and 0s that comprise electronic records must be kept “alive” through periodic transfer to new storage media.
In addition to these current conditions associated with technology and records management, organizations face tremendous internal change management challenges with regard to reallocation of resources, business process improvements, collaboration and coordination between business areas, accountability, and the dynamic integration of evolving recordkeeping requirements. Building and sustaining the capability to manage digital information over long periods of time is a shared responsibility of all stakeholders.
A number of known threats may degrade or destroy electronic records and data:
The impact on the preserved records can be gauged by determining what percentage of the data has been lost and cannot be recovered or, for the data that can be recovered, what the impact or delay to users may be.
It should be noted that threats can be interrelated and more than one type of threat may impact records at a time. For instance, in the event of a natural disaster, operators are more likely to make mistakes, and computer hardware failures can create new software failures.
The digital preservation community recognizes that open standard technology-neutral standards play a key role in ensuring that digital records are usable, understandable, and reliable for as far into the future as may be required.
There are two broad categories of digital preservation standards. The first category involves systems infrastructure capabilities and services that support a trustworthy repository. The second category relates to open standard technology-neutral file formats.
Digital preservation infrastructure capabilities and services that support trustworthy digital repositories include the international standard ISO 14721:2003, 2012 Space Data and Information Transfer Systems—Open Archival Information System (OAIS)—Reference Model, which is a key standard applicable to LTDP.4
The fragility of digital storage media in concert with ongoing and sometimes rapid changes in computer software and hardware poses a fundamental challenge to ensuring access to trustworthy and reliable digital content over time. Eventually, every digital repository committed to LTDP must have a strategy to mitigate computer technology obsolescence. Toward this end, the Consultative Committee for Space Data Systems developed an Open Archival Information System (OAIS) reference model to support formal standards for the long-term preservation of space science data and information assets. OAIS was not designed as an implementation model.
The OAIS Reference Model defines an archival information system as an archive, consisting of an organization of people and systems that has accepted the responsibility to preserve information and make it available and understandable for a designated community (i.e. potential users or consumers), who should be able to understand the information. Thus, the context of an OAIS-compliant digital repository includes producers who originate the information to be preserved in the repository, consumers who retrieve the information, and a management/organization that hosts and administers the digital assets being preserved.
OAIS encapsulates digital objects into information packages. Each information package includes the digital object content (a sequence of bits) and representation information that enables rendering of an object into human usable information along with preservation description information (PDI) such as provenance, context, and fixity.
The OAIS Information Model employs three types of information packages: a submission information package (SIP), an archival information package (AIP), and a dissemination information package (DIP). An OAIS-compliant digital repository preserves AIPs and any PDI associated with them. A SIP encompasses digital content that a producer has organized for submission to the OAIS. After the completion of quality assurance and transformation procedures, an AIP is created, which is the focus of preservation activity. Subsequently, a DIP is created that consists of an AIP or information extracted from an AIP customized to the requirements of the designated community of users and consumers.
The core of OAIS is a functional model that consists of six entities:
Figure 17.1 displays the relationships between these six functional entities.5
In archival storage, the OAIS reference model articulates a migration strategy based on four primary types of AIP migration that are ordered by an increasing risk of potential information loss: refreshment, replication, repackage, and transformation.6
OAIS is the lingua franca of digital preservation. The international digital preservation community has embraced it as the framework for viable and technologically sustainable digital preservation repositories. An LTDP strategy that is OAIS-conforming offers the best means available today for preserving the digital heritage of all organizations, private and public.
ISO 18492 provides practical methodological guidance for the long-term preservation and retrieval of authentic electronic document-based information, when the retention period exceeds the expected life of the technology (hardware and software) used to create and maintain the information assets. It emphasizes both the role of open standard technology–neutral standards in supporting long-term access and the engagement of IT specialists, document managers, records managers, and archivists in a collaborative environment to promote and sustain a viable digital preservation program.
ISO 18492 takes note of the role of ISO 15489 but does not cover processes for the capture, classification, and disposition of authentic electronic document-based information. Ensuring the usability and trustworthiness of electronic document-based information for as long as necessary in the face of limited media durability and technology obsolescence requires a robust and comprehensive digital preservation strategy. ISO 18492 describes such a strategy, which includes media renewal, software dependence, migration, open standard technology-neutral formats, authenticity protection, and security:
ISO 14721 (OAIS) acknowledged that an audit and certification standard was needed that incorporated the functional specifications for records producers, records users, ingest of digital content into a trusted repository, archival storage of this content, and digital preserving planning and administration. ISO 16363 is this audit and certification standard. Its use enables independent audits and certification of trustworthy digital repositories and thereby promotes public trust in digital repositories that claim they are trustworthy. To date only a handful of ISO 16363 test audits have been undertaken; additional time is required to determine how widely adopted the standard becomes.
ISO 16363 is organized into three broad categories: organization infrastructure, digital object management, and technical infrastructure and security risk management. Each category is decomposed into a series of primary elements or components, some of which may be more appropriate for digital libraries than for public records digital repositories. In some instances there are secondary elements or components. An explanatory discussion of each element accompanies “empirical metrics” relevant to that element. The empirical metrics typically include high-level examples of how conformance can be demonstrated. Hence, they are subjective high-level conformance metrics rather than explicit performance metrics.
Organizational infrastructure7 consists of these primary elements:
Digital object management,8 which is the core of the standard, comprises these primary elements:
Technical infrastructure and security risk management primary elements9 include these elements:
ISO 16363 represents the gold standard of audit and certification for trustworthy digital repositories. In some instances the resources available to a trusted repository may not support full implementation of the audit and certification specifications. Decisions about where full and partial implementation is appropriate should be based on a risk assessment analysis.
ISO 14721 specifies that preservation metadata associated with all archival storage activities (e.g. generation of hash digests, transformation, and media renewal) should be captured and stored in PDI. This high-level guidance requirement demands greater specificity in an operational environment.
Toward this end, the US Library of Congress and the Research Library Group supported a new international working group called PREservation Metadata Information Strategies (PREMIS)10 to define a core set of preservation metadata elements with a supporting data dictionary that would be applicable to a broad range of digital preservation activities and to identify and evaluate alternative strategies for encoding, managing, and exchanging preservation metadata. Version 2.2 was released in June 2012.11
PREMIS enables designers and managers of digital repositories to have a clear understanding of the information required to support the “functions of viability, renderability, understandability, authenticity, and identity in a preservation context.” PREMIS accomplishes this through a data model that consists of five “semantic units” (think of them as high-level metadata elements, each of which is decomposed into subelements) and a data dictionary that decomposes these “semantic units” into a structure hierarchy. The five semantic units and their relationships are displayed in Figure 17.2.
Note the arrows that define relationships between these entities:
The PREMIS Data Dictionary decomposes objects, events, agents, and rights into a structured hierarchical schema. In addition, it contains semantic units that support documentation of relationships between Objects. An important feature of the PREMIS is an XML schema for the PREMIS Data Dictionary. The primary rationale for the XML schema is to support the exchange of metadata information, which is crucial in ingest and archival storage. The XML schema enables automated extraction of preservation related metadata in SIPs and population of this preservation metadata into AIPs. In addition, the XML schema can enable automatic capture of preservation events that are foundational for maintaining a chain of custody in archival storage.
A digital file format specifies the internal logical structure of digital objects (i.e. binary bits of 1s and 0s) and signal encoding (e.g. text, image, sound, etc.). File formats are crucial to long-term preservation because a computer can open, process, and render file formats that it recognizes. Many file formats are proprietary (also known as native), meaning that digital content can be opened and rendered only by the software application used to create, use, and store it. However, as IT changed, some software vendors introduced new products that no longer support earlier versions of a file format. In such instances these formats become “legacy” format, and digital content embedded in them can be opened only with computer code written expressly for this purpose. Other vendors, such as Microsoft, support backward compatibility across multiple generations of technology so Microsoft Word 2010 can open and render documents in Microsoft Word 95. Nonetheless, it is unrealistic to expect any software vendor to support backward compatibility for its proprietary file formats for digital content that will be preserved for multiple decades.
In the late 1980s, an alternative to vendor-supported backward compatibility emerged to mitigate dependence on proprietary file formats through open system interoperable file formats. Essentially, this meant that digital content could be exported from one proprietary file format and imported to one or more other proprietary file formats. Over time, interoperable file formats evolved into open standard technology-neutral formats that today have these characteristics:
Because even open standard technology-neutral formats are not immune to technology obsolescence, their selection must take into account their technical sustainability and implementation in digital repositories. The PRONON program of the National Archives of the United Kingdom and long-term sustainability of file formats of the US Library of Congress assess the sustainability of open standard technology-neutral formats.
The recommended open standard technology-neutral formats for nine content types listed in Table 17.1 are based on this ongoing work along with preferred file formats supported by Library and Archives Canada and other national archives. Unlike PDF/A, several of these file formats (e.g. XML, JPEG 2000, and Scalable Vector Graphics [SVG]) were not explicitly designed for digital preservation. It cannot be emphasized too strongly that this list of recommended open standard technology–neutral formats (or any other comparable list) is not static and will change over time as technology changes.
Table 17.1 Recommended Open Standard Technology-Neutral Formats
PDF/A | XML | TIFF | PNG | JPEG 2000 | SVG | MPEG-2 | BWF | WARC | |
Text | √ | √ | |||||||
Spreadsheets | √ | ||||||||
Images (raster) | √ | √ | √ | ||||||
Photographs (digital) | √ | ||||||||
Vector graphics | √ | ||||||||
Moving images | √ | ||||||||
Audio | √ | ||||||||
Web | √ | ||||||||
Databases | √ |
PDF/A is an open standard technology-neutral format that enables the accurate representation of the visual appearance of digital content without regard for the proprietary format or application in which it was created or used. PDF/A is widely used in digital repositories as a preservation format for static textual and image content. Note that PDF/A is agnostic with regard to digital imaging processes or storage media. PDFA/A supports conversion of TIFF and PNG images to PDF/A. There are two levels of conformance to PDF/A specifications. PDF/A-1a references the use of a “well-formed” hierarchical structure with XML tags that enable searching for a specific tag in a very large digital document. PDF/A-1b does not require this conformance, and as a practical matter, it does not affect the accurate representation of visual appearance.
Since its publication in 2005, there have been two revisions of PDF/A. The first revision, PDF/A-2, was aligned with the Adobe Portable Document Format 1.7 published specifications, which Adobe released to the public domain in 2011. The second revision, PDF/A-3, supports embedding documents in other formats, such as the original source document, in a PDF document.
XML is a markup language that is a derivative of Standard General Markup Language (SGML) that logically separates the rendering of a digital document from its content to enable interoperability across multiple technology platforms. Essentially XML defines rules for marking up the structure of content and its content in ASCII text. Any conforming interoperable XML parser can render the original structure and content. XML-encoded text is human-readable because any text editor can display the marked-up text and content. XML is ubiquitous in IT environments because many communities of users have developed document type definitions unique to their purposes, including genealogy, math, and relational databases. Structure data elements work with relational databases, so this enables relational database portability.
Tagged image file format (TIFF) was initially developed by the Aldus Corporation in 1982 for storing black-and-white images created by scanners and desktop publishing application. Over the next six years, several new features were added, including a wide range of color images and compression techniques, including lossless compression. The most recent version of TIFF 6.0 was released by Aldus in 1992. Subsequently, Adobe purchased Aldus and chose not to support any further significant revisions and updates. Nonetheless, TIFF is widely used in desktop scanners for creating digital images for preservation. With such a large base of users, it is likely to persist for some time, but Adobe's decision to discontinue further development of TIFF means that it will lack features of other current and future image file formats. Fortunately, there are tools available to convert TIFF images to PDF and PNG images.
The W3C Internet Engineering Task Force supported the development of PNG as a replacement for graphics image format (GIF) because the GIF compression algorithm was protected by patent rights rather than being in the public domain, as many believed. In 2004, PNG became an international standard that supports lossless compression, grayscale, and true-color images with bit depths that range from 1 to 16 bits per pixel, file integrity checking, and streaming capability.
Vector graphics images consist of two-dimensional lines, colors, curves, or other geometrical shapes and attributes that are stored as mathematical expressions, such as where a line begins, its shape, where it ends, and its color. Changes in these mathematical expressions will result in changes in the image. Unlike raster images, there is no loss of clarity of a vector graphics image when it is made larger. SVG images and their behavior properties are defined in XML text files, which means any named element in a SVG image can be indexed and searched. SVG images also can be accessed by any text editor, which minimizes on a specific software application to render and edit the images.
JPEG 2000 is an international standard for compressing full-color and grayscale digital images and rendering them as full-size images and thumbnail images. Unlike JPEG, its predecessor, which supported only lossy compression, JPEG 2000 supports both lossy and lossless compression. Lossy compression means that during compression, bits that are considered technically redundant are permanently deleted. Lossless compression means no bits are lost or deleted. The latter is very important for LTDP because lossy compression is irreversible. JPEG 2000 is widely used in producing digital images in digital cameras and is an optional format in many digital scanners.
MPEG-2 is an international broadcast standard for lossy compression of moving images and associated audio. The major competitor for MPEG-2 appears to be Motion JPEG 2000, which is used in small devices, such as cell phones.
First issued by the European Broadcasting Union in 1997 and revised in 2001 (v1) and 2011 (v2), BWF is a file format for audio data that is an extension of the Microsoft Wave audio format. Its support of metadata ensures that it can be used for the seamless exchange of audio material between different broadcast environments and between equipment based on different computer platforms.
WebARChive (WARC) is an extension of the Internet Archive's ARC format to store digital content harvested through “Web crawls.” WARC was developed to support the storage, management, and exchange of large volumes of “constituent data objects” in a single file. Currently, WARC is used to store and manage digital content collected through Web crawls and data collected by environmental sensing equipment, among others.
Implementing a sustainable LTDP program is not an effort that should be undertaken lightly. Digital preservation is complex and costly and requires collaboration with all of the stakeholders who are accountable for or have an interest in ensuring access to usable, understandable, and trustworthy electronic records for as far into the future as may be required.
As noted earlier, ISO 14721 and ISO 16363 establish the baseline functions and specifications for ensuring access to usable, understandable, and trustworthy electronic records, whether this involves regulatory and legal compliance for a business entity, vital records, accountability for a government unit, or cultural memory for a public or private institution. Most first-time readers who review the functions and specifications of ISO 14721 and ISO 16363 are likely to be overwhelmed by the detail and complexity of almost 150 specifications.
A useful approach that both simplifies these specifications and provides explicit criteria regarding conformance to ISO 14721 and ISO 16363 is the Long-Term Digital Preservation Capability Maturity Model® (DPCMM).12 The DPCMM, which is described in some detail in this section, draws on functions and preservation services identified in ISO 14721(OAIS) as well as attributes specified in ISO 16363, Audit and Certification of Trustworthy Repositories. It is important to note that the DPCMM is not a one-size-fits-all approach to ensuring long-term access to authentic electronic records. Rather, it is a flexible approach that can be adapted to an organization's specific requirements and resources.
DPCMM can be used to identify the current state capabilities of digital preservation that form the basis for debate and dialogue regarding the desired future state of digital preservation capabilities, and the level of risk that the organization is willing to assume. In many instances, this is likely to come down to the question of what constitutes digital preservation that is good enough to fulfill the organization's mission and meet the expectations of its stakeholders. The DPCMM has five incremental stages, which are depicted in Figure 17.3. In Stage 1, a systematic digital preservation program has not been undertaken or the digital preservation program exists only on paper, whereas Stage 5 represents the highest level of sustainable digital preservation capability and repository trustworthiness that an organization can achieve.
The DPCMM is based on the functional specifications of ISO 14721and ISO 16363 and accepted best practices in operational digital repositories. It is a systems-based tool for charting an evolutionary path from disorganized and undisciplined management of electronic records, or the lack of a systematic electronic records management program, into increasingly mature stages of digital preservation capability.
The goal of the DPCMM is to identify at a high level where an electronic records management program is in relation to optimal digital preservation capabilities, report gaps, capability levels, and preservation performance metrics to resource allocators and other stakeholders to establish priorities for achieving enhanced capabilities to preserve and ensure access to long-term electronic records.
Stage 5 is the highest level of digital preservation readiness capability that an organization can achieve. It includes a strategic focus on digital preservation outcomes by continuously improving the manner in which electronic records life cycle management is executed. Stage 5 digital preservation capability also involves benchmarking the digital preservation infrastructure and processes relative to other best-in-class digital preservation programs and conducting proactive monitoring for breakthrough technologies that can enable the program to significantly change and improve its digital preservation performance. In Stage 5, few if any electronic records that merit long-term preservation are at risk.
Stage 4 capability is characterized by an organization with a robust infrastructure and digital preservation processes that are based on ISO 14721 specifications and ISO 16363 audit and certification criteria. At this stage, the preservation of electronic records is framed entirely within a collaborative environment in which there are multiple participating stakeholders. Lessons learned from this collaborative framework serve as the basis for adapting and improving capabilities to identify and proactively bring long-term electronic records under life cycle control and management. Some electronic records that merit long-term preservation still may be at risk.
Stage 3 describes an environment that embraces the ISO 14721 specifications and other best practice standards and schemas and thereby establishes the foundation for sustaining an enhanced digital preservation capability over time. This foundation includes successfully completing repeatable projects and outcomes that support the enterprise digital preservation capability and enables collaboration, including shared resources, between record-producing units and entities responsible for managing and maintaining trustworthy digital repositories. In this environment, many electronic records that merit long-term preservation are likely to remain at risk.
Stage 2 describes an environment where an ISO 14721–based digital repository is not yet in place. Instead, a surrogate repository for electronic records is available to some records producers that satisfies some but not all of the ISO 14721 specifications. Typically, the digital preservation infrastructure and processes of the surrogate repository are not systematically integrated into business processes or universally available so the state of digital preservation is somewhat rudimentary and life cycle management of the organization's electronic records is incomplete. There is some understanding of digital preservation issues, but it is limited to a relatively few individuals. There may be virtually no relationship between the success or failure of one digital preservation initiative and the success or failure of another one. Success is largely the result of exceptional (perhaps even heroic) actions of an individual or a project team. Knowledge about such success is not widely shared or institutionalized. Most electronic records that merit long-term preservation are at risk.
Stage 1 describes an environment in which the specifications of ISO 14721 and other standards may be known, accepted in principle, or under consideration, but they have not been formally adopted or implemented by the record-producing organization. Generally, there may be some understanding of digital preservation issues and concerns, but this understanding is likely to consist of ad hoc electronic records management and digital preservation infrastructure, processes, and initiatives. Although there may be some isolated instances of individuals attempting to preserve electronic records on a workstation or removable storage media (e.g. DVD or hard drive), practically all electronic records that merit long-term preservation are at risk.
This capability maturity model consists of 15 components, or key process areas, that are necessary and required for the long-term preservation of usable, understandable, accessible, and trustworthy electronic records. Each component is identified and is accompanied by explicit performance metrics for each of the five levels of digital preservation capability.
The objective of the model is to provide a process and performance framework (or benchmark) against best practice standards and foundational principles of digital preservation, records management, information governance, and archival science. Figure 17.4 displays the components of the DPCMM.
Scope notes for each of the graphic elements in Figure 17.4 diagram are provided next for additional clarity. Numbered components in the model are associated with performance metrics and capability levels described in the next section.
Producers and Users
Over time, new digital preservation tools and solutions will emerge that will require new open standard technology-neutral standard file formats. Open standard technology-neutral formats are backwardly compatible so they can support interoperability across technology platforms over an extended period of time.
The most complete trustworthy digital repository is based on models and standards that include ISO 14721, ISO 16363, and generally accepted best digital preservation practices. The repository may be managed by the organization that owns the electronic records or may be provided as a service by an external third party. It is likely that many organizations initially will rely on surrogate digital preservation capabilities and services that approximate some but not all of the capabilities and services of a conforming ISO14721/ISO 16363 trustworthy digital repository.
Digital preservation performance metrics for each level of the five levels of the model have been mapped to each of the 15 numbered components described in the previous section. The performance metrics are explicit empirical indicators that reflect an incremental level of digital preservation capability. The digital preservation capability performance metrics for digital preservation strategy listed in Table 17.2 illustrate the results of this mapping exercise.14
Conducting a gap analysis of its digital preservation capabilities using these performance metrics enables the organization to identify both its current state and desired future state of digital preservation capabilities. In all likelihood, this desired future state will depend on available resources, the organization's mission, and stakeholder expectations. “Good-enough” digital preservation capabilities will vary by organization; what is good enough for one organization is unlikely to coincide with what is good enough for another.
Table 17.2 Digital Preservation Performance Metrics
Level | Capability Description |
0 | A formal strategy to address technology obsolescence does not exist. |
1 | A strategy to mitigate technology obsolescence consists of accepting electronic records in their native format with the expectation that new software will become available to support these formats. During this interim period, viewer technologies will be relied on to render usable and understandable electronic records. |
2 | Electronic records in interoperable “preservation-ready”* file formats and transformation of one native file format to an open standard technology-neutral file format are supported. Changes in information technologies that may impact electronic records collections and the digital repository are monitored proactively and systematically. |
3 | The organization supports transformation of selected native file formats to preferred/supported preservation file formats in the trustworthy digital repository. Records-producing units are advised to use preservation-ready file formats for permanent or indefinite long-term (e.g. case files, infrastructure files) electronic records in their custody. |
4 | Electronic records in all native formats are transformed to available open standard technology-neutral file formats. |
* The term “preservation-ready file formats” refers to open standard technology-neutral formats that the organization has identified as preferred for long-term digital preservation.
Any organization with long-term or permanent electronic records in its custody must ensure that the electronic records can be read and correctly interpreted by a computer application, rendered in an understandable form to humans, and trusted as accurate representations of their logical and physical structure, substantive content, and context. To achieve these goals, a digital repository should operate under the mandate of a digital preservation strategy that addresses 10 digital preservation processes and activities:
An alternative is to forgo this costly process in the hope that a future technology, such as emulation, will be widely available and relatively inexpensive. Meanwhile, the repository would rely on a file viewer technology, such as Inside Out, to render legacy electronic records into format understandable to humans with the exact logical and physical structure and representation at the time they were created and used.
A robust firewall that blocks unauthorized access with tightly controlled role-based permission rights will help protect the security of records in the custody of the repository.
A further enhancement to protect against a cataclysmic natural or man-made disaster is maintaining a backup copy of the repository's holdings at an off-site facility.
The design and implementation of a digital repository that operates under this digital preservation strategy can be carried out in several different ways. One way is to use internal expertise to build a stand-alone repository that conforms to these digital preservation strategy requirements. Typically, an internally built repository is costly, takes considerable time to implement, and may not meet all expectations because of technical inexperience. An alternative is to use the services and/or solutions offered by an external institution or supplier. A third-party solution is offered by Archivematica, a Vancouver, British Columbia, company that specializes in the use of open-source software and conformance to the specifications of ISO SO 14721. “Archivematica is a free and open-source digital preservation system that is designed to maintain standards-based, long-term access to collections of digital objects.”15 Another company, Preservica,16 has an ISO 14721–conforming digital preservation SaaS and on-premise solution that has been implemented in national and pan-national archives as well as 19 US state archives. It is likely that other repository solutions and preservation services will emerge over the next few years as demand for digital archiving increases. The digital preservation strategy discussed earlier can be used to assess the capabilities of these solutions. Spain-based Libnova also offers a cloud-based digital preservation solution, especially for handling large collections at national libraries or archives.
In November 2017 The National Cultural AudioVisual Archives (NCAA), hosted by the Indira Gandhi National Centre for the Arts Audio/Visual Repository, became the first digital repository in the world to be awarded ISO 16363 certification. The audit was conducted by PTAB, the Primary Trustworthy Digital Repository Authorisation Body. Its members are the group of global digital preservation who developed the ISO standards 14721, 16363, and 16919. PTAB was the first organization to be accredited to perform repository audits. A number of public sector repositories have announced plans to undergo audits and seek certification. It remains to be seen whether commercial enterprises and/or those who fund repositories for valuable digital encoded information will seek certification to ensure repositories are worthy of trust and sustained funding.
It wasn't long ago—5 to 10 years—that LTDP required a relatively expensive and complicated set of internal processes to store digital records needed for 10 years or more. Migrating digital images from older, proprietary file formats and maintaining records in industry-standard, technology-neutral file formats while ensuring readability presented major challenges.
But today, there are new outsourced options that make digital preservation much easier and more cost effective for organizations needing to preserve digital documents. The approach that digital preservation suppliers take is to manage the entire digital conversion process (from paper or microfilm to digital) and to store five to six copies of each image with a major cloud supplier like Microsoft Azure or Amazon AWS on servers dispersed geographically around the world. Some approaches use more than one cloud supplier to reduce the risk of loss even further.
Error-detecting software uses checksum algorithms to scan digital records periodically for any degradation or loss of bits due to hardware failures, hacking attacks, or other anomalies. Then the damaged copy is either restored or replaced, ensuring that five to six viable copies are still available in various parts of the world.
This newer cloud-based approach has made digital preservation more accessible and viable for major organizations with the need to preserve digital information far into the future, especially movie studios, national libraries, universities, research organizations, and government entities.
Organizations, especially those whose primary mission is to preserve and provide access to permanent records, face significant challenges in meeting their LTDP needs. They must collaborate with internal and external stakeholders, develop governance policies and strategies to govern and control information assets over long periods of time, inventory records in the custody of records producers, monitor technology changes and evolving standards, and sustain trustworthy digital repositories. The most important consideration is to determine what level of LTDP maturity is appropriate, achievable, and affordable for the organization and to begin working methodically toward that goal for the good of the organization and its stakeholders over the long term. In addition, organizations should focus on what is doable over the next 10 to 20 years rather than the next 50 or 100 years.
18.223.196.211