2

Archives and special collections in the digital humanities

Abstract

Digital humanities presents an emerging interdisciplinary framework for integrating digital technologies and engaging archives in humanities research and teaching. However, recent advances in technology and various debates in the digital humanities have placed and led to questioning the increasing emphasis on computing, digital technologies, and social media, bringing also into discourse the scholarly and theoretical scope of digital humanities practices as well as the relationship between archives and the humanities. This chapter presents definitions for digital humanities, discusses some of the debates, and overviews some projects in research and teaching, which presents a new framework for archival contributions. The debates address the theoretical scope of digital humanities projects, many of which have benefited from involving archivists.

Keywords

Archives; Debates; Digital humanities; Scholarship in the digital humanities; Text encoding initiative (TEI) projects; Thematic research collections
Digital Humanities presents an emerging interdisciplinary framework for integrating digital technologies and engaging archives in humanities research and teaching. For several centuries now, archives have housed and preserved institutional records, special manuscript collections, and rare books of cultural, historical, and literary significance, which have been the foundation of historic partnership between humanities scholars and archivists. Recent advances in technology and various debates in the digital humanities have, however, placed increasing emphasis on computing, digital technologies, and social media, bringing into question the scholarly and theoretical scope of digital humanities practices as well as the relationship between archives and the humanities. Digital Humanities projects, such as thematic research collections, continue to demonstrate the value of archives through this continuing relationship because archivists have expanded their expertise to cover digital curation in order to preserve, promote, and provide long-term access to digital—that is, digitized and born-digital—collections as well as quantitative data for nonnarrative humanities projects. The debates around classifying Digital Humanities projects, including collection building, writing codes, and creating new digital artifacts necessarily bring the theoretical and scholarly scope of archival work into the discussion.
Chapter 2 investigates the relationship between archives and the digital humanities. It begins by establishing a conceptual framework through a review of definitions and discussions on Digital Humanities. A discussion on the various debates in the field also provides a theoretical framework for the role of archives as well as the scholarly orientation of Digital Humanities projects and work contributed by archivists and digital curators. The chapter then addresses what constitutes critical discourse in the field and the various positions stated in the Digital Humanities Manifesto (2008) followed by concerns from the archivist perspective. The range of Digital Humanities projects from the early CD-ROM initiatives to current interests in data visualizations, geospatial representations, and social curation underscore the continuing role of archives as leading heritage institutions. The chapter examines specific Digital Humanities projects that have involved archives and used digital content, data, or both.

Defining the digital humanities

In his Columbia University’s Center for Digital Research and Scholarship presentation, Cohen (see Cohen, Frabetti, Buzzetti, & Rodriguez-Velasco, 2011) defined Digital Humanities as “the use of digital media and technology to advance the full range of thought and practice in the humanities, from the creation of scholarly resources, to research on those resources, to the communication of results to colleagues and students” (“Defining the Digital Humanities”). Cohen’s reference to the role of archives, libraries, digital collections, and finding aids lays the foundation of a continuing long-term relationship between humanities scholars, archivists, and subject librarians. Digital Humanities represents the partnership of humanistic scholarship and computing as referenced in the much-cited definition from Wikipedia (2013):

The Digital Humanities are an area of research, teaching, and creation concerned with the intersection of computing and the disciplines of the humanities. Developing from the field of humanities computing, digital humanities embrace a variety of topics, from curating online collections to data mining large cultural data sets. Digital humanities (often abbreviated DH) currently incorporate both digitized and born-digital materials and combine the methodologies from traditional humanities disciplines (such as history, philosophy, linguistics, literature, art, archaeology, music, and cultural studies) and social sciences with tools provided by computing (such as data visualisation, information retrieval, data mining, statistics, text mining) and digital publishing.

“Digital Humanities”

In their introduction to the Companion to Digital Humanities, Schreibman, Siemens, and Unsworth (2004) address the history of close collaboration of disciplinary experts, technologists, librarians, and other information specialists such as archivists. They write,

[Humanities] remains deeply interested in text, but as advances in technology have made it first possible, then trivial to capture, manipulate, and process other media, the field has redefined itself to embrace the full range of multimedia. Especially since the 1990s, with the advent of the World Wide Web, digital humanities has broadened its reach, yet it has remained in touch with the goals that have animated it from the outset: using information technology to illuminate the human record, and bringing an understanding of the human record to bear on the development and use of information technology.

“History,” para. 1

At the aforementioned Columbia University event, Frabetti (see Cohen, Frabetti, Buzzetti, & Rodriguez-Velasco, 2011) called the digital humanities simply as “Humanities in dialogue with digital technologies.” While addressing the relationship between the humanities and technology however, she points out that the purpose of this relationship is not solely to use technology for the sake of technology but also to understand what implications digital technology has for the humanities. This rationale may also apply to the technorealist approach taken by Cohen and Rosenzweig (2006), which advocates a center-of-the road position between the extreme technoskeptics’ and cyberenthusiasts’ positions. In fact, digital curators will share an interest in applying technology toward investigating critical humanistic and historiographical questions that might not be possible without digital technology. Hall (2011) addresses a two-way relationship between the humanities and technology when she writes, “just as interesting as what computer science has to offer the humanities is the question of what the humanities…have to offer computer science; and, beyond that, what the humanities themselves can bring to the understanding of computing and the shaping of the digital” (p. 2). A frequently asked question has been whether Digital Humanities scholars can present a humanistic—that is, nonquantifiable, nonmechanistic, and qualitative—perspective on digital technology and computing. Drucker (2012) takes this concern a step further by addressing the incompatibility of current interfaces used to create timelines, maps, and other data visualizations with the qualitative methodologies of the humanities since all retrieved data appear in abstraction and out of context as not to support the analysis and interpretations of texts. She writes,

The challenge is to shift humanistic study from attention to the effects of technology (from readings of social media, games, narrative, personae, digital texts, images, environments) to a humanistically informed theory of the making of technology (a humanistic computing at the level of design, modeling of information architecture, data types, interface, and protocols).

p. 87

Using quantitative data in visualizations, timelines, and maps may be more effective when used with specific addresses, dates, and names, but such technologies must—as Drucker notes—accommodate qualitative data that humanities scholars also use for analysis, interpretation, and writing. Drucker points to the incompatibility or incommensurability of mechanized spatial and temporal representations for works, ideas, and events that simply cannot be plotted along rigidly parameterized lines. According to Schmidt (2011), sufficient support from available data is necessary for theory and practice in Digital Humanities research, but the field can also engage experts from other fields to contribute. He writes,

At their core, the digital humanities are the practice of using technology to create new objects for humanistic interrogation…This has rightly led much of digital humanities’ focus to lie in public humanities; there is enormous excitement about the potential of visualizations, exhibits, and tools to encourage non-humanists to think humanistically.

para. 5

In this spirit, Davidson’s concerns (2012) should not be too surprising. She alludes to the growing gap between the sciences and humanities, which had flourished contemporaneously since the industrial revolution: wherever science made progress, humanistic scholarship provided the interpretive, analytic, and narrative perspectives that shaped culture. Davidson’s vision for the digital humanities departs from diametrically opposed sciences and humanities in favor of an integrated intellectual landscape. She writes,

Perhaps we need to see technology and the humanities not as a binary but as two sides of a necessarily interdependent, conjoined, and mutually constitutive set of intellectual, educational, social, political, and economic practices. More to the point, we need to acknowledge how much the massive computational abilities that have transformed the sciences have also changed our field in ways large and small and hold possibilities for far greater transformation in the three areas—research, writing, and teaching—that matter most.

p. 477

Characteristics of Digital Humanities

While these and similar definitions have evolved around the relationship between the humanities and computing, characterizing the field involves identifying the deeper aspects of the discipline including its theoretical foundations and critical discourse. The Digital Humanities Manifesto, published in two versions in 2008 and as Digital Humanities Manifesto 2.0 in 2009, is an aggregation of over 70 statements from the broader digital humanities community of practitioners, and it serves as an important framework for an evolving discourse with various definitions, descriptions, questions, and debates in the field. These statements represent the most commonly voiced positions in the Manifesto but they also characterize a new discipline that integrates technology, collaboration, multivocality, quantitative methodology, interdisciplinarity, and the decentralization of knowledge, all of which distinguish this field from the earlier humanities and other legacy disciplines of earlier scientific paradigms. The first version of the Manifesto (2008) describes this field as “not a unified field but an array of convergent practices that explore a universe in which print is no longer the exclusive or the normative medium in which knowledge is produced and/or disseminated” (para. 2). A review of statements in the Manifesto yields the following characterizations for Digital Humanities:
1. it aggregates practices from multiple fields with a focus on disseminated knowledge;
2. it immerses quantitative aspects of computing with qualitative and interpretive aspects of the humanities;
3. it emphasizes multivocality in humanistic knowledge; and
4. it emphasizes teamwork and the collaborative production and reproduction of humanistic knowledge, using teams committed to risk-taking, collaboration, and experimentation.
These statements represent a marked departure from practices of traditional Humanities scholarship, which has varied from monodisciplinary to various degrees of interdisciplinary approaches, focused on qualitative and narrative-driven scholarship, and solo production of knowledge with little or no space for intertextuality. The emerging model of humanities scholarship has been increasingly inter- and transdisciplinary, collaborative, culturally diverse, and intersubjective. Svensson’s (2010) review of an emerging digital humanities landscape points to a multifaceted paradigm for relationships between scholarship and technology. Tara McPherson’s typological approach identifies “computing humanities, blogging humanities, and multimodal humanities…[which correspond to] building tools, infrastructure, standards, and collections…the production of networked media and peer-to-peer writing…[and the aggregation of] scholarly tools, databases, networked writing and peer-to-peer commentary while also leveraging the potential of the visual and aural media that are part of contemporary life” (as cited in Svensson, 2010, Typologies of the Digital Humanities section, para. 14). Another paradigm shift occurred in the transition from Humanities 1.0 to 2.0 as discussed in Davidson’s work (see Svensson, 2010), which addresses the transition from data-based projects to greater interactivity, openness, and interdisciplinarity. The decentered authorship parallels the shift from computing to multimodal humanities.

Discursive concerns in the digital humanities

Outside the scope of the two versions of the Manifesto, there are several other concerns related to critical discourse (or lack thereof) in the digital humanities. These concerns address the overuse of technology to the potential detriment of scholarly pursuits, the gap between the humanities and sciences, and the scholarly nature of digital humanities research and practice. Frederick Gibbs (2011) warns about the absence of critical discourse in the digital humanities, which in his view (1) “must be concerned with both interpretation and evaluation”; and (2) “is central to establishing the importance of the kind of scholarly and even cultural work that it does” (“II. The value of digital humanities criticism,” para. 1). As for the current status of critical discourse, Gibbs presents three issues: (1) there is no effective critical discourse around digital humanities work; most energy is focused on peer review; (2) Digital Humanities work requires more practical and theoretical rubrics as evaluative criteria; and (3) criticism of Digital Humanities work needs to be more collaborative and multivocal. Gibbs’s concerns reflect the emerging discursive differences between Humanities and Digital Humanities in that the latter has yet to establish a greater degree of autonomy from the humanities while integrating into a more interdisciplinary discourse marked by a greater degree of critical discourse, intersubjective readings, and collaboration. The historically solo-oriented work in the humanities is in contrast with the collaborative orientation of the digital humanities.
A broader and equally important question in the debates stirring in the digital humanities (Gold, 2012a) is whether building collections is theoretical in nature as to merit recognition as a scholarly activity for tenure and promotion of humanities professors. In fact, the same may apply to archivists and librarians in tenure-track positions. Gold (2012b) alludes to a presentation at the 2011 Modern Language Association’s (MLA) “History and Future of Digital Humanities” panel in which Stephan Ramsay identified the mandate to build (including coding) things in order to be considered a digital humanist. According to Ramsay (see Gold, 2012b), IT skills to build repositories per se will not make one a digital humanist; however, collections and metadata both support searching, information retrieval, discovery, and writing in digital humanities scholarship constitutes scholarly work in the digital humanities. The role of archivists engaging in Digital Humanities research is vital, as they operate in a similar theoretical framework while developing the necessary metadata framework and the data.
Ramsey and Rockwell (2012) address the difficulty in interpreting and determining “what ‘digital work’ means in the humanities, and the context in which that term is being applied [which] can differ between scholarly and non-professorial positions, and the normative concerns of tenure and promotion” (75). According to Rucker (see Ramsey & Rockwell, 2012), digital artifacts are theoretical in nature because they reify and communicate knowledge. Lev Manovich (see Ramsey & Rockwell, 2012) classifies digital artifacts such as software, code, and platforms as prototypes that validate theories in digital humanities instead of predicting as theories do in the sciences. Likewise, digital artifacts such as research data, digital objects, metadata, and the digital repositories used for digital curation are also prototypes that explain observations through metadata. Whether they should be experimental or developed for the purpose of modeling, digital artifacts remain theoretical in nature since they must integrate into the ontology of the discipline via the various subject fields in the metadata record. Hence, this level of relationship brings archivists in direct connection with digital humanists. If building things in Digital Humanities is equivalent or equal to communicating scholarship, then we must accept the argument that artifacts are also an act of scholarship.
Ramsey and Rockwell (2012) approach digital artifacts as theory in that they view digital artifacts as hermeneutical instruments used to interpret events, phenomena, or answer humanistic questions. Digital artifacts are suitable for hermeneutic activity since they present contextual data for intersubjective reading, analysis, discussion, and interpretation; thus, they become “theory frameworks” for interpreting. Visualization tools, they argue, may also be hermeneutical tools, and while digitized artifacts per se are not, the presentation of analytical and interpretive information in the metadata is scholarly activity. A significant part of metadata (such as provenance information) requires research on the curator’s part, which is publishable content. Considering the digital artifact as a theoretical model, Ramsey and Rockwell (2012) turn to the basics of computation wherein computers are used to “transform information from one state to another” (81)—which is identical to reformatting in digital terms, and presenting descriptive information in the form of metadata—also a strictly electronic (and structured) form of data. Bauer (2012) recognizes the theoretical significance of databases that present a prototype of data deeply theoretical in scope and methodology. “When we create these systems we bring our theoretical understandings to bear on our digital projects including (but not limited to) decisions about: controlled vocabulary (or the lack thereof), search algorithms, interface design, color palettes, and data structure” (para. 7).
The debates and other communications about discursive concerns in the digital humanities lead to the growing argument that the digital projects undertaken in the digital humanities merit recognition as scholarly activity with theoretical foundations. A reasonable extension to these arguments is the inclusion of work by archivists, curators, and librarians who collaborate in such projects.

The role of archives in the digital humanities

The definitions and characterization of the digital humanities and the discussion of discursive concerns in the field can help in identifying the roles archives play in the digital humanities. This is because archives have systematically appraised and accessioned collections with relevance to institutional (including curricular) needs in mind, and carefully appraised and processed collections only benefit the field’s recognition in scholarly communities. Integrating archival collections into the conceptual framework of digital humanities takes into consideration the theory and practice in archiving and digital curation. What are the roles of archives and digital curation in the digital humanities context, and to what extent does digital humanities discourse elevate or clarify these roles? The growing need for raw data in digital humanities scholarship communicates new expectations for archives and libraries, which already produce enough data as part of the metadata record to facilitate visualization. For instance, geospatial data on donors from the deed of gift or other donor records would not only support provenance research on donation history in specific geographical areas during a specific period but would also allow mapping the donors in a timeline, and allow researchers to establish relationships among several donors by specific regions and periods.
This is where archives and special collection areas can make tremendous contributions to digital humanities scholarship, but at the same time retain nondigital research methods. As Schmidt (2011) writes, “The unreconstructed texts of the past make us think in old ways. Archives, libraries, censuses, atlases: all of these force us to read juxtapositions far more aligned with historical ways of thinking than the reconfigurations possible with digital texts” (para. 10). Archives have been, still are, and will be collecting materials from the past when historical records were produced in print, but at the same time, they can also participate in the transition to digitally oriented and reconfigured landscape.
The theoretical scope of digital projects in the digital humanities may scaffold a similar recognition of archival and librarian work associated with such projects. In their introduction to the Companion to Digital Humanities—which clearly speaks to this issue—Schreibman et al. (2004) write,

Widely spread through the digital humanities community is the notion that there is a clear and direct relationship between the interpretative strategies that humanists employ and the tools that facilitate exploration of original artifacts based on those interpretative strategies; or, more simply put, those working in the digital humanities have long held the view that application is as important as theory. Thus, exemplary tasks traditionally associated with humanities computing hold the digital representation of archival materials on a par with analysis or critical inquiry, as well as theories of analysis or critical inquiry originating in the study of those materials.

“Principles, Applications, and Dissemination,” para. 1

The various debates in the digital humanities regarding the theoretical scope of digital projects include Bauer’s work (2012), which brings into this discourse the archival and information science theory related to the ontological, epistemological, and methodological aspects of archival and digital curation practice. Bauer recognizes research databases and digital libraries as prototypes having theoretical significance because they support data modeling and interpretation, which may be equally theoretical in scope. Owens (2011) addresses the variety of data across the traditional and digital humanities practices. He writes,

We can choose to treat data as different kinds of things. First, as constructed things, data are a species of artifact. Second, as authored objects created for particular audiences, data can be interpreted as texts. Third, as computer-processable information, data can be computed in a whole host of ways to generate novel artifacts and texts which are then open to subsequent interpretation and analysis.

“What is data,” para. 1

Owens recognizes that data in the digital repositories represent three dimensions of humanistic knowledge: First, the data itself represents a digital object that is either a digital reproduction of the physical original—one that humanists may or may not necessarily be interested in if the digital copy is reliable and certified. Second, the data itself supports a new narrative for some humanists; for instance, the provenance field may support meta-research or a completely new line of inquiry into the history of donations. Finally, the data itself may be useful in data visualizations for additional interpretations with applicable technologies.
Herein rests the role of archivists and librarians with description and metadata skills to reinforce the semantic connections between object and context. If the building of collections, data analysis, and interpretation fall into the category of scholarship, will the work undertaken by archivists and digital curator also do? If digital preservation, metadata (and ontology) construction, information architecture, and digitization involve computing, then one must also ask if some of the work by digital curators also qualifies as modeling. Charles Isbell (see Ramsey & Rockwell, 2012) regards computing as modeling activity through which the modeler establishes correspondence between a phenomenon such as an event, place, or community, and the computer that contains the data describing such phenomena. The language associated with digital artifacts—that is, metadata—is a symbolic representation of the world captured in digital form. In digital preservation terms, therefore, it is equivalent to presenting an abstract (historical or imagined) world in the form of a digital object comprised of a combination of media, text, and data.
Owens and Bailey (2012) argue that using digital interfaces such as Viewshare to visualize data constitutes a mode of inquiry that is more than merely providing access to information. “Visualization can be thought of as part of a hermeneutic research process: ‘generative and iterative, capable of producing new knowledge through the aesthetic provocation.’ In short, the development of an interface to a collection is itself an interpretive act, which brings to light particular vectors for further exploration and interpretation” (para. 2). This understanding echoes a similar position by Ramsey and Rockwell (2012) who argue that digital artifacts are theoretical in scope. Digital humanities data curation brings actual data from archival materials into a new level of curation that focuses on annotations, visualizations, and other hermeneutic research activity. In a relationship between archivists and digital humanists, both are involved in scholarly activity even though the project workflow assigns researchers to specific activities. Furthermore, digital curation is scholarly pursuit if the collections that archives build, preserve, reformat, and promote meet established criteria for theory.
Digital collections, virtual exhibitions, virtual museums, and other digital artifacts must project beyond esthetic appreciation, frame the learning process (with a well-planned information architecture), and present theoretical problems to the researcher or to the curator or to both. Theoretical questions must include the following:
1. Does the collection contain information on organizational, community, or personal history to support critical historical writing?
2. If there is no such material (possibly because the donated material did not contain any), does the arrangement of historical records represent the internal structure of the organization?
3. Can that arrangement lead to or support new theories?
4. Does the metadata record present critical and verifiable information on provenance?
5. Does the metadata record contain subject terms that delineate and represent the domain within which the records were created?
Thus, if the information in the metadata record can drive new humanistic and historiographical questions for further analysis, such digital collections provide the intellectual framework for scholarly and theoretical work and therefore merit recognition on similar grounds. Therefore, defining the digital humanities in terms of its relationship with archives and digital curation should take into consideration archivists’ ability to produce knowledge of scholarly and theoretical value.

Archives and the linguistic turn

Along with the increased use of archives came a linguistic turn leading to a variety of new meanings associated with the term “archives” and to an extent that such references no longer associate the term with an archival facility, specialization, process, service, or materials kept for preservation. Cox (2005) underscores the importance of archives in the information age amidst the growing criticism of archives as “bureaucratic obstacles” (p. 209). With the vision of paperless offices come questions of accountability not backed up with commitments to preserve the digital records, which have evidently proliferated often beyond institutions’ and corporations’ control. Cox also addresses the trustworthiness of records with respect to authenticity and reliability. For the purposes of this book, these two aspects of digital records should be of concern to humanities scholars and historians regardless of the extent to which they are using (or even relying on) digital information in repositories and social media.
Theimer (2012) addresses the widespread misunderstanding, misuse of, or challenges to using “archives” in its historic context, and the lack of awareness about the fundamental differences between archives and manuscript collections, both of which remain vital primary sources for digital humanists. The divergence in the usage of the term “archive” is nevertheless important because many digital humanities projects—including digital historical representations (Sternfeld, 2011)—continue to apply this term in reference to materials from traditional, digital, and hybrid archives.
Theimer argues this point by pointing out the formal definition according to which archives are

Materials created or received by a person, family, or organization, public or private, in the conduct of their affairs and preserved because of the enduring value contained in the information they contain or as evidence of the functions and responsibilities of their creator, especially those materials maintained using the principles of provenance, original order, and collective control.

Peace-Moses, Ed. 2005, “archives”

The information on provenance, for instance, provides important historical context to scholars researching the history of objects and that may apply to born-digital content, assuming the increasing scarcity of such contents due to obsolete technologies (hardware, software, and file format). Although collective control per se may not be obvious in virtual exhibitions, the information architecture of virtual exhibitions can represent the physical–logical organization of the original collections. It is important to assume some level of organization—in the form of an organizational chart for instance—which may have been in place prior to transferring and accessioning the records, and such a visualization may frame the arrangement of records at the time of archival processing. Finally, the concept of original order may mean keeping together records that otherwise have little to do with each other but are related through the historical context, such as letters of Civil War soldiers to their families, or a collector’s subsequent activity. Thus, a broader thematic framework must be evident for such records to appear together.
Price (2009) questions the meanings of such terms as archives, databases, editions, projects, and thematic research collections. Do these have any different and significant implications for digital humanities scholarship, or are they merely the legacies of predigital institutions of humanities, archives, and scholarly communication in general? The legal and ethical framework within which archivists work may never exclude predigital records even as the momentum for digitization is still growing. What digital humanists should realize is that there is much more to archives than the instant access (and gratification) coming with digital collections; there are buried treasures in those archival boxes that are worth the travel and digging through old papers.
As for the use of “archives” in the digital humanities context, Price cites the definition by Peter Schillingsburg (see Price, 2009), which is noteworthy since it dramatically diverts from the historic mission and nature of archives. If, according to Schillingsburg, an archive is merely a “library of electronic texts, linked to explanations and parallels and histories” (as cited in Price, 2009, “Archives and Digital Thematic Research Collections,” para. 3), the term implies a product evolving outside of the archival practice since archival processing does not involve interlinking selected individual records. Theimer (2012) points out that archivists prepare finding aids that help researchers navigate throughout processed—that is, formally accessioned, described, arranged, and catalogued—collections. Internal hyperlinks in HTML, PDF, and EAD (Encrypted Archival Description) finding aids enable researchers to navigate throughout very large finding aids and between key sections of finding aids. External links may take researchers to related collection finding aids or selections of materials digitized from the collections. Digital humanists can collaborate with archivists and digital curators to develop highly specialized digital archives that focus on a particular humanistic or historiographical question while take advantage of high-quality digital collections and finding aids to navigate knowledge in that field.

Digital humanities projects involving archives and libraries

Since the mid-1980s, there have been copious efforts to innovate scholarship and teaching through CD-ROM and early Web-based projects. Although archivists and librarians have contributed to these projects, there were no digital curation standards to ensure long-term preservation and access. Madden (2008) addresses lessons learned in the American Memory project, which started in 1998 with disadvantages of not having any standards for the creation and preservation of digital content. Archives and libraries played a visible role in developing digital curation standards from the earliest conversations on digital curation in the United Kingdom and United States. The development of metadata schemata (Dublin Core, Visual Resources Association, and Cultural Description for Works of Art, and others) and professional standards involving archives and libraries has significantly improved access to digitized collections from archives, libraries, and museums. Archivists were active in the development of archival description and encoding standards such as the DACS (Describing Archives: Content Standard) and EAD, respectively.
The Companion to Digital Humanities (Schreibman et al., 2004) ascribes practical and theoretical significance to text encoding. Although textual analysis is not synonymous or otherwise equivalent to archival practices, the Text Encoding Initiative (TEI), with early beginnings in 1987, had interested archivists and librarians from the beginning. Mylonas and Renear write,

TEI is now itself a research community, connecting many professions, disciplines, and institutions in many countries and defining itself with its shared interests, concepts, tools, and techniques. Its subject matter is textual communication, with the principal goal of improving our general theoretical understanding of textual representation, and the auxiliary practical goal of using that improved understanding to develop methods, tools, and techniques that will be valuable to other fields and will support practical applications in publishing, previous hit archives, and libraries.

As cited in Renear, 2004, Chapter 17, “Larger Significance of TEI,” para. 4

Not only does the TEI standard highlight structure, but within a text, it presents an ontology that, via proper mapping, corresponds to specific fields in Dublin Core, MARC (MAchine-Readable Cataloging), or other metadata schemata. While mapping at a low (item-) level of granularity may be laborious and cost-prohibitive, the strategy opens new research avenues for digital humanities scholars to linked data and, through that, new knowledge hitherto unassociated with known texts. For instance, mapping between geospatially tagged oral histories and other digital objects (images and hypertext) elsewhere is possible. If they all reference the same or nearby locations, a digital map may expose spatial and/or chronological proximity of hitherto unassociated people, events, and places for historians and other digital humanities researchers to consider further study. Metadata for oral history transcripts with geospatial data can help researchers establish thematic and logical connections to existing bodies of text proven incorrect, falsified, or fabricated, or may simply underscore the veracity of information in some texts.
TEI projects were among the earliest digital humanities projects involving archives and libraries sharing and curating metadata. The rationale behind text encoding was simple: to present text using computer technology and in machine-readable form using the XML schema, which evolved out of the earlier SGML schema. This approach has not only presented (digitally transcribed or born-digital) text but also the local structure of the material in a way that supported analysis (and it required textual analysis), interpretation, and organization.
Archives may also play an important role in verifying the transcription for accuracy and reliability for researchers to use as primary sources. While TEI texts are digital—and hence, suspect as inauthentic in some communities—PDF files may eventually allow access to layers of text produced by Optical Character Recognition. If the text in such layers were open to reading and editing (instead of the excruciatingly slow word-by-word inspection of suspects in currently available software), it would allow archivists and historians to verify and tag such texts for analysis. The PDF/A file format enables archivists to add a digital signature attesting to the authenticity and reliability of the TEI representation of the original text, and a save action will prevent others from altering the document. This approach may not only support TEI tagging within the PDF file but allow digital humanists to view the original page image. This proposal has been submitted to the PDF Association’s LinkedIn group for further discussion (see Sabharwal, 2012b), and the emerging PDF standards may one day support this application of the TEI schema. In addition, such data in TEI tags are stored in databases that can be queried and retrieved in archival repositories and library databases; thus, TEI offers continued support for digital curation in a linked data environment.
Among the various digital humanities projects involving archives and libraries are the thematic research collections, which Palmer (2004) calls “digital aggregations of primary sources” (Introduction, para. 1.) that digital humanities scholars can develop in collaboration with archivists and librarians. Thematic research collections are in hypertext format, containing digitized and heterogeneous primary sources. Therefore, archives may see an advantage in working with diverse technologies, data types, and file formats—a strategy using format-neutral, universal, and open metadata schema. The Dublin Core metadata schema, for instance, has been in use for digital collections since it supports interoperability across several platforms. Researchers’ ability to search a wide array of digital resources and catalogs, retrieve the digital content, and discover related library holdings is a goal of archivists and librarians interested in digital humanities projects.
The challenges to libraries in this environment, according to Rydberg-Cox (2006), were not only to introduce digital resources as an extension to print material on which the traditional system of scholarly communication has evolved for centuries. If the digital projects replace print resources, then there will be questions as to whether these new digital corpora will compete with other commercially available products with much more robust citation analysis tools such as Thomson Reuters Web of Science. Besides, the preservation and curation of digital resources will add to the list of new challenges, which libraries with a solo curator or solo digital initiatives librarian will understand to be formidable. Rydberg-Cox also points to the implications these developments have on libraries, which have thus far focused much of their professional energies on measuring their impact on scholarship, readership, and teaching, as well as improving such areas as interlibrary loans, reference, instruction, and outreach. Working with digital humanities scholars will also mean collaborating on new and original archival and library resources that focus around specific humanistic and historiographical questions.

Digital humanities project descriptions

The projects described in this illustrate multiple scenarios for using archives and manuscript collections with varying extent of collaboration with archivists and librarians. In the scenario where the projects had direct access to primary source collections, the collaboration must have benefited from the direct relationship with archivists and librarians who provided finding aids, metadata, and other bibliographical information. In cases where metadata was available via a subscription-based service, the project used such data for analysis, visualization, and interpretation. Elsewhere it is evident that first edition books or books out of print were the source of the information. Finally, there are such projects as Viewshare, that allow archivists and librarians to submit specific data for digital humanities curation in order to enable researchers to analyze, and interpret with the help of maps and timelines generated by the project. This chapter does not investigate the legalities of using those publications for digital reproduction; the discussions herein proceed with the assumptions that the project leaders have obtained the necessary permissions or used works in the public domain. In most cases, archives and libraries transfer the responsibility to identify copyright holders to the researchers who must make all efforts to locate the copyright holder, heir, or agent for permission. An analysis of bibliographic sources may also help researchers identify orphaned works.

The Perseus Project (1987)

The Perseus Project (“Perseus Digital Library,” n.d.; see also Wiltshire, Pearcy, Hamilton, Eiteljorg, & O’Donnell, 1992) is an early digital humanities project to which archives, special collections, libraries, and museums have contributed with photographs and texts since its launch at Tufts in 1987. The project planning had begun in 1985, and the CD-ROM of Perseus 1.0 by Yale University Press came out in 1992, followed by version 2.0 separately for Macintosh users in 1996 (“Perseus Version History,” n.d.). Because of the launch of the Web in 1994, 1.0 is of particular historical significance since it demonstrates the presence of hypertext structures and information architecture. Equally important are the sources of information since the Perseus Project focused on Greco-Roman art and architecture in digitized images and hypertext. The site was developed in HyperCard, which preceded the Web’s hypertext environment. It featured guided tours using paths to the contents and the other CDs. The tours included guided, novice, expert, philological, art, and archaeology tours designated for beginning and advanced users. Maps were available to allow visually enhanced navigation commonly available on Google maps today. Additionally, search tools, direct and indirect links, and the lookup tools were also available as part of the site’s information architecture.
The primary texts in Ancient Greek with translation into modern English featured 31 authors (dramatists, philosophers, poets, and historians) including Aeschylus, Aristotle, Euripides, Herodotus, Homer, Pindar, Plato, and Sophocles. The Perseus site (Tufts, n.d.) lists sources in excess of 2,500 works and over 3,500 encyclopedic entries from Loeb Classical Library. Images of vases come from Boston, Mississippi, and such institutions as Harvard University. A significant part of the collections comes from Tufts University Library and various digitized collections, which illustrates the role of archives and libraries during the early years of the digital humanities.

CD-ROM projects

Who built America? (1995)

Who built America? was an experiment (Thomas II, 2004) of the American Social History Project based at City University of New York, bringing social and labor history into the context of America’s national history between 1876 and 1914. It was a project involving Roy Rosenzweig, Steve Brier, and Joshua Brown to introduce a new multimedia form of scholarship and teaching, but its stand-alone architecture had already presented a risk in an age that steadily and quickly moved toward hyperlinked and networked texts on the World Wide Web launched in 1994. It contained 450 pages accompanied by a CD-ROM containing multimedia histories, songs, recorded speeches, oral histories, and primary source materials in print form. The availability of archival materials in edited (i.e., typed) format has elicited criticism (see Darien, 1998) and praise (Saillant, 1994), depending on the audience: historians may have preferred to see the original texts perhaps side by side with the transcription while a general audience may appreciate the easy-to-read format and visual accessibility. The textual materials from archives appear in edited form, which demonstrates the role of archives in selecting source materials for the project.

Valley of the Shadow

Among the earliest digital humanities (and historical scholarship) project to appear on the Web was the Valley of the Shadow project undertaken by Edward L. Ayers at the University of Virginia. The project focuses on Civil War era records, which typically come from two sources: institutional manuscript collections and private collectors. The project was available in CD-ROM as well as Web version. Ayers (1999) referred to his project as “digital archive” and “interactive ‘album’ that organizes the story in thematic and chronological spaces, providing launching points into the archive and using multimedia as effectively as we can” (para. 6). Such a digital archive is also a critical resource for analysis and interpretation. This “digital archive” has extensively relied on manuscript collections at the University of Virginia, U.S. Army Military History Institute, Carlisle Barracks, Pennsylvania State Archives, National Archives and Records Administration, Virginia Military Institute Archives, Library of Virginia, and several other collections of regional and national significance. The current Web site provides access to transcribed texts, census records, maps, and images via an image map in the form of a floor map with an overview of a physical archive. Each level corresponds to large categories of records such as “The Eve of War,” “The War Years,” and “The Aftermath.” While the arrangement of the archival material here fits the information architecture of the project site, the ontological structure of the archive is present through the indexing and topical arrangement of the material on the site.

Text Encoding Initiative projects

Aside from the continuous misuse of “archive” in the digital humanities context, Text Encoding Initiative (TEI) projects have involved archives and special collections making their collections available for textual analysis and interpretation. Willett (2004) alludes to such projects as “textual archives” as envisioned for the works of Yeats, the Canterbury Tales Project, the Project Gutenberg, Women’s Writes Online, and the works of Henrik Ibsen in Oslo, and Isaac Newton’s manuscripts in London.

Binder’s book

McCarthy, Welsh, and Wheale (2012) report on the basic application of TEI tagging in the case of Bodleian Library’s Binder’s Book (BB) at Oxford University, a 150-page record of seventeenth-century book binding orders. The authors considered the potential audience for such a resource to include rare book librarians, collectors, bibliographers, digital humanities scholars, archivists interested in such aspects as provenance, and graduate students interested in TEI encoding. From a digital humanities data curation perspective, this TEI project offers value to the library wishing to curate and share such data with researchers. The authors address the methodological conflict between the requirements of the EAD schema used for finding aids and the TEI used for textual analysis:

Text encoders should act with the awareness that they impose meaning upon texts, rather than merely presenting them for others (McGann, 2001). This can be problematic when the document expressed is technically archival in nature, as the BB. The archive world has developed Encoded Archival Description (EAD) for finding aids; it has its own system for allowing interpretation by the encoder, but the world of textual encoding has no similarly evidential language.

McCarthy, Welsh, & Wheale, 2012, p. 564

Aside from such methodological differences, there have been attempts to establish crosswalks between standards and schemas. The Optimizing Resources for Repositories and Archives working group (METS, 2009) has addressed integrating such metadata standards as TEI, EAD, DDI (Data Documentation Initiative), and Metadata Encoding and Transmission Standard (METS). Reconciling the domain-specific encoding and description standards of the archivist and digital humanities communities of practice will remove barriers and foster new projects.

Thematic research collections

The William Blake Archive (Eaves, Essick, & Viscomi, 2014) has been a freely available site since 1996 with a focus on William Blake’s prints and poems. The illustrations are accessible on separate pages as to minimize obstructions to linear reading of the text. The intuitive arrangement of illustrations allows viewers to access the illustrations in a separate frame although this approach may not quite support assistive equipment and users relying on such support. The digital archive draws its sources from several contributing sources such as Auckland Art Museum, Bodleian Library, Glasgow University Library Special Collections Department, Library of Congress Rare Book and Special Collections Division, New York Public Library Rare books division, and many others. The metadata records with the contributed material contain such information as technical metadata, provenance, and other administrative metadata. The images have been digitized with a resolution of 300 ppi using TIFF file format for “dark archive” preservation and detailed viewing. Additional storage options include DVD disk copies as well as storing raw images in a storage provided for the archive by the Carolina Digital Library and Archives. This project illustrates that the archives are no longer just providing data and content for viewers’ and researcher’s sake—they are also involved in the digital preservation of the digital archive contents.
The Walt Whitman Archive (Folsom & Price, 2011) began in 1995 with a focus on the life works of Walt Whitman who left behind, in addition to his poetry, notebooks, manuscript fragments, prose essays, correspondence, and journal articles, all of which present important contexts for analysis of his works. The original site also offered a biographical essay by Ed Folsom and Kenneth M. Price and supplementary biographical materials. The current site also provides a timeline. More importantly, researchers can find hypertext and facsimile versions of his works. The archive makes extensive use of TEI methods to reveal editing work in his manuscripts, which allows researchers to follow the changes to his work.
The presence of archival resources is evident: not only does this site offer its own library of TEI-encoded files, audio and image files, notebooks, manuscripts, and translations; it also presents alphabetically arranged finding aids to manuscripts at various repositories holding Walt Whitman’s works. The listing of finding aids identifies various source repositories such as the American Antiquarian Society, British Library, and Harvard University. Manuscripts Department, Houghton Library, Huntington Public Library, Library of Congress, Liverpool Central Library, Musée de la Coopération Franco-Américaine, University of Pennsylvania’s Walt Whitman Collection, Walt Whitman House in Camden, and many more. The structure of these integrated finding aids represents the arrangement of the original archives in series with links to digital images—an added benefit of EAD finding aids. Each series-level description provides the identification of the content in the home repository, which enables researchers to search for originals at the home repositories. As archival finding aids go through subsequent updates due to added items and background biographical information, one may see the benefit of links to those finding aids. In fact, there are no links to holding institutions in the Whitman Archive, just their contact information. These internal finding aids, however, lend a solid information architecture to the entire site. In some cases, the Web site of the individual repositories may not even have an online-accessible finding aid, so these integrated finding aids are very helpful. A quick comparison with the Whitman Archive finding aid for the Trent Collection of Whitmaniana at Duke University, for instance, reveals differences in detail (like in the Abstract) but the structural correspondence is reliable. The argument for the integrated finding aid approach is logical, however, given the migration of content from one infrastructure to another: despite some minor differences, the Whitman Archive provides the stability of information that researchers can highly appreciate. The archive demonstrates a close integration of archival role into this project.

Recent digital humanities projects

Digital Literary Atlas of Ireland

Recent innovative developments in the digital humanities involve the transdisciplinary use of geospatial data to emphasize the spatial, not just the temporal, dimensions of digital humanities since narratives migrate when people do, which directly contributes to the diffusion of artistic, musical, literary, and other creative genre over extensive geographical spaces. The Digital Literary Atlas of Ireland (Travis, 2010) project at the Trinity College of Dublin combines biographical data with interactive timeline and geospatial technology, which present Irish literary history and biographical knowledge in a temporal–spatial context. The project focuses on the literary, historical, and cartographic perspectives on Ireland between 1922 and 1949 through the eyes of 14 Irish writers. The user interface presents three access points to biographical information:
1. featured authors’ life paths with biographical narratives and references to sources;
2. timeline using Google Earth timeline outlining the life of these authors as they moved around. This data has significant bearing on their development as writer during these periods; and
3. vimeo presentation of maps, which begin with an overview of the terrain, gradually zoom in to the street-level view, though not quite as close as Google Street view but a Google Earth plug-in allows a significant level of interactivity for viewers interested in the geographical aspects of humanities.
The significance of these Irish writers in the context of Irish national history and cultural identity can be accurately established through authors whose works have not only shaped the lives of these authors, but may have had transformative effects on their literary works.

Mapping the Republic of Letters

The Mapping the Republic of Letters (2013) project seeks to visualize the international network of scientific academies through the networks of correspondences and patterns of travels followed in this project. The time frame for this project was from Erasmus to Benjamin Franklin, which spans the development of modern sciences and humanities from the Renaissance to Modernity. Given the geographic and date ranges, the project presents case studies as individual frameworks for data presentation. The results were manageable sets of visualized data for individual writers like Voltaire, Galileo, Locke, and many more. Based on the data that could be generated from the material located in archives and manuscript collections, the next step was to visualize such information as the number of letters sent by Galileo in a given year between 1588 and 1616, and their recipients. The geospatial visualization of Voltaire’s and Locke’s correspondences shows geographical overlap in the two individuals’ scholarly networks although their lives have only had a 10-year overlap. The digital curation and visualization of such data may present surprising facts such as mutual acquaintances of historical personages who did not interact with each other. The network map reveals geographical proximities of famous people whose relationships otherwise might not have been known due to the lack of literary or historical study. This type of visualization provides new avenues for humanities scholars to follow.
Although the background information on the site does not present any direct involvement of archivists and librarians, the material for this project included 55,000 records and 6,400 correspondents (“Mapping,” 2013) in the Electronic Enlightenment Project (EEP) database, which is available via subscription and contains nearly 64,000 historical records as of 2013 and a network of over 8,000 historical figures. The EEP is a product of the Bodleian Library at Oxford University with its own archives and manuscript collections, but the project sources include information on almost 60,000 manuscripts and over a 100,000 early edition sources.

Archives, ViewShare, and digital humanities data curation

Of all the digital humanities projects, perhaps this model illustrates the fullest involvement of archives in the digital humanities landscape. Archivists and digital curators collaborating on the preservation of content and metadata can extend access to the data in the metadata record in ways that enable researchers to use such data. For instance, timelines can use the values in the date fields to generate timelines, and of recent, there have been efforts to include geospatial data in designed fields to allow such services as ViewShare (n.d.) to generate maps. Viewshare requires either a metadata worksheet or an Internet address (URI) where it can harvest the metadata complying with the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) standard. As the discussion of the Republic of Letters has demonstrated, researchers can use these tools to research hitherto unrelated events and people in nearby places.
Windhager and Mayr (2011) have introduced two visualization models for the navigation of information in space, time, and ontology: geographic and topographical space-time-cubes. These methods facilitate the visual representation of time, space, and ontology in a single three-dimensional space, in which human and other historical relationships unfold over a period of time in specific geographical locations and in relation to various topics. Although these visualization models have presented in the context of museum exhibitions to orient visitors, their applications may extend to representing information visually in digital repositories, virtual museums, and in such visualization tools as ViewShare or EEP. This is definitely an opportunity for archivists and librarians to play a more visible role in humanities data curation since they already have all or most required metadata: date, temporal and spatial coverage, subject, names, and provenance.
Many archives have submitted data to Viewshare for data visualization. The History of Fairfax County in Postcards (Trow, 2012) illustrates an ideal use of Viewshare, as it plots the locations of the postcard sites on a map with pins. Clicking on the pin will open a small pop-up window with the thumbnail and description. Viewshare can also generate a pie chart, timeline, as well as gallery with the images residing on the collection’s host server; Viewshare simply harvests the metadata and makes available any visualization with the help of code that archivists and technologists can embed on their institutional Web site.

Digital humanities curation in the classroom

Thus far, this chapter has focused on defining, characterizing, and discursive aspects of the digital humanities and the role of archives, but has not addressed pedagogy in the field. While this book does not focus on pedagogy and learning theories (e.g., constructivism), it places archivists in traditional classrooms and e-learning environments with an important role to fill as teaching partners or even as instructors. The conceptual model in this chapter demonstrates this role as well as the relationship of interpretive layering and metadata enhancement, both of which comprise collaborative curation in the classroom. Panofsky’s approach to iconographical study serves as a framework even though its original context for the methodology was art history and the interpretation of art. Applying some of the methodology to classroom curation of works of cultural, literary, and historical significance may present learning opportunities in current learning environments as well. More importantly, any advanced course may potentially turn into a thematic research collection or some digital archive described earlier in this chapter. Another chapter (on information architecture) will address the critical hypertextual and navigational relationships between the classroom and digital resources.
Recent professional literature notes the lack of coverage on pedagogy in Digital Humanities. Brier (2012) notes that the Digital Humanities Quarterly (at or near the publication of his work) contained 19 (out of the 90 plus) articles related to research and only two to pedagogy. Full-text searches on “research” received 81 hits whereas the figure for “teaching” and “learning” was at least 40 and nine for “pedagogy.” In his calculations, Hirsch (2012) notes that while the Companion to Digital Humanities (published in 2004) contains 504 instances for “research,” instances of “pedagogy” only appear eight times; “teaching” 60 times, “education” 30 times, “teach” 7 times, and so on. These numbers indicate the visibility of such topics in the Digital Humanities research community, and may be of concern to archivists and librarians collaborating educators in the field.
With advanced classroom technologies and learning management systems (such as Blackboard, Moodle, Sakai, or others), educators can integrate digital collections and digital humanities sites into their curricula and specific coursework. The student learning objectives in such courses may include learning about ancient civilizations and Western traditions (as covered in most humanities courses in the United States) to developing digital humanities curricula (for education majors), developing digital humanities projects (for humanities majors), and developing digital archives (for library and archives students). In most cases, online courses include lectures with materials from digital archives, digital libraries, and virtual museums. The role with instructors is to relate that information to the course material while students will analyze and interpret what they select for their assignments. Kapelos and Patrick (2012) report on an architectural course engaging special collections and subject librarians in the process at Ryerson University. The archival materials included architectural photographs, rare texts, and other materials in a physical classroom setting. A curator’s role may range anywhere from selecting collections to support specific coursework to teaching an entire course for humanities and history students in using primary sources (in print, analog, and digital forms). The curation process may involve writing annotations to enhancing the metadata through interpretive layering (Flanders & Muñoz, 2011).
This last section focuses on the relationship of interpretive layering, metadata enhancement, and the application of Edwin Panofsky’s iconological framework (Panofsky, 1962). Although designed specifically for the interpretation of art, Panofsky’s three-level iconological approach is ideal for the study of humanities whereby the interpretive layering process begins with the identification of the artifact or document discovered and located through research. Reliable descriptive metadata is vital as it can considerably improve the identification process, which is essential to the selection process in order to lend a project much-needed focus. This approach enables students to enhance the existing metadata record (supplied by the curator or subject librarian) with missing data and information discovered through research.
At the initial stage called preiconographical description, “the objects and events whose representation by lines, colors, and volumes constitutes the works of motifs can be identified…on the basis of our practical experience” (Panofsky, 1962, p. 9). In curation terms, all metadata contain simple data and information to aid the identification of author or artist, title, style, geographic location, language, provenance, time period, cultural context, and others. This metadata must support historiographical or hermeneutical analysis at the subsequent stages.
The next stage (called iconographic analysis) focuses on contexts such as themes, concepts, stories, and allegories, which require greater familiarity with such foundational elements as objects and events. Students begin working with information specific to the artifact and beyond the general information that appears in the initial metadata record, and this stage requires some research of related sources. They can enhance fields like dc.description.abstract or dc.description.notes, which provide adequate space for analytic annotations. These repeatable fields also present space for critical notes by curators and faculty for future reference. Panofsky (1962) ascribes great importance to iconographic analysis, which requires correct preiconographical description.
The last stage (called iconographic interpretation) focuses on the interpretation of deeper, intrinsic meanings, deeper meanings of objects, events, content, and symbolical values. After the three-tiered process, the effects of interpretive layering on the curation process show extensively in the metadata records. Curators will notice the difference in annotations from history students, compared to those from other areas of the humanities, which is due to the significant differences in methodologies across the humanities. Historiographical studies will produce different kinds of interpretative annotations in comparison to those produced by students in courses focusing on arts and letters and inviting more subjective interpretations. As a result, the metadata record will contain progressively deeper and more introspective keywords available for analysis and interpretations in future courses. Figure 2.1 below demonstrates this layering effect in the curation process.
The role of an archivist or curator selecting the course material and providing the metadata can significantly improve the learning experiences (and outcomes) of students having the benefit of reliable metadata and high-quality digital material from a well-curated digital collection. The enhanced information developed and added to a virtual museum or digital collection can support future interpretive and analytic studies in digital humanities coursework and professional research.
image
Figure 2.1 Interpretive layering on the curation process.

Conclusion

Archives and special collections play a vital role in the digital humanities through participation in digital humanities projects and teaching. As the digital humanities field expands, questions about scholarship and the theoretical nature of projects will emerge and require discussion. The debates in the digital humanities provide a framework for arguing that digital curation activities may also be just as theoretical in scope and depth, and the various projects demonstrate validity to such arguments. As digital humanities pedagogy gains greater ground in higher education, so will digital curation in order to provide the critical information students need in their coursework.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.91.44