Add to bookbag
Author: Julie L. Holcomb
Title: Preserving Digital Archives, Preserving Cultural Memory
Publication Info: Ann Arbor, MI: MPublishing, University of Michigan Library
November 2000
Availability:

This work is protected by copyright and may be linked to without seeking permission. Permission must be received for subsequent distribution in print or electronically. Please contact mpub-help@umich.edu for more information.

Source: Preserving Digital Archives, Preserving Cultural Memory
Julie L. Holcomb


vol. 3, no. 3, November 2000
Article Type: Book Review
URL: http://hdl.handle.net/2027/spo.3310410.0003.320

Preserving Digital Archives, Preserving Cultural Memory

Julie L. Holcomb

  • Gilliland-Swetland, Anne J. Enduring Paradigm, New Opportunities: The Value of the Archival Perspective in the Digital Environment. (Council on Library and Information Resources, 2000).
  • Price, Laura and Abby Smith. Managing Cultural Assets from a Business Perspective. (Council on Library and Information Resources, 2000).
  • Smith, Abby. Why Digitize? (Council on Library and Information Resources, 1999).

The Council on Library and Information Resources (CLIR) focuses upon maintaining access to information for the current generation, as well as future generations, of scholars, students, and the general public. CLIR serves as a forum for change, particularly as the organization "brings together experts from around the country and the world and asks them to turn their intelligence to the problems that libraries, archives, and information organizations are facing as they integrate digital resources and services into their well-established print-based environments." Additionally, CLIR serves as the administrative home of the Digital Library Federation, a collaborative group of research libraries and archives. CLIR publishes newsletters, technical reports, and research briefs; the full text of many of their publications is available on the CLIR website (http://www.clir.org) within a few weeks of the print publication date. Leaders in the field of digital libraries and archives author CLIR technical reports. Abby Smith, Laura Price, and Anne Gilliland-Swetland have published extensively about library and archival issues. In Why Digitize? Smith takes an objective look at the advantages and, particularly, the disadvantages of converting traditional analog materials into digital format. Gilliland-Swetland analyzes traditional archival practice and suggests how archivists can make a major contribution to a new paradigm for the design, management, preservation, and use of digital resources. Finally, Smith and Price team up to present a new paradigm of archival and library management that views collections as core institutional assets. Smith and Price's model of management provides a viable solution to funding and protecting the new digital assets librarians and archivists create in their digitization projects.

In her monograph Why Digitize? Smith presents a hypothetical student research project on the presidential election of 1860. The student conducts her research by "looking at digital images of daguerreotypes of the candidates, political campaign posters (a recent innovation of the time), cartoons from contemporary newspapers, abolitionist broadsides and notices of slave auctions, and the manuscript of Lincoln's inaugural address in draft form reflecting several different stages of composition." (8) While the originals are housed in various collections in geographically dispersed repositories, the materials are all available to the student at home because of digital technology. As Smith notes, "Digital technology can . . . make available powerful teaching materials for students who would not otherwise have access to them." (8) Smith's report, however, is not a call for the mass digitization of primary source materials; instead, Smith analyzes the costs and the benefits of converting analog materials into digital format.

The costs and benefits of digital primary source materials can be bifurcated into issues related to creation (including maintenance) and use. Creation issues include the fragility of the storage media, unreliable data, and discarded analog originals. Use issues include divergent digital collection development plans between and among institutions, and overly accessible, decontextualized materials.

Smith begins her report appropriately enough with a definition of digital information and how its inherent nature complicates digital imaging projects. It is important to note, as Smith does, that digitally encoded data is not a faithful representation of its analog counterpart. For example, "when a photograph is digitized for viewing on a computer screen, the original continuous tone image is divided into dots with assigned values that are mapped upon command." (2) Furthermore, digital information is not human-readable, therefore, it requires a machine and human intervention to preserve the digital data. Because of the fragility of the medium that digital information is stored on and the need to keep data "fresh" and encoded in readable file formats, digital information has a life span that is dramatically shorter than the poorest quality paper.

Another problem with digital images is the very plasticity of the technology. Images can be manipulated in infinite ways. For example, a recent television commercial featured a young Fred Astaire dancing with a modern Hoover vacuum cleaner. As Smith notes, it is difficult to ascertain the authenticity of the image or the text unless the user has an original to compare with the digital image. If users are accessing materials at a distance, this is not possible so the user must rely upon the integrity of the institution. This in turn forces institutions, or should force institutions, to look at how best to protect their digitized materials against unethical manipulation in order to retain their users' trust.

Failure to retain images after imaging is not a new problem. There is a significant difference, however, in destroying originals after microfilming versus doing the same thing after digitizing. Microfilm is an established preservation medium with a known shelf-life and can be read without machine intervention; conversely, digitized images do not have an established shelf-life and require extensive machine and human intervention for continued accessibility. Cultural memory, therefore, is at higher risk if originals are discarded after digitizing because the images (and cultural memory) may be permanently lost if the technology fails.

Coordinated digital collection development is not yet widespread, so the hypothetical student researching the 1860 presidential campaign would encounter numerous obstacles if she conducted her research using only online resources. Not every institution that holds relevant material has the resources to initiate digital imaging projects. Copyright issues also impact what archival materials are digitized. Smith notes a "disproportionate amount of public domain material [available on the Internet], which distorts the nature of the source base for research restricted to the Web." (11) "This skewed representation of created works on the Web will continue for quite some time into the future, and the complications that surround moving image and recorded sound rights means, ironically, that these will be the least accessible resources on the most dynamic information source around." Technology also hampers what is digitized for Internet access. For example, until Optical Character Recognition (OCR), "the post-processing technology that makes scanned text searchable, works well for scripts using non-Latin characters as for those using Latin ones, resources from around the world in vernacular languages will not take their proper place in the scanning queue." (11)

Digitization is access–lots of it, Smith notes. And that may not be as good as librarians and archivists initially thought. First, the digital experience of a document or an image "flattens and decontextualizes the images." (9) Digital images can only be viewed serially, a much different experience than spreading out the documents or the images on the table in the reading room. Secondly, materials are further decontextualized because they are removed from the larger collection of the holding repository. Other materials that may provide insight into the collection being viewed may not be available digitally. Finally, placing sensitive materials on the Internet for the average user is problematic. As Smith points out:

No one has to travel to a library, nor do they have to present proof of their serious research interest in order to gain access to complex, disturbing, and uninterpreted material. On the other hand, if one makes the difficult decision to edit out materials that are readily served in a reading room, but are too powerful to broadcast on the Internet, what does that do to the integrity of a research collection? (10)

Given these problems with the creation and use of digital images along with the high cost and labor-intensive nature of projects to create digital surrogates of analog originals, Smith advises caution when considering these projects. "To convert everything to digital form would be wrong-headed, even if we could do it. The real challenge is how to make those analog materials more accessible using the powerful tool of digital technology, not only through conversion, but also through digital finding aids, and linked databases of search tools." (13)

Digital technology is not only raising new problems for archivists, librarians, scholars, and students, it is also erasing many of the traditional "distinctions between custodians of information and custodians of artifacts," as well as among the various professions within these groups according to Anne Gilliland-Swetland. (iv) The "rapid development and widespread implementation of networked digital information technology" threatens to overwhelm even the bibliographic practices of the library and information science profession–"the most extensively articulated and widely and widely implemented in information systems"–unless a new paradigm is developed. (v) This new paradigm must, according to Gilliland-Swetland, "adopt, adapt, develop, and shed principles and practices of the constituent information communities as necessary." Further, "such a paradigm must recognize and address the distinct societal roles and missions of different information professions even as boundaries between their practices and collections begin to blur in the digital environment." (v) The archival science perspective, "as a profession that is interested in information as evidence and the ways in which the context, form, and interrelationships among materials help users to identify, trust, interpret, and make relevant decisions about those materials," can make a major contribution to a new paradigm for the design, management, preservation, and use of digital resources. (10)

Gilliland-Swetland's report recounts the history of archival science and the archival perspective. As she notes in her introduction, "this report seeks to explicate the societal role and resulting principles and practices that together form the archival perspective and to identify their historical origins and evolution. It also discusses what the archival perspective offers in addressing issues that arise in the digital information environment." (3)

Beginning with a basic definition what archives are, Gilliland-Swetland then discusses the societal roles of archives: legal, cultural memory, and the organization, dissemination, and use of recorded information. It is important to understand the societal roles of archives because it is in the fulfillment of these roles that archivists provide the necessary skills and knowledge to contribute to the paradigm Gilliland-Swetland is calling for. For example, in fulfilling their legal function, archivists and their repositories are "generally legally constituted entities responsible for identifying, managing, and preserving the integrity of an institution's official records of long-term value." (5-6) Basic archival principles such as respect des fonds, provenance, and original order "ensure that the intellectual integrity of aggregations of records is maintained and that individual records are always contextualized." (13) This contextual, evidence-based approach of archivists provides an understanding of the ways in which information is created and how records individually and collectively reflect the process of creation, as well as how the creation of the records reflects upon the creator of the records.

After examining the history of archival enterprise, Gilland-Swetland then persuasively argues the utility of the archival paradigm in the new digital environment. As she reasons, the principles and practices of the archival community demonstrate how that community "constructs information and why this construction needs to be understood and addressed in the digital environment. These principles and practices, independent of the archival construction of information, can also contribute to the management of digital information." (21) Archivists are "making significant contributions to research and development in the digital information environment by using integrity, metadata, knowledge management, risk management, and knowledge preservation." (21) Gilliland-Swetland then discusses each of these latter points in detail citing particular projects as further evidence for her argument. Integrity, for example, has two primary aspects–"checking and certifying data integrity" and "identifying the intellectual qualities of information that make it authentic." (21) We have "intellectual mechanisms by which we come to trust traditional forms of published information [such as] a consideration of provenance, citation practices, peer review, editorial practices, and an assessment of the intellectual form of the information"; however, in the digital environment, information "may not conform to predictable forms or may not have been through traditional publication processes." (22) Therefore, the new digital environment means "a more complex understanding of information characteristics is required for the intellectual integrity of the information to be understood." (22) Gilliand-Swetland cites two studies–Project Prism at Cornell University and the International Project on Permanent Records in Electronic Systems (InterPARES)–that are working on issues of data integrity in the digital environment. InterPARES, in particular, examines the characteristics inherent in "digital information objects" "in order to establish their authenticity and how that authenticity might be maintained over time." (23) The InterPARES project findings may well be relevant to understanding how to establish and maintain the integrity of digital objects in the larger digital information community.

Archivists will always be distinct from librarians, but the digital environment will blur those distinctions significantly. Gilliland-Swetland constructs a persuasive argument for the value of the archival perspective in the digital environment. The only certainty about digital libraries, archives, and materials is uncertainty.

Digitization stains information professionals and their institutions financially, as Smith noted in her monograph. As libraries and archives face increased competition for fewer financial resources, managers of these institutions must learn to apply business standards and practices to their organizations in order to survive. According to Laura Price and Abby Smith, authors of Managing Cultural Assets from a Business Perspective, "the new environment in which all cultural institutions find themselves [is] one in which business increasingly sets standards for operations and accountability." (2) Price and Smith's report is a case study of the application of the business risk model (as derived from accounting practice) to the collections of the Library of Congress. Under the traditional model of collection management, librarians and archivists have focused on the cost of acquiring and maintaining materials for their collections; however, this report presents a new model of management that views collections as core institutional assets which are managed for maximum productivity while controlling risk to their integrity. As assets, the collections are integral to fulfilling the mission (i.e. achieving the business objectives) of the institution. Price and Smith believe the advantage of this model is that it allows librarians and archivists to "express and justify their needs in terms familiar to financial officers and funding organizations–in terms of business risk." (2)

The typical flow of business-risk activities follows four general steps: risk identification, risk analysis, response to risks, and control assessment, and central to each of these steps is the institution's mission statement. The first three steps are dealt with in-depth in Price and Smith's case study.

The management of the Library of Congress, as presented in Price and Smith's case study, identified four "salient types of risk" to its collections: an undetected loss of collection items, an inability to serve patrons through accessibility to collection assets, an inability to acquire materials essential to the continued development of the Library, and a failure by management to acquire adequate information to determine whether the Library's objectives are being met (7-8). Once the library had identified the risks, it derived four corresponding types of "safeguarding controls" to mitigate the aforementioned risks to the collections; as Price and Smith note, controls need to be determined before risk to the collections can be assessed. The four controls identified by Library management were bibliographic ("What do we have?"), inventory ("Where are the items located?"), and preservation and physical security ("How can collection items be protected from physical loss or damage due to improper storage and handling?"). (10-11) Control activities and potential risks to collections from weak controls were then identified for each of the four controls.

Once risks and controls were noted, library management then divided the collections by format. Price and Smith note two advantages to segregation by format type: 1). only twenty-five percent of the Library's collections are books and serials; and 2). the degree of risk varies by format type. Therefore, a more efficient, and ultimately more accurate, risk assessment is possible if collections are separated by "major format types that tend to share similar risk." (9) Library management established a common language for this "segregation by risk" by using the names of five precious metals to "describe groups of items in the collections by degree of tolerance for risk." (9) The five categories create a scale of "value." For example, the platinum category includes the Library's most priceless items such as the Gutenberg Bible while, at the opposite end of the spectrum, the copper category includes items the Library does not intend to keep but wants to retain while a decision is being made (this would include items for possible sale and exchange programs).

Library management then conducted a risk assessment of its collections using the four risk types and the four controls. They began in 1997 with the Geography and Map Division and since then have assessed most of the special collections, general collections, and areas of the Library that perform essential activities.

Price worked closely with the Library of Congress in developing the business risk model she describes in her report; and Smith, who is currently director of programs at the Council on Library and Information Resources (CLIR), was associate librarian for library services for the Library of Congress and in that capacity worked on the development of this business risk model. Thus both authors had first-hand knowledge of the case under study in this well-written report. The authors present their case study in a clear, concise, and readable style. They often make use of tables and illustrations to further illustrate particular key points of their argument for the application of the business risk model in libraries and archives.

The model, as presented in this report, is particularly relevant because of the "new environment"–increased competition for limited funding–libraries and archives find themselves in today. And with the advent of numerous digitization projects by these institutions, the business risk model will provide a viable solution to fund and protect these "new" core institutional assets as well the "traditional" assets. Furthermore, Price and Smith's model calls for collaboration between and among institutions and staff, partnerships which are necessary for survival in the "new environment." In this model all staff play an important role in assessing and mitigating risks to core institutional assets. As noted by the authors: "the control environment is improved when the organizational culture places a premium on the integrity and competencies of its people and makes each person's responsibilities explicit and a factor in his or her overall performance evaluation." (19) Similarly, IT managers and collections managers "should cultivate relationships that support their complementary tasks" in order to make collections accessible for the long-term. (19) In summary, Price and Smith present a library-business model with that could potentially redefine how library and archival managers administer their collections.

The three reports reviewed here demonstrate CLIR's commitment to maintaining access to information for the current generation, as well as future generations, of scholars, students, and the general public. Solutions to the problems caused by the creation of digital information objects are being developed and CLIR is making every effort to be at the vanguard of disseminating that knowledge to information professionals and others interested in the long-term preservation of digital resources, thus preserving our cultural memory.

Review by Julie L. Holcomb