Digital Libraries: A Vision for the 21st Century: A Festschrift in Honor of Wendy Lougee on the Occasion of her Departure from the University of Michigan

The Five Organizational Stages of Digital Preservation


Anne R. Kenney & Nancy Y. McGovern

Introduction

For over a decade, the national media periodically has treated the American public to horror stories regarding the fate of important documents available only in digital form. Oft-repeated tales chronicle the loss of valuable scientific, government, and business data stored on punch cards or 8-inch floppy disks or written in computer languages long dead. At the same time, we know that the world is becoming increasingly dependent on digital information. A much-publicized study conducted at the University of California at Berkeley in 2000 estimated that 93% of the world's yearly intellectual output is produced in digital form. [1]

Despite the increasing evidence documenting the fragility and ubiquity of digital content, cultural repositories have been slow to respond to the need to safeguard digital heritage materials. Survey after survey conducted over the past five years provides a bleak picture of institutional readiness and responsiveness. In 1998, Margaret Hedstrom and Sheon Montgomery conducted a survey on behalf of the Research Libraries Group (RLG) on "Digital Preservation Needs and Requirements in RLG Member Institutions." Of the 54 institutions that completed the survey, most reported holding digital collections, and two-thirds of the 30 research libraries included in the survey claimed institutional responsibility for preserving digital materials. Yet 36 of the 54 respondents did not have written policies for digital preservation and only 8 institutions considered their staff as "expert" in digital library activities; 44 ranked their knowledge at the "novice" or "intermediate" levels. [2]

These findings are consistent with a second survey conducted by the Digital Library Federation (DLF) in early 2001. Fourteen out of twenty-one institutions reported having no formal preservation policy, although over half reported having digital preservation responsibilities for commercial data under lease and sixteen of twenty-one for library digital materials. Where preservation is concerned, only the digitized holdings commanded any real attention. Other kinds of digital content (e.g., e-journals, e-prints, university records) were considered most at risk. [3]

Cultural repositories below the research university level reveal even less institutional readiness. A 2002 preservation survey of libraries at leading liberal arts colleges, land grant institutions, and mid-sized universities indicated that only 4 of the 68 respondents (6%) had developed a preservation plan for digital resources. [4]

Of all the preservation challenges facing us, none is more pressing than developing workable solutions to digital preservation. In assessing the 1998 RLG survey, Margaret Hedstrom spoke of the gap between current guidance on digital preservation and institutional capacities to follow through. Certainly, private and government funders have poured money into digital preservation projects. As early as 1991, for example, the National Historical Publications and Records Commission (NHPRC) had adopted a national electronic records research agenda and began funding significant electronic records projects. [5] An invitational conference to assess the program after five years concluded that while it was "technically possible to create and maintain reliable and authentic records in electronic form," implementing effective programs would require "significant changes in organizational and individual behavior." [6] Even after a decade of considerable investment on the part of NHPRC and the archival community, there has not been a significant increase in sustained programs. As a consequence of this record and limited funding, NHPRC is reevaluating its priorities in this area. [7] Why this lag in institutional take up? A real key in assessing digital preservation is to understand the organizational impediments to digital preservation practice. This issue is the focus of our paper.

In part the answer lies in the fact that most of the attention given to digital preservation has focused on technology as both the root of the problem and the basis for the solution. Although undeniably important, this emphasis has had its downsides. A great deal of energy has gone into advocating one technology over another, most notably evidenced in the migration vs. emulation debate. This emphasis has led to a reductionist view where technology is equated with solution, which in turn is deferred until some time in the future when the technology has matured. Even when the technology solution is purportedly at hand— D-Space, for example, is being characterized as a "sustainable solution for institutional digital asset services" [8]— technology is not the solution, only part of it.

The focus on technology has mimicked computational methods that reduce things to on or off status— either you have a solution or you do not. This either/or assessment gives little consideration to the effort required to reach the on stage, to a phased approach for reaching the on stage, or to differences in institutional settings. Nor does it take into account that a partial program at one institution may represent a fully mature program at another. An organization may only ever have to preserve an abbreviated range of formats or may expand its capabilities to all formats as it progresses through the stages. It is not surprising then that organizations have been left uncertain as to how to proceed. Practitioners who have reached more advanced stages of digital preservation development recognize that there is no cookbook solution for organizations, but if an organization cannot imagine how to start, that may explain why so few have done so. The practical consequence for cultural repositories has been to defer development of programmatic efforts. Postponing the development of a program because one cannot create whole cloth a comprehensive one will ensure that vital digital resources will be sacrificed in the interim.

The goal of digital preservation is to maintain the ability to display, retrieve, and use digital material in the face of rapidly changing technological and organizational infrastructures. Unfortunately, there is no single best way to do just that, nor is there agreement on long-term solutions. Even in the short-term, there are very few institutional programs that can serve as models to others, providing practical advice to move forward. Moreover, what programs do exist are not likely to be transferable to a new context. Organizations cannot acquire an out-of-the-box comprehensive digital preservation program— one that is suited to the organizational context in which the program is located, to the materials that are to be preserved, and to the existing technological infrastructure. Librarians and archivists must understand their own institutional requirements and capabilities before they can begin to identify which combination of policies, strategies, and tactics are likely to be most effective in meeting their needs.

Insufficient attention has been paid to the institutional context in which digital preservation programs must be developed. What emphasis exists has been placed on constructing models and frameworks (see below), but these presume a fully mature program and do not address how such programs can be phased in. In this paper, we describe five definable stages that cultural repositories will pass through on their way to developing a fully mature digital preservation program. We illustrate these stages based on our experiences at Cornell University Library (CUL), but we believe that they hold true for other organizational contexts. CUL has been exploring and examining these stages, based upon its own progress and in relation to the community at large. Each of these stages is clearly delineated, characterized by key attributes and organizational responses. Some of the stages may be shortened, and an institution may be further advanced in one aspect over another, but they must all be passed through and in the same sequence. Our premise is that too few institutions are far enough along in these stages to build sustainable digital preservation programs— that technology isn't the greatest inhibitor, organizational readiness is. Understanding the stages and benefiting from the experiences of others may enable organizations to move more quickly through the process once they get started, in much the same manner that countries that industrialized after WWII were able to capitalize on the experience of those who preceded them.

A recent study by Daniel Greenstein and Suzanne E. Thorin examines three stages of digital library growth and the attributes of each stage, utilizing case studies at six institutions. [9] This type of exploration is helpful in assessing the organizational milieu in which digital preservation programs will be developed. We believe, however, that developing comprehensive and effective digital preservation programs does not necessarily coincide with the development of digital libraries and that duration is not a key factor in assessing progress. In fact, digital preservation is one of the last things to be considered as part of digital libraries themselves, representing the capstone rather than the cornerstone of such efforts. Integrating digital resources and services into mainstream library functions does not automatically ensure that the library fulfills basic requirements of digital preservation. The stages suggest benchmarks for measuring development. Self-assessment by the organization is the surest way to determine the stage to which an organization has progressed.

The Five Organizational Stages

The five stages of organizational response to digital preservation are:

  1. Acknowledge: Understanding that digital preservation is a local concern;
  2. Act: Initiating digital preservation projects;
  3. Consolidate: Seguing from projects to programs;
  4. Institutionalize: Incorporating the larger environment; and
  5. Externalize: Embracing inter-institutional collaboration and dependency.

The stages of organizational response can be assessed through such factors as the scope of an organization's digital collections and the resources (personnel, technology, and funding) available to identify, manage, and make them accessible. Key indicators of digital preservation responsiveness characterize each of the five stages and will be addressed below. These are generally grouped in the following three categories: policy and planning, technological infrastructure, and content and use.

Stage 1. Acknowledge: Understanding that digital preservation is a local concern

Although most institutions recognize that digital preservation is a growing concern, too few have concluded that the problem is theirs. Acknowledging the problem usually does not come until an institution has had several years of experience with digital content and services— and a brush with issues surrounding continuing access.

Academic libraries have dramatically increased their experience with and reliance on digital content. The aforementioned DLF survey revealed that 40% of members' costs for digital libraries in 2000 went for commercial content. [10] The big-ticket items were electronic scholarly journals that libraries license rather than own. Although model contracts require content providers to assume responsibility for digital preservation, few institutions appear ready to forego access to the licensed content just because its long-term accessibility might be in question. Even the false sense of security provided by maintaining paper versions of electronic subscriptions is giving way to the economic realities that both may not be affordable. Increasingly, when a choice has to be made, the paper is sacrificed as the access advantages of the digital versions outweigh concerns over longevity. A sense prevails that although the digital preservation challenge is real, it's someone else's responsibility. Until the library feels an "ownership" in digital assets, there is little motivation to worry about their long-term care.

Ownership by itself, however, is insufficient. Most academic institutions have experimented with some type of digitization and all face an increasingly digital record environment. Yet there remains an unexplained optimism that the future will take care of itself. Early use of the technology is often characterized by over-exaggerated claims of its benefits. The impulse to embrace things digital is strong, but too often insufficient thought is given to infrastructure requirements— costs, personnel, systems, and policies— and reality falls short of the promise.

Institutions participating in the 2002 preservation survey ranged along a continuum with many just beginning to appreciate the dimensions of the digital preservation issue to those that were beginning to take tentative steps to address it. Concerns about becoming involved sorted high. Some argued that they were not in the preservation game but in the access game and they relied on larger institutions to shoulder that burden. Some felt disenfranchised because they were not at the table in discussions that have an impact on the long-term care of digital content. Others feared becoming solely responsible for developing solutions, given their very limited resources. [11]

Key Indicators for Stage 1:

Policy and planning: the preservation policy is often non-existent or may be implicit. When it does exist, policies tend to be high-level, as in the organization acknowledges the need to address digital preservation.

Technological infrastructure: may be non-existent or, if it exists, is likely to be heterogeneous (the result of disparate and spontaneous digitization activities) and decentralized (as opposed to distributed); may consist of stored files and ad hoc access mechanisms.

Content and use: the focus may be reactive to specific collections rather than encompassing the potential scope of materials that need to be preserved. Conversely, there may be a sense that all types of digital resources must be included, even if the organization has no need to address some categories of digital objects.

In stage one, institutions move from dependency on digital content to recognizing there is a problem, to acknowledging institutional self-interest and the need to act.

An early champion of digitization, Cornell University Library experienced nearly a decade as a stage one institution. By the late 1980s, Cornell was involved in the planning for two pioneer projects. The first was the Chemical Online Retrieval Experiment (CORE Project), a joint effort of the American Chemical Society, Bellcore, OCLC, Chemical Abstracts Service and Mann Library at Cornell University. CORE was one of the first large-scale electronic journal efforts to use the Internet to deliver articles with graphics and complex typography directly to end-users. [12] The second was a co-development effort between CUL and the Xerox Corporation to design and test a scanning system to digitize library materials. [13] Because the quality of the image achieved by this prototype system at 600 1-bit capture was stunning, CUL tested the equipment on endangered brittle books. The paper copies produced from the digital files were considered equal or superior to the film or paper versions produced via microfilm and photocopy. After a year's worth of experimentation, Cornell staff hailed the process as a breakthrough in preservation reformatting. The final report of this project covered digital capture, the production of paper facsimiles, and network access, but was unfortunately titled the Joint Study in Digital Preservation, inadvertently contributing to a continuing tendency on the part of cultural institutions to equate digitization with digital preservation. These were heady days when both enthusiasm and rhetoric flourished.

Although Cornell used digitization in the name of preservation, it did not immediately commit to the long-term maintenance of the digital files themselves. By arguing that the printed facsimiles produced from the digital files were the end product, Cornell staff eased concerns on the part of senior administrators to ongoing obligations after to the co-development phase. Further, the Cornell/Xerox Project occurred outside mainstream library activities, in what Greenstein and Thorin characterize as a "skunk works," so its success or failure did not directly impact library production operations. [14] In hindsight, if we had assumed full preservation responsibility from the beginning, the project may not have gone forward. It was only with the advent of the Web and the resulting use made of the networked digital files that Cornell came to think of them as institutional assets. It took several more years— and heroic recovery from a failed optical juke box system and obsolete file format that required the support of non-project staff— for Cornell to realize that it had swapped one kind of preservation problem— brittle books— for another and that the preservation issue had not been diminished but increased by the shift. [15]

Stage 2. Act: Initiating digital preservation projects

Acknowledging digital preservation as an organizational concern leads quite naturally to action. The organization need not begin to build a digital archive to demonstrate its commitment. In stage 2, an organization will initiate discrete activities to address the most pressing digital preservation needs.

The motivation to move from stage 1 to stage 2 occurs when an organization perceives the need to take action to preserve its digital assets. Stage 2 activities are project-based, often funded by external or one-time monies, and the work tends to be conducted outside mainstream library functions. Although specifically addressing long-term issues, stage 2 efforts tend to be of limited duration and developed as one-time fixes to an ongoing problem. They also tend to be exploratory and educational in nature, often motivated in response to a very specific concern or a funding opportunity, and are limited in scope. For example, projects may focus on a specific type of object or set of materials, on a particular lifecycle function (e.g., storage or access), on a particular management technique (e.g., mass migration), or on a set of tools to support those activities. If more than one project is undertaken, they are frequently seen as discrete efforts. This phase is usually the shortest in duration, as the inadequacy of this approach becomes apparent when projects end. Nonetheless, this is a very important phase through which institutions must pass, acquiring necessary skills and experience in applying concepts and bridging the gap between digital preservation research and actual implementations. It is also a sobering phase, during which time the inadequacies of previous practices become most apparent.

Key Indicators for Stage 2:

Policy and planning: the preservation policy may remain implicit in stage 2 or may be expressed in general terms, though evidence that the organization is committing to digital preservation accumulates.

Technological infrastructure: the organization may stipulate a set of technical requirements that apply to each project, or, more likely, will devise technical requirements that are project-specific and reactive. Digital content may be dispersed across multiple servers in multiple locations or be co-located using available equipment, depending on the size of the projects, the level of technology support obtained for the project, and the nature of technology support within the organization. Cross-project technology planning is less likely to occur at stage 2 than at later stages.

Content and use: efforts may go deep into addressing the range of requirements for selected types of digital materials or collections, or address some or all collections in basic ways.

By the late 1990s, Cornell University Library had assembled an impressive mass of digitized materials, but it was not managing them in a manner to ensure their long-term viability. The library entered stage 2 in 1997, after responding in an ad hoc manner to a succession of problems associated with making its digital image files accessible on a steady basis. Addressing these problems identified the need for focused projects. In its typical fashion, staff turned an obstacle into an opportunity, by writing a grant to develop a digital preservation strategy for its digital image collections, which was funded by the Institute for Museum and Library Services (IMLS) in 1998.

The project goals were to:

  • Investigate emerging file formats for long-term utility;
  • Develop and apply requisite metadata to support long-term management;
  • Determine functional requirements for storage; and
  • Place the master files at the heart of the CUL digital library.

One of the first tasks was to develop an inventory of all the digital imaging collections that had been created over the past decade. This process revealed inconsistencies in practice from one effort to the next, and underscored the importance of adequate documentation while revealing the problems and expense associated with recreating it after the fact. The project began as one to address Cornell's retrospective digital holdings; by project's end, the staff recommended that guidelines and mainstreamed responsibility be implemented for prospective initiatives as well.

A second major aspect of stage 2 activities at CUL was the shift in research focus from digital imaging to digital preservation. In 1999, CUL partnered with Cornell's Computer Science Department in a four-year effort to investigate and develop the policies and mechanisms needed for information integrity for Web-accessible resources. The National Science Foundation (NSF) funded Project PRISM as one of the Digital Library Initiative 2 (DLI-2) projects. Although the focus was on fundamental research rather than immediate practical application, the project was significant in that it addressed the preservation of digital resources that cultural repositories have come to depend on but which they neither own nor control. [16] The ultimate goal of this research is the development of a risk management tool kit that cultural repositories and other entities can use to monitor and develop retention policies for online resources.

Each of these projects has helped shape CUL's appreciation for the range of issues confronting cultural repositories grappling with digital preservation. They also underscored the inadequacy of responses that are time-bound by project funding and essentially live outside the organizational infrastructure.

Stage 3. Consolidate: Seguing from projects to programs

After some experience with parallel or sequential digital preservation projects, the organization generally concludes that the innate project lifecycle— most notably the need for an injection of funding to get started and an inevitable end of some kind— is not compatible with long-term planning and does not lead to the establishment of a program. Digital preservation activities become ongoing at this stage and increasingly coordinated, but are not yet truly integrated. Reaching stage 3 demonstrates some level of organizational commitment to the development of a digital preservation program, but each organization will eventually need to decide if it will build, buy, or outsource its digital archives. While the organization's focus at this stage is on creating safe places for its digital resources, e.g., well-managed repositories and file storage systems, it may already be looking ahead to more systematic digital archiving solutions.

The motivation for moving from stage 2 to 3 is a realization that project-based funding is inadequate and unstable, and that a reliable, sustainable source of funding is needed to maximize the benefits of the work, to build on developments from projects, and to undertake long-term initiatives. This realization may occur as a result of a real or potential gap between projects. It often coincides with an acknowledgement that maintaining disparate solutions is not sustainable and must give way to a coordinated and integrated approach.

This stage is also characterized by a realization that something can be done now, even as we wait for the big picture to emerge in full detail. The primary focus is the development of realizable and effective short-term digital preservation programs based on workable solutions. The organization moves beyond theoretical constructs to implementation strategies grounded in practical solutions, emphasizing short-term programs for managing risk as research and development continue in assessing long-term solutions.

Establishing the funding stream will increasingly be enabled from this stage on through the commitment of resources and the reallocation of funding from traditional library operations. The key elements are the recognition that digital preservation requires permanent funding, and that with the creation of new digital resources there is an associated preservation surcharge. These costs must be factored into the decision making process. This shift in thinking may also lead to subsidizing the cost of creating good digital masters. A program mentality understands that investing in well-formed digital objects at creation assures that downstream those objects are easier to repurpose; project requirements generally cannot afford the time or cost of that approach. The underlying principle for programs is that digital preservation is not one-time investment, but ongoing commitment. Organizations cannot "skip a year" in the same way they can when acquiring or digitizing material.

Key Indicators for Stage 3:

Policy and planning: the organization makes explicit its commitment to digital preservation by developing basic, essential policies and by understanding the value of policies as part of the solution; the need to address access issues may drive the development of policies that enable preservation.

Technological infrastructure: at minimum, there is some assessment of the current technology investment, and there may be more systematic efforts to plan for or address the organization's requisite technological infrastructure. Developments at this stage may depend on the extent to which projects have been centralized or dispersed; the extent to which the range of targeted digital resources are homogenous or heterogeneous; and the scope, size, and complexity of the organization's digital assets. There is a greater tendency to plan for growth.

Content and use: as with technological infrastructure, there is a tendency to assess the preservation-readiness of current collections, and to define ongoing requirements for building and maintaining collections and resources.

The theme of consolidation of resources and energy comes across very strongly in the key indicators for this stage. There is a broadening of the digital preservation scope from the project level towards an institutional one, a process that eventually leads to the identification of redundancies, the reduction of inefficiencies, and the establishment of priorities.

At Cornell, some representative examples of this stage of development include a commitment to several ongoing initiatives for which each repository's users and partners have an expectation of continuing access:

  • USDA Economics and Statistics System (http://usda.mannlib.cornell.edu/): The Albert R. Mann Library at Cornell [17] developed a working relationship with the United States Department of Agriculture to provide ongoing access to this important resource. The explicit focus of the definition of roles and responsibilities for the program is on who will provide support for using the materials. There is an implicit commitment to continued access. The Web site projects a seamless interface to heterogeneous materials with variant preservation needs. The content represents a collection of resources from multiple creating sources that are stored in a range of formats. Specific files may ultimately be preserved by more than one digital preservation program, but Cornell has assumed responsibility for ensuring ongoing access to this incarnation of the materials.
  • ArXiv (http://cul.arxiv.org/): This preprint repository represents an early and innovative technology development that became a sustainable resource. This repository relocated to Cornell in 2001. Ongoing access is enabled by the joint support of the Library and the Computer Science Department. [18] Adapting an existing implementation to meet digital preservation requirements is generally difficult, but more so with one that has such a familiar interface to its users, has such a high profile within a protective domain, and is growing at such a rapid rate. There will very likely be a dramatic and potentially jarring transition at some point to align this digital resource to CUL's digital preservation program.
  • Project Euclid (http://projecteuclid.org/Dienst/UI/1.0/Home): [19] This is a partnership of independent publishers of journals for the field of theoretical and applied mathematics and statistics to address the needs of low-cost independent and society journals, a vital source for this field. The project's site makes its commitment explicit: "Full-text searching, reference linking, interoperability through the Open Archives Initiative, and long-term retention of data are all important components of the project." This repository, like the other examples, will have to be supplemented to be compliant with digital preservation requirements, a transition requiring potentially extensive resources that will exceed the cost-recovery mechanisms for the project.

These initiatives demonstrate an explicit commitment to providing long-term access to digital resources on a programmatic level. Initiatives such as these are increasingly being coordinated across Cornell University Library, but they are not integrated and at present have no comprehensive digital preservation program to sustain them. Attempting to maintain disparate depositories can quickly deplete an organization's already limited resources for technical development, a situation with which Cornell has struggled. Packaging the digital preservation management of these and other resources offers potential economies that provide a push towards stage 4.

Stage 4. Institutionalize: Incorporating the larger environment

Bringing all of the pieces together across the institution allows for the best use of inevitably scarce human, technical, and financial resources and is the final internal step for the organization. Institutionalizing policies, procedures, and techniques creates a robust program that can be rationally managed and scaled, as needs demand.

The motivation for moving from stage 3 to 4 is the desire to maximize the effectiveness of resources through organization-wide efforts. The shift may be driven by the need to realize economies of scale through central or common, as opposed to individual, digital depository implementations. Organizations may linger in stage 3 until a critical mass builds and the organization feels the need to move to the next stage, or the assessment that is inherent in consolidation may spur the organization on to develop comprehensive, enterprise-wide digital preservation programs. A driver for moving to stage 4 may be the increasingly heavy burden of managing large, heterogeneous collections that are hosted in multiple points within the institution and that require complex and costly technical infrastructures.

Recently, the cumulative work of the digital preservation community at large has produced a set of foundation documents (Table 1). Taken together, this body of work provides a comprehensive overview of the parts and levels of a digital preservation program that can be invaluable to an organization at this stage of development. An organization will undertake activities at each level of the hierarchy at different times in its development. Associating the appropriate frameworks and standards with each level maximizes use of community developments and clarifies the development process for the organization.

Table 1. Foundation Documents Hierarchy

Organizational need

Foundation document example (latest version)

Comprehensive organizational requirements

Trusted Digital Repositories: Attributes and Responsibilities, RLG-OCLC, 2002 http://www.rlg.org/longterm/repositories.pdf

Digital archive requirements and design

Open Archival Information System (OAIS) Reference Model— Blue Book, CCSDS, 2002 http://www.classic.ccsds.org/documents/pdf/CCSDS-650.0-B-1.pdf

Digital archive deposit requirements

Producer-Archive Interface Methodology Abstract Standard, CCSDS, 2002 http://bill.cacr.caltech.edu/cfdocs/usvo-pubs/files/CCSDS_651_W2.pdf

Object-level digital preservation requirements

Preservation Metadata and the OAIS Information Model, OCLC/RLG PMWG, 2002 http://www.oclc.org/research/pmwg/

File format issues

Draft Standard— Data Dictionary— Technical Metadata for Digital Still Images, NISO, 2002 http://www.niso.org/standards/resources/Z39_87_trial_use.pdf

In establishing an institution-wide digital preservation program, the organization can assess its efforts by mapping its achievements to each of these documents. The set of documents can be used as an organizational metric for measuring and planning for progress. Cornell is in the process of moving from stage 3 to stage 4, and has been actively using these documents to organize and plan its digital preservation program. The Trusted Digital Repositories (TDR) and OAIS examples below are the result of that work.

Trusted Digital Repositories

Perhaps the most immediately valuable contribution of the Trusted Digital Repository report is the framework of TDR attributes. The six attributes of the TDR framework are: administrative responsibility, organizational viability, financial sustainability, technological and procedural suitability, system security, and procedural accountability. The report defines the characteristics of each attribute that together address core legal, economic, technical, and other organizational issues, and break what is often presented as the monolithic digital preservation problem into manageable parts. A notable feature is that technology is not the central focus or first consideration in the framework. Starting with technology rather than understanding technological requirements as an important consideration within a specific context has been a major impediment to organizations starting or developing digital preservation programs.

In mapping CUL's list of key digital preservation requirements, which were identified during phase 1 of our depository development, to the Trusted Digital Repositories framework, [20] we discovered that:

  • most of the defining characteristics of the RLG-OCLC framework attributes implicitly or explicitly reference other framework attributes and so identify interdependencies; and
  • the attributes of the RLG-OCLC framework are neither hierarchical (e.g., procedural accountability cuts across/underlies all of the components), nor equal in weight (e.g., system security is not less important than financial sustainability, but the latter is a general requirement for all depositories within an organization and should be in place prior to the development of the former, while the former is addressed during development and is appropriate to each depository implementation).

The model (Figure 1) that we developed based upon our mapping demonstrates the interdependencies between the attributes of the framework. Our addition of the digital archives border to the model allows for the possibility that the organization may maintain more than one digital archive, or that multiple organizations may collaboratively maintain one digital archive. The model and the attributes allow the organization to organize its activities and identify its available expertise to be grouped appropriately to address each requisite attribute.

Figure 1: Trusted Digital Repositories (TDR) Framework Model
Figure 1

Trusted Digital Repositories (TDR) Framework Model

The OAIS Reference Model

The TDR report makes clear from its opening line that first and foremost the repository must be OAIS-compliant. Open Archival Information System (OAIS), the dominant model for designing a digital archive implementation, has become familiar to most digital preservation practitioners. "An OAIS is an archive, consisting of an organization of people and systems, that has accepted responsibility to preserve information and make it available for a Designated Community." [21] The OAIS reference model is an ISO standard that has been widely embraced by organizations that are working on digital preservation programs and digital archive implementations. The model documents the entities and functions within a digital archive. The OAIS model does not, and is not intended to, address any external relations associated with a digital archive.

Figure 2: The OAIS Reference Model [from CCSDS-650.0-B-1.pdf]
Figure 2

The OAIS Reference Model [from CCSDS-650.0-B-1.pdf]

Each organization can and should at some point map its digital preservation development activities to the OAIS model (Figure 2). A gap analysis enables the organization to prioritize its needs and develop or incorporate missing pieces as needed. Some organizations may develop a digital archive as one cohesive initiative, but more often, development is done in stages based upon funding and priorities. The OAIS model enables modular development. Regardless of whether the organization builds, buys, or outsources its digital archive, there are fundamental questions about the creation (AIP), deposit (ingest), archival storage (AIP), and use (access, DIP) of the digital objects the organization will preserve that need to be addressed. The OAIS is a useful tool for doing so.

TDR mapped to OAIS

The TDR primarily focuses on the administrative context for building and managing a digital repository; the OAIS focuses on functions and processes. In relying on these two foundation documents, it was important to understand how they relate one to another, so we mapped the OAIS model to the TDR framework. This creates a more comprehensive model for digital preservation planning and development (Figure 3) that sets each digital archive within its full organizational context, and also allows the organization to consider organizational, legal, economic, technical, and implementation issues individually or in relevant combinations.

Figure 3: Joining the TDR and OAIS Models completes the picture
Figure 3

Joining the TDR and OAIS Models completes the picture

Institutionalizing the program is characterized by systematic monitoring of developments in the wider world, continual mapping of organizational developments to emerging models and standards, and proactive coordination of efforts across the organization. There is increasing evidence within the organization during this stage that the program is an integral part of operations, not an ephemeral or separate extension to normal operations. Institutionalization encompasses an explicit acceptance by the organization of responsibility for and commitment to a digital preservation program that is comprehensive for the organization and incorporates institution-wide, ongoing planning to establish a program; the development of fundamental policies and guidelines; and the allocation of core funding to the program.

Key Indicators for Stage 4:

Policy and planning: organization-wide entities that coordinate, authorize, and mandate digital preservation programs may be established, or some equivalent mechanism that allows for consistent and systematic management rather than event-based responses; establishing a comprehensive policy framework provides the focus for planning efforts. The framework as outlined and populated will address, in some way, all six components identified in the RLG-OCLC report: Trusted Digital Repositories: Attributes and Responsibilities.

Technological infrastructure: true technology planning and management generally begins at this stage, characterized by responding to rather than reacting to, and anticipating needs; the infrastructure may be distributed rather than centralized, but investments in infrastructure are more likely to be based upon requirements that are defined and approved at a high level of management, and implemented across the organization.

Content and use: rather than presuming that all digital materials will be preserved as part of the organization's commitment to digital preservation, the implications of that commitment are more fully understood and acceptance criteria is established and utilized to determine the scope of collections that will be actively preserved by the organization. Services to capture, store, maintain, and provide access to digital resources become integral to the organization and subject to relevant monitoring and measurements, and expectations that these services will be reliable and consistent become evident.

The majority of digital preservation activity at Cornell is focused on moving from stage 3 to stage 4. One prominent manifestation of stage 4 involves the implementation of a centralized depository as part of the ten goals and objectives for CUL for the coming year. [22] At this stage, CUL has mainstreamed development of a common depository, building on the foundation laid during the IMLS-funded project to develop a preservation strategy for Cornell's digital image collections (a stage 2 effort.) The recent establishment of the Common Depository System (CDS) initiative and the appointment of the first Digital Preservation Officer (DPO), [23] a library-wide position that is devoted to the development and promulgation of the requisite policy framework, reflect the establishment of a focal point for building a comprehensive digital preservation program.

The CDS is a collaborative initiative that brings together the DPO; Digital Libraries and Information Technology (D-LIT), the unit in CUL that has central responsibility for the technology infrastructure; metadata specialists from Central Technical Services (CTS) and elsewhere; and the widespread owners and inheritors of the digital collections at Cornell. Shifting from the centralized to the common depository is more than just a title change; it signals the broadening of the scope of the initiative to include not just image collections, but all types of digital objects that CUL may elect to preserve. It also signals the realization that while the implementation of depositories may not be physically centralized, there is a need for common practices and requirements to realize economies of scale and to maximize investments in digital preservation. [24] CDS embraces the set of foundation documents previously itemized.

One important step in planning for the CDS was mapping CUL efforts to date to the OAIS model (Figure 4). This led to the realization that within separate projects and developments we had developed— or at minimum established a placeholder for development work to build upon— the majority of the OAIS components. By combining efforts, we can make greater progress.

Figure 4: CUL activities mapped to the OAIS model.
Figure 4

CUL activities mapped to the OAIS model.

To further the development of our program, we are continually scanning the digital preservation community for ongoing developments and relevant practice. Through editorship of RLG DigiNews and our combined professional involvement, we make an effort to be part of the broader community, both contributing to and learning from it.

Stage 5. Externalize: Embracing inter-institutional collaboration and dependencies

Inter-institutional collaboration at stage 5 may take the form of a consortium to build a digital archive, a federation of individual digital archives, or a virtual organization that comes together to manage one or more digital archives. Economies of scale, shared responsibility for infrastructure upkeep, and pooled expertise are all possibilities. At this stage, the organization moves from discrete safe places, as established in stage 3 and integrated at the institutional level in stage 4, to integrated safe places that bring multiple organizations, partners, and digital archive implementations together.

Participation in subject-based, thematic, or domain-oriented depositories that cut across institutional lines may provide an impetus for moving from stage 4 to 5.

Such a union of resources creates the potential and the opportunity for a layer of services on top of the repositories that will be available to all of the members and may realize significant and unexpected benefits for the participants. In the ideal, this kind of success would both ensure the retention of existing partners and attract additional participants. The theme at this stage is that the whole can be greater than the sum of its parts. The intersection of this level of collaboration will eventually produce a matrix (Figure 5), an interlocking collective of fully-implemented digital preservation programs and digital archive implementations. [25]

Figure 5: Digital Archive Repository in Stage 5
Figure 5

Digital Archive Repository in Stage 5

One aspect of the matrix is the potential for multiple instances of objects and collections to be managed in one or more archival contexts, e.g., subject-based, publisher-based, author-based, and domain-based repositories. The matrix may not be fully achievable today, but there are steps that will move us in that direction. The World Wide Web is itself a matrix. [26] The Web has had a huge impact on the way in which organizations operate and interact, on the formats of digital materials, and on information services and delivery mechanisms. All of these areas influence the development of digital preservation programs. The Web is the essential element that makes stage 5 collaboration possible, even necessary.

Participating organizations must be prepared for the implications of such entwined collaborations and the resulting interdependencies for initiatives at this level to be successful. It is particularly important that participating organizations have not only considered, but also explicitly and tangibly addressed, all of the attributes of a trusted digital depository. This will make the necessary integration of the necessary roles, responsibilities, and component parts across institutions more easily achievable.

There are increasing examples of activity at this stage, e.g., the Nordic Archive, the intended Web archiving collaboration between the Internet Archive and national libraries in the U.S. and Europe; the LOCKSS Project; and the California Digital Library. [27] Though promising, these digital preservation collaboratives have yet to be fully realized to meet all of the requirements specified by the TDR framework and the OAIS model. Beyond those requirements are considerations associated with the "glue" between such initiatives, envisioned as services for such things as security alerts, repositories of aging software, migration programs for upgrading obsolete formats, and the like.

Organizations at this stage will also realize that creating safe places for digital resources may not always be possible or desirable, or at least may not always be the first line of defense. Web resources, for example, are a new, rapidly evolving, and largely uncharted digital preservation area. Most Web preservation initiatives presume that the goal is to capture and preserve targeted resources providing an all or nothing approach. In practice, the array of Web-based digital assets that an organization is interested in may range from resources that are owned or controlled by that organization to those for which the organization has no authority to capture or preserve; from resources that are essential to new and existing collecting areas to those that are only loosely tied to the organization's collection scope. Cornell is continuing its Prism research effort, begun as a stage 2 initiative, to build a risk management approach for Web resources that will enable an organization to specify the value of targeted resources and the level of control over those resources the organization can or will exert. The tools to support the program will enable the organization to tailor its Web monitoring program to suit its evolving requirements. [28]

Key Indicators for Stage 5:

Policy and planning: inter-institutional planning requires a fundamental level of cooperation that presumes a solid base within individual organizations upon which to build. The group of organizations coming together at this stage form a virtual organization that must be capable of cohesive management; responsibilities may be centralized or distributed, replicated in each member organization or provided through specialized assignments and modular development by each member, and managed in a micro or macro style, but the roles and responsibilities of the members must be explicit, accepted, current, feasible, effective, and coherent for such an amalgamation to be successful. Collaborative work may be undertaken at other stages, but becomes an inherent feature of the program at this stage and is presumed in resource planning as development progresses.

Technological infrastructure: like policy and planning, the requisite technology may reside at one member organization, be replicated at each, or have modules that come together to form the whole; responsibilities, performance, and maintenance may be major factors for the group to address.

Content and use: collection development in a shared environment may be an activity at the group level (e.g., a common set of collections that is vetted and adheres to an agreed upon set of criteria), at the individual member level (e.g., individually created or selected collections that are presented as a product of the member organization, but managed in common), or at some point along the continuum from individual to group. The universal requirement is that the roles, responsibilities, and rules for the resources of the collaborative to be committed to the preservation of specified digital resources must be known and accepted.

Cornell is making the transition from stage 3 to stage 4, but there are already indicators of the progression to stage 5:

  • Our work on virtual remote control for Web resources will contribute to the development of stage 5 initiatives;
  • The Political Communications Web Archiving project, in which we are participating, is exploring the requirements and best implementation for a collaborative Web archiving consortium for political communications; [29]
  • The range of digital math library projects at Cornell is part of increasingly global efforts to meet the information requirements of the math domain. In particular, EMANI is an example of a potential stage 5 collaboration.  [30]

Conclusion

Organizational response and readiness will be key to addressing the digital preservation conundrum. By understanding that organizations must pass through various stages on their way to implementing a fully mature program, and that some effort is better than none, institutions can begin to develop organizational responses that are appropriate to their various stages. Until an institution becomes aware of the value of its digital assets and the risks attending them, there is little incentive to make digital preservation a priority (stage 1). Until an institution experiments with projects designed to address digital preservation, there is insufficient appreciation for the limits of one-time fixes (stage 2). Until an institution amasses a varied array of formats or assumes archival responsibility for several different collections (stage 3), there's no need to consolidate programs (stage 4). And until an institution appreciates that its interest in protecting digital assets extends beyond what it individually can achieve, it will not embrace inter-institutional dependencies (stage 5). The value of such self-assessment is that it enables an institution to judge its response to the digital preservation challenge based on its own particular circumstances and needs and to understand that the organizational context may be the key determinant to success, not the state of technology.

This paper presents our preliminary attempt to identify and describe digital preservation stages from an organizational perspective. The Cornell examples are provided as illustrative cases. The acceptance and community-wide use of digital preservation stages such as these would be very helpful for inter-institutional communication as digital preservation programs emerge, as research and practice iteratively devise appropriate preservation strategies, and as the digital preservation domain becomes firmly established with known parameters, terminology, and an extensive base of practitioners. At present in our domain, core terminology is loosely defined and too dependent on the context of its use to support clear communications that might hasten community-wide development. Practitioners generally lack the common understanding, experience, and confidence to discuss, with any specificity or success, the exact nature and significance of digital preservation developments.

Unlike most other digital preservation models that are superimposed upon the targeted environment or landscape by the model creator, we expect an organization's development stage to be determined through self-assessment by each organization. That process may require external review by other practitioners, organizations, or relevant experts. The anticipated and hoped for result is that an organization could state: "we have a stage 3 digital preservation program, and are undertaking these activities..." Because these statements would be based upon a common understanding of the stages, interested organizations and practitioners could immediately engage in a productive discussion with that organization. Organizations then can share meaningful information about their organizational responses.

Organizational stages for digital preservation have the potential to provide a more effective communication tool, to define a metric for quantifying progress towards a comprehensive digital preservation program, and to establish benchmarks for setting organizational goals. There are other models that present useful and informative representations of and insights into digital objects, digital archives, or preservation approach perspectives. Stages that address the organizational context for digital preservation programs outline a setting in which these key perspectives can be understood and incorporated. The stages would have to evolve and be further refined over time, but, as a starting point, the stages are less daunting than fruitlessly groping for the on switch. Stages may offer organizations the possibility of a staircase rather than an unassailable wall, but they cannot diminish the effort that is currently required by the organization to climb the stairs.

Endnotes

1. Peter Lyman, Hal R. Varian, et.al, How Much Information? project, http://www.sims.berkeley.edu/research/projects/how-much-info/.

2. Margaret Hedstrom and Sheon Montgomery, "Digital Preservation Needs and Requirements in RLG Member Institutions," 1998, http://www.rlg.org/preserv/digpres.html.

3. D Greenstein, S Thorin, D Mckinney, "Draft report of a meeting held on 10 April in Washington DC to discuss preliminary results of a survey issued by the DLF to its members, 23 April 2001," http://www.diglib.org/roles/prelim.htm.

4. Anne R. Kenney and Deirdre C. Stam, The State of Preservation Programs in American College and Research Libraries: Building a Common Understanding and Action Agenda, Council on Library and Information Resources, December 2002, pg. 33.

5. National Historical Publications and Records Commission, Research Issues in Electronic Records: Report of the Working Meeting, 1991, http://www.archives.gov/grants/electronic_records/research_issues_report.html

6. National Historical Publications and Records Commission, Electronic Records Research and Development: Final Report of the 1996 Conference held at the University of Michigan, Ann Arbor, June 28-29, 1996, 1997, http://www.si.umich.edu/e-recs/

7. Minnesota Historical Society was funded by NHPRC to assess progress toward fulfilling the electronic records research agenda, see http://www.mnhs.org/preserve/records/eragenda.html.

8. Dspace Internal Reference Specification—Technology & Architecture, Version 2002-03-01.

9. Daniel Greenstein and Suzanne E. Thorin, The Digital Library: A Biography, Council on Library and Information Resources, 2nd edition, December 2002, http://www.clir.org/pubs/abstract/pub109abst.html.

10. D Greenstein, S Thorin, D Mckinney, "Draft report of a meeting held on 10 April in Washington DC to discuss preliminary results of a survey issued by the DLF to its members, 23 April 2001," http://www.diglib.org/roles/prelim.htm.

11. Kenney and Stam, Op. cit. p. 9

12. Richard Entlich, CORE Project Final Report, http://www.dcs.gla.ac.uk/idom/irlist/new/1997/IR-L_Digest,_Vol.XIV,_No.27,_Issue_365/CORE_Project_Final_Report.html.

13. Anne R. Kenney and Lynne K. Personious, The Cornell / Xerox / Commission on Preservation and Access Joint Study in Digital Preservation Report: Phase I (January 1990-December 199l) Digital Capture, Paper Facsimiles, and Network Access, Commission on Preservation and Access, 1992, http://www.clir.org/pubs/reports/joint/index.html.

14. Greenstein and Thorin, op. cit. p. 6

15. Anne R. Kenney, Oya Y. Rieger, et.al., Preserving Cornell's Digital Image Collections: Implementing an Archival Strategy, http://www.library.cornell.edu/imls/IMLS-CULfinalreport2.pdf.

16. Anne R. Kenney, Nancy McGovern, et.al., "Preservation Risk Management for Web Resources: Virtual Remote Control in Cornell's Project Prism," D-Lib Magazine, January 2002, http://www.dlib.org/dlib/january02/kenney/01kenney.html.

17. This CUL unit library supports the College of Agriculture and Life Sciences and the College of Human Ecology.

18. There are co-sponsors for individual subject areas of the ArXiv that are identified where relevant.

19. The Project Euclid repository is based upon Dienst, a protocol and implementation for distributed digital library servers that was developed by the Computer Science Department at Cornell University. A number of implementations at Cornell use Dienst, though the Digital Library Research Group of Computing and Information Science has moved away from and beyond this protocol in its development of Fedora (http://www.fedora.info/) and other initiatives. This distance between research and practice is not uncommon as organizations develop systems to support digital libraries and digital preservation programs. Research can and should continue to look ahead, while implementation must identify and prepare for feasible steps forward.

20. See the Cornell document: Mapping the RLG/OCLC Attributes of a Trusted Repository to CDI key issues for more information: http://www.library.cornell.edu/iris/dpo/cd2-mapping6-steps12.pdf.

21. Reference Model for an Open Archival Information System (OAIS), CCDS 650.0-B-1, Blue Book, January 2002, page 1-1.

22. See Goal II at: http://www.library.cornell.edu/Admin/goals/index.html.

23. See the CDS portion of the DPO Web site at: http://www.library.cornell.edu/iris/dpo/cds.html.

24. Cornell has a very distributed organizational model, and all depositories are not owned and built by CUL; therefore, a centralized depository is not a feasible model, but promulgating common requirements and practices will enable a cohesive program.

25. The Integrated Digital Preservation Matrix model emerged during our development of a subject-based digital archive (SBDA) model. We consider the matrix to be a logical extension of inter-institutional collaboration from an organizational context perspective.

26. See, for example, the discussion of the matrix-effect of the Web at the Matrix site: http://www.mids.org/; and the range of Web-enabled GRID computing initiatives at the National Archives and Records Administration (NARA) and elsewhere.

27. See the Nordic Archive Web site: http://nwa.nb.no/; the project information that will be posted at the Internet Archive Web site: http://www.archive.org; the LOCKSS Project Web site, http://lockss.stanford.edu/, and the California Digital Library at: http://www.cdlib.org/.

28. See the emerging results of the Virtual Remote Control project at: http://www.library.cornell.edu/iris/research/prism/index.html.

29. See the project description at: http://www.library.cornell.edu/iris/research/WebPolCom.pdf.

30. See a list of some examples at: http://www.math.cornell.edu/~library/digitalization.html; and the EMANI site at: http://www.emani.org/.