This paper was refereed by the Journal of Electronic Publishing's peer reviewers.

Archiving has become a major problem as more and more information is presented in electronic form. This paper does not propose a comprehensive solution to archiving in a period of rapid technological change, nor will it cast information technology in a negative light. Rather, we seek to explore a metaphoric conceptualization of the archive as a living ecosystem, where information and its delivery systems are recognized as dynamic, highly changeable, and inhabited by humans. If we want to keep data alive, strategies involving all players in the ecosystem — publishers, librarians, archivists, information consumers, and authors — are vital.

As Diane Kovacs puts it, "How the advent and increasing presence of e-publications will impact the people who will read them may ultimately be of more importance than what we will do with the machines, the storage media or the delivery mechanism."[1]

Documents prepared without carefully considering how people access or read information inevitably end up as digital landfill: either inaccessible or, if inadequately networked, forgotten. If electronic data is to survive, it must perpetually "migrate"[2] — that is, transfer from one system to the next before the first system becomes obsolete.

Imagine a researcher compiling a longitudinal study. The researcher begins her study in 1988, using an IBM 286 computer. The hard drive soon becomes choked with data. The data are transferred to 5.25-inch diskettes and they are put in a box marked "1988." Within a few years, with periodic system upgrades, the researcher begins backing up data on 3.5-inch diskettes. Then, in the late-1990s, the researcher upgrades to an iMac, a computer with only a CD drive installed. While there is still hope for the 3.5-inch diskettes (many computers take both CDs and 3.5-inch diskettes, the researcher is alarmed to realize that she cannot find a drive that will read the archived 5.25-inch diskettes. The outcome: the data from the early years of the longitudinal study are effectively lost, rendered irretrievable by the relative rapidity of computer developments. In the current technological environment, all electronic data must be migrated every few years if it is to survive.

For libraries, the costs and challenges of migration are great. While electronic systems have brought inestimable advantages to the library, including the online catalogue and access to remote databases, the tasks of libraries have expanded enormously. Librarians must now stay on top of technology as well as information, and must maintain archives and provide continuous access when operating platforms and systems change every few years. Liza Chan has likened a library's struggle to select the best delivery and storage system for electronic media to "a blindfolded person shooting at a moving target."[3]

Suddenly, books seem very appealing in their simplicity. As Coyle notes, "print is a marvellous storage medium; it is easily handled and requires no additional equipment."[4] The only delivery system required is the ability to read. But in the electronic world, the two extra tiers of interpretation — software to access data and hardware to deliver it — create a huge number of potential problems before we are able simply to read something.

Changing Roles in the World of Academic Publishing

It is not only libraries and researchers that face the challenges of a rapidly changing information-technology environment. Academic publishers wishing to go electronic must provide a new set of services and perhaps even explore new economic models to succeed. The traditional players in this field — loosely defined as producers (publishers and authors) and consumers (libraries and readers) — are shifting roles rapidly. Now, researchers can be their own publishers, the publisher is an archivist of electronic journals, and the librarian is an information speculator, selecting what to purchase and what to save amongst an increasing array of information types. In this shift of roles, fundamental considerations arise, such as who holds copyright, who finances, and who determines when a document is completely "final." Roles are not only changing rapidly, they are changing unevenly. As early as 1994, 28 percent of American libraries relied totally on publishers for the archiving of electronic journals.[5] When publishers hold the right to withdraw materials from the public domain, charge access fees, and, more disturbingly, easily alter materials at any time, their suitability to act as archivists must be questioned. As Scott asked, "For how long will [we] be content to assume that the publisher will always have a well-maintained backfile or archive from which to serve their customers?"[6]

Nonetheless, the substantive structure of academic publishing remains largely unchanged. Despite the exponential growth of e-journals — from 417 peer-reviewed electronic journals in 1996 to 1,049 in 1999[7] — it is not necessarily any easier to be published, nor is the rate of publication necessarily any swifter. Even in the refereed e-journal, the pace of publication remains stoppered by the peer review process. In addition, electronic journals hold limited appeal for some because their legitimacy, as Tenopir notes delicately, "is questioned by tenure committees in academic institutions."[8] Thus, with all the exciting possibilities of online publishing, the researcher may be in the same position as before, waiting for months or years until the article has, first, finished its peer review, and second, secured a berth in an upcoming edition. In other words, despite the promise of immediacy and currency in the electronic world (and immediacy may be the experience of the Internet user), academics seeking to publish online, while maintaining the credibility of peer review, may not have their work published any faster. As such, readers may not gain any of the "currency benefit" possible, for instance, in online news reporting.

In this collision of needs — that is, the researcher's need to publish, and the publisher's need for profit — the potential benefits of electronic journals are subsumed in a quagmire of financial, reputational and temporal complications. In many ways, little has changed in the staged move to electronic academic journals.

Temporary Archiving Measures or an Ecological Approach?

The archive, however, is in flux. Much of this has to do with the pace of technological change, and the perceived locus of control. Neavill and Sheblé, for instance, note the instability in the relationship between publishers and libraries: "Libraries cannot develop policies on the basis of a stabilized publishing structure. Any plans for access and archiving will have to be temporary."[9]

The concept of "temporary" measures rests on an assumption that one day we will all catch up with technology. This assumption underlies much of our technology purchasing, that if we buy the "right" computer or software at the "right" time, we will finally be state-of-the-art. The reality is that Microsoft and others will continue to change operating platforms, hardware, and software every year. Manufacturers operate within a structure of premeditated obsolescence, of staggered improvement and alteration, ensuring that there will be no "end-state" program upon which we can all settle.

Because electronic media, software applications, and computer hardware will all continue to change at a rapid rate, and because policies must be developed to address that reality, the archive must change. The very notion of a permanent or fixed archive may have to give way to an ecological preservation system that (paradoxically perhaps) is in a state of constant change.

Data grows, lives, and dies, as do delivery systems. As never before, the task of keeping data alive requires frequent adaptations to and perpetual evolution of the archival system. To keep pace with technological flux, an ongoing process of selection — of media platforms, of preservation structures, of migratory patterns — is necessary to avoid data extinction. In these various ways, there is an ecological force at work. In fact, there are several ecological themes already in play in electronic archives. These include the encouragement of symbiotic relationships, the use of anti-extinction preservation and migratory schemes, and the deliberate adoption of cloning/breeding campaigns.

Symbiotic Archiving

In a recent article, Berger described the painstaking process of archiving the journal Canadian Architect and Builder, published from 1880 to 1908. The paper in the journal was deteriorating, yet the journal was sufficiently in demand to prompt several libraries to start a major archiving project. The project was undertaken in two stages. Firstly, the paper editions were preserved by traditional conservation techniques. Secondly, editions were scanned digitally and made available on a Web site.[10] Successful archiving involves assessing appropriate storage and delivery methods for documents based on cost, usage, appropriateness, and longevity. In this case the best solution was symbiosis of electronic and paper media. By combining paper and electronic means, twin goals are achieved: an original record is maintained, and worldwide access to the material is provided. The symbiotic, ecological relationship exists in the fact that the paper version needs the electronic version for the sake of its own preservation against constant usage, and the electronic version needs the paper version as a backup in case of technical problems or obsolescence. Symbiotic double conservation contrasts with some previous "preservation" practices such as the effective destruction of books through the removal of bindings to facilitate the microfilming process.[11] In summary, a careful consideration of the archival ecosystem allows for effective migration while maximizing conservation in the long term. By identifying how a book or journal is most easily accessed with current technology (at present, via the Web), and best preserved (in two formats, paper and electronic), the text stands a better chance of lasting through time.

Of course, HTML, SGML, Java, JPEG, MPEG, and WAV files, etc. might go the way of the Commodore 64, and Canadian Architect and Builder and other journals archived on the Web may suffer a similar fate: technologically unreachable and/or unreadable by the Web advancements in the next five to one hundred years, they will be lost to the interested reader. They will be extinct. Further, a Web site with a forgotten URL may, in the long term, be even harder to find than a mis-shelved book. While some search engines can be tremendously efficient at locating relevant sites by keyword now, we do not know if the search engines of fifty years hence will be able to locate the historic, first-generation sites coded late in the twentieth century and in the early years of the twenty-first century.

Anti-extinction Preservation and Migratory Schemes

Some have made substantive plans to maintain the currency, or at least the accessibility, of their electronic archive. For instance, in 1996 the American Geophysical Union created a trust fund "for maintaining and upgrading its electronic journals in perpetuity." The American Astronomical Society has taken a similar course, using subscription revenue to finance the migration of its publication, Astrophysical Journal, to the latest markup standard and/or technology every five years as necessary.[12] These journals, in other words, devote significant effort and resources to an anti-extinction preservation and migratory process. The American Commission on Preservation and Access and the Research Libraries Group defined data migration in their report "Task Force on Archiving of Digital Information":

Migration is a set of organized tasks designed to achieve the periodic transfer of digital materials from one hardware/software configuration to another, or from one generation of computer technology to a subsequent generation. The purpose of migration is to preserve the integrity of digital objects and to retain the ability for clients to retrieve, display, and otherwise use them in the face of constantly changing technology. The Task Force regards migration as an essential function of digital archives.[13]

Crucial to that overall process is the independent certification of digital archives and a critical fail-safe mechanism. The primary function of the critical fail-safe mechanism is to enact "an aggressive rescue function to save culturally significant digital information" that may be at risk. The independent certification is a mechanism through which digital archive standards are reviewed, and whose indicators include the accessibility of stored data and the long-term protection of culturally valuable information. The Task Force was disparaging of "marketplace forces," alleging that the private sector, without the appropriate checks and balances, "may value information for too short a period and without applying broader, public interest criteria."[13] Their set of solutions seek to encompass public interest through a long-term approach, and through ongoing consultation with community stakeholder groups.

The search for solutions is on-going. In 2001, Harvard University Library was awarded a major planning grant to look into repository techniques for electronic journals. Funded by the Andrew W. Mellon Foundation, the project will explore the mechanism through which electronic journals will be selected for preservation, commercial intellectual property negotiations with publishers, delivery format for users, costs, and the overall storage strategies to be employed. The focus of the plan is to be an integrated electronic archive. As Harvard's proposal says, "No institution will archive solely for its own use, nor rely solely on its own archiving program. Therefore no institution can decide on the nature of its archiving operation in isolation."[14]

Stanford University's LOCKSS project (LOCKSS stands for Lots of Copies Keep Stuff Safe) seeks to assure the integrity of electronic publications through the maintenance of copies on multiple sites, and with periodic checks between these copies to verify informational congruency.[15] Through cloning and distributed storage, LOCKSS hopes to minimize in advance the impact a catastrophe at any individual site may have upon (data) populations as a whole.

The Biosphere

This list of archiving and migration methodologies is by no means exhaustive. Of particular concern are the long-term prospects for the electronic resources and research undertaken in developing countries. While research libraries in the United States and other industrialized nations may have the financial resources and the available workforce to facilitate "perpetual care" activities, in other parts of the world the resources are simply not available. "Digital aid" to the universities and publishers of the developing world may prove imperative in the coming years. The ecology of the archiving process requires that the whole biosphere of electronic data must be considered in the push for preservation, not just the material of discrete territories.

Even where the necessary will and resources are in place, for some electronic data the issue of long-term preservation and/or migration is somewhat more difficult. Central examples are the presentations and publications produced only in multimedia format. As one report put it, there is no "safety net" for purely digital data, "no analogous tangible version you can recapture if you lose the digital version."[16]While projects such as LOCKSS may provide some degree of safety, a further complication remains. If all computer platforms and software packages steadily lose their cutting-edge status via the computer industry's (apparent) maxim of premeditated obsolescence, multimedia format documents combining complex interactions of moving images, sound, and text stand the greatest risk of being lost or degraded over time. If there is a high degree of dependence between a given multimedia presentation and specific hardware or software, the process of "refreshing" — copying digital information from one medium to another — may be insufficiently compatible to maintain the exact interplay of image, sound, and text captured in the original.[17] A two-dimensional replica may still be preserved, but the subtleties of the original may be lost, degraded in the transfer.

As has always been the case, decisions will have to be made regarding what will be preserved, either in the symbiotic sense, or in the broader ecological migration to next-generation systems. This process, as always, takes time and resources. Priorities will be set and value judgments will be made regarding what should be preserved, and how well. Despite the massive technological advances in recent decades, it is a surety that information and data will still be lost in the passage of time.

Archiving as Metaphor

The fate of data will largely depend on the strategies adopted by libraries, publishers, archivists, and individual researchers. It is our belief that the future of the archive depends on how information storage is conceptualized. Rather than making temporary plans in the hopes of a future permanent solution, it is advantageous to visualize all information preservation as an evolving, ever-changing ecosystem.

We have not presented a single solution to future archiving: indeed, we argue against this. Instead, we have provided a conceptualization, a metaphor that may be of use to archivists and publishers alike, a theorization whose aim is to realign considerations of the archive to current technological realities. Ironically, it is the ecosystem, perhaps the oldest organizational complex on our planet, that provides the metaphoric substance.

As technology continues to develop apace, those seeking to preserve and to update are faced with ever-expanding options of how best to do this. It is an exercise in futility to try to pick the technological winner, for "the latest" will be hopelessly outdated in ten years' time. What is necessary is a permanent strategy for handling perpetual change. This will involve periodic reassessment, the migration of material from one platform to the next, and the allocation of resources. Conceptualizing the archive as an ecosystem may give some succor to those involved with the complexities of the task.



This paper emerged from participation in a forum of publishers, librarians and researchers, whose topic was 'the future of publishing and the archive'.

Julia Martin is a doctoral candidate in the School of English at the University of New South Wales. Her doctoral research is concerned with autobiography and eighteenth century thought. Her e-mail address is J.Martin@unsw.edu.au

David Coleman is a postdoctoral Research Associate in the Faculty of Education at the University of Sydney. His research interests include the internationalization of education and issues of globalization, particularly in relation to the United Nations. His e-mail address is d.coleman@edfac.usyd.edu.au


Notes


    1. Diane K. Kovacs, "Electronic publishing in libraries: Introduction," Library Hi Tech. 17, 1 (1999): 8.return to text

    2. Preserving Digital Data. (Washington, DC: Association of Research Libraries, 1997).return to text

    3. Liza Chan, "Electronic journals and academic libraries," Library Hi Tech. 17, 1 (1999): 14.return to text

    4. K. Coyle, "Electronic Information: Some Implications for Libraries," http://www.kcoyle.net/carlart.html, cited in Janet R. Cottrell, "Ethics in an age of changing technology: Familiar territory or new frontiers?" Library Hi Tech. 17, 1 (1999): 111.return to text

    5. Elizabeth Parang and Laverna Saunders, Electronic Journals in ARL Libraries: Issues and Trends, A SPEC Kit. (Washington, DC: Association of Research Libraries, 1994).return to text

    6. John T. Scott, Archiving the On-Line Journals: Why do we need an archive? Quarterly Newsletter of the International Council for Scientific and Technical Information. 26 (1997): http://www.icsti.org/forum/26/#journalsreturn to text

    7. Chan, 11.return to text

    8. Carol Tenopir, "The complexities of electronic journals," Library Journal. 122, 2 (1997): 37.return to text

    9. B. Neavill and M. A. Sheblé "Archiving electronic journals," Serials Review, 21, (1995), quoted in Chan, 14.return to text

    10. Marilyn Berger, "Digitization for preservation and access: A case study," Library Hi Tech. 17, 2 (1999): 146.return to text

    11. Nicholson Baker, Double Fold: Libraries and the Assault on Paper (New York: Random House, 2001).return to text

    12. AIP History Newsletter, "Science Publishers Set Aside Funds for Archiving Electronic Journals," XXIX, 1 (1997): http://www.aip.org/history/newsletter/spr97/journals.htmreturn to text

    13. Donald Waters and John Garrett (co-chairs), Preserving Digital Information: Final Report and Recommendations. Task Force on Archiving of Digital Information (1996): http://www.eric.ed.gov/ERICDocs/data/ericdocs2sql/content_storage_01/0000019b/80/14/88/8f.pdfreturn to textreturn to text

    14. Harvard University Library. Proposal for a Study of Electronic Journal Archiving. Submitted to the Andrew W. Mellon Foundation, October 13, (2000): http://www.diglib.org/preserve/harvardprop.htm See also http://www.diglib.org/preserve/ejp.htm for further research into e-journal archiving conducted by major American research libraries.return to text

    15. Stanford University Library. "Permanent Publishing on the Web," http://lockss.stanford.edu/return to text

    16. Preserving Digital Data. (Washington, DC: Association of Research Libraries, 1997), 16.return to text

    17. Waters, Donald and Garrett, John (co-chairs). Preserving Digital Information: Final Report and Recommendations. Task Force on Archiving of Digital Information (1996): http://www.eric.ed.gov/ERICDocs/data/ericdocs2sql/content_storage_01/0000019b/80/14/88/8f.pdfreturn to text

    References

    Baker, Nicholson. Double Fold: Libraries and the Assault on Paper. New York: Random House, 2001.

    Berger, Marilyn. "Digitization for preservation and access: A case study." Library Hi Tech. 17, 2 (1999): 146-151. [doi: 10.1108/07378839910275623]

    Chan, Liza. "Electronic journals and academic libraries." Library Hi Tech. 17, 1 (1999): 10-16. [doi: 10.1108/07378839910267145]

    Cottrell, Janet R. "Ethics in an age of changing technology: Familiar territory or new frontiers?" Library Hi Tech. 17, 1 (1999): 107-113. [doi: 10.1108/07378839910267271]

    Elsevier Science. Brain Research, Combined Subscription: http://www.sciencedirect.com/science/journal/00068993

    Harvard University Library. Proposal for a Study of Electronic Journal Archiving. Submitted to the Andrew W. Mellon Foundation, October 13, (2000): http://www.diglib.org/preserve/harvardprop.htm

    Kovacs, Diane K. "Electronic publishing in libraries: Introduction." Library Hi Tech. 17, 1 (1999): 8-9. [doi: 10.1108/EUM0000000004556]

    Parang, Elizabeth and Saunders, Laverna. Electronic Journals in ARL Libraries: Issues and Trends, A SPEC Kit. Washington, DC: Association of Research Libraries, 1994.

    "Permanent Publishing on the Web." Stanford University Library. http://lockss.stanford.edu/

    Preserving Digital Data. Washington, DC: Association of Research Libraries, 1997.

    Scott, John T. "Archiving the On-Line Journals: Why do we need an archive?" Quarterly Newsletter of the International Council for Scientific and Technical Information. 26 (1997): http://www.icsti.org/forum/26/

    "Science Publishers Set Aside Funds for Archiving Electronic Journals." AIP History Newsletter. American Institute of Physics Center for History of Physics. XXIX, 1 (1997): http://www.aip.org/history/newsletter/spr97/journals.htm

    Tenopir, Carol. "The complexities of electronic journals." Library Journal. 122, 2 (1997): 37-38.

    Waters, Donald and Garrett, John (co-chairs). Preserving Digital Information: Final Report and Recommendations. Task Force on Archiving of Digital Information (1996): [formerly http://www.rlg.org/ArchTF/] and [formerly http://www.rlg.org/ArchTF/tfadi.index.htm]


    Links from this article:

    American Geophysical Union, http://www.agu.org

    Astrophysical Journal, http://www.journals.uchicago.edu/ApJ

    Canadian Architect and Builder, http://digital.library.mcgill.ca/cab/

    LOCKSS project, http://lockss.stanford.edu

    Research Libraries Group, [formerly http://www.rlg.org]