Add to bookbag
Author: Daniel Pfeifer
Title: An Archiving Scheme for an On-line Journal
Publication info: Ann Arbor, MI: MPublishing, University of Michigan Library
November 1998
Rights/Permissions:

This work is protected by copyright and may be linked to without seeking permission. Permission must be received for subsequent distribution in print or electronically. Please contact mpub-help@umich.edu for more information.

Source: An Archiving Scheme for an On-line Journal
Daniel Pfeifer


vol. 1, no. 2, November 1998
Article Type: Article
URL: http://hdl.handle.net/2027/spo.3310410.0001.203
PDF: Download full PDF [12kb ]

An Archiving Scheme for an On-line Journal

Daniel Pfeifer

Wake Forest University

The issues of permanence and information availability loom large for a scholarly journal that uses the web as its primary medium for communicating information. Ideally, when a writer quotes a particular passage from an on-line article, the referred link will remain for future readers to follow. Ideally, when a librarian makes a subscription to a periodical, he or she does not worry that the information may disappear after some period of time. Unfortunately, the ease with which someone can add or delete or change information on a computer,that is such a benefit in many cases, makes permanence a struggle for an on-line journal. Unlike a book that must be physically destroyed, virtual information stored on a web server can disappear with the flick of a switch or the press of a button. Although a book is not timeless by any means, the information contained in its pages is a far less temporary than virtually stored data.

Through foresight and long term planning efforts, the producers of scholarly journals on the Internet can remedy the problem of permanence to a satisfactory degree. The nature of journal archiving and footnoting lend to the solution. On the computing side of the formula, file-processing planning can ensure that a document begins in a certain place and remains there "for the duration."

The traditional organization of a journal includes volumes and numbers which generally correspond to the publication year (in the life of the journal) and the timing of the issue within that year. In addition, journals generally include another identifier like a quarterly or semi-annual notation, i.e. Spring 98 or September 99. Either of these notations will do to produce the primary file-processing structure. In addition to the timing identifiers, a journal generally has a standard internal structure that may include sub-sections like articles and reviews. These sub-sections would be used to build the secondary file-processing structure. Finally, each document, whether an article or review, has an author, of course, and a title. The document itself represents the lowest level of the file-processing structure. Each document would be noted by using an abbreviation of either the author's name or a key word in the title. Since we are dealing with the web, there is one more consideration: the internet domain name. Fortunately, the highest level of the file-processing structure, the domain name, also corresponds to a characteristic or characteristics of the journal, the title, and,in some cases, the publisher. In the event that the journal is on the sponsoring organization's web server, whether a university or the organization producing the journal, the sponsor's domain name is generally used, and a sub-directory is used to represent the journal.

For example, in the Spring 98 edition of the Journal of the Association for History and Computing, Deborah L. Anderson wrote an article entitled "Academic Historians, Electronic Information Access Technologies, and the World Wide Web: A Longitudinal Study of Factors Affecting Use and Barriers to That Use." Applying the archiving scheme briefly outlined above Dr. Anderson's article could be noted as http://mcel.pacificu.edu/jahc/v1/n1/articles/anderson.html. Another way to identify or "save" the document would be http://mcel.pacificu.edu/jahc/spring98/articles/anderson.html.

The following is a breakdown of the document location starting from the uppermost level. The internet domain for the JAHC is http://mcel.pacificu.edu/. Pacific University sponsors the journal by allowing it to reside on their web server. The journal title, Journal of the Association for History and Computing, is abbreviated in the sub-directories /jahc/. The issue itself is represented in the next level by either /v1/n1/ or /spring98/. The sub-section of the journal, in this case /articles/, comes next. And finally, the author's name is used as the document identifier, anderson.html. Altogether, the information contained in the document's URL should look similiar to an abbreviated footnote.

If in the long term planning, the journal editors decide that the author's last name is not a good way to denote a document, then they may use the document's title. One reason for this action is that in many cases multiple authors contribute to an article. Which name does one choose? Should it be left up to the authors' discretion? Another scenario that may cause a problem of a more serious nature to the archiving scheme is if two authors with the same last name submit to a journal issue. In this case one could use the author's last name and first initial. In order to eliminate both of the problems a title keyword or a combination of both author and title could be used.

Once the basic file-processing structure is in place for an issue, the opening page of the journal should be the default page name of the web server located in the highest level directory. For the Journal of the AHC, a user should be able to access the journal by simply linking to http://mcel.pacificu.edu/jahc/. Of course, the opening page will change with each new journal edition, but one link should be present on every opener — "Back Issues." Each back issue index could then be named according to the archiving scheme. If using /spring98/ as the edition notation, then the opening page would be named spring98.html. If using the /v1/n1/ notation, then the page would be named v1n1.html. As the opening pages move to the "Back Issues" section, their links would not change because the documents (the articles, reviews, etc.) are never relocated.

However, the warning "never say never" trully applies to permanence on the World Wide Web. Readers of the first issue of the JAHC can observe that the second issue is now in a different location. Rather than residing on the ssd1.pacificu server, the journal is now on the mcel.pacificu machine. The editor and the development team at Pacific University made the decision to move to the new server for various sound reasons. First and foremost is the speed and reliability of the connection. The new server is maintained "in house," rather than on the University's server which has a high level of Pacific U traffic. In addition, the journal is now easier to support for the small group of technicians that are needed to administer the server, format HTML, and deal with anything else that arises when dealing with the production of a large web site.

The move did raise another important issue of permanency. What happens when the journal's upper level domain changes? Or put in a bit more normal way, what happens when the journal changes publishers or sponsoring organizations and has to move to a new location on the web? Fortunately, there is a solution. On the web it is possible to use the "Refresh" command or a script (JavaScript or CGI) to automatically forward the reader to the new location. From there, the reader browses as normal because the file structure or the sub-location of the documents remains the same.

A final consideration to be addressed by an on-line journal is library subscription. No librarian wants to live in fear that some of their catalogued items may disappear, but the obvious solutions to this problem break down at some point or another. For example, if the library copies all of the journal files (articles, reviews, etc.) onto a local server, then that action is unacceptable to the journal for obvious reasons. On the flip side, a licensing agreement from the journal does not necessarily remove the librarian's fear that the entire journal may someday "go away." Software licenses can be cancelled. It is not the intention of the JAHC to play marketing or licensing games with libraries or readers, regardless of Microsoft's example. Our intention is to publish scholarly work about history and computing and to make that information as widely available as possible. To this end, the JAHC editors are considering ways to resolve library's concerns. One possible solution is to produce a CD at a low cost that libraries may purchase and place on their shelves. Whatever the solution, it should be one where the library retains a permanent copy of the subscribed edition.

Hopefully, this brief discussion has brought us closer to the resolution of the permanence issues for a scholarly on-line journal. The key to the archiving scheme and permanence of a publication on the Internet is the directory structure of the site. With foresight and clues from current conventions like footnotes, a document can remain in one location on the WWW. By using storage facilities like recordable CDs, libraries can have a copy of the journal that will not "go away" with the flick of a switch or a lack of funding. Although the web is not considered a medium of permanence, we can add a satisfactory degree of certainty.