Add to bookbag
Author: Deborah Lines Andersen
Title: Benchmarks: Testing the Persistence of URLs
Publication Info: Ann Arbor, MI: MPublishing, University of Michigan Library
February 2007
Availability:

This work is protected by copyright and may be linked to without seeking permission. Permission must be received for subsequent distribution in print or electronically. Please contact mpub-help@umich.edu for more information.

Source: Benchmarks: Testing the Persistence of URLs
Deborah Lines Andersen


vol. 10, no. 1, February 2007
Article Type: Benchmark
URL: http://hdl.handle.net/2027/spo.3310410.0010.103

Benchmarks: Testing the Persistence of URLs

Deborah Lines Andersen

dla@albany.edu

Benchmark: a standard by which something can be measured or judged.  [1]

Passages

Before moving into the body of this column I would like to acknowledge a change in our editorial staff. With this issue Scott Merriman steps down as the editor of the electronic resources column. I would like to thank Scott for his six years of working on the journal as a column editor. Happily, Scott will continue in a different capacity as a member of the peer-review editorial team of the journal.

With Scott's stepping down I am pleased to welcome Jeremy Boggs as the new editor of the electronic resources column. Jeremy is at George Masson University where he is a PhD student in U.S. History. He is a web developer for the Center for History and New Media there. His column this issues focuses on weblogs and carnivals in history.

A Research Question for the Journal

The subject of this "Benchmarks" surfaced as I was reviewing the papers and columns for this issue. I was particularly taken with the number of links to web sites that appeared in the papers. Luc Guay's article, "Les TIC transforment les pratiques pédagogiques" contains 9 URLs in both its French and English forms. Jessica Lacher-Feldman's article, "Publishers' Bindings Online, 1815-1930" contains 30. Links are no surprise in an online journal, but a question surfaced for me about how up-to-date and viable are the links throughout the entire run of journal issues starting in 1998. This turns out to be an important topic in history and computing. Historically, we want our readers to be able to following the original links that our authors included in their papers. The links can be followed because individuals all over the world have taken the time to migrate their materials to new computer platforms while also preserving the original URLs for access. This is a computing issue.

Shakers and the World Wide Web

In order to explore this question I chose a paper I wrote for the journal in August 1999 entitled, "Heuristics for Educational Use and Evaluation of Electronic Information: A Case of Searching for Shaker History on the World Wide Web."  [2] I selected this article first because I know its material, second because it is now seven years old, and finally because it contained 41 links to other materials on the World Wide Web. My research question was two-fold. First, how many of the links would still be functional? Second, if there were now dead links, would it be possible to find the same information through a keyword search of Google?

The larger question is the historical one-one of access. How well is the journal doing in this regard?

The Links

The 41 links can be divided into various application types. Table 1 presents these categories. Most astounding when looking at this table is that nearly 50 percent of the links are dead on location. This means that they are either no longer formatted as links (black rather than blue with no hyperlink function) or blue hyperlinks that point to a "page not found" statement. This does not necessarily mean that the sites no longer exist. A case in point is the University at Albany library site listed as http://www.albany.edu/library/ in the paper. The Albany library is now located at http://library.albany.edu/.

A second case in point is the http://www.ziplink.net/~pcb/and/nat00121.htm link in the paper which presents in black. If one copies this url and clicks on it the Priscilla C. Butler paper, Origins of the Shakers: The Heresy of Mother Ann Lee. (Vassar College Department of Religion, Senior Thesis, Fall Semester, 1982) is still there but one has to jump through a few digital hoops to get there.

A third case is exemplified by http://www.convergemag.com in which the magazine Converge Online continues to exist but the paper in question (Jamie Murphy, 1999. "Technology Training for Faculty." Converge 2(3)30-31) is no longer accessible on the website.

Next is the situation in which the link is extant-it points to the same title as the original url-but the materials have been updated and are not the same as the ones that were originally referenced. An example of this problem exists at http://www.vuw.ac.nz/~agsmith/evaln/evaln.htm, Alastair Smith's "Evaluation of Information Resources," (The World Wide Web Virtual Library, updated 19 October 2006). The material is similar to that referenced in August 1999 but definitely not exactly the same sources due to the update.

There is the case of a URL which exists but which points to something entirely different that the original link. http://www.namss.org.uk/evaluate.htm used to be an evaluation site but as of 15 January 2007 was a site promoting distance education in the United Kingdom.

Finally, there are links that continue to return the content that was there in 1999. The Norwegian site, "Bombs and Babies: Childhood Memories from World War 2" is an example of a site that is extant and has not changed since originally referenced.

Table 1: Links Presented in "Heuristics..." article, JAHC 2(2): August 1999
Number Live Links Dead Links
Libraries 6 0 6
Virtual Libraries 1 1 0
U.S. Universities 4 0 4
Non-U.S. Universities 1 1 0
Search engines 3 2 1
Magazines and journals 3 1 2
.Com, .Net and .Org sites 21 16 5
Individual sites 2 1 1
TOTALS 41 22 19

*Items accessed on 15 January 2007

It is important to take a look at the major categories in the above list. Of the U.S. university library links, none of them continue to be available to our readers. This is surprising and disappointing. Universities do upgrade their servers on a regular basis. Students and faculty move on and take their materials off line. Researchers update their materials and create new URLs for them. Nonetheless, the function of references in a journal article is to make those references available to readers. The six university links are no longer available in their present configurations.

The other major category, .Com, .Net, and .Org sites, presents a rosier picture for our readers. Full 75 percent (16 of the 21 links) remain live seven years after their original citations. Perhaps museums keep their old servers longer, or add to rather than change materials.

The Data

For the curious, the following text lists all 41 of the web sites that were referenced on the "Heuristics..." paper and gives a comment in brackets about the availability of the particular site. They are organized by the categories in table 1.

Libraries

All library sites cited in the online paper presented as dead links. There are two issues here. One is that the university library still exists-its does in every case. The second issue is whether or not the cited materials continue to exist on the web site.

Virtual Libraries

The single item here is a live link but has been updated since the 1999 citation in the original paper.

U.S. Universities

All of these items are dead links. It is possible that they exist but one would have to do a web search in order to check their availability. As in the case of the University at Albany libraries it is possible that the information exists but that the domain name has changed.

Non-US Universities

This is an extant link. The information is dated 1997.

Search Engines

AltaVista used but no link in paper [www.altavista.com exists on the wesb].

Magazines and Journals

Of the two journals the JAHC article is available. Converge Online is also available but the 1999 articles do not appear when using its internal search engine. A Google search of the articles gives their authors and titles but no digital copies of the articles.

.Com or .Org or .net sites

Perhaps not surprisingly, only seven of the 21 commercial sites are no longer available to our readers. The museums and businesses referenced in 1999 continue to provide links to materials at the same site, under the same URL.

Individual sites

These two sites have URLs that are hard to decipher. Nonetheless, they both are still live and point to the materials that were referenced in 1999. This again underlines the point that large institution tend to reuse and update their sites while individuals seem more likely to maintain original content.

Thoughts and Lessons for the Journal

There are enormous policy and editorial issues that surface as a result of these findings. First, it is apparent that some URLs in the journal lose their status as live links over time-perhaps because of migrating the journal to new servers. If the editorial board were willing to spend time and effort, it might be possible to re-edit and correct these omissions on a regular basis.

Nonetheless, there are many other links that no longer point at the materials intended by the authors. What does the journal do with these? We could require that only links to online, peer-reviewed articles be included on our site, assuming that these articles will have long lives and be available to our readers. This is one end of the editorial policy spectrum. The other end of the spectrum would acknowledge that the World Wide Web is an extremely fluid environment, with information changing on a regular basis. We might want a disclaimer at the beginning of each issue that states that links will come and go. We might do a "link survey" on a regular basis (and on a grander scale that this one-article example) that would see how we are doing.

At present it is business as usual. Authors will use the best sources they can find-print or digital-and we will have to beg the forgiveness of our readers if these materials drift out of our range.

Notes

1. "Benchmark," American Heritage Dictionary, 4th ed., 2000.

2. Deborah Lines Andersen. "Heuristics for Educational Use and Evaluation of Electronic Information: A Case of Searching for Shaker History on the World Wide Web." Journal of the Association for History and Computing 2(2); August 1999. http://www.mcel.pacificu.edu/jahc/jahcii2/articlesii2/anderson2/anderson2.html