Economics and Usage of Digital Libraries: Byting the BulletSkip other details (including permanent urls, DOI, citation information)
This work is protected by copyright and may be linked to without seeking permission. Permission must be received for subsequent distribution in print or electronically. Please contact firstname.lastname@example.org for more information. :
For more information, read Michigan Publishing's access and usage policy.
2.8 Electronic journals and other organized databases
Some reports are already available on the dramatic increase in usage of scholarly information that is easily available. Traditionally, theses and dissertations have been practically invisible, used primarily within the institution where they were written, and even there, they were not accessed frequently. Free access to digital versions is now leading to an upsurge in usage, as is described in McMillan et al. (1999).
In the remainder of this section, even though it is not fully justified, I will equate a full article download with a reading as measured by Don King and his collaborators.
The entire American Mathematical Society e-math system was running at about 1.2 million "hits" per month in early 1999. The Ginsparg archive (arXive) at Los Alamos was getting about 2 million hits per month. The netlib system of Jack Dongarra and Eric Grosse was at about 2.5 million hits per month.
For detailed statistics on usage and growth of JSTOR, see (Guthrie, this volume). By the end of 1999, its usage was several million a month, whether one counts hits or full article downloads, and was growing at over 100% per year.
The Brazilian SciELO (Scientific Electronic Library Online) project available at http://www.scielo.br/ , started out in early 1998. It appears to be still going through the initial period of explosive growth. In January 1999, 4,943 pages were transmitted. A year later, that number had grown to 63,695. 67,143 hosts requested pages in 1999, so it was not just a small group of users who were involved. It is too early to tell about how fast it will continue to grow, but it seems worth listing this project to show that even the less industrialized countries are participating in making literature freely available.
Paul Ginsparg's arXive had about 100,000 papers in early 1999, and was running at a rate of about 7 million full article downloads per year. Thus on average each article was downloaded about 70 times per year. These download statistics were just for the main Los Alamos server. If we assume that the more than a dozen mirrors collectively see as much activity as the main server, then we get a download rate of about 140 times per year per article. This is misleading, though, since it mixes old and new papers, which have different utilization patterns.
If we look at download activity for arXiv articles as a function of time, we find that on average an article gets downloaded around 150 times within one year of its submission, and then 20 to 30 times a year in subsequent years. In particular, even articles submitted around 1991 get downloaded that often. Since this again covers just the main server, we probably should again multiply these numbers by two to get total activity. If we do that, we get into the range of readings per article that established journals experience. The pattern of usage differs from that observed by King and other for printed journal articles. Those are read primarily in the six months after publication, and then the frequency with which they are accessed decreases.
The Electronic Journal of Combinatorics published about 200 articles by early 1999, and had about 30,000 full article downloads from its main site during 1999. That is an average of 150 downloads per article. Multiplying that by two to account for the many mirror sites again gets us to about 300 downloads per article per year. Data about distribution of downloads with time is not available.
The general impression from the statistics quoted above is that articles in electronic archives and electronic journals may not yet be read as frequently as printed journal articles, but are getting close. On the other hand, some sources appear to be used much more frequently online than they would be in print.
Additional evidence that online access changes scholars' reading patterns is provided by First Monday, "the peer-reviewed journal of the Internet," at http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/ . Issues are made freely available on the first Monday of each month. First Monday started publication in May 1996. About 3,600 people subscribe to the e-mail notification service.
First Monday has provided me with access to the logs of their U.S. Web server from January 1999 through February 2000.This is not sufficient for a careful statistical study, but some interesting patterns can be discerned in the data.
Over this period, the number of full paper downloads has grown from a range of 50,000 to 60,000 per month in early 1999, to between 110,000 and 120,000 per month in early 2000. Distinct hosts requesting articles have increased from between 12,000 to 15,000 to over 20,000 each month. Thus the growth rate of requests has been close to the 100% that has occurred frequently on the Internet. Since there are only 3,600 subscribers, this suggests many others learn of the material through word of mouth, e-mail, or other methods.
In a typical month, the largest number of downloads is to articles from that month's issue. In subsequent months, accesses to an issue drop in a pattern similar to that found by Don King in his studies of print journals. Half a year later, downloads are usually down to a quarter or even a sixth of the first month's rate. At that stage, though, the story changes. Whereas for print journals, usage continues to decrease with time, for First Monday it appears to increase. For example, there were 9,064 full article downloads from all the 1997 issues in February 1999, and 19,378 in February 2000. Thus accesses to the 1997 issues kept pace with the general growth of usage. Of the articles that were most frequently downloaded in 1999, 6 of the top 10 had been published in previous years. This supports the thesis that easy online access leads to much wider usage of older materials.
My personal Web page, which was at AT&T until August 2001, and is now at http://www.dtc.umn.edu/~odlyzko/doc/internet.size.pdf, has also seen rapid growth in usage. However, it is hard to discuss growth rates meaningfully in a short space, since most of the growth came from new papers in new areas. Instead, I will discuss the usage patterns that I have observed.
During January 2000, there were 10,360 hits on my home page from 1,808 hosts, excluding .gif files, and hits from obvious crawlers. Most of these 1,808 hosts only looked at various index files. If we exclude those, as well as the ones that downloaded only my cv or only abstracts of papers, we are left with 656 hosts that downloaded 1,198 full copies of articles. Of those 656 hosts, 494 downloaded just a single paper. Many of those 494 requested a specific URL for an article as opposed to looking at the home page for pointers, and then disappeared. Thus on average the people who visited my home page seemed to know what they were looking for, got it, and moved on.
Visitors to my Web page were remarkably quiet in the face of some obvious faults. Many of the papers posted on that page, especially old ones, are incomplete, in that they are early versions, and usually do not have figures that are present in the printed versions. Still, that occasions few complaints. For example, in 1999, a posting to a number theory mailing lists resulted in 152 downloads of a paper in the space of less than two weeks. However, only one person complained about the lack of figures in the Web version, even though they are very helpful in visualizing the behavior shown in the paper.
Another anecdotal piece of evidence demonstrates what happens on the Web. Several times people have told me they were glad to meet me, as they had read my papers and benefited from them. Conversation showed that they indeed were familiar with the papers in question. However, they also told me that they had lost the URL, and would I please remind them where my home page was? Even though finding my home page on the Web is easy, since my name is not a particularly common one, they obviously did not find it necessary to bother doing so. This, as well as the lack of complaints regarding incomplete papers, suggests a world of plenty. People are guided to Web pages by a variety of cues, get whatever they can from those pages, and move on to other things. It is not a world of a few precious treasures with no substitutes.
The importance of making material easily available was demonstrated in a very graphic form when I made .pdf versions of my technical papers available in April 1998. There was an immediate jump in the rate of downloads. Prior to that, mathematical papers were available only in .ps and .tex formats, and the ones on electronic publishing and related topics in .ps and straight text. Most PC owners do not have easy access to tools for reading .ps papers, and were apparently bypassing the available material that required extra effort from them. This is similar to observations of Academic Press and the American Institute of Physics (Luther, 2001) that better interfaces lead to higher usage.
The temporal pattern of article usage on my Web page shows the behavior that was already noted for arXiv and for First Monday.After an initial period, frequency of access does not vary with age of article, and stays relatively constant with time, after discounting for general growth in usage.
There is more evidence that easy online access leads to changes in usage patterns. For example, downloads from my home page go to a variety of sources all over the world. Some are leading to email correspondence from places like Pakistan, the Philippines, or Mexico. This is not surprising in itself, since those countries do have technically educated populations that are growing. What is interesting is that this correspondence predominantly refers to papers that have been downloaded electronically, and to copies of older papers that are not available in digital form, and which the requesters had learned about from my home page. This does suggest strongly that easy availability is stimulating interest from a much wider audience. This conclusion is also supported by similar observations concerning correspondence with people in industrialized countries. Much comes from people outside universities or large research institutions that have good libraries and who would be unlikely to read my papers in print.
In a small fraction of cases the referrer field on requests shows where the requester found the URL. In many cases, such requests come from reading lists in college or graduate courses.
As a final note, spikes in usage often occur when one of my papers is mentioned in some newsletter or discussion group. For example, Bruce Schneier publishes CRYPTO-GRAM, a monthly email newsletter on cryptography and computer security, with a circulation of about 20,000. In early August 1999, CRYPTO-GRAM mentioned a recent preprint of mine which I had not advertised much, and which was about to appear in a regular print journal. Over the next two weeks over a thousand copies were downloaded. I am convinced that this is a higher figure than the number of times the printed version will be read.
The CRYPTO-GRAM example as well as those of other visits to my home page suggest that informal versions of peer review are in operation. A recommendation from someone, or a reference in a paper that the reader trusts, all serve to validate even unpublished preprints. Scholars pursue a variety of cues in selecting what material to access.