Economics and Usage of Digital Libraries: Byting the BulletSkip other details (including permanent urls, DOI, citation information)
This work is protected by copyright and may be linked to without seeking permission. Permission must be received for subsequent distribution in print or electronically. Please contact firstname.lastname@example.org for more information. :
For more information, read Michigan Publishing's access and usage policy.
2. The Rapid Evolution of Scholarly Communication[†]
Traditional journals, even those available electronically, are changing slowly. However, scholarly communication is rapidly evolving to electronic formats. In some areas, electronic versions of papers are being read about as often as the printed versions. Although there are serious difficulties in comparing figures from different media, growth in the use of electronic scholarly information is sufficiently high that, if it continues for a few years, print versions will no doubt be eclipsed. Further, much electronic information is accessed outside the formal scholarly publication process. There is vigorous growth in forms of electronic communication that take advantage of the unique capabilities of the Web, and that simply do not fit into the traditional journal publishing format.
This paper presents statistics on the use of print and electronic information. It also discusses preliminary evidence about the changing patterns of usage. This evidence indicates that to stay relevant, scholars, publishers, and librarians will have to make even larger efforts to make their material easily accessible.
Traditional journals and libraries have been vital components of scholarly communication. They are evolving, but slowly. The reasons for this are discussed briefly in Section 2.2 and, in more detail, in Odlyzko (1997b). The danger is that they might be rapidly losing their value, and could become irrelevant.
At first sight, there seems little cause for concern. Print journal subscriptions are declining, but gradually. One often hears of attrition in subscriptions of 3-5% per year. For example, the American Physical Society, with high quality and relatively inexpensive journals, has seen a steady decrease of about 3% per year (Lustig, 1997). At those rates, losing half the circulation takes between 14 and 24 years. On Internet time, that is almost an eternity. Preprints in most areas are still a small fraction of what gets published. Also, library usage is sometimes reported as declining, but again at modest rates.Yet these are not reasons for complacency. Why should there be any declines at all? Ours is an information age; the number of people getting college and postgraduate education is growing rapidly, spending on R&D and implementation of new technologies is skyrocketing. Why should established journal subscriptions be dropping, and why should many of the recent specialized journals be regarded as successes if they reach a circulation of 300? Why should many research monographs be printed in runs smaller than the roughly 500 copies of the first edition of Copernicus' De revolutionibus orbium coelestium of 1543?
My conclusion is that the current scholarly information system is badly flawed, and does not provide required services. This paper presents evidence that the demand for high quality scholarly information is indeed growing, and can only be satisfied through easy availability on the Web.
Some of the early studies of electronic usage, such as Lenares' interesting 1999 paper, concentrated on faculty at leading research institutions. Change might be expected to be slow in such places. Although such scholars usually have the resources to be pioneers, they have little incentive, since they have access to good libraries. The evidence to be presented later shows that the current system neglects the needs of growing ranks of scholars who are not at such institutions. Thus, it is better to concentrate on these scholars and their usage of information that is freely available over the Internet.
Tenopir et al. (2000) does show that, among established scholars, electronic resources play an increasing role, but that current usage is dominated by traditional media. However, it is important to look at growth rates rather than absolute numbers. In an early 1999 discussion in a librarians' mailing list, somebody pointed out that, in 1998, only 20% of the astronomy papers were submitted to Ginsparg's xxx paper archive, now called the arXiv, at http://www.arxiv.org . An immediate rejoinder from another participant was that, while this was true, the corresponding percentage had been around 7% in 1995. It is growth rates that tell us what is in our future.
This paper is only a brief attempt at finding patterns in the use of online information. At the moment, we have little data about online usage patterns. This is especially regrettable since these patterns appear to be in the midst of substantial change. What we need are careful studies, such as have been carried out for print media. Although the Web in principle makes it possible to provide extremely detailed information about usage, in practice there is little data collection and analysis, especially in scholarly publishing. Even when data are collected, they are seldom released. Thus one purpose in writing the initial draft of this paper was to stimulate further collection and dissemination of usage data. The main purpose, though, was to look for patterns even with the limited data available to me, to provide a starting point for further research.
Fortunately, many new studies of electronic resources have appeared recently. In general, they do support most of the tentative conclusions of this paper, which are:
Usage of online scholarly material is growing rapidly, and in some cases already appears to surpass the use of traditional print journals. Much online usage appears to come from new readers and often from places that do not have access to print journals.
We can expect the growth of online material to accelerate, especially as the information about usage patterns becomes widely known. Until recently, scholars did not have much incentive to put their works on the Web, as this did not create many new readers. While we can expect that snobbery will retard this step ("I can reach the dozen top experts in my field by publishing in Physical Review Letters, or by sending them my preprint directly, why do I care about the great unwashed?"), the attraction of a much greater audience on the Web and the danger that anything not on the Web will be neglected are likely to become major spurs to scholars making their works available online. For example, the recent study by Lawrence (2001) shows that papers in computer science that are freely available online are cited much more frequently than others. Anderson et al. (2001) might appear to suggest the opposite, since in this study free online availability was associated with lower citation frequency. However, that result is likely anomalous, in that the freely available online-only articles in the journal under study were apparently perceived widely, even if incorrectly, as of inferior quality.
The need for traditional peer review is overrated. Odlyzko (1995) had extensive discussion of the inadequacy of conventional peer review, and how much more useful forms were likely to evolve on the Internet. That paper was written before the ascendancy of the Web. While open review and comments on published papers have been slow to take hold, online references and bibliographies are developing into a new form of peer review. People are coming to my Web page in large numbers looking for specific papers. While in almost all cases I do not know what brings them there, it is pretty clear that they are finding links to the material in a variety of sources, such as bibliographies and references on other home pages. A new form of peer review, it brings many readers even for papers published in obscure and unrefereed places.
Concerns about information overload and chaos on the Net are exaggerated. While better organization of the material would surely be desirable, people are finding their way to the serious information sources in growing numbers as is.
Ease of access and ease of use are paramount. Material on the Web is growing, and scholars, like the commercial content producers, are engaged in a war for the eyeballs. Readers will settle for inferior forms of papers if those are the ones that can be reached easily.
Novel forms of scholarly communication are evolving that are outside the boundaries of traditional journals.
These conclusions and predictions are supported by data in the rest of this paper. It does appear that while journals are not changing fast, scholarly communication as a whole is evolving rapidly.
2.2 Rates of technological change
The conventional notion of "Internet time," in which technological change is accelerated tremendously, is a myth. Rapid change does occur occasionally, and the adoption of Web browsers is frequently cited as an example. Less than 18 months after the release of the first preliminary version of the Mosaic browser, Web transmissions constituted more than half of Internet traffic. However, this was a singular exception. Cell phones, faxes, and ATM machines took much longer to spread. Even on the Internet, new systems are usually adopted much more slowly. How come IPv6 is still basically invisible? Why is HTTP1.1 spreading so slowly? How about TeX and its various dialects (which go back more than two decades)? Even at universities, e-mail took a while to diffuse. The Internet has changed much, but it has not made for a dramatic increase in the pace at which new technologies diffuse. A typical time scale for significant changes is still on the order of a decade. This was noted a long time ago: "A modern maxim says: People tend to overestimate what can be done in one year and to underestimate what can be done in five or ten years." (Licklider, 1965, p.17)
Further discussion of rates of change is available in Odlyzko (1997b), which presents many examples (such as music CDs, ATM machines, credit cards, and cell phones) supporting the thesis that consumer adoption of new technologies is slow. Thus we should not be surprised if electronic scholarly communication does not turn on a dime.
The rare rapid adoptions of new technologies (aside from unusual situation such as that of the Web) appear to be associated with the presence of forcing agents that can compel rapid change (Odlyzko, 1997b). On the other hand, sociological changes tend to be very slow, taking a generation or two.
Aside from simply observing that historically, new technologies have been taking on the order of a decade to be widely adopted, one can also build statistical time simulations that explain this time scale. For instance, we know that usage of electronic forms of scholarly information has typically been growing at 50 to 100 percent per year. This is shown in various tables in this paper. On the other hand, print usage has shown little change. Supposing that print usage remains static, from the moment electronic usage breaks the one percent threshold at which it is likely to be noticed, growth rates of 50 to 100% would only yield parity with print usage after approximately a decade.
2.3 Disruptive technologies
Clayton Christensen's book (1997) has become a modern classic. It helps explain the failure of successful organizations, such as Encyclopaedia Britannica, to adopt new technologies. The example of the Britannica, cited in Odlyzko ( 1995, 1999), is very instructive. It was and remains the most scholarly of the English-language encyclopedias. However, it could not cope with the challenges posed first by inexpensive CD-ROM encyclopedias, and more recently by the Web.
What Christensen calls disruptive technologies tend to have three important characteristics:
- they initially underperform established products
- they enable new applications for new customers
- their performance improves rapidly
Electronic publishing has these characteristics. Little material was available initially, screen resolution was poor, printers were not widely available and expensive, and so on. However, online material was easy to locate and access, and could provide novel features, such as the constant updating of the genome database. Moreover, costs, quality, and availability have all been improving rapidly. That is why direct comparisons of traditional journals or libraries with electronic collections are not directly relevant. For example, the 1998 paper by Stevens-Rayburn and Bouton is effective in demonstrating that the Web at that time could not substitute for a regular library. It still can't, even in 2000. However, that is not the relevant question.
The mainframe was not dethroned by the PC directly. The PC could not replace the big machines in areas such as payroll processing. The computing power of the mainframes sold each year is still increasing, and has been increasing all along, even when IBM was going through its traumatic downsizing in the early 1990s. It's just that the PC market has been growing much faster. The mainframe has been consigned to a small niche, and the revenues from that niche have been declining. This is a useful analogy to keep in mind. Traditional journals and libraries are still playing a vital role, but, to quote from Odlyzko (1997b), "... journals are not where the interesting action is." The real issue, to quote Stevens-Rayburn and Bouton (1998), is that "in this new electronic age, if it isn't on-line, for many purposes it might as well not exist." Further, even if it is online, it might not matter if it is not easy to access or is not timely.
2.4 Effects of barriers to use
Even small barriers to access reduce usage significantly. Statistics collected by Don King and his collaborators show that as the physical distance to a library increases, usage decreases dramatically. A recent statistical tidbit of a similar nature is the reaction of the mathematicians at Penn State when all journal issues published before 1973 had to be sent to off-site storage because of space limitations. This move was widely disliked, even though any volume can be obtained within one day. The interesting thing is that the mathematical research community of about 200 faculty, visitors, and graduate students asks for only about 850 items to be recalled from storage per year. That is just over 4 items per person per year. It seems likely (based on extrapolations from circulation figures for bound journals that are immediately available on shelves) that usage of this material was much higher when it was easily accessible in the library in their building.
When subscriptions to journals are canceled, articles from those journals are obtained through interlibrary loans or document delivery services. Some libraries (Louisiana State University's perhaps most prominent among them) have consciously decided to replace journal subscriptions with document delivery, after making a calculation of how much the journals cost per article read. While I do not have comprehensive statistics, my impression is that such moves save more than preliminary computations suggest. The secret behind this phenomenon is that usage of document delivery services is lower than that of journals available right on the spot. Having to fill out a request form and wait a day or a week reduces demand.
Librarians have known for a long time that ease of use is crucial. They experienced this with card catalogs, where materials whose catalog entries were available only in the paper card catalogs were not being used. Thus the current shift towards online usage had been anticipated.
... there's a sense in which the journal articles prior to the inception of that electronic abstracting and indexing database may as well not exist, because they are so difficult to find. Now that we are starting to see, in libraries, full-text showing up online, I think we are very shortly going to cross a sort of critical mass boundary where those publications that are not instantly available in full-text will become kind of second-rate in a sense, not because their quality is low, but just because people will prefer the accessibility of things they can get right away.
Clifford Lynch, 1997, quoted in Stevens-Rayburn and Bouton (1998)
Today, we have evidence that Clifford Lynch was correct. Note that Encyclopaedia Britannica has been a victim of this trend. Being the best did not protect it from declines in revenues, restructuring, and being forced to experiment with several business models.
The shift to online usage is exposing many of the limitations of the traditional system. Research libraries are wonderful institutions. They do provide the best service that was possible with print technology. However, in today's environment, that is not enough. Most printed scholarly papers are available typically in something like 1,000 research libraries. Those libraries are accessible to a decreasing fraction of the growing population of educated people who need them. Further, even for those scholars fortunate enough to be at an institution with a good library, the sizes of the collections are making material harder to access. Hours of availability are limited. Also, studies have shown that even when a book that is searched for is in a given library's collection, in about 40% of the cases it cannot be found when needed.
The basic problem, of course, is that it is impossible in the print world to make everything easily accessible even in the best library in the world. Space constraints mean that some material will be far from the user. In practice, most libraries can store only a tiny fraction of the material that might be of interest to their patrons. While they have been careful about selecting what seemed to be most relevant, experience shows that when easy electronic access is provided to large bodies of material not normally available in the library, there is demand for it (Luther, 2001; Bensman and Wilder, 1998). That is a major factor propelling the move towards bundling of electronic journal offerings and consortium pricing (Odlyzko, 1999).
The easy access to online resources is leading to increasing usage, as will be discussed later, and is also documented in Anderson et al. (2001), Gazzale and MacKie-Mason (this volume), Guthrie (this volume) and Luther (2001). But not all online access is equal. Many scholars use Amazon.com's search page as a first choice in doing bibliographic searches for recent books, since it is more user-friendly than the electronic catalogs of the Library of Congress, say. Luther (2001) notes, "Both Academic Press and the American Institute of Physics (AIP) noted that they experienced surges in usage after they introduced new platforms that simplified navigation and access."
Ease of use has an important bearing on pricing. Odlyzko (1995) predicted that pay-per-view was likely doomed to fail in scholarly publishing, because of its deterrent effect on usage.Publishers have now, after experiments with PEAK and other pricing models, moved to this view as well. For example, Hunter (this volume) states that
[Elsevier's] goal is to give people access to as much information as possible on a flat fee, unlimited use basis. [Elsevier's] experience has been that as soon as the usage is metered on a per-article basis, there is an inhibition on use or a concern about exceeding some budget allocation.
Similarly, Luther (2001) points out that "Philosophically, Academic Press is opposed to a business model in which charges increase with use because it discourages use."
Easy access implies not only greater use, but also changing patterns of use. For example, a recent news story discussed how the Internet is altering the doctor-patient relationship (Kolata, 2000). The example that opens the story is of a lady who is reluctantly told by the doctor she might have lupus, and leaves the clinic terrified of what this might be. She then proceeds to obtain information about this disease from the Internet. When she returns to see a different, more pleasant physician, she is well-informed and prepared to question the diagnosis and possible treatment. What is remarkable about this story is that the basic approach of this patient was feasible before the arrival of the Web. She could have gone to her local library, where the reference librarians would have been delighted to point her to many excellent print sources of medical information. However, few people availed themselves of such opportunities before. Now, with the easy availability of the Web, we see a different story.
The arguments about effects of barriers to access and of lowering such barriers suggest that scholarly communication will undergo substantial changes. We should expect to see greater use of online material. We should also see much greater use of it by people outside the narrow disciplinary areas that produce it. Much of this use will come from outside the traditional academic and research institutions, but a considerable portion is likely to come from other departments within an institution. Further, the increasing volume of material, as well as the decreasing role of traditional peer review, are likely to lead to greater demand for survey and handbook material. With lower barriers to interactions and access to specialized literature, we should also see more interdisciplinary work.
2.5 Scholarly information as a commodity
Authors like to think of their articles as precious resources that are absolutely unique and for which no substitutes can be found. Yet a more accurate picture is that any one article is just one item in a river of knowledge, and that this river is rising. Substitutes exist for almost everything. Some people interested in Fermat's Last Theorem will want, for historical or other reasons, to see Andrew Wiles' original paper (Wiles, 1995). Many others will be happy with a reference to where and when that paper was published, and others will be satisfied with various popular accounts of the proof. Even those interested in the technical details will often be satisfied with, and often be better server by, other presentations, such as that in the Darmon, Diamond, and Taylor account of the proof (Darmon et al., 1997).
Thinking about a river of knowledge instead of a collection of unique and irreplaceable nuggets helps explain why scholars manage to function even with a badly flawed information system. Even though in 40% of the cases, a desired book cannot be retrieved from a desired book cannot be retrieved from the library's shelves, usually some other book covering the same topic can be found. Spending on libraries by research universities is correlated most strongly the total budgets, and very weakly with quality. Harvard spends about $70 million per year on its libraries, verus $25 million for Princeton. Yet would anyone claim that a Harvard education or scholarly output is almost three times as good as that of Princeton?
The Internet is reducing the costs of production and distribution of information. As a result, there is a flood of material. Much is of low quality, but a substantial fraction is very good. Before looking whether scholars are using this material let us consider usage of print material.
2.6 Usage of print journals
We are fortunate to have an excellent recent survey of usage of print journals in the book of Carol Tenopir and Don King (2000). It shows that a typical technical paper is read, which is defined as not necessarily reading it carefully, but going beyond just glancing at the title and abstract, between 500 and 1500 times. These readings average about one hour in length, and in about half the cases represent the reader's first encounter with an article.
The estimate of 500 to 1500 readings per article is much higher than some earlier studies had come up with. The studies on which Tenopir and King base their estimates do have biases that may raise the reading estimates above the true value. For example, they are based on self-reporting by technical professionals, who may overestimate their readings. Further, those figures include articles in technical journals with large circulations (such as Science, Nature, and IEEE Spectrum) that are not typical of library holdings. If one considers library usage studies, such as those that have been carried out at the University of Wisconsin in Madison, one comes up with somewhat lower estimates for the number of readings per paper. Still, the basic conclusion that a typical technical paper is read several hundred times appears valid.
The studies reported in Tenopir and King (2000) also show that, in the print world, articles are usually read mostly in the first half-year after publication. Afterwards, usage drops off sharply.
2.7 Growth in usage of electronic information
It is hard to measure online activity accurately. The earliest and still widely used measure is that of "hits," or requests for a file. Unfortunately, with the growth of complicated pages, that measure is harder to evaluate. When possible, I prefer to look at full article downloads. Finally, as a conservative measure, one can look at the number of hosts (unique IP addresses) that requested information from a server. Even then, there are considerable uncertainties. The same person may send requests from several hosts. On the other hand, common employment of proxies and caches means that many people may hide behind a single host address, and a single download may lead to multiple users obtaining copies (as happens when papers are forwarded via email as well).
In addition to the uncertainties in interpreting the activity seen at a server, it is hard to compare data from different servers. Logs are set to record different things, and some Web pages are much more complicated than others that have the same or equivalent content. Thus comparing different measures of online activity is of necessity like comparing apples, oranges, pears, bananas, and onions. Some of the difficulties of such comparisons can be avoided by concentrating on rates of growth. If online information access is growing much faster than usage of print material, it will eventually dominate.
In spite of problems inherent in measuring online activity, it is obvious by most measures that Internet is growing rapidly. Typical growth rates, whether of bytes of traffic on backbones or of hosts, are on the order of 100% per year (Odlyzko, 2000; Coffman and Odlyzko, 1998). When one looks at usage of scholarly information online, typical growth rates are in the 50 to 100% range. For example, Table 2.1 shows the utilization of the online resources of the Library of Congress. Growth, in terms of bytes transmitted was over 100% per year for three years before decreasing to 90% in 1998, and then decreasing further in 1999, to 38%. It then increased to 62% in 2000. Table 2.2 shows downloads from the AT&T Labs - Research Web site, at http://www.research.att.com/, which contains a variety of papers, software, data, and other technical information. The growth rate there in the number of requests has been around 50% per year for several years, but between 2000 and 2001, it jumped to over 120%.
Some measures of electronic information usage are showing signs of stability, or even decreasing growth. For example, Table 2.3 shows utilization of Leslie Lamport's page devoted to material about a logic for specifying and reasoning about concurrent and reactive systems. Usage had been pretty stable in 1996 through 1998. When I corresponded with him about this in 1999, he thought usage had reached a steady state, with the entire community interested in this esoteric technical subject already accessing the page as much as they would ever need to do. However, the final counts for 1999 and 2000 showed substantial increases.
The next few sections discuss data about several online information sources that are freely available on the Internet.
2.8 Electronic journals and other organized databases
Some reports are already available on the dramatic increase in usage of scholarly information that is easily available. Traditionally, theses and dissertations have been practically invisible, used primarily within the institution where they were written, and even there, they were not accessed frequently. Free access to digital versions is now leading to an upsurge in usage, as is described in McMillan et al. (1999).
In the remainder of this section, even though it is not fully justified, I will equate a full article download with a reading as measured by Don King and his collaborators.
The entire American Mathematical Society e-math system was running at about 1.2 million "hits" per month in early 1999. The Ginsparg archive (arXive) at Los Alamos was getting about 2 million hits per month. The netlib system of Jack Dongarra and Eric Grosse was at about 2.5 million hits per month.
For detailed statistics on usage and growth of JSTOR, see (Guthrie, this volume). By the end of 1999, its usage was several million a month, whether one counts hits or full article downloads, and was growing at over 100% per year.
The Brazilian SciELO (Scientific Electronic Library Online) project available at http://www.scielo.br/ , started out in early 1998. It appears to be still going through the initial period of explosive growth. In January 1999, 4,943 pages were transmitted. A year later, that number had grown to 63,695. 67,143 hosts requested pages in 1999, so it was not just a small group of users who were involved. It is too early to tell about how fast it will continue to grow, but it seems worth listing this project to show that even the less industrialized countries are participating in making literature freely available.
Paul Ginsparg's arXive had about 100,000 papers in early 1999, and was running at a rate of about 7 million full article downloads per year. Thus on average each article was downloaded about 70 times per year. These download statistics were just for the main Los Alamos server. If we assume that the more than a dozen mirrors collectively see as much activity as the main server, then we get a download rate of about 140 times per year per article. This is misleading, though, since it mixes old and new papers, which have different utilization patterns.
If we look at download activity for arXiv articles as a function of time, we find that on average an article gets downloaded around 150 times within one year of its submission, and then 20 to 30 times a year in subsequent years. In particular, even articles submitted around 1991 get downloaded that often. Since this again covers just the main server, we probably should again multiply these numbers by two to get total activity. If we do that, we get into the range of readings per article that established journals experience. The pattern of usage differs from that observed by King and other for printed journal articles. Those are read primarily in the six months after publication, and then the frequency with which they are accessed decreases.
The Electronic Journal of Combinatorics published about 200 articles by early 1999, and had about 30,000 full article downloads from its main site during 1999. That is an average of 150 downloads per article. Multiplying that by two to account for the many mirror sites again gets us to about 300 downloads per article per year. Data about distribution of downloads with time is not available.
The general impression from the statistics quoted above is that articles in electronic archives and electronic journals may not yet be read as frequently as printed journal articles, but are getting close. On the other hand, some sources appear to be used much more frequently online than they would be in print.
Additional evidence that online access changes scholars' reading patterns is provided by First Monday, "the peer-reviewed journal of the Internet," at http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/ . Issues are made freely available on the first Monday of each month. First Monday started publication in May 1996. About 3,600 people subscribe to the e-mail notification service.
First Monday has provided me with access to the logs of their U.S. Web server from January 1999 through February 2000.This is not sufficient for a careful statistical study, but some interesting patterns can be discerned in the data.
Over this period, the number of full paper downloads has grown from a range of 50,000 to 60,000 per month in early 1999, to between 110,000 and 120,000 per month in early 2000. Distinct hosts requesting articles have increased from between 12,000 to 15,000 to over 20,000 each month. Thus the growth rate of requests has been close to the 100% that has occurred frequently on the Internet. Since there are only 3,600 subscribers, this suggests many others learn of the material through word of mouth, e-mail, or other methods.
In a typical month, the largest number of downloads is to articles from that month's issue. In subsequent months, accesses to an issue drop in a pattern similar to that found by Don King in his studies of print journals. Half a year later, downloads are usually down to a quarter or even a sixth of the first month's rate. At that stage, though, the story changes. Whereas for print journals, usage continues to decrease with time, for First Monday it appears to increase. For example, there were 9,064 full article downloads from all the 1997 issues in February 1999, and 19,378 in February 2000. Thus accesses to the 1997 issues kept pace with the general growth of usage. Of the articles that were most frequently downloaded in 1999, 6 of the top 10 had been published in previous years. This supports the thesis that easy online access leads to much wider usage of older materials.
My personal Web page, which was at AT&T until August 2001, and is now at http://www.dtc.umn.edu/~odlyzko/doc/internet.size.pdf, has also seen rapid growth in usage. However, it is hard to discuss growth rates meaningfully in a short space, since most of the growth came from new papers in new areas. Instead, I will discuss the usage patterns that I have observed.
During January 2000, there were 10,360 hits on my home page from 1,808 hosts, excluding .gif files, and hits from obvious crawlers. Most of these 1,808 hosts only looked at various index files. If we exclude those, as well as the ones that downloaded only my cv or only abstracts of papers, we are left with 656 hosts that downloaded 1,198 full copies of articles. Of those 656 hosts, 494 downloaded just a single paper. Many of those 494 requested a specific URL for an article as opposed to looking at the home page for pointers, and then disappeared. Thus on average the people who visited my home page seemed to know what they were looking for, got it, and moved on.
Visitors to my Web page were remarkably quiet in the face of some obvious faults. Many of the papers posted on that page, especially old ones, are incomplete, in that they are early versions, and usually do not have figures that are present in the printed versions. Still, that occasions few complaints. For example, in 1999, a posting to a number theory mailing lists resulted in 152 downloads of a paper in the space of less than two weeks. However, only one person complained about the lack of figures in the Web version, even though they are very helpful in visualizing the behavior shown in the paper.
Another anecdotal piece of evidence demonstrates what happens on the Web. Several times people have told me they were glad to meet me, as they had read my papers and benefited from them. Conversation showed that they indeed were familiar with the papers in question. However, they also told me that they had lost the URL, and would I please remind them where my home page was? Even though finding my home page on the Web is easy, since my name is not a particularly common one, they obviously did not find it necessary to bother doing so. This, as well as the lack of complaints regarding incomplete papers, suggests a world of plenty. People are guided to Web pages by a variety of cues, get whatever they can from those pages, and move on to other things. It is not a world of a few precious treasures with no substitutes.
The importance of making material easily available was demonstrated in a very graphic form when I made .pdf versions of my technical papers available in April 1998. There was an immediate jump in the rate of downloads. Prior to that, mathematical papers were available only in .ps and .tex formats, and the ones on electronic publishing and related topics in .ps and straight text. Most PC owners do not have easy access to tools for reading .ps papers, and were apparently bypassing the available material that required extra effort from them. This is similar to observations of Academic Press and the American Institute of Physics (Luther, 2001) that better interfaces lead to higher usage.
The temporal pattern of article usage on my Web page shows the behavior that was already noted for arXiv and for First Monday.After an initial period, frequency of access does not vary with age of article, and stays relatively constant with time, after discounting for general growth in usage.
There is more evidence that easy online access leads to changes in usage patterns. For example, downloads from my home page go to a variety of sources all over the world. Some are leading to email correspondence from places like Pakistan, the Philippines, or Mexico. This is not surprising in itself, since those countries do have technically educated populations that are growing. What is interesting is that this correspondence predominantly refers to papers that have been downloaded electronically, and to copies of older papers that are not available in digital form, and which the requesters had learned about from my home page. This does suggest strongly that easy availability is stimulating interest from a much wider audience. This conclusion is also supported by similar observations concerning correspondence with people in industrialized countries. Much comes from people outside universities or large research institutions that have good libraries and who would be unlikely to read my papers in print.
In a small fraction of cases the referrer field on requests shows where the requester found the URL. In many cases, such requests come from reading lists in college or graduate courses.
As a final note, spikes in usage often occur when one of my papers is mentioned in some newsletter or discussion group. For example, Bruce Schneier publishes CRYPTO-GRAM, a monthly email newsletter on cryptography and computer security, with a circulation of about 20,000. In early August 1999, CRYPTO-GRAM mentioned a recent preprint of mine which I had not advertised much, and which was about to appear in a regular print journal. Over the next two weeks over a thousand copies were downloaded. I am convinced that this is a higher figure than the number of times the printed version will be read.
The CRYPTO-GRAM example as well as those of other visits to my home page suggest that informal versions of peer review are in operation. A recommendation from someone, or a reference in a paper that the reader trusts, all serve to validate even unpublished preprints. Scholars pursue a variety of cues in selecting what material to access.
2.9 New forms of scholarly communication
A popular destination on the AT&T Labs - Research Web server is my colleague Neil Sloane's On-Line Encyclopedia of Integer Sequences, accessible from his home page, at http://www.research.att.com/~njas/. In January 2000, it attracted more than 6% of all the hits to the AT&T Labs - Research site. This "encyclopedia" is a novel combination of a database, software, and now also a new online journal. The integer sequence project enables people to find out what the next element is in a sequence such as
0, 1, 8, 78, 944, 13800, 237432, ...
This might seem like recreational mathematics, but it is very serious, as many research papers acknowledge the assistance of Sloane's database or, in earlier times, his books on this subject. It serves to tie mathematicians, computer scientists, physicists. chemists, and engineers together, and stimulate further research. It represents a novel form of communication that could not be captured in print form.
Another popular site that is also a locus of mathematical activity is Steve Finch's "Favorite Mathematical Constants" page at http://www.mathcad.com/library/Constants/ . It also shows rapid growth in usage although one that is harder to quantify, since monitoring software was changed less than a year ago, so comparisons are harder to make. Just as with Sloane's integer sequence page, it is becoming a form of "portal" to mathematics, one that does not fit easily into traditional publications models.
2.10 Conclusions and predictions
Many discussions of the future of scholarly publishing have been dominated by economic considerations. Digitization has often been seen as a solution to the "library crisis," which forces libraries to cut down on subscriptions. So far there has been little effect in this area, as pricing trends have not changed much (Odlyzko, 1999).
In the long run it has been clear that print will eventually become irrelevant, aside from any economic pressures, as it is simply too inflexible. Gutenberg's invention imprisoned scholarly publishing in a straitjacket that will be discarded eventually. However, the inertia of the scholarly publishing system is enormous, and so traditional journals have not changed much. They are in the process of migrating to the Web, but operate just as they did in print. However, we are beginning to see new ventures that will lead to new modes of operations. Still, it will be a while before they become a sizable fraction of the total scholarly publishing enterprise.
The large majority of scholarly publications are likely not to change much for several decades. However, there will be growing pressure to make them easily available. In particular, scholars are likely to press ever harder for free circulation and archiving of preprints. The realization will spread that anything not easily available on the Web will be almost invisible. Whether they like it or not, scholars are engaged in a war for the eyeballs and ease of access will be seen as vital.
Ease of access is likely to promote the natural evolution of scholarly work. There will be more interdisciplinary research, and more survey publications. Some of these trends are beginning to appear in the data discussed in this paper, and we are likely to get more confirmation in the next few years.
† I thank Steve Finch, Paul Ginsparg, Jim Gray, Eric Grosse, Kevin Guthrie, Stevan Harnad, Steve Heller, Patrick Ion, Don King, Kevin Kiyan, Greg Kuperberg, Leslie Lamport, Steve Lawrence, Carol Montgomery, Gary Mullen, Ann Okerson, Kimberly Parker, Robby Robson, Carol Tenopir, Ed Valauskas, Hal Varian, Tom Walker, and Herb Wilf, for providing comments, corrections, and helpful information.
1. For circulation figures for major research libraries in the U.S., see Association of Research Libraries, Statistics and Measurement Program (http://www.arl.org/stats/index.html).
5. For more evidence, see also Klopfenstein (1989) and the references there.
8. See endnote 10 to Chapter 2 of Buckland (1992) for references.
10. A summary is presented in King and Tenopir (this volume).
11. The University of Wisconsin study is available at http://wendt.library.wisc.edu/archive/journals/costben.html
12. This page is available at http://research.microsoft.com/users/lamport/tla/tla.html.
15. For an account of the project, see Sloane's recent paper Sloane (1998).