Economics and Usage of Digital Libraries: Byting the BulletSkip other details (including permanent urls, DOI, citation information)
This work is protected by copyright and may be linked to without seeking permission. Permission must be received for subsequent distribution in print or electronically. Please contact firstname.lastname@example.org for more information. :
For more information, read Michigan Publishing's access and usage policy.
1. Stakeholders and successful digital transformation of the research library
Libraries and library collections are experiencing transformation. Some aspects of this evolution are gradual, but others are jarring. The digital transformation has been a critical catalyst within publishing and scholarly communication. Before the 1990's library collections were predominantly in print-on-paper formats with modest investment in proprietary citation databases or other electronic reference works. Network-accessible content, of any type, was rare. In the relatively short intervening time, the quantity of popular, reference and scholarly information in digital format exploded, and an increasingly substantial share of library budgets is devoted to acquisition, processing and management of digital resources.
In addition, libraries now make most of their digital content accessible to patrons over the Internet. In less than a decade nearly every library in the developed world has become "Internet enabled." A first step was often to make the existing catalogue accessible to patrons via the Internet. This extrapolation of existing catalogue functions to electronic databases created worldwide access to information about library holdings, further enhancing resource sharing and creating opportunities for libraries to re-conceive labor-intensive operations. Coupled with the availability of online journal indices, these developments set the stage for change both internal and external to the library. Lynch (2000) notes that during this period libraries played a key role in introducing information systems and the use of technology for information access to a campus audience, particularly in non-scientific disciplines.
The development of network-accessible digital collections generated more dramatic impact. The capabilities for creating and distributing highly functional electronic content have empowered users for new types of inquiry, and libraries for fuller engagement in content creation, management, and use.
The ongoing transformation is pronounced within both for-profit and not-for-profit research libraries. The vast majority of scholarly journal publications and reference tools are now available in electronic form. While the Association of Research Libraries' (ARL) data indicate library acquisitions budgets have lost significant purchasing power, largely due to high price inflation for journals, overall library investment in electronic content has risen dramatically in the last several years (ARL 2001). ARL figures for 2001-02 indicate an average 19.6% of library collections budgets is spent on electronic resources, a fivefold increase since 1992-93. Not surprisingly, interlibrary borrowing is rising as budgets are squeezed between inflation and the growing availability of new electronic publications. The increased expenditure for electronic resources is driven in no small part by user demand, and the results are evident in usage. As Guthrie (this volume) notes, when scholarly materials are Internet-accessible and full-text searchable, the number of user accesses is many multiples higher than in traditional print environments. The rapid transformation of library collections is an example of institutional transformation, with a broad definition of "institution." During a period of profound transformation, the patterns and forms of interactions among an institution's many stakeholders will necessarily change. The path of transformation and its success crucially depend on the way in which these stakeholder interactions develop. We develop our thesis in this chapter. The authors in the rest of this volume study institutional transformations with an emphasis on specific stakeholder interactions. In particular, the authors focus on economic interactions (between stakeholders such as publishers and libraries, authors and publishers, and libraries and employees, among others) and access or usage interactions (between readers and libraries).
1.1 Institutional transformation
We use "institution" in its social construct sense: "a custom, practice, relationship, or behavioral pattern of importance in the life of a community or society." Institutions, such as the institution of marriage, are complex social constructs that arise and are shaped in response to social needs, constraints and balances of power among stakeholders. In this sense, research libraries as a whole are an institution; an individual library is an instance of the social institution. Institutions generally provide stability because they do not easily change quickly, although a particular instance of an institution, like a particular marriage, may quickly change.
To illustrate by example, some institutions that bear familial resemblance to research libraries are K-12 education, the bookstore industry, and public parks. Each requires a flow of funds; each engages the interests of multiple parties; each is familiar to its participants and more or less uniform across different instances. Each is designed or emerges to serve a mixture of needs, subject to the social constraints and power relationships. Each institution has numerous individual instances (e.g., individual bookstores).
When there is significant change in the needs or constraints or power relationships on which an institution is based, the practices and behavioral patterns embodied in the institution may no longer effectively serve the new configuration of needs, constraints and power relationships. In the face of such change in external conditions, institutions adapt and change. Institutions, because they comprise a web of social conventions, practices and structures, adapt slowly. Slow adaptation is usually suitable because the foundational conditions (needs, constraints and power relationships) change slowly. However, during times of rapid change in needs and constraints, slow institutional adaptation may lead to frictions and stakeholder frustrations.
The institution of the public access library emerged slowly as needs and constraints changed. The development of the university-based research library grew out of the monastic scholasticism of the early Enlightenment and Renaissance. This development was an early adaptation to changes in power and user needs: the declining power of the Church and the growing importance of secular study. Arguably the first university research library was at Oxford University, founded by Thomas Cobham, Bishop of Worcester, in 1320. In 1598 Thomas Bodley set himself to restoring and opening the library to students; in 1602 the famous Bodleian Library at Oxford officially opened (Bodleian Library (2007)).
The great modern national libraries such as the British Library and the Library of Congress were at least in part a transformation in response to the growth of the bourgeois class and the emergence of populism. These libraries had their origins in mandatory deposit laws of the 17th century, and in the donation of royal collections to the citizenry in France and elsewhere during the 18th and early 19th centuries. The familiar system of multiple, distributed, small-to-medium public libraries with open access to all local citizens developed in the latter half of the 19th century, with substantial help from Andrew Mellon and other benefactors.
Similarly, the history of the scholarly research journal reflects a gradual evolution from the communication mission of the scientific societies founded in the 17th century, to the 18th century emergence of journal articles as a mechanism for registering ownership of discoveries. In the 19th century, journal publications became more closely associated with professional standing (Schauder, 1994).
This three-century evolution of journals illustrates the gradual shaping of community practices as new vehicles for communication emerged and new practices were adopted.
The practice or behavior pattern that defines the research library as an institution comprises a web of interactions between various stakeholders each with distinct interests in the collections of a library. These stakeholders include authors, publishers, librarians and users. Each has an interest in and participates in the processes of creation, publication, distribution, acquisition, organization, archiving, and usage. This web of interacting participants and processes shapes both the research library's collection and the success of the library in meeting the expressed needs of users over time. It is this interdependent, interactive context—the ecology of libraries—that frames this book.
1.2 The Ecology of Libraries: Transformation in Context
The research library is embedded in the broad social systems of information flow management: the production, distribution and use of information resources by members of society. This iterative process among stakeholders is the foundation of knowledge creation and transmission. As the information technology revolution has proceeded over the past three decades or so, all institutions participating in information flow management have been deeply affected. Consequently, the set of social needs, constraints, and power relationships that are served by the research library have changed due to developments in information technology.
Simultaneously, new technology for information access and analysis enabled the development of new research methodologies and discipline interests. The changes in needs and capabilities have, in turn, stimulated pressure for institutional change in values and behavioral norms. For example, new venues and methods for documenting and disseminating research have gradually become incorporated in practice, as the values associated with academic tenure have adapted to these opportunities. These interdependent forces have had an impact on individuals, on disciplines, and on libraries.
The information technology revolution has affected all library types: public, K-12, academic, and organizational or special including corporate and not-for-profit research libraries. The "information technology revolution" is a vague moniker for a vast array of technological innovations and their rapid application throughout society. Understanding the effects of the revolution depends on understanding two fundamental facts that have driven substantial change: the exponential rate of decline in the costs of digital computation (silicon, i.e., microprocessors), and of digital communication (sand, i.e., fiber optics).
Gordon Moore of Intel predicted in 1965 that the power of microprocessors available at a given cost would double about every 18 months (Moore, 1965). The equivalent form of Moore's law is that the cost of a given amount of microprocessor power would be cut in half every eighteen months. Moore's prediction has held for thirty years. There have been a number of astonishing technological and industrial advances over the past several centuries, but there is probably nothing that matches this rate of improvement. To give an idea of the power of this exponential cost decline, consider a luxurious house that cost $100,000 to build in 1970. If housing costs had followed Moore's law, then this mansion would cost only $1.67 to build in 2003. Imagine how different the world would be today if mansions cost only $1.67!
Digital communications technology experienced similar cost declines. In 1994 MacKie-Mason and Varian (1994) documented that digital communications costs had decreased 30% annually for the previous thirty years and, if anything, the rate has accelerated in the past decade.
The cost collapse in computation and communication fostered the rapid development and deployment of powerful networked information technologies. The explosion of digital networking, cheap storage, and powerful computation and display devices has radically changed the constraints, stakeholder needs, and power relationships on which the institution of research libraries is built.
In addition to these dramatic cost/performance improvements in computational power and communication, several concurrent trends have shaped the landscape in which libraries interact with other stakeholders. The development of standards for creating and protocols for sharing digital content have stimulated electronic publishing and associated systems of access. As a result, we've seen the rapid development of large-scale repositories of electronic publications by major publishers, and increasingly sophisticated retrieval systems and tools.
The community of information "creators" for scholarly or research content include individuals, the academy, and other research investors such as for-profit companies with their own research labs. These stakeholders are crucially interested in the values-based set of social practices defining intellectual property and its ownership. These values have been evolving both as modern societies depend increasingly on intellectual capital for wealth creation, and as digital information technology leads to changes in distribution, collection and information management practices. Institutional adaptation to changes in these social and technological forces has created significant volatility within the world of scholarly communication.
For example, the academy has been revisiting and re-conceiving institutional intellectual property policies, often prompted by explorations of new distance-independent venues for courses (Knight Higher Education Collaborative, 2002). At the same time legislatures have been modifying copyright law. Although largely driven by mass-market commercial interests, the legal changes have important implications for scholarly communication. In response to these stakeholder pressures for institutional change, some universities and other organizations have boldly launched new services for managing and disseminating information.
A related trend is the emergence of open paradigms, which are realized as codifications of principles and social norms for interactions between creators and other stakeholders. These developing norms also result from the interplay of changing values, technologies, and social needs. Just as the open software movement is defined by collaborative development, programs such as the Open Knowledge Initiative (for sharing learning resources), Open Archives Initiative (for sharing research content), and the Open Access movement for journals embrace collaboration and a more open exchange of goods and services. These initiatives are challenging institutionalized practices surrounding the flow of information.
Much has been written about the transformation of publishing. Many authors focus on the new opportunities for dissemination introduced by the capability for self-publishing on the Web, but most of the authors in this book focus on formal publication systems. Formal publication involves a distinction between content creator and publisher, and the use of independent reviewers. Thus the institution embodies norms and routines for interactions between these separate stakeholders.
The early 1990's saw a number of significant experiments among publishers (e.g., Elsevier Science's TULIP project, the predecessor of the PEAK project discussed in several chapters in this book). These experiments typically focused on methods for creating and distributing electronic versions of print journals. Gradually capabilities for linking, more complex search functions, and customization options emerged. These tools were increasingly important to users as the volume of electronic content grew through conversion of older literature and the aggregation of current titles.
While this extrapolation from print to digital production and distribution represented a significant development evident among commercial and non-profit publishers alike, several concurrent and alternative development paths took shape. Inflation in publication prices and concerns about constraints on rights for use and re-use prompted encouragement of alternative models, often within a non-profit context. Similarly, the open access movement to free up access to publications spurred new models for managing rights and supporting costs.
As one recent market analysis of the journal publishing industry reported, initiatives to launch alternative publication vehicles face significant obstacles:
Libraries and academics have been trying for over a decade to develop new ways of disseminating academic knowledge and research, but the barriers to entry enjoyed by the incumbent journals are just too high (loyal readership, brand recognition, 'boards' of academics who peer review research), as are the value proposition [sic] (they bring order to an anarchic process—the development of knowledge) (Morgan Stanley, 2002).
The picture would not be complete without discussing changes in individual user behavior and expectations. Several recent surveys report on the changing dimensions of user activity. The user base for libraries is expanding and the demand for instruction in use of library resources is also increasing. Yet there has been a downturn in circulation of physical collections and use of in-library reference services (Kyrillidou and Young, 2003). While the majority of students and faculty are using online library content, they report that they still desire a hybrid environment with both print and electronic collections (Friedlander, 2002). As the volume and complexity of electronic publications increase, users are also expressing a desire for greater personal control in managing access to electronic content (Cook et al., 2003).
Individual user preferences are strong forces for change, but do not represent the whole picture. Community practices and preferences within specific disciplines are also potent forces, and each discipline community has responded differently to these new opportunities for communicating and documenting research. Traweek's (1988) anthropological analysis of life among high energy physicists captures the culture and practices of this community and depicts the social conventions that enabled the early and extremely rapid adoption of e-prints. In economics, by contrast, the first significant non-commercial e-print site started in 1993, but of the 2500 papers submitted to date, nearly half were only submitted in the last two years. More recently behaviors among authors and editors within ecology have been analyzed to understand the decision processes that lead to publishing in electronic journals in that field (Hahn, 2001).
The changes associated with technology and community norms and values have been sufficiently radical that we should expect to see major transformations of the institution of the research library. Since institutions by nature do not adapt quickly, a period during which the foundations are so quickly transformed might be compared to an earthquake that causes cracks and damage as the institution responds to the quake. When a stable institution experiences increasing pressure to change rapidly, understanding the changes in the interactions between stakeholders is especially important.
The Economics of Libraries
An important contradiction has arisen from the fact that the information technology revolution has been driven by cost reductions, but the costs of research libraries have been increasing during this period. Many observers (and budget administrators) expected a decrease in research library spending over the past decade. In fact, although the costs of the technology have decreased, these constitute only a fraction of the inputs to the research library. The costs are high for creating, adopting, implementing, maintaining and managing new information systems that rely on networked information technology, in part because these activities depend largely on human labor, not silicon or sand. Meanwhile, during the transformation, the older systems must still be staffed and maintained. Overall, the costs of transformation have been added to the ongoing costs of the institution, and the total cost is correspondingly higher.
The seeming paradox of lower input costs but higher total costs is no paradox at all. It follows directly from the nature of institutional transformation. The change in library constraints reflects a decrease in the costs of some inputs to a stable, functioning system. However, adapting to this change in constraints—undertaking the transformations to reach the new stable system—is itself costly, and during the transformation total costs can be much higher. The transformation of the USSR from socialist to market economy is a colloquial example. The institutions of a market economy are likely to be much less costly and more efficient, but the transformation from one to the other is costly and slow.
1.3 Understanding the transformation
Research libraries are experiencing rapid, simultaneous shifts in constraints, needs, and power balances. Institutions develop slowly in response to such powerful, durable social forces, and rapid adaptation to changing forces is unlikely. Thus, during any period of intense change, institutional adaptations may appear to be falling behind, and there will be many false steps. We can potentially lower the social cost of rapid—and thus, wrenching—institutional transformations by understanding the interactions between the stakeholders and the process of change in those interactions.
The interactions between human stakeholders in an institution are loosely-coupled interfaces in a complex social system. That is, a signal on one side of an interaction typically elicits a response from the other side, but the mapping between signal and response is imperfect: a given signal may elicit a range of different responses at different times and locations.
As an example of one such loosely-coupled interface common in the research library, consider the practice of interlibrary loans which is used to share access to collections. If a user at one library wished to read a print-on-paper document held in the collection of a different library, she could request that the distant library deliver the document for local use. Because the distant library owns the copy of the document in question and an unrestricted right to let users read that particular copy, the library can deliver the physical document without involving the publisher.
Sometimes, however, the distant library would prefer to deliver a facsimile of the document to the user, so as to retain its copy for its own users, or to reduce the risk of loss or damage. Typically, a publisher holds the copyright on the document, and thus the user-library interaction (the request for a facsimile) could invoke an interaction between library and publisher: "can we make a copy to deliver to the requesting user?" Because this interaction between library and publisher is likely to occur many times, libraries and publishers have reached pre-arranged agreements (claiming fair use exemptions or relying on agreed upon guidelines). Such agreements—the result of an interaction between at least two interested parties—specify the terms and conditions under which the library may create a facsimile and deliver it to the requesting user.
These are only two of the results that the originating signal—the user's interlibrary loan request—might invoke. The results might differ for the user in various ways; for example, if the original document is delivered, the user normally will be required to return it within a fixed period; if a facsimile is delivered, the user typically retains the copy. Another variation is that in some cases a requested facsimile might be created by the distant, owning library, whereas in other cases the requesting library might purchase a facsimile from a document delivery service (another interaction between two stakeholders); in such cases the conditions for the user (for example, whether the document must be returned or whether a fee is assessed) might be different. Thus, the interlibrary loan process is a loosely-coupled interface between two stakeholders (user and library): the request might or might not invoke further interactions between other stakeholders, and the response to the request might take on a variety of forms.
With the development of networks and digital collections the interlibrary loan process faces substantial pressure to change. In the print-on-paper world a loan might be delivered in several days to several weeks. In a networked digital world users put much higher priority on quick, even immediate delivery. Libraries are likely to find that the costs of digital reproduction and networked delivery are much lower than print-on-paper costs, and thus want to implement electronic delivery in response to the loan request. However, because electronic delivery typically requires the making of an electronic copy, this response is likely to invoke a new interaction between library and publisher: "can we make a digital copy and electronically deliver it to the requesting user?" The results of these ancillary interactions may be, and often have been, quite different than the request for permission to deliver print-on-paper facsimiles in response to interlibrary loan requests. The publisher may impose different terms and conditions, or may even refuse permission for electronic interlibrary loans.
What does this example suggest in terms of the interplay that ensues between stakeholders? As we noted, the library and publishers typically have to deal with many interlibrary loan requests, and thus have generally found it efficient to establish pre-arranged agreements that cover most such requests. This routine speeds the process and lowers the transaction costs. As research libraries increasingly rely on digital collections, pressure for electronic interlibrary loans will become frequent, and thus considerable time and transaction costs will be saved if libraries and publishers can work out new pre-arranged agreements to cover these situations. However, the changes in technological costs and capabilities, and the concomitant changes in the interests of the stakeholders, mean that agreements to cover print-on-paper interlibrary loans may not be satisfactory to all stakeholders, and a period of trial, error and negotiation may be necessary. The sooner participants understand the ways in which the constraints, costs and interests of stakeholders have changed, the sooner and better they can adapt the institution by agreeing on new standardized agreements covering electronic interlibrary loans.
The emergence of digital collections as an instance of institutional transformation guides us to look for particular types of problems that arise as the institution adapts to the information technology revolution. We have selected the contributions to this volume to focus on two questions we think will be central.
The first is who will pay to create and sustain digital materials? The information technology revolution has made large-scale digital collections technically feasible, and many of the stakeholders have increasingly demanded digital collections. The creation of materials to include in such collections is a necessary first step, and despite the reduction in the costs of some key inputs, the digital production process is costly. Therefore, who will pay? Various stakeholder interactions in a print-on-paper research library world involve financial transfers to pay for creation services. For example, some authors pay page charges to print publishers, and libraries pay to purchase printed books and to subscribe to printed journals. Among the stakeholders (authors, publishers, libraries, readers, and perhaps others), who will pay for digital material production? The constraints and needs (and perhaps power relationships) have been changed by information technology, and thus we should expect that the financial interactions between stakeholders will also change as the institution adapts. Perhaps we will see more author payments (recommended by some open access journals), or more charitable foundation support for digital publishing (JSTOR was started by the Andrew W. Mellon Foundation; the Public Library of Science has initiated production with a grant from the Gordon and Betty Moore Foundation). Perhaps funding will continue to flow primarily from libraries and individual readers, but the structure of payments is likely to change because of the differences in functionality between electronic and paper publications.
The model of institutional transformation leads us to a second question: How will readers use digital collections? Will users go to physical libraries and interact face-to-face with librarians, or communicate electronically and access documents from their desktops or wireless PDAs? How will user mobility affect the roles of libraries? Will usage increase for stakeholders who had less convenient access to print documents (e.g., students and faculty at smaller colleges, or scientists in developing countries)? Will users do more browsing using electronic search tools and less full-document reading? How will the relative usage of a document's several information features change? For example, when it is easier to find and search documents, will there be more demand for specific elements, such as the data tables or bibliography, and less for the text?
We organized the remaining chapters around these two fundamental questions. In the first section we present several chapters on a major field research project that simultaneously addressed collection building, publisher pricing models, library collection decisions, and usage: the PEAK (Pricing Electronic Access to Knowledge) project. It was our work on this project at the University of Michigan that led us to invite other authors to join us to create a volume that encompasses publishing economics and usage during this era of the digital transformation of libraries. The two organizing questions above were also the framing context for PEAK: Who will pay for digital materials? How will they be used?
In the second section the authors address digital publishing. They focus on publishing costs and the distribution of those costs to various stakeholders in the institution. In the third section we collect several essays on experiences in building and using digital collections. The authors emphasize user needs and the effect that digital collection usage has on building and maintaining the collections. Many of these chapters report on a specific project at a particular time, but they all shed light on the dynamics of stakeholder interactions and the role of these interactions in shaping the transformation of the encompassing institutions.
There are several lessons that are driven home by their common emergence in the chapters of this book. Regarding payment for materials, a variety of economic interactions exist in the social institution of research libraries. In the past century, most of the flow of resources has been to publishers from users (individuals and libraries) through subscriptions or society membership fees. The studies in this book demonstrate that a variety of alternative mechanisms are feasible in theory and in practice, and that there are reasons to expect some transformation of these economic interactions in response to the digital revolution. The PEAK project focused on interactions between users and publishers, but experimentally tested on-demand document delivery and found that a small but non-trivial number of scholars were willing to purchase immediate access to articles using a credit card. PEAK also tested a novel, generalized subscription scheme that is only feasible in a digital, networked collection. Other chapters consider quite different configurations of the institution, including proposals and projects based on the open paradigm, and models in which author fees to publishers primarily finance the publication functions.
Regarding the ways in which digital collections are used, several chapters show that readers want and will use older scholarly material, not just new publications. Several successful projects have focused on creating digital collections of archival material. In the case of unique collections, accessibility has greatly increased because users no longer need to physically travel to the collection. Perhaps more surprising has been the dramatic levels of usage for older scholarly publications made available by JSTOR. These publications to a large extent are available in print in the users' local research libraries. When made available over the network and digitally searchable, however, usage far exceeds all estimates of usage for the print-on-paper collections of scholarly journal back issues.
We also have three chapters that report on user and librarian experiences in traditional university and corporate libraries that made early investments in large-scale digital collection projects. One of the most striking findings is that the interactions between librarian and user are changing; users need different types of services from information professionals, in many cases delivered through different types of interactions. Thus, the human production systems embedded in the institution of research libraries must adapt to the technological changes, just as the collections themselves—and the economic models that support them—must adapt.
We are still in the early stages of the digital transformation of libraries, and more broadly, the social institutions of information creation, distribution, and retrieval. In this book we present the reflections and findings of leading scholars and practitioners who have been in the front lines of the transformation. We think the evidence is compelling that successful adaptation to the forces of change require an understanding of stakeholder interactions as the many participants in the institution negotiate and navigate the details of the transformation.
1. Kyrillidou and Young (2003) report an average annual increase of 7.7% since 1986, well ahead of general inflation.
3. After the British burned Washington and the original Library of Congress during the War of 1812, the new Library of Congress collection was established when Thomas Jefferson sold his magnificent collection of nearly 7000 books.
4. See Hedstrom and King (2003) for an historical discussion of the emergence of the related institutions of libraries, archives and museums.
6. The median price for all houses sold in the U.S. in 1970 was only $23,000. See U.S. Census Bureau (2000), Table 1201.
7. Recent examples include the durable digital depository (DSpace) initiative from MIT and the University of California's e-scholarship program. Of course, experimental initiatives rise and fall quickly, so the landscape will look considerably different within a few short years. One initiative that appears to have succeeded is JSTOR, described by Kevin Guthrie in a chapter of this book.
8. For a discussion of the influence of open models in prompting new roles for libraries see Lougee (2002).
10. See, e.g., Suber (2002).
11. The Economics Working Papers Archive was created by Bob Parks at the University of Washington, St. Louis, and can be found at http://econwpa.wustl.edu/. 2500 papers is a very small fraction of the working papers disseminated in Economics in the past 10 years. For example, the more recent and commercial albeit not-for-profit Economics Research Network now hosts over 36,000 full-text economics working papers.
12. In actual practice, interlibrary loans involve interactions among three parties: the individual requestor, originating library and lending (distant library). For this example in these three paragraphs we simplify, treating the loan as a direct transaction between the borrower and the distant library.
13. As an example, when a facsimile is made on paper, the quality is necessarily less than in the original document, and this degradation is compounded as copies are made of copies. This fact, among others, limits the extent to which the print-on-paper system could be abused to implement large-scale republishing of a document to avoid copyright payments to the publisher or author. Digital copying, on the other hand, generally maintains perfect fidelity. Thus, publishers are concerned that a digital copy delivered to a user at a different institution might be used to create multiple copies redistributed without associated compensation.
14. Of course, most of the payment to authors for the creation of their works has been through their employers: universities and the owners of government and private research labs. This bifurcation of payments, with one stream from users to publishers, and another stream from sponsors to creators, is a bit unusual, and certainly quite different than in commercial or popular publishing. However, albeit fascinating, in this book we have not addressed the economic interactions between information creators and their sponsors.
2. The Rapid Evolution of Scholarly Communication[†]
Traditional journals, even those available electronically, are changing slowly. However, scholarly communication is rapidly evolving to electronic formats. In some areas, electronic versions of papers are being read about as often as the printed versions. Although there are serious difficulties in comparing figures from different media, growth in the use of electronic scholarly information is sufficiently high that, if it continues for a few years, print versions will no doubt be eclipsed. Further, much electronic information is accessed outside the formal scholarly publication process. There is vigorous growth in forms of electronic communication that take advantage of the unique capabilities of the Web, and that simply do not fit into the traditional journal publishing format.
This paper presents statistics on the use of print and electronic information. It also discusses preliminary evidence about the changing patterns of usage. This evidence indicates that to stay relevant, scholars, publishers, and librarians will have to make even larger efforts to make their material easily accessible.
Traditional journals and libraries have been vital components of scholarly communication. They are evolving, but slowly. The reasons for this are discussed briefly in Section 2.2 and, in more detail, in Odlyzko (1997b). The danger is that they might be rapidly losing their value, and could become irrelevant.
At first sight, there seems little cause for concern. Print journal subscriptions are declining, but gradually. One often hears of attrition in subscriptions of 3-5% per year. For example, the American Physical Society, with high quality and relatively inexpensive journals, has seen a steady decrease of about 3% per year (Lustig, 1997). At those rates, losing half the circulation takes between 14 and 24 years. On Internet time, that is almost an eternity. Preprints in most areas are still a small fraction of what gets published. Also, library usage is sometimes reported as declining, but again at modest rates.Yet these are not reasons for complacency. Why should there be any declines at all? Ours is an information age; the number of people getting college and postgraduate education is growing rapidly, spending on R&D and implementation of new technologies is skyrocketing. Why should established journal subscriptions be dropping, and why should many of the recent specialized journals be regarded as successes if they reach a circulation of 300? Why should many research monographs be printed in runs smaller than the roughly 500 copies of the first edition of Copernicus' De revolutionibus orbium coelestium of 1543?
My conclusion is that the current scholarly information system is badly flawed, and does not provide required services. This paper presents evidence that the demand for high quality scholarly information is indeed growing, and can only be satisfied through easy availability on the Web.
Some of the early studies of electronic usage, such as Lenares' interesting 1999 paper, concentrated on faculty at leading research institutions. Change might be expected to be slow in such places. Although such scholars usually have the resources to be pioneers, they have little incentive, since they have access to good libraries. The evidence to be presented later shows that the current system neglects the needs of growing ranks of scholars who are not at such institutions. Thus, it is better to concentrate on these scholars and their usage of information that is freely available over the Internet.
Tenopir et al. (2000) does show that, among established scholars, electronic resources play an increasing role, but that current usage is dominated by traditional media. However, it is important to look at growth rates rather than absolute numbers. In an early 1999 discussion in a librarians' mailing list, somebody pointed out that, in 1998, only 20% of the astronomy papers were submitted to Ginsparg's xxx paper archive, now called the arXiv, at http://www.arxiv.org . An immediate rejoinder from another participant was that, while this was true, the corresponding percentage had been around 7% in 1995. It is growth rates that tell us what is in our future.
This paper is only a brief attempt at finding patterns in the use of online information. At the moment, we have little data about online usage patterns. This is especially regrettable since these patterns appear to be in the midst of substantial change. What we need are careful studies, such as have been carried out for print media. Although the Web in principle makes it possible to provide extremely detailed information about usage, in practice there is little data collection and analysis, especially in scholarly publishing. Even when data are collected, they are seldom released. Thus one purpose in writing the initial draft of this paper was to stimulate further collection and dissemination of usage data. The main purpose, though, was to look for patterns even with the limited data available to me, to provide a starting point for further research.
Fortunately, many new studies of electronic resources have appeared recently. In general, they do support most of the tentative conclusions of this paper, which are:
Usage of online scholarly material is growing rapidly, and in some cases already appears to surpass the use of traditional print journals. Much online usage appears to come from new readers and often from places that do not have access to print journals.
We can expect the growth of online material to accelerate, especially as the information about usage patterns becomes widely known. Until recently, scholars did not have much incentive to put their works on the Web, as this did not create many new readers. While we can expect that snobbery will retard this step ("I can reach the dozen top experts in my field by publishing in Physical Review Letters, or by sending them my preprint directly, why do I care about the great unwashed?"), the attraction of a much greater audience on the Web and the danger that anything not on the Web will be neglected are likely to become major spurs to scholars making their works available online. For example, the recent study by Lawrence (2001) shows that papers in computer science that are freely available online are cited much more frequently than others. Anderson et al. (2001) might appear to suggest the opposite, since in this study free online availability was associated with lower citation frequency. However, that result is likely anomalous, in that the freely available online-only articles in the journal under study were apparently perceived widely, even if incorrectly, as of inferior quality.
The need for traditional peer review is overrated. Odlyzko (1995) had extensive discussion of the inadequacy of conventional peer review, and how much more useful forms were likely to evolve on the Internet. That paper was written before the ascendancy of the Web. While open review and comments on published papers have been slow to take hold, online references and bibliographies are developing into a new form of peer review. People are coming to my Web page in large numbers looking for specific papers. While in almost all cases I do not know what brings them there, it is pretty clear that they are finding links to the material in a variety of sources, such as bibliographies and references on other home pages. A new form of peer review, it brings many readers even for papers published in obscure and unrefereed places.
Concerns about information overload and chaos on the Net are exaggerated. While better organization of the material would surely be desirable, people are finding their way to the serious information sources in growing numbers as is.
Ease of access and ease of use are paramount. Material on the Web is growing, and scholars, like the commercial content producers, are engaged in a war for the eyeballs. Readers will settle for inferior forms of papers if those are the ones that can be reached easily.
Novel forms of scholarly communication are evolving that are outside the boundaries of traditional journals.
These conclusions and predictions are supported by data in the rest of this paper. It does appear that while journals are not changing fast, scholarly communication as a whole is evolving rapidly.
2.2 Rates of technological change
The conventional notion of "Internet time," in which technological change is accelerated tremendously, is a myth. Rapid change does occur occasionally, and the adoption of Web browsers is frequently cited as an example. Less than 18 months after the release of the first preliminary version of the Mosaic browser, Web transmissions constituted more than half of Internet traffic. However, this was a singular exception. Cell phones, faxes, and ATM machines took much longer to spread. Even on the Internet, new systems are usually adopted much more slowly. How come IPv6 is still basically invisible? Why is HTTP1.1 spreading so slowly? How about TeX and its various dialects (which go back more than two decades)? Even at universities, e-mail took a while to diffuse. The Internet has changed much, but it has not made for a dramatic increase in the pace at which new technologies diffuse. A typical time scale for significant changes is still on the order of a decade. This was noted a long time ago: "A modern maxim says: People tend to overestimate what can be done in one year and to underestimate what can be done in five or ten years." (Licklider, 1965, p.17)
Further discussion of rates of change is available in Odlyzko (1997b), which presents many examples (such as music CDs, ATM machines, credit cards, and cell phones) supporting the thesis that consumer adoption of new technologies is slow. Thus we should not be surprised if electronic scholarly communication does not turn on a dime.
The rare rapid adoptions of new technologies (aside from unusual situation such as that of the Web) appear to be associated with the presence of forcing agents that can compel rapid change (Odlyzko, 1997b). On the other hand, sociological changes tend to be very slow, taking a generation or two.
Aside from simply observing that historically, new technologies have been taking on the order of a decade to be widely adopted, one can also build statistical time simulations that explain this time scale. For instance, we know that usage of electronic forms of scholarly information has typically been growing at 50 to 100 percent per year. This is shown in various tables in this paper. On the other hand, print usage has shown little change. Supposing that print usage remains static, from the moment electronic usage breaks the one percent threshold at which it is likely to be noticed, growth rates of 50 to 100% would only yield parity with print usage after approximately a decade.
2.3 Disruptive technologies
Clayton Christensen's book (1997) has become a modern classic. It helps explain the failure of successful organizations, such as Encyclopaedia Britannica, to adopt new technologies. The example of the Britannica, cited in Odlyzko ( 1995, 1999), is very instructive. It was and remains the most scholarly of the English-language encyclopedias. However, it could not cope with the challenges posed first by inexpensive CD-ROM encyclopedias, and more recently by the Web.
What Christensen calls disruptive technologies tend to have three important characteristics:
- they initially underperform established products
- they enable new applications for new customers
- their performance improves rapidly
Electronic publishing has these characteristics. Little material was available initially, screen resolution was poor, printers were not widely available and expensive, and so on. However, online material was easy to locate and access, and could provide novel features, such as the constant updating of the genome database. Moreover, costs, quality, and availability have all been improving rapidly. That is why direct comparisons of traditional journals or libraries with electronic collections are not directly relevant. For example, the 1998 paper by Stevens-Rayburn and Bouton is effective in demonstrating that the Web at that time could not substitute for a regular library. It still can't, even in 2000. However, that is not the relevant question.
The mainframe was not dethroned by the PC directly. The PC could not replace the big machines in areas such as payroll processing. The computing power of the mainframes sold each year is still increasing, and has been increasing all along, even when IBM was going through its traumatic downsizing in the early 1990s. It's just that the PC market has been growing much faster. The mainframe has been consigned to a small niche, and the revenues from that niche have been declining. This is a useful analogy to keep in mind. Traditional journals and libraries are still playing a vital role, but, to quote from Odlyzko (1997b), "... journals are not where the interesting action is." The real issue, to quote Stevens-Rayburn and Bouton (1998), is that "in this new electronic age, if it isn't on-line, for many purposes it might as well not exist." Further, even if it is online, it might not matter if it is not easy to access or is not timely.
2.4 Effects of barriers to use
Even small barriers to access reduce usage significantly. Statistics collected by Don King and his collaborators show that as the physical distance to a library increases, usage decreases dramatically. A recent statistical tidbit of a similar nature is the reaction of the mathematicians at Penn State when all journal issues published before 1973 had to be sent to off-site storage because of space limitations. This move was widely disliked, even though any volume can be obtained within one day. The interesting thing is that the mathematical research community of about 200 faculty, visitors, and graduate students asks for only about 850 items to be recalled from storage per year. That is just over 4 items per person per year. It seems likely (based on extrapolations from circulation figures for bound journals that are immediately available on shelves) that usage of this material was much higher when it was easily accessible in the library in their building.
When subscriptions to journals are canceled, articles from those journals are obtained through interlibrary loans or document delivery services. Some libraries (Louisiana State University's perhaps most prominent among them) have consciously decided to replace journal subscriptions with document delivery, after making a calculation of how much the journals cost per article read. While I do not have comprehensive statistics, my impression is that such moves save more than preliminary computations suggest. The secret behind this phenomenon is that usage of document delivery services is lower than that of journals available right on the spot. Having to fill out a request form and wait a day or a week reduces demand.
Librarians have known for a long time that ease of use is crucial. They experienced this with card catalogs, where materials whose catalog entries were available only in the paper card catalogs were not being used. Thus the current shift towards online usage had been anticipated.
... there's a sense in which the journal articles prior to the inception of that electronic abstracting and indexing database may as well not exist, because they are so difficult to find. Now that we are starting to see, in libraries, full-text showing up online, I think we are very shortly going to cross a sort of critical mass boundary where those publications that are not instantly available in full-text will become kind of second-rate in a sense, not because their quality is low, but just because people will prefer the accessibility of things they can get right away.
Clifford Lynch, 1997, quoted in Stevens-Rayburn and Bouton (1998)
Today, we have evidence that Clifford Lynch was correct. Note that Encyclopaedia Britannica has been a victim of this trend. Being the best did not protect it from declines in revenues, restructuring, and being forced to experiment with several business models.
The shift to online usage is exposing many of the limitations of the traditional system. Research libraries are wonderful institutions. They do provide the best service that was possible with print technology. However, in today's environment, that is not enough. Most printed scholarly papers are available typically in something like 1,000 research libraries. Those libraries are accessible to a decreasing fraction of the growing population of educated people who need them. Further, even for those scholars fortunate enough to be at an institution with a good library, the sizes of the collections are making material harder to access. Hours of availability are limited. Also, studies have shown that even when a book that is searched for is in a given library's collection, in about 40% of the cases it cannot be found when needed.
The basic problem, of course, is that it is impossible in the print world to make everything easily accessible even in the best library in the world. Space constraints mean that some material will be far from the user. In practice, most libraries can store only a tiny fraction of the material that might be of interest to their patrons. While they have been careful about selecting what seemed to be most relevant, experience shows that when easy electronic access is provided to large bodies of material not normally available in the library, there is demand for it (Luther, 2001; Bensman and Wilder, 1998). That is a major factor propelling the move towards bundling of electronic journal offerings and consortium pricing (Odlyzko, 1999).
The easy access to online resources is leading to increasing usage, as will be discussed later, and is also documented in Anderson et al. (2001), Gazzale and MacKie-Mason (this volume), Guthrie (this volume) and Luther (2001). But not all online access is equal. Many scholars use Amazon.com's search page as a first choice in doing bibliographic searches for recent books, since it is more user-friendly than the electronic catalogs of the Library of Congress, say. Luther (2001) notes, "Both Academic Press and the American Institute of Physics (AIP) noted that they experienced surges in usage after they introduced new platforms that simplified navigation and access."
Ease of use has an important bearing on pricing. Odlyzko (1995) predicted that pay-per-view was likely doomed to fail in scholarly publishing, because of its deterrent effect on usage.Publishers have now, after experiments with PEAK and other pricing models, moved to this view as well. For example, Hunter (this volume) states that
[Elsevier's] goal is to give people access to as much information as possible on a flat fee, unlimited use basis. [Elsevier's] experience has been that as soon as the usage is metered on a per-article basis, there is an inhibition on use or a concern about exceeding some budget allocation.
Similarly, Luther (2001) points out that "Philosophically, Academic Press is opposed to a business model in which charges increase with use because it discourages use."
Easy access implies not only greater use, but also changing patterns of use. For example, a recent news story discussed how the Internet is altering the doctor-patient relationship (Kolata, 2000). The example that opens the story is of a lady who is reluctantly told by the doctor she might have lupus, and leaves the clinic terrified of what this might be. She then proceeds to obtain information about this disease from the Internet. When she returns to see a different, more pleasant physician, she is well-informed and prepared to question the diagnosis and possible treatment. What is remarkable about this story is that the basic approach of this patient was feasible before the arrival of the Web. She could have gone to her local library, where the reference librarians would have been delighted to point her to many excellent print sources of medical information. However, few people availed themselves of such opportunities before. Now, with the easy availability of the Web, we see a different story.
The arguments about effects of barriers to access and of lowering such barriers suggest that scholarly communication will undergo substantial changes. We should expect to see greater use of online material. We should also see much greater use of it by people outside the narrow disciplinary areas that produce it. Much of this use will come from outside the traditional academic and research institutions, but a considerable portion is likely to come from other departments within an institution. Further, the increasing volume of material, as well as the decreasing role of traditional peer review, are likely to lead to greater demand for survey and handbook material. With lower barriers to interactions and access to specialized literature, we should also see more interdisciplinary work.
2.5 Scholarly information as a commodity
Authors like to think of their articles as precious resources that are absolutely unique and for which no substitutes can be found. Yet a more accurate picture is that any one article is just one item in a river of knowledge, and that this river is rising. Substitutes exist for almost everything. Some people interested in Fermat's Last Theorem will want, for historical or other reasons, to see Andrew Wiles' original paper (Wiles, 1995). Many others will be happy with a reference to where and when that paper was published, and others will be satisfied with various popular accounts of the proof. Even those interested in the technical details will often be satisfied with, and often be better server by, other presentations, such as that in the Darmon, Diamond, and Taylor account of the proof (Darmon et al., 1997).
Thinking about a river of knowledge instead of a collection of unique and irreplaceable nuggets helps explain why scholars manage to function even with a badly flawed information system. Even though in 40% of the cases, a desired book cannot be retrieved from a desired book cannot be retrieved from the library's shelves, usually some other book covering the same topic can be found. Spending on libraries by research universities is correlated most strongly the total budgets, and very weakly with quality. Harvard spends about $70 million per year on its libraries, verus $25 million for Princeton. Yet would anyone claim that a Harvard education or scholarly output is almost three times as good as that of Princeton?
The Internet is reducing the costs of production and distribution of information. As a result, there is a flood of material. Much is of low quality, but a substantial fraction is very good. Before looking whether scholars are using this material let us consider usage of print material.
2.6 Usage of print journals
We are fortunate to have an excellent recent survey of usage of print journals in the book of Carol Tenopir and Don King (2000). It shows that a typical technical paper is read, which is defined as not necessarily reading it carefully, but going beyond just glancing at the title and abstract, between 500 and 1500 times. These readings average about one hour in length, and in about half the cases represent the reader's first encounter with an article.
The estimate of 500 to 1500 readings per article is much higher than some earlier studies had come up with. The studies on which Tenopir and King base their estimates do have biases that may raise the reading estimates above the true value. For example, they are based on self-reporting by technical professionals, who may overestimate their readings. Further, those figures include articles in technical journals with large circulations (such as Science, Nature, and IEEE Spectrum) that are not typical of library holdings. If one considers library usage studies, such as those that have been carried out at the University of Wisconsin in Madison, one comes up with somewhat lower estimates for the number of readings per paper. Still, the basic conclusion that a typical technical paper is read several hundred times appears valid.
The studies reported in Tenopir and King (2000) also show that, in the print world, articles are usually read mostly in the first half-year after publication. Afterwards, usage drops off sharply.
2.7 Growth in usage of electronic information
It is hard to measure online activity accurately. The earliest and still widely used measure is that of "hits," or requests for a file. Unfortunately, with the growth of complicated pages, that measure is harder to evaluate. When possible, I prefer to look at full article downloads. Finally, as a conservative measure, one can look at the number of hosts (unique IP addresses) that requested information from a server. Even then, there are considerable uncertainties. The same person may send requests from several hosts. On the other hand, common employment of proxies and caches means that many people may hide behind a single host address, and a single download may lead to multiple users obtaining copies (as happens when papers are forwarded via email as well).
In addition to the uncertainties in interpreting the activity seen at a server, it is hard to compare data from different servers. Logs are set to record different things, and some Web pages are much more complicated than others that have the same or equivalent content. Thus comparing different measures of online activity is of necessity like comparing apples, oranges, pears, bananas, and onions. Some of the difficulties of such comparisons can be avoided by concentrating on rates of growth. If online information access is growing much faster than usage of print material, it will eventually dominate.
In spite of problems inherent in measuring online activity, it is obvious by most measures that Internet is growing rapidly. Typical growth rates, whether of bytes of traffic on backbones or of hosts, are on the order of 100% per year (Odlyzko, 2000; Coffman and Odlyzko, 1998). When one looks at usage of scholarly information online, typical growth rates are in the 50 to 100% range. For example, Table 2.1 shows the utilization of the online resources of the Library of Congress. Growth, in terms of bytes transmitted was over 100% per year for three years before decreasing to 90% in 1998, and then decreasing further in 1999, to 38%. It then increased to 62% in 2000. Table 2.2 shows downloads from the AT&T Labs - Research Web site, at http://www.research.att.com/, which contains a variety of papers, software, data, and other technical information. The growth rate there in the number of requests has been around 50% per year for several years, but between 2000 and 2001, it jumped to over 120%.
Some measures of electronic information usage are showing signs of stability, or even decreasing growth. For example, Table 2.3 shows utilization of Leslie Lamport's page devoted to material about a logic for specifying and reasoning about concurrent and reactive systems. Usage had been pretty stable in 1996 through 1998. When I corresponded with him about this in 1999, he thought usage had reached a steady state, with the entire community interested in this esoteric technical subject already accessing the page as much as they would ever need to do. However, the final counts for 1999 and 2000 showed substantial increases.
The next few sections discuss data about several online information sources that are freely available on the Internet.
2.8 Electronic journals and other organized databases
Some reports are already available on the dramatic increase in usage of scholarly information that is easily available. Traditionally, theses and dissertations have been practically invisible, used primarily within the institution where they were written, and even there, they were not accessed frequently. Free access to digital versions is now leading to an upsurge in usage, as is described in McMillan et al. (1999).
In the remainder of this section, even though it is not fully justified, I will equate a full article download with a reading as measured by Don King and his collaborators.
The entire American Mathematical Society e-math system was running at about 1.2 million "hits" per month in early 1999. The Ginsparg archive (arXive) at Los Alamos was getting about 2 million hits per month. The netlib system of Jack Dongarra and Eric Grosse was at about 2.5 million hits per month.
For detailed statistics on usage and growth of JSTOR, see (Guthrie, this volume). By the end of 1999, its usage was several million a month, whether one counts hits or full article downloads, and was growing at over 100% per year.
The Brazilian SciELO (Scientific Electronic Library Online) project available at http://www.scielo.br/ , started out in early 1998. It appears to be still going through the initial period of explosive growth. In January 1999, 4,943 pages were transmitted. A year later, that number had grown to 63,695. 67,143 hosts requested pages in 1999, so it was not just a small group of users who were involved. It is too early to tell about how fast it will continue to grow, but it seems worth listing this project to show that even the less industrialized countries are participating in making literature freely available.
Paul Ginsparg's arXive had about 100,000 papers in early 1999, and was running at a rate of about 7 million full article downloads per year. Thus on average each article was downloaded about 70 times per year. These download statistics were just for the main Los Alamos server. If we assume that the more than a dozen mirrors collectively see as much activity as the main server, then we get a download rate of about 140 times per year per article. This is misleading, though, since it mixes old and new papers, which have different utilization patterns.
If we look at download activity for arXiv articles as a function of time, we find that on average an article gets downloaded around 150 times within one year of its submission, and then 20 to 30 times a year in subsequent years. In particular, even articles submitted around 1991 get downloaded that often. Since this again covers just the main server, we probably should again multiply these numbers by two to get total activity. If we do that, we get into the range of readings per article that established journals experience. The pattern of usage differs from that observed by King and other for printed journal articles. Those are read primarily in the six months after publication, and then the frequency with which they are accessed decreases.
The Electronic Journal of Combinatorics published about 200 articles by early 1999, and had about 30,000 full article downloads from its main site during 1999. That is an average of 150 downloads per article. Multiplying that by two to account for the many mirror sites again gets us to about 300 downloads per article per year. Data about distribution of downloads with time is not available.
The general impression from the statistics quoted above is that articles in electronic archives and electronic journals may not yet be read as frequently as printed journal articles, but are getting close. On the other hand, some sources appear to be used much more frequently online than they would be in print.
Additional evidence that online access changes scholars' reading patterns is provided by First Monday, "the peer-reviewed journal of the Internet," at http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/ . Issues are made freely available on the first Monday of each month. First Monday started publication in May 1996. About 3,600 people subscribe to the e-mail notification service.
First Monday has provided me with access to the logs of their U.S. Web server from January 1999 through February 2000.This is not sufficient for a careful statistical study, but some interesting patterns can be discerned in the data.
Over this period, the number of full paper downloads has grown from a range of 50,000 to 60,000 per month in early 1999, to between 110,000 and 120,000 per month in early 2000. Distinct hosts requesting articles have increased from between 12,000 to 15,000 to over 20,000 each month. Thus the growth rate of requests has been close to the 100% that has occurred frequently on the Internet. Since there are only 3,600 subscribers, this suggests many others learn of the material through word of mouth, e-mail, or other methods.
In a typical month, the largest number of downloads is to articles from that month's issue. In subsequent months, accesses to an issue drop in a pattern similar to that found by Don King in his studies of print journals. Half a year later, downloads are usually down to a quarter or even a sixth of the first month's rate. At that stage, though, the story changes. Whereas for print journals, usage continues to decrease with time, for First Monday it appears to increase. For example, there were 9,064 full article downloads from all the 1997 issues in February 1999, and 19,378 in February 2000. Thus accesses to the 1997 issues kept pace with the general growth of usage. Of the articles that were most frequently downloaded in 1999, 6 of the top 10 had been published in previous years. This supports the thesis that easy online access leads to much wider usage of older materials.
My personal Web page, which was at AT&T until August 2001, and is now at http://www.dtc.umn.edu/~odlyzko/doc/internet.size.pdf, has also seen rapid growth in usage. However, it is hard to discuss growth rates meaningfully in a short space, since most of the growth came from new papers in new areas. Instead, I will discuss the usage patterns that I have observed.
During January 2000, there were 10,360 hits on my home page from 1,808 hosts, excluding .gif files, and hits from obvious crawlers. Most of these 1,808 hosts only looked at various index files. If we exclude those, as well as the ones that downloaded only my cv or only abstracts of papers, we are left with 656 hosts that downloaded 1,198 full copies of articles. Of those 656 hosts, 494 downloaded just a single paper. Many of those 494 requested a specific URL for an article as opposed to looking at the home page for pointers, and then disappeared. Thus on average the people who visited my home page seemed to know what they were looking for, got it, and moved on.
Visitors to my Web page were remarkably quiet in the face of some obvious faults. Many of the papers posted on that page, especially old ones, are incomplete, in that they are early versions, and usually do not have figures that are present in the printed versions. Still, that occasions few complaints. For example, in 1999, a posting to a number theory mailing lists resulted in 152 downloads of a paper in the space of less than two weeks. However, only one person complained about the lack of figures in the Web version, even though they are very helpful in visualizing the behavior shown in the paper.
Another anecdotal piece of evidence demonstrates what happens on the Web. Several times people have told me they were glad to meet me, as they had read my papers and benefited from them. Conversation showed that they indeed were familiar with the papers in question. However, they also told me that they had lost the URL, and would I please remind them where my home page was? Even though finding my home page on the Web is easy, since my name is not a particularly common one, they obviously did not find it necessary to bother doing so. This, as well as the lack of complaints regarding incomplete papers, suggests a world of plenty. People are guided to Web pages by a variety of cues, get whatever they can from those pages, and move on to other things. It is not a world of a few precious treasures with no substitutes.
The importance of making material easily available was demonstrated in a very graphic form when I made .pdf versions of my technical papers available in April 1998. There was an immediate jump in the rate of downloads. Prior to that, mathematical papers were available only in .ps and .tex formats, and the ones on electronic publishing and related topics in .ps and straight text. Most PC owners do not have easy access to tools for reading .ps papers, and were apparently bypassing the available material that required extra effort from them. This is similar to observations of Academic Press and the American Institute of Physics (Luther, 2001) that better interfaces lead to higher usage.
The temporal pattern of article usage on my Web page shows the behavior that was already noted for arXiv and for First Monday.After an initial period, frequency of access does not vary with age of article, and stays relatively constant with time, after discounting for general growth in usage.
There is more evidence that easy online access leads to changes in usage patterns. For example, downloads from my home page go to a variety of sources all over the world. Some are leading to email correspondence from places like Pakistan, the Philippines, or Mexico. This is not surprising in itself, since those countries do have technically educated populations that are growing. What is interesting is that this correspondence predominantly refers to papers that have been downloaded electronically, and to copies of older papers that are not available in digital form, and which the requesters had learned about from my home page. This does suggest strongly that easy availability is stimulating interest from a much wider audience. This conclusion is also supported by similar observations concerning correspondence with people in industrialized countries. Much comes from people outside universities or large research institutions that have good libraries and who would be unlikely to read my papers in print.
In a small fraction of cases the referrer field on requests shows where the requester found the URL. In many cases, such requests come from reading lists in college or graduate courses.
As a final note, spikes in usage often occur when one of my papers is mentioned in some newsletter or discussion group. For example, Bruce Schneier publishes CRYPTO-GRAM, a monthly email newsletter on cryptography and computer security, with a circulation of about 20,000. In early August 1999, CRYPTO-GRAM mentioned a recent preprint of mine which I had not advertised much, and which was about to appear in a regular print journal. Over the next two weeks over a thousand copies were downloaded. I am convinced that this is a higher figure than the number of times the printed version will be read.
The CRYPTO-GRAM example as well as those of other visits to my home page suggest that informal versions of peer review are in operation. A recommendation from someone, or a reference in a paper that the reader trusts, all serve to validate even unpublished preprints. Scholars pursue a variety of cues in selecting what material to access.
2.9 New forms of scholarly communication
A popular destination on the AT&T Labs - Research Web server is my colleague Neil Sloane's On-Line Encyclopedia of Integer Sequences, accessible from his home page, at http://www.research.att.com/~njas/. In January 2000, it attracted more than 6% of all the hits to the AT&T Labs - Research site. This "encyclopedia" is a novel combination of a database, software, and now also a new online journal. The integer sequence project enables people to find out what the next element is in a sequence such as
0, 1, 8, 78, 944, 13800, 237432, ...
This might seem like recreational mathematics, but it is very serious, as many research papers acknowledge the assistance of Sloane's database or, in earlier times, his books on this subject. It serves to tie mathematicians, computer scientists, physicists. chemists, and engineers together, and stimulate further research. It represents a novel form of communication that could not be captured in print form.
Another popular site that is also a locus of mathematical activity is Steve Finch's "Favorite Mathematical Constants" page at http://www.mathcad.com/library/Constants/ . It also shows rapid growth in usage although one that is harder to quantify, since monitoring software was changed less than a year ago, so comparisons are harder to make. Just as with Sloane's integer sequence page, it is becoming a form of "portal" to mathematics, one that does not fit easily into traditional publications models.
2.10 Conclusions and predictions
Many discussions of the future of scholarly publishing have been dominated by economic considerations. Digitization has often been seen as a solution to the "library crisis," which forces libraries to cut down on subscriptions. So far there has been little effect in this area, as pricing trends have not changed much (Odlyzko, 1999).
In the long run it has been clear that print will eventually become irrelevant, aside from any economic pressures, as it is simply too inflexible. Gutenberg's invention imprisoned scholarly publishing in a straitjacket that will be discarded eventually. However, the inertia of the scholarly publishing system is enormous, and so traditional journals have not changed much. They are in the process of migrating to the Web, but operate just as they did in print. However, we are beginning to see new ventures that will lead to new modes of operations. Still, it will be a while before they become a sizable fraction of the total scholarly publishing enterprise.
The large majority of scholarly publications are likely not to change much for several decades. However, there will be growing pressure to make them easily available. In particular, scholars are likely to press ever harder for free circulation and archiving of preprints. The realization will spread that anything not easily available on the Web will be almost invisible. Whether they like it or not, scholars are engaged in a war for the eyeballs and ease of access will be seen as vital.
Ease of access is likely to promote the natural evolution of scholarly work. There will be more interdisciplinary research, and more survey publications. Some of these trends are beginning to appear in the data discussed in this paper, and we are likely to get more confirmation in the next few years.
† I thank Steve Finch, Paul Ginsparg, Jim Gray, Eric Grosse, Kevin Guthrie, Stevan Harnad, Steve Heller, Patrick Ion, Don King, Kevin Kiyan, Greg Kuperberg, Leslie Lamport, Steve Lawrence, Carol Montgomery, Gary Mullen, Ann Okerson, Kimberly Parker, Robby Robson, Carol Tenopir, Ed Valauskas, Hal Varian, Tom Walker, and Herb Wilf, for providing comments, corrections, and helpful information.
1. For circulation figures for major research libraries in the U.S., see Association of Research Libraries, Statistics and Measurement Program (http://www.arl.org/stats/index.html).
5. For more evidence, see also Klopfenstein (1989) and the references there.
8. See endnote 10 to Chapter 2 of Buckland (1992) for references.
10. A summary is presented in King and Tenopir (this volume).
11. The University of Wisconsin study is available at http://wendt.library.wisc.edu/archive/journals/costben.html
12. This page is available at http://research.microsoft.com/users/lamport/tla/tla.html.
15. For an account of the project, see Sloane's recent paper Sloane (1998).