spobooks5621225.0001.001 in

    Preface

    This monograph is the final product of a journey that began over 12 years ago The University of Michigan already was a pioneer in the development of electronic access to scholarly publications. For example, it was one of a small number of participants in TULIP (The University Licensing Project), an early joint project between Elsevier and university libraries to develop workable systems for electronic distribution and usage. Michigan was also a recipient of one of six large grants in the first round of the NSF Digital Library program, launching he University of Michigan Digital Library (UMDL) project. Towards the end of TULIP, the project leaders from Michigan and Elsevier started meeting with researchers from UMDL to discuss a bold idea: to deploy a full-scale production-quality digital access system to enable usage of content from all of Elsevier's (then about 1200) scholarly journals, and at the same time to conduct a field experiment to answer various questions about the interplay between pricing models and usage.

    Over the course of about a year, we put together production and research teams at Michigan, and met multiple times with Elsevier executives to negotiate the licenses and design the project. Elsevier gave us considerable freedom in the design of the pricing experiment, so that we could charge participating institutions different prices, and importantly, different pricing structures (e.g., per article vs. “all-you-can-eat”). We received substantial technical support from the digital library production specialists at the University Library, as well as from the production division at Elsevier. The details of the PEAK production system and experiment are described in chapters 4, 5 and 6 below.

    Near the end of the production system trial and field experiment, we decided to convene an international conference of academic librarians, publishers and scholars interested in the usage and economics of emerging digital library systems. This group of about 100 met for two days in Ann Arbor, during which these research papers were presented, with lively discussion throughout. Based on the success of this conference we decided to further develop the papers and publish them as a monograph, the results of which you now have in front of you. Nearly all of the conference speakers agreed to make suggested revisions to include their papers in the volume. We (Lougee and MacKie-Mason) reviewed the papers and provided detailed suggestions for revision. We then had a professional copy and style editor comb through the drafts to seek greater uniformity of tone and style. The authors prepared their revisions and resubmitted the papers.

    Due to an unfortunate series of events, this process took far longer than we had planned, and our original agreement to publish the volume as a timely entry in the literature on emerging digital libraries fell through. A second publisher failed to deliver on its commitments after another year of delay. At that time the University of Michigan Scholarly Publishing Office stepped forward to publish this monograph, recognizing its value both as an historical record on the state of digital library development around the turn of the millennium, and also because many of the problems, solutions and conceptual frameworks advanced in these chapters are enduring. Though some of the production and usage facts may seem quaint by now, about eight years after the original drafts of the papers were written, the central issues and observations are still fresh.

    Content of the chapters was revised and updated by the authors through about 2004. During the final production process, during 2007-2008, all of the referenced hyperlinks were checked and corrected. Of course, hyperlinks on the web decay at a steady rate, but as of spring 2008 the references are complete and useful.

    PEAK was a ground-breaking effort in its day, and references to the project have continued over time. It raised important questions about the potential for highly functional journal content and new economic models of publishing. In today’s context of socially-enabled systems, interactive publishing, and open access publishing, the motivating questions of PEAK remain relevant.

    Wendy Pradt Lougee
    University of Minnesota
    Jeffrey K. MacKie-Mason
    University of Michigan
    May 2008

    Contributors

    David Alsmeyer
    BT Advanced Communications Research
    Maria Bonn
    Senior Associate Librarian
    Digital Library Initiative
    University of Michigan
    Ann Arbor, MI 48109
    Mary M. Case
    Director, Office of Scholarly Communication
    Association of Research Libraries
    21 Dupont Circle, NW
    Suite 800
    Washington, DC 20036
    marycase@arl.org
    Robert S. Gazzale
    Assistant Professor of Economics
    Williams College
    Williamstown, MA
    rgazzale@williams.edu
    Kevin M. Guthrie
    President, Ithaka Harbors Inc.
    151 East 61st Street
    New York, NY 10065
    Leah Halliday
    Department of Information Science,
    Loughborough University,
    Loughborough, Leicstershire
    LE11 3TU, UK
    l.l.halliday@virgin.net
    Karen Hunter
    Senior Vice President
    Elsevier Science
    k.hunter@elsevier.com
    Paul B. Kantor
    Tantalus Inc. and Rutgers University
    New Brunswick, NJ 08903
    kantor@scils.rutgers.edu
    Donald W. King
    Visiting Professor
    School of Information Sciences,
    University of Pittsburgh
    dwking@umich.edu
    Bruce R. Kingma
    School of Information Studies
    Syracuse University
    Syracuse, New York 13244-4100
    brkingma@syr.edu
    Thomas Krichel
    Palmer School of Library and Information Science
    Long Island University
    Brookville, New York 11548-1300
    Thomas.Krichel@liu.edu
    Wendy Pradt Lougee
    University Librarian
    University of Minnesota
    Minneapolis, Minnesota
    wlougee@umn.edu
    Jeffrey K. MacKie-Mason
    Arthur W. Burks Professor
    Information and Computer Science
    Professor of Economics and Public Policy
    School of Information
    University of Michigan
    Ann Arbor, MI 48109-1092
    jmm@umich.edu
    Carol Mandel
    New York University
    carol.mandel@nyu.edu
    Mark J. McCabe
    School of Economics
    Georgia Institute of Technology
    Atlanta, Ga 30318
    mark.mccabe@econ.gatech.edu
    Carol Hansen Montgomery
    Dean of Libraries
    Drexel University
    W. W. Hagerty Library
    Drexel University
    Philadelphia, PA 19104
    montgoch@drexel.edu
    Andrew M. Odlyzko
    Director, Digital Technology Center
    University of Minnesota
    499 Walter Library
    Minneapolis, MN 55455
    odlyzko@umn.edu
    Charles Oppenheim
    Department of Information Science
    Loughborough University
    Loughborough, Leicstershire LE11 3TU, UK
    c.oppenheim@lboro.ac.uk
    Hannelore B. Rader
    Ekstrom Library
    University of Louisville
    Louisville, KY 40292
    h.rader@louisville.edu
    Juan Riveros
    Managing Economist, Nathan Associates, Inc.
    2101 Wilson Boulevard, Suite 1200
    Arlington, VA 22201
    Michael P. Spinella
    JSTOR
    149 Fifth Avenue, 8th Floor
    New York, New York 10010
    mspinella@jstor.org
    Mary Summerfield
    Director, Business Development and Planning
    msummerfield@press.uchicago.edu
    Carol Tenopir
    Professor, School of Information Sciences,
    University of Tennessee

    David Alsmeyer manages BT's (formerly British Telecom's) library, information and translation services, which provides technical library and information services and a full translation and interpreting service to people throughout the BT Group.

    Maria Bonn is the Director of the Scholarly Publishing Office, University of Michigan.

    Mary M. Case is Director of the Office of Scholarly Communication of the Association of Research Libraries (ARL). The Office of Scholarly Communication undertakes activities to understand and influence the forces affecting the production, dissemination, and use of scholarly information.

    Robert Gazzale is an Assistant Professor of Economics at Williams College, Massachussetts. He was a member of the PEAK research team while a Ph.D. student at the University of Michigan.

    Kevin M. Guthrie is the president of JSTOR, an independent not-for-profit organization established to help the scholarly community take advantage of advances in information technology. He is the author of The New-York Historical Society: Lessons from One Nonprofit's Long Struggle for Survival, published by Jossey-Bass Publishers in January 1996.

    Leah Halliday obtained her Ph.D. at the Department of Information Science, Loughborough University, where she studied economic models digital journals. She currently is the SUNCAT project manager in the Edinburgh University Data Library.

    Karen Hunter is Senior Vice President of Elsevier Science. With Elsevier since 1976, she has concentrated for several years on strategic planning and the electronic delivery of journal information. She was responsible for the TULIP experiment (1991-1995) in networked journal delivery at 9 universities and for the design and start-up of ScienceDirect, Elsevier Science's Web journals service.

    Paul Kantor is Professor in the Department of Library and Information Science of the School of Communication, Information and Library Studies at Rutgers University, where he also directs the Rutgers Distributed Laboratory for Digital Libraries, and the Alexandria Project Laboratory for study of the library function.

    Donald W. King is a research professor at the University of Pittsburgh School of Information Sciences. He has spent 40 years evaluating information systems and describing the communication environment.

    Bruce R. Kingma is Associate Dean and Associate Professor in the School of Information Studies, Syracuse University. He has authored two books and numerous refereed articles on the economics of scholarly information.

    Thomas Krichel is an assistant professor at the Palmer School of Library and Information Science, Long Island University. In 1993 he founded NetEc, a consortium of Internet projects for academic economists. In 1997, he founded the RePEc dataset to document economics.

    Wendy Pradt Lougee is the University of Minnesota Librarian and holds the McKnight Presidential Professorship. She was the Project Director for the PEAK project, at which time she was the Associate Director, University Library for Digital Library Services at the University of Michigan.

    Jeffrey K. MacKie-Mason is the Arthur W. Burks Professor of Information and Computer Science, and a Professor of Economics and Public Policy at the University of Michigan. He was the Research Director for the PEAK project.

    Carol A. Mandel is Dean of Libraries at New York University and Publisher, New York University Press. During the period of this study, she was Deputy University Librarian at Columbia University.

    Mark J. McCabe is an assistant professor in the School of Economics at Georgia Institute of Technology. Prior to this appointment, he was an economist in the Antitrust Division of the U.S. Department of Justice for seven years, where he analyzed mergers and other competitive strategies in a number of industries.

    Dr. Carol Montgomery is currently Dean of Libraries at Drexel University. She was formerly Director of the Institute for Academic Informatics and Associate Provost at Allegheny University of the Health Sciences. She is the co-author of several books.

    Andrew Odlyzko is the Director of the Digital Technology Center, holds an ADC Telecommunications Professorship, is a professor of mathematics, and is an Assistant Vice President for Research at the University of Minnesota.

    Charles Oppenheim has been Professor of Information Science at the Department of Information Science, Loughborough University, since 1998. In addition to other academic positions, he has held business development positions with the electronic publishing companies Pergamon Infoline, Derwent Publications and Reuters.

    Hannelore B. Rader has been University Librarian and Dean of Libraries at the University of Louisville since 1997. Her more than 70 publications have focused on information literacy and library administration. She was named 1999 ACRL Academic and Research Librarian of the Year.

    Dr. Juan Riveros is a consultant with Nathan Associates, Inc., where he performs analyses of competition in various industries. At the time of this project he was a Ph.D. candidate in economics at the University of Michigan, and a member of the PEAK research team.

    Michael Spinella is the Executive Director of JSTOR. Previously, he was the Director of Membership and Meetings at the American Association for the Advancement of Science since 1990.

    Mary Summerfield is an independent consultant in the information industry based in Oak Park, IL. During the period of this study, she was a Project Director at Columbia University with primary responsibility for the Online Books Evaluation Study.

    Carol Tenopir is a Professor at the School of Information Sciences, University of Tennessee, Knoxville. Dr. Tenopir is the author of over 200 articles and five books.

    Acknowledgements

    The evolution of digital libraries has been marked by important collaboration and this monograph is a prime example of a particularly fruitful enterprise at the University of Michigan. We are grateful to the many people who made this project possible. Starting at the beginning, we were introduced to each other by Dan Atkins, in the context of the University of Michigan Digital Library (UMDL) project, one of the first wave of large-scale NSF-funded digital library research projects in the U.S. In addition to heading UMDL, Dan was the founding Dean of the UM's School of Information (SI), and was committed to leveraging the expertise of multiple communities. We are grateful for his inspiration and leadership as we conceived and carried out the collaborative effort represented in the PEAK project.

    PEAK was a combined research and development project run at the University of Michigan in partnership with Elsevier Science. In addition to the online journal service we implemented, PEAK hosted an international research conference in 2000, at which the first versions of the papers in this monograph were presented. We are grateful to people involved in both the publishing project, and the conference and book production.

    PEAK would not have been possible without the vision and support of Karen Hunter, Senior Vice President at Elsevier. Karen already had been a leader in digital publishing projects for many years, and has been a great friend to academic librarians and digital library researchers during a time when relations between publishers and academia have often been tense. We are also grateful for the contributions made by Roland Dietz and Alexandra Jankovich, both previously at Elsevier.

    For the development and operation of PEAK at the University of Michigan we are especially grateful to John Wilkin, who managed the production team at UM, and Maria Bonn, who designed the interface, then led the services and assessment team. We also thank SI graduate student Ann Zimmerman for her diligent participation and support of the behavioral research component of the project, along with economics graduate students Juan Riveros and Bob Gazzale, who provided excellent assistance in the data analysis. A number of others from the University of Michigan Library made invaluable contributions to the services and technologies associated with PEAK.

    When we organized the conference, we received excellent support from Hannah Wilkins from SI, and Pat Hodges of the University Library. Generous support was provided by Elsevier Science, Wiley Interscience and the Council on Library and Information Resources. The School of Information hosted the conference and provided additional logistical support.

    We are especially grateful to the authors of the chapters in this monograph. They wrote the original versions for the conference, before we had the notion of publishing the collection, and they paid their own travel expenses. Then, during the exceedingly long, and sometimes painful process of creating this monograph, they were patient, supportive and responsive to our requests for revisions. We are also grateful to the approximately 100 other conference participants who attended at their own expense, and provided lively and detailed commentary and suggestions.

    Finally, we thank the professionals who helped us publish this monograph. Anne Pfaelzer de Ortiz provided superb style and copy editing, working patiently and hard to bring a consistent tone and style to the chapters. Economics graduate student Kan Takeuchi carefully typeset the first version in the LaTeX system. We also are very grateful to the staff of the UM Scholarly Publishing Office. When we approached them in 2007, they responded enthusiastically to the opportunity to bring this monograph to the public. They provided excellent production support, converting the entire (long) manuscript from its original LaTeX format to XML; they also did the design and provided project management support. The SPO is directed by Maria Bonn (who a decade earlier played a central role in the creation of the online PEAK service), and Kathleen Fear and Kevin Hawkins provided the always excellent services. We are also grateful to Andrea McVittie for reviewing and correcting or updating all of the links to online references so they are accurate as of early 2008.

    I. Overview

    1. Stakeholders and successful digital transformation of the research library

    Libraries and library collections are experiencing transformation. Some aspects of this evolution are gradual, but others are jarring. The digital transformation has been a critical catalyst within publishing and scholarly communication. Before the 1990's library collections were predominantly in print-on-paper formats with modest investment in proprietary citation databases or other electronic reference works. Network-accessible content, of any type, was rare. In the relatively short intervening time, the quantity of popular, reference and scholarly information in digital format exploded, and an increasingly substantial share of library budgets is devoted to acquisition, processing and management of digital resources.

    In addition, libraries now make most of their digital content accessible to patrons over the Internet. In less than a decade nearly every library in the developed world has become "Internet enabled." A first step was often to make the existing catalogue accessible to patrons via the Internet. This extrapolation of existing catalogue functions to electronic databases created worldwide access to information about library holdings, further enhancing resource sharing and creating opportunities for libraries to re-conceive labor-intensive operations. Coupled with the availability of online journal indices, these developments set the stage for change both internal and external to the library. Lynch (2000) notes that during this period libraries played a key role in introducing information systems and the use of technology for information access to a campus audience, particularly in non-scientific disciplines.

    The development of network-accessible digital collections generated more dramatic impact. The capabilities for creating and distributing highly functional electronic content have empowered users for new types of inquiry, and libraries for fuller engagement in content creation, management, and use.

    The ongoing transformation is pronounced within both for-profit and not-for-profit research libraries. The vast majority of scholarly journal publications and reference tools are now available in electronic form. While the Association of Research Libraries' (ARL) data indicate library acquisitions budgets have lost significant purchasing power, largely due to high price inflation for journals[1], overall library investment in electronic content has risen dramatically in the last several years (ARL 2001). ARL figures for 2001-02 indicate an average 19.6% of library collections budgets is spent on electronic resources, a fivefold increase since 1992-93. Not surprisingly, interlibrary borrowing is rising as budgets are squeezed between inflation and the growing availability of new electronic publications. The increased expenditure for electronic resources is driven in no small part by user demand, and the results are evident in usage. As Guthrie (this volume) notes, when scholarly materials are Internet-accessible and full-text searchable, the number of user accesses is many multiples higher than in traditional print environments. The rapid transformation of library collections is an example of institutional transformation, with a broad definition of "institution." During a period of profound transformation, the patterns and forms of interactions among an institution's many stakeholders will necessarily change. The path of transformation and its success crucially depend on the way in which these stakeholder interactions develop. We develop our thesis in this chapter. The authors in the rest of this volume study institutional transformations with an emphasis on specific stakeholder interactions. In particular, the authors focus on economic interactions (between stakeholders such as publishers and libraries, authors and publishers, and libraries and employees, among others) and access or usage interactions (between readers and libraries).

    1.1 Institutional transformation

    We use "institution" in its social construct sense: "a custom, practice, relationship, or behavioral pattern of importance in the life of a community or society."[2] Institutions, such as the institution of marriage, are complex social constructs that arise and are shaped in response to social needs, constraints and balances of power among stakeholders. In this sense, research libraries as a whole are an institution; an individual library is an instance of the social institution. Institutions generally provide stability because they do not easily change quickly, although a particular instance of an institution, like a particular marriage, may quickly change.

    To illustrate by example, some institutions that bear familial resemblance to research libraries are K-12 education, the bookstore industry, and public parks. Each requires a flow of funds; each engages the interests of multiple parties; each is familiar to its participants and more or less uniform across different instances. Each is designed or emerges to serve a mixture of needs, subject to the social constraints and power relationships. Each institution has numerous individual instances (e.g., individual bookstores).

    When there is significant change in the needs or constraints or power relationships on which an institution is based, the practices and behavioral patterns embodied in the institution may no longer effectively serve the new configuration of needs, constraints and power relationships. In the face of such change in external conditions, institutions adapt and change. Institutions, because they comprise a web of social conventions, practices and structures, adapt slowly. Slow adaptation is usually suitable because the foundational conditions (needs, constraints and power relationships) change slowly. However, during times of rapid change in needs and constraints, slow institutional adaptation may lead to frictions and stakeholder frustrations.

    The institution of the public access library emerged slowly as needs and constraints changed. The development of the university-based research library grew out of the monastic scholasticism of the early Enlightenment and Renaissance. This development was an early adaptation to changes in power and user needs: the declining power of the Church and the growing importance of secular study. Arguably the first university research library was at Oxford University, founded by Thomas Cobham, Bishop of Worcester, in 1320. In 1598 Thomas Bodley set himself to restoring and opening the library to students; in 1602 the famous Bodleian Library at Oxford officially opened (Bodleian Library (2007)).

    The great modern national libraries such as the British Library and the Library of Congress were at least in part a transformation in response to the growth of the bourgeois class and the emergence of populism. These libraries had their origins in mandatory deposit laws of the 17th century, and in the donation of royal collections to the citizenry in France and elsewhere during the 18th and early 19th centuries.[3] The familiar system of multiple, distributed, small-to-medium public libraries with open access to all local citizens developed in the latter half of the 19th century, with substantial help from Andrew Mellon and other benefactors.[4]

    Similarly, the history of the scholarly research journal reflects a gradual evolution from the communication mission of the scientific societies founded in the 17th century, to the 18th century emergence of journal articles as a mechanism for registering ownership of discoveries. In the 19th century, journal publications became more closely associated with professional standing (Schauder, 1994).

    This three-century evolution of journals illustrates the gradual shaping of community practices as new vehicles for communication emerged and new practices were adopted.

    The practice or behavior pattern that defines the research library as an institution comprises a web of interactions between various stakeholders each with distinct interests in the collections of a library. These stakeholders include authors, publishers, librarians and users. Each has an interest in and participates in the processes of creation, publication, distribution, acquisition, organization, archiving, and usage. This web of interacting participants and processes shapes both the research library's collection and the success of the library in meeting the expressed needs of users over time. It is this interdependent, interactive context—the ecology of libraries—that frames this book.

    1.2 The Ecology of Libraries: Transformation in Context

    The research library is embedded in the broad social systems of information flow management: the production, distribution and use of information resources by members of society. This iterative process among stakeholders is the foundation of knowledge creation and transmission. As the information technology revolution has proceeded over the past three decades or so, all institutions participating in information flow management have been deeply affected. Consequently, the set of social needs, constraints, and power relationships that are served by the research library have changed due to developments in information technology.

    Simultaneously, new technology for information access and analysis enabled the development of new research methodologies and discipline interests. The changes in needs and capabilities have, in turn, stimulated pressure for institutional change in values and behavioral norms.[5] For example, new venues and methods for documenting and disseminating research have gradually become incorporated in practice, as the values associated with academic tenure have adapted to these opportunities. These interdependent forces have had an impact on individuals, on disciplines, and on libraries.

    Technology Forces

    The information technology revolution has affected all library types: public, K-12, academic, and organizational or special including corporate and not-for-profit research libraries. The "information technology revolution" is a vague moniker for a vast array of technological innovations and their rapid application throughout society. Understanding the effects of the revolution depends on understanding two fundamental facts that have driven substantial change: the exponential rate of decline in the costs of digital computation (silicon, i.e., microprocessors), and of digital communication (sand, i.e., fiber optics).

    Gordon Moore of Intel predicted in 1965 that the power of microprocessors available at a given cost would double about every 18 months (Moore, 1965). The equivalent form of Moore's law is that the cost of a given amount of microprocessor power would be cut in half every eighteen months. Moore's prediction has held for thirty years. There have been a number of astonishing technological and industrial advances over the past several centuries, but there is probably nothing that matches this rate of improvement. To give an idea of the power of this exponential cost decline, consider a luxurious house that cost $100,000 to build in 1970.[6] If housing costs had followed Moore's law, then this mansion would cost only $1.67 to build in 2003. Imagine how different the world would be today if mansions cost only $1.67!

    Digital communications technology experienced similar cost declines. In 1994 MacKie-Mason and Varian (1994) documented that digital communications costs had decreased 30% annually for the previous thirty years and, if anything, the rate has accelerated in the past decade.

    The cost collapse in computation and communication fostered the rapid development and deployment of powerful networked information technologies. The explosion of digital networking, cheap storage, and powerful computation and display devices has radically changed the constraints, stakeholder needs, and power relationships on which the institution of research libraries is built.

    In addition to these dramatic cost/performance improvements in computational power and communication, several concurrent trends have shaped the landscape in which libraries interact with other stakeholders. The development of standards for creating and protocols for sharing digital content have stimulated electronic publishing and associated systems of access. As a result, we've seen the rapid development of large-scale repositories of electronic publications by major publishers, and increasingly sophisticated retrieval systems and tools.

    Information Creators

    The community of information "creators" for scholarly or research content include individuals, the academy, and other research investors such as for-profit companies with their own research labs. These stakeholders are crucially interested in the values-based set of social practices defining intellectual property and its ownership. These values have been evolving both as modern societies depend increasingly on intellectual capital for wealth creation, and as digital information technology leads to changes in distribution, collection and information management practices. Institutional adaptation to changes in these social and technological forces has created significant volatility within the world of scholarly communication.

    For example, the academy has been revisiting and re-conceiving institutional intellectual property policies, often prompted by explorations of new distance-independent venues for courses (Knight Higher Education Collaborative, 2002). At the same time legislatures have been modifying copyright law. Although largely driven by mass-market commercial interests, the legal changes have important implications for scholarly communication. In response to these stakeholder pressures for institutional change, some universities and other organizations have boldly launched new services for managing and disseminating information.[7]

    A related trend is the emergence of open paradigms, which are realized as codifications of principles and social norms for interactions between creators and other stakeholders. These developing norms also result from the interplay of changing values, technologies, and social needs. Just as the open software movement is defined by collaborative development, programs such as the Open Knowledge Initiative (for sharing learning resources), Open Archives Initiative (for sharing research content), and the Open Access movement for journals embrace collaboration and a more open exchange of goods and services. These initiatives are challenging institutionalized practices surrounding the flow of information.[8]

    Publishers

    Much has been written about the transformation of publishing.[9] Many authors focus on the new opportunities for dissemination introduced by the capability for self-publishing on the Web, but most of the authors in this book focus on formal publication systems. Formal publication involves a distinction between content creator and publisher, and the use of independent reviewers. Thus the institution embodies norms and routines for interactions between these separate stakeholders.

    The early 1990's saw a number of significant experiments among publishers (e.g., Elsevier Science's TULIP project, the predecessor of the PEAK project discussed in several chapters in this book). These experiments typically focused on methods for creating and distributing electronic versions of print journals. Gradually capabilities for linking, more complex search functions, and customization options emerged. These tools were increasingly important to users as the volume of electronic content grew through conversion of older literature and the aggregation of current titles.

    While this extrapolation from print to digital production and distribution represented a significant development evident among commercial and non-profit publishers alike, several concurrent and alternative development paths took shape. Inflation in publication prices and concerns about constraints on rights for use and re-use prompted encouragement of alternative models, often within a non-profit context. Similarly, the open access movement to free up access to publications spurred new models for managing rights and supporting costs.[10]

    As one recent market analysis of the journal publishing industry reported, initiatives to launch alternative publication vehicles face significant obstacles:

    Libraries and academics have been trying for over a decade to develop new ways of disseminating academic knowledge and research, but the barriers to entry enjoyed by the incumbent journals are just too high (loyal readership, brand recognition, 'boards' of academics who peer review research), as are the value proposition [sic] (they bring order to an anarchic process—the development of knowledge) (Morgan Stanley, 2002).

    Users

    The picture would not be complete without discussing changes in individual user behavior and expectations. Several recent surveys report on the changing dimensions of user activity. The user base for libraries is expanding and the demand for instruction in use of library resources is also increasing. Yet there has been a downturn in circulation of physical collections and use of in-library reference services (Kyrillidou and Young, 2003). While the majority of students and faculty are using online library content, they report that they still desire a hybrid environment with both print and electronic collections (Friedlander, 2002). As the volume and complexity of electronic publications increase, users are also expressing a desire for greater personal control in managing access to electronic content (Cook et al., 2003).

    Individual user preferences are strong forces for change, but do not represent the whole picture. Community practices and preferences within specific disciplines are also potent forces, and each discipline community has responded differently to these new opportunities for communicating and documenting research. Traweek's (1988) anthropological analysis of life among high energy physicists captures the culture and practices of this community and depicts the social conventions that enabled the early and extremely rapid adoption of e-prints. In economics, by contrast, the first significant non-commercial e-print site started in 1993, but of the 2500 papers submitted to date, nearly half were only submitted in the last two years.[11] More recently behaviors among authors and editors within ecology have been analyzed to understand the decision processes that lead to publishing in electronic journals in that field (Hahn, 2001).

    The changes associated with technology and community norms and values have been sufficiently radical that we should expect to see major transformations of the institution of the research library. Since institutions by nature do not adapt quickly, a period during which the foundations are so quickly transformed might be compared to an earthquake that causes cracks and damage as the institution responds to the quake. When a stable institution experiences increasing pressure to change rapidly, understanding the changes in the interactions between stakeholders is especially important.

    The Economics of Libraries

    An important contradiction has arisen from the fact that the information technology revolution has been driven by cost reductions, but the costs of research libraries have been increasing during this period. Many observers (and budget administrators) expected a decrease in research library spending over the past decade. In fact, although the costs of the technology have decreased, these constitute only a fraction of the inputs to the research library. The costs are high for creating, adopting, implementing, maintaining and managing new information systems that rely on networked information technology, in part because these activities depend largely on human labor, not silicon or sand. Meanwhile, during the transformation, the older systems must still be staffed and maintained. Overall, the costs of transformation have been added to the ongoing costs of the institution, and the total cost is correspondingly higher.

    The seeming paradox of lower input costs but higher total costs is no paradox at all. It follows directly from the nature of institutional transformation. The change in library constraints reflects a decrease in the costs of some inputs to a stable, functioning system. However, adapting to this change in constraints—undertaking the transformations to reach the new stable system—is itself costly, and during the transformation total costs can be much higher. The transformation of the USSR from socialist to market economy is a colloquial example. The institutions of a market economy are likely to be much less costly and more efficient, but the transformation from one to the other is costly and slow.

    1.3 Understanding the transformation

    Research libraries are experiencing rapid, simultaneous shifts in constraints, needs, and power balances. Institutions develop slowly in response to such powerful, durable social forces, and rapid adaptation to changing forces is unlikely. Thus, during any period of intense change, institutional adaptations may appear to be falling behind, and there will be many false steps. We can potentially lower the social cost of rapid—and thus, wrenching—institutional transformations by understanding the interactions between the stakeholders and the process of change in those interactions.

    The interactions between human stakeholders in an institution are loosely-coupled interfaces in a complex social system. That is, a signal on one side of an interaction typically elicits a response from the other side, but the mapping between signal and response is imperfect: a given signal may elicit a range of different responses at different times and locations.

    As an example of one such loosely-coupled interface common in the research library, consider the practice of interlibrary loans which is used to share access to collections. If a user at one library wished to read a print-on-paper document held in the collection of a different library, she could request that the distant library deliver the document for local use.[12] Because the distant library owns the copy of the document in question and an unrestricted right to let users read that particular copy, the library can deliver the physical document without involving the publisher.

    Sometimes, however, the distant library would prefer to deliver a facsimile of the document to the user, so as to retain its copy for its own users, or to reduce the risk of loss or damage. Typically, a publisher holds the copyright on the document, and thus the user-library interaction (the request for a facsimile) could invoke an interaction between library and publisher: "can we make a copy to deliver to the requesting user?" Because this interaction between library and publisher is likely to occur many times, libraries and publishers have reached pre-arranged agreements (claiming fair use exemptions or relying on agreed upon guidelines). Such agreements—the result of an interaction between at least two interested parties—specify the terms and conditions under which the library may create a facsimile and deliver it to the requesting user.

    These are only two of the results that the originating signal—the user's interlibrary loan request—might invoke. The results might differ for the user in various ways; for example, if the original document is delivered, the user normally will be required to return it within a fixed period; if a facsimile is delivered, the user typically retains the copy. Another variation is that in some cases a requested facsimile might be created by the distant, owning library, whereas in other cases the requesting library might purchase a facsimile from a document delivery service (another interaction between two stakeholders); in such cases the conditions for the user (for example, whether the document must be returned or whether a fee is assessed) might be different. Thus, the interlibrary loan process is a loosely-coupled interface between two stakeholders (user and library): the request might or might not invoke further interactions between other stakeholders, and the response to the request might take on a variety of forms.

    With the development of networks and digital collections the interlibrary loan process faces substantial pressure to change. In the print-on-paper world a loan might be delivered in several days to several weeks. In a networked digital world users put much higher priority on quick, even immediate delivery. Libraries are likely to find that the costs of digital reproduction and networked delivery are much lower than print-on-paper costs, and thus want to implement electronic delivery in response to the loan request. However, because electronic delivery typically requires the making of an electronic copy, this response is likely to invoke a new interaction between library and publisher: "can we make a digital copy and electronically deliver it to the requesting user?" The results of these ancillary interactions may be, and often have been, quite different than the request for permission to deliver print-on-paper facsimiles in response to interlibrary loan requests. The publisher may impose different terms and conditions, or may even refuse permission for electronic interlibrary loans.

    What does this example suggest in terms of the interplay that ensues between stakeholders? As we noted, the library and publishers typically have to deal with many interlibrary loan requests, and thus have generally found it efficient to establish pre-arranged agreements that cover most such requests. This routine speeds the process and lowers the transaction costs. As research libraries increasingly rely on digital collections, pressure for electronic interlibrary loans will become frequent, and thus considerable time and transaction costs will be saved if libraries and publishers can work out new pre-arranged agreements to cover these situations. However, the changes in technological costs and capabilities, and the concomitant changes in the interests of the stakeholders, mean that agreements to cover print-on-paper interlibrary loans may not be satisfactory to all stakeholders, and a period of trial, error and negotiation may be necessary.[13] The sooner participants understand the ways in which the constraints, costs and interests of stakeholders have changed, the sooner and better they can adapt the institution by agreeing on new standardized agreements covering electronic interlibrary loans.

    The emergence of digital collections as an instance of institutional transformation guides us to look for particular types of problems that arise as the institution adapts to the information technology revolution. We have selected the contributions to this volume to focus on two questions we think will be central.

    The first is who will pay to create and sustain digital materials? The information technology revolution has made large-scale digital collections technically feasible, and many of the stakeholders have increasingly demanded digital collections. The creation of materials to include in such collections is a necessary first step, and despite the reduction in the costs of some key inputs, the digital production process is costly. Therefore, who will pay? Various stakeholder interactions in a print-on-paper research library world involve financial transfers to pay for creation services. For example, some authors pay page charges to print publishers, and libraries pay to purchase printed books and to subscribe to printed journals. Among the stakeholders (authors, publishers, libraries, readers, and perhaps others), who will pay for digital material production? The constraints and needs (and perhaps power relationships) have been changed by information technology, and thus we should expect that the financial interactions between stakeholders will also change as the institution adapts. Perhaps we will see more author payments (recommended by some open access journals), or more charitable foundation support for digital publishing (JSTOR was started by the Andrew W. Mellon Foundation; the Public Library of Science has initiated production with a grant from the Gordon and Betty Moore Foundation). Perhaps funding will continue to flow primarily from libraries and individual readers, but the structure of payments is likely to change because of the differences in functionality between electronic and paper publications.

    The model of institutional transformation leads us to a second question: How will readers use digital collections? Will users go to physical libraries and interact face-to-face with librarians, or communicate electronically and access documents from their desktops or wireless PDAs? How will user mobility affect the roles of libraries? Will usage increase for stakeholders who had less convenient access to print documents (e.g., students and faculty at smaller colleges, or scientists in developing countries)? Will users do more browsing using electronic search tools and less full-document reading? How will the relative usage of a document's several information features change? For example, when it is easier to find and search documents, will there be more demand for specific elements, such as the data tables or bibliography, and less for the text?

    We organized the remaining chapters around these two fundamental questions. In the first section we present several chapters on a major field research project that simultaneously addressed collection building, publisher pricing models, library collection decisions, and usage: the PEAK (Pricing Electronic Access to Knowledge) project. It was our work on this project at the University of Michigan that led us to invite other authors to join us to create a volume that encompasses publishing economics and usage during this era of the digital transformation of libraries. The two organizing questions above were also the framing context for PEAK: Who will pay for digital materials? How will they be used?

    In the second section the authors address digital publishing. They focus on publishing costs and the distribution of those costs to various stakeholders in the institution. In the third section we collect several essays on experiences in building and using digital collections. The authors emphasize user needs and the effect that digital collection usage has on building and maintaining the collections. Many of these chapters report on a specific project at a particular time, but they all shed light on the dynamics of stakeholder interactions and the role of these interactions in shaping the transformation of the encompassing institutions.

    There are several lessons that are driven home by their common emergence in the chapters of this book. Regarding payment for materials, a variety of economic interactions exist in the social institution of research libraries. In the past century, most of the flow of resources has been to publishers from users (individuals and libraries) through subscriptions or society membership fees.[14] The studies in this book demonstrate that a variety of alternative mechanisms are feasible in theory and in practice, and that there are reasons to expect some transformation of these economic interactions in response to the digital revolution. The PEAK project focused on interactions between users and publishers, but experimentally tested on-demand document delivery and found that a small but non-trivial number of scholars were willing to purchase immediate access to articles using a credit card. PEAK also tested a novel, generalized subscription scheme that is only feasible in a digital, networked collection. Other chapters consider quite different configurations of the institution, including proposals and projects based on the open paradigm, and models in which author fees to publishers primarily finance the publication functions.

    Regarding the ways in which digital collections are used, several chapters show that readers want and will use older scholarly material, not just new publications. Several successful projects have focused on creating digital collections of archival material. In the case of unique collections, accessibility has greatly increased because users no longer need to physically travel to the collection. Perhaps more surprising has been the dramatic levels of usage for older scholarly publications made available by JSTOR. These publications to a large extent are available in print in the users' local research libraries. When made available over the network and digitally searchable, however, usage far exceeds all estimates of usage for the print-on-paper collections of scholarly journal back issues.

    We also have three chapters that report on user and librarian experiences in traditional university and corporate libraries that made early investments in large-scale digital collection projects. One of the most striking findings is that the interactions between librarian and user are changing; users need different types of services from information professionals, in many cases delivered through different types of interactions. Thus, the human production systems embedded in the institution of research libraries must adapt to the technological changes, just as the collections themselves—and the economic models that support them—must adapt.

    We are still in the early stages of the digital transformation of libraries, and more broadly, the social institutions of information creation, distribution, and retrieval. In this book we present the reflections and findings of leading scholars and practitioners who have been in the front lines of the transformation. We think the evidence is compelling that successful adaptation to the forces of change require an understanding of stakeholder interactions as the many participants in the institution negotiate and navigate the details of the transformation.

    Notes

    1. Kyrillidou and Young (2003) report an average annual increase of 7.7% since 1986, well ahead of general inflation.return to text

    2. American Heritage Dictionary, 3rd ed. (Houghton Mifflin, 1992).return to text

    3. After the British burned Washington and the original Library of Congress during the War of 1812, the new Library of Congress collection was established when Thomas Jefferson sold his magnificent collection of nearly 7000 books.return to text

    4. See Hedstrom and King (2003) for an historical discussion of the emergence of the related institutions of libraries, archives and museums.return to text

    5. Values in this case refers to shared perceptions of relative status or perceived worth within a group or community.return to text

    6. The median price for all houses sold in the U.S. in 1970 was only $23,000. See U.S. Census Bureau (2000), Table 1201.return to text

    7. Recent examples include the durable digital depository (DSpace) initiative from MIT and the University of California's e-scholarship program. Of course, experimental initiatives rise and fall quickly, so the landscape will look considerably different within a few short years. One initiative that appears to have succeeded is JSTOR, described by Kevin Guthrie in a chapter of this book.return to text

    8. For a discussion of the influence of open models in prompting new roles for libraries see Lougee (2002).return to text

    9. There are two good recent surveys that guide one through the literature and contemporary practices: Friedlander and Bessette (2003), and Cox and Cox (2003).return to text

    10. See, e.g., Suber (2002).return to text

    11. The Economics Working Papers Archive was created by Bob Parks at the University of Washington, St. Louis, and can be found at http://econwpa.wustl.edu/. 2500 papers is a very small fraction of the working papers disseminated in Economics in the past 10 years. For example, the more recent and commercial albeit not-for-profit Economics Research Network now hosts over 36,000 full-text economics working papers.return to text

    12. In actual practice, interlibrary loans involve interactions among three parties: the individual requestor, originating library and lending (distant library). For this example in these three paragraphs we simplify, treating the loan as a direct transaction between the borrower and the distant library.return to text

    13. As an example, when a facsimile is made on paper, the quality is necessarily less than in the original document, and this degradation is compounded as copies are made of copies. This fact, among others, limits the extent to which the print-on-paper system could be abused to implement large-scale republishing of a document to avoid copyright payments to the publisher or author. Digital copying, on the other hand, generally maintains perfect fidelity. Thus, publishers are concerned that a digital copy delivered to a user at a different institution might be used to create multiple copies redistributed without associated compensation.return to text

    14. Of course, most of the payment to authors for the creation of their works has been through their employers: universities and the owners of government and private research labs. This bifurcation of payments, with one stream from users to publishers, and another stream from sponsors to creators, is a bit unusual, and certainly quite different than in commercial or popular publishing. However, albeit fascinating, in this book we have not addressed the economic interactions between information creators and their sponsors.return to text

    2. The Rapid Evolution of Scholarly Communication[†]

    Traditional journals, even those available electronically, are changing slowly. However, scholarly communication is rapidly evolving to electronic formats. In some areas, electronic versions of papers are being read about as often as the printed versions. Although there are serious difficulties in comparing figures from different media, growth in the use of electronic scholarly information is sufficiently high that, if it continues for a few years, print versions will no doubt be eclipsed. Further, much electronic information is accessed outside the formal scholarly publication process. There is vigorous growth in forms of electronic communication that take advantage of the unique capabilities of the Web, and that simply do not fit into the traditional journal publishing format.

    This paper presents statistics on the use of print and electronic information. It also discusses preliminary evidence about the changing patterns of usage. This evidence indicates that to stay relevant, scholars, publishers, and librarians will have to make even larger efforts to make their material easily accessible.

    2.1 Introduction

    Traditional journals and libraries have been vital components of scholarly communication. They are evolving, but slowly. The reasons for this are discussed briefly in Section 2.2 and, in more detail, in Odlyzko (1997b). The danger is that they might be rapidly losing their value, and could become irrelevant.

    At first sight, there seems little cause for concern. Print journal subscriptions are declining, but gradually. One often hears of attrition in subscriptions of 3-5% per year. For example, the American Physical Society, with high quality and relatively inexpensive journals, has seen a steady decrease of about 3% per year (Lustig, 1997). At those rates, losing half the circulation takes between 14 and 24 years. On Internet time, that is almost an eternity. Preprints in most areas are still a small fraction of what gets published. Also, library usage is sometimes reported as declining, but again at modest rates.[1]Yet these are not reasons for complacency. Why should there be any declines at all? Ours is an information age; the number of people getting college and postgraduate education is growing rapidly, spending on R&D and implementation of new technologies is skyrocketing. Why should established journal subscriptions be dropping, and why should many of the recent specialized journals be regarded as successes if they reach a circulation of 300? Why should many research monographs be printed in runs smaller than the roughly 500 copies of the first edition of Copernicus' De revolutionibus orbium coelestium of 1543?

    My conclusion is that the current scholarly information system is badly flawed, and does not provide required services. This paper presents evidence that the demand for high quality scholarly information is indeed growing, and can only be satisfied through easy availability on the Web.

    Some of the early studies of electronic usage, such as Lenares' interesting 1999 paper, concentrated on faculty at leading research institutions. Change might be expected to be slow in such places. Although such scholars usually have the resources to be pioneers, they have little incentive, since they have access to good libraries. The evidence to be presented later shows that the current system neglects the needs of growing ranks of scholars who are not at such institutions. Thus, it is better to concentrate on these scholars and their usage of information that is freely available over the Internet.

    Tenopir et al. (2000) does show that, among established scholars, electronic resources play an increasing role, but that current usage is dominated by traditional media. However, it is important to look at growth rates rather than absolute numbers. In an early 1999 discussion in a librarians' mailing list, somebody pointed out that, in 1998, only 20% of the astronomy papers were submitted to Ginsparg's xxx paper archive, now called the arXiv, at http://www.arxiv.org . An immediate rejoinder from another participant was that, while this was true, the corresponding percentage had been around 7% in 1995. It is growth rates that tell us what is in our future.

    This paper is only a brief attempt at finding patterns in the use of online information. At the moment, we have little data about online usage patterns. This is especially regrettable since these patterns appear to be in the midst of substantial change. What we need are careful studies, such as have been carried out for print media.[2] Although the Web in principle makes it possible to provide extremely detailed information about usage, in practice there is little data collection and analysis, especially in scholarly publishing. Even when data are collected, they are seldom released. Thus one purpose in writing the initial draft of this paper was to stimulate further collection and dissemination of usage data. The main purpose, though, was to look for patterns even with the limited data available to me, to provide a starting point for further research.

    Fortunately, many new studies of electronic resources have appeared recently.[3] In general, they do support most of the tentative conclusions of this paper, which are:

    1. Usage of online scholarly material is growing rapidly, and in some cases already appears to surpass the use of traditional print journals. Much online usage appears to come from new readers and often from places that do not have access to print journals.[4]

    2. We can expect the growth of online material to accelerate, especially as the information about usage patterns becomes widely known. Until recently, scholars did not have much incentive to put their works on the Web, as this did not create many new readers. While we can expect that snobbery will retard this step ("I can reach the dozen top experts in my field by publishing in Physical Review Letters, or by sending them my preprint directly, why do I care about the great unwashed?"), the attraction of a much greater audience on the Web and the danger that anything not on the Web will be neglected are likely to become major spurs to scholars making their works available online. For example, the recent study by Lawrence (2001) shows that papers in computer science that are freely available online are cited much more frequently than others. Anderson et al. (2001) might appear to suggest the opposite, since in this study free online availability was associated with lower citation frequency. However, that result is likely anomalous, in that the freely available online-only articles in the journal under study were apparently perceived widely, even if incorrectly, as of inferior quality.

    3. The need for traditional peer review is overrated. Odlyzko (1995) had extensive discussion of the inadequacy of conventional peer review, and how much more useful forms were likely to evolve on the Internet. That paper was written before the ascendancy of the Web. While open review and comments on published papers have been slow to take hold, online references and bibliographies are developing into a new form of peer review. People are coming to my Web page in large numbers looking for specific papers. While in almost all cases I do not know what brings them there, it is pretty clear that they are finding links to the material in a variety of sources, such as bibliographies and references on other home pages. A new form of peer review, it brings many readers even for papers published in obscure and unrefereed places.

    4. Concerns about information overload and chaos on the Net are exaggerated. While better organization of the material would surely be desirable, people are finding their way to the serious information sources in growing numbers as is.

    5. Ease of access and ease of use are paramount. Material on the Web is growing, and scholars, like the commercial content producers, are engaged in a war for the eyeballs. Readers will settle for inferior forms of papers if those are the ones that can be reached easily.

    6. Novel forms of scholarly communication are evolving that are outside the boundaries of traditional journals.

    These conclusions and predictions are supported by data in the rest of this paper. It does appear that while journals are not changing fast, scholarly communication as a whole is evolving rapidly.

    2.2 Rates of technological change

    The conventional notion of "Internet time," in which technological change is accelerated tremendously, is a myth. Rapid change does occur occasionally, and the adoption of Web browsers is frequently cited as an example. Less than 18 months after the release of the first preliminary version of the Mosaic browser, Web transmissions constituted more than half of Internet traffic. However, this was a singular exception. Cell phones, faxes, and ATM machines took much longer to spread. Even on the Internet, new systems are usually adopted much more slowly. How come IPv6 is still basically invisible? Why is HTTP1.1 spreading so slowly? How about TeX and its various dialects (which go back more than two decades)? Even at universities, e-mail took a while to diffuse. The Internet has changed much, but it has not made for a dramatic increase in the pace at which new technologies diffuse. A typical time scale for significant changes is still on the order of a decade. This was noted a long time ago: "A modern maxim says: People tend to overestimate what can be done in one year and to underestimate what can be done in five or ten years." (Licklider, 1965, p.17)

    Further discussion of rates of change is available in Odlyzko (1997b), which presents many examples (such as music CDs, ATM machines, credit cards, and cell phones) supporting the thesis that consumer adoption of new technologies is slow.[5] Thus we should not be surprised if electronic scholarly communication does not turn on a dime.

    The rare rapid adoptions of new technologies (aside from unusual situation such as that of the Web) appear to be associated with the presence of forcing agents that can compel rapid change (Odlyzko, 1997b). On the other hand, sociological changes tend to be very slow, taking a generation or two.

    Aside from simply observing that historically, new technologies have been taking on the order of a decade to be widely adopted, one can also build statistical time simulations that explain this time scale. For instance, we know that usage of electronic forms of scholarly information has typically been growing at 50 to 100 percent per year. This is shown in various tables in this paper. On the other hand, print usage has shown little change. Supposing that print usage remains static, from the moment electronic usage breaks the one percent threshold at which it is likely to be noticed, growth rates of 50 to 100% would only yield parity with print usage after approximately a decade.

    2.3 Disruptive technologies

    Clayton Christensen's book (1997) has become a modern classic. It helps explain the failure of successful organizations, such as Encyclopaedia Britannica, to adopt new technologies. The example of the Britannica, cited in Odlyzko ( 1995, 1999), is very instructive. It was and remains the most scholarly of the English-language encyclopedias. However, it could not cope with the challenges posed first by inexpensive CD-ROM encyclopedias, and more recently by the Web.

    What Christensen calls disruptive technologies tend to have three important characteristics:

    • they initially underperform established products
    • they enable new applications for new customers
    • their performance improves rapidly

    Electronic publishing has these characteristics. Little material was available initially, screen resolution was poor, printers were not widely available and expensive, and so on. However, online material was easy to locate and access, and could provide novel features, such as the constant updating of the genome database. Moreover, costs, quality, and availability have all been improving rapidly.[6] That is why direct comparisons of traditional journals or libraries with electronic collections are not directly relevant. For example, the 1998 paper by Stevens-Rayburn and Bouton is effective in demonstrating that the Web at that time could not substitute for a regular library. It still can't, even in 2000. However, that is not the relevant question.

    The mainframe was not dethroned by the PC directly. The PC could not replace the big machines in areas such as payroll processing. The computing power of the mainframes sold each year is still increasing, and has been increasing all along, even when IBM was going through its traumatic downsizing in the early 1990s. It's just that the PC market has been growing much faster. The mainframe has been consigned to a small niche, and the revenues from that niche have been declining. This is a useful analogy to keep in mind. Traditional journals and libraries are still playing a vital role, but, to quote from Odlyzko (1997b), "... journals are not where the interesting action is." The real issue, to quote Stevens-Rayburn and Bouton (1998), is that "in this new electronic age, if it isn't on-line, for many purposes it might as well not exist." Further, even if it is online, it might not matter if it is not easy to access or is not timely.

    2.4 Effects of barriers to use

    Even small barriers to access reduce usage significantly. Statistics collected by Don King and his collaborators show that as the physical distance to a library increases, usage decreases dramatically.[7] A recent statistical tidbit of a similar nature is the reaction of the mathematicians at Penn State when all journal issues published before 1973 had to be sent to off-site storage because of space limitations. This move was widely disliked, even though any volume can be obtained within one day. The interesting thing is that the mathematical research community of about 200 faculty, visitors, and graduate students asks for only about 850 items to be recalled from storage per year. That is just over 4 items per person per year. It seems likely (based on extrapolations from circulation figures for bound journals that are immediately available on shelves) that usage of this material was much higher when it was easily accessible in the library in their building.

    When subscriptions to journals are canceled, articles from those journals are obtained through interlibrary loans or document delivery services. Some libraries (Louisiana State University's perhaps most prominent among them) have consciously decided to replace journal subscriptions with document delivery, after making a calculation of how much the journals cost per article read. While I do not have comprehensive statistics, my impression is that such moves save more than preliminary computations suggest. The secret behind this phenomenon is that usage of document delivery services is lower than that of journals available right on the spot. Having to fill out a request form and wait a day or a week reduces demand.

    Librarians have known for a long time that ease of use is crucial. They experienced this with card catalogs, where materials whose catalog entries were available only in the paper card catalogs were not being used. Thus the current shift towards online usage had been anticipated.

    ... there's a sense in which the journal articles prior to the inception of that electronic abstracting and indexing database may as well not exist, because they are so difficult to find. Now that we are starting to see, in libraries, full-text showing up online, I think we are very shortly going to cross a sort of critical mass boundary where those publications that are not instantly available in full-text will become kind of second-rate in a sense, not because their quality is low, but just because people will prefer the accessibility of things they can get right away.

    Clifford Lynch, 1997, quoted in Stevens-Rayburn and Bouton (1998)

    Today, we have evidence that Clifford Lynch was correct. Note that Encyclopaedia Britannica has been a victim of this trend. Being the best did not protect it from declines in revenues, restructuring, and being forced to experiment with several business models.

    The shift to online usage is exposing many of the limitations of the traditional system. Research libraries are wonderful institutions. They do provide the best service that was possible with print technology. However, in today's environment, that is not enough. Most printed scholarly papers are available typically in something like 1,000 research libraries. Those libraries are accessible to a decreasing fraction of the growing population of educated people who need them. Further, even for those scholars fortunate enough to be at an institution with a good library, the sizes of the collections are making material harder to access. Hours of availability are limited. Also, studies have shown that even when a book that is searched for is in a given library's collection, in about 40% of the cases it cannot be found when needed.[8]

    The basic problem, of course, is that it is impossible in the print world to make everything easily accessible even in the best library in the world. Space constraints mean that some material will be far from the user. In practice, most libraries can store only a tiny fraction of the material that might be of interest to their patrons. While they have been careful about selecting what seemed to be most relevant, experience shows that when easy electronic access is provided to large bodies of material not normally available in the library, there is demand for it (Luther, 2001; Bensman and Wilder, 1998). That is a major factor propelling the move towards bundling of electronic journal offerings and consortium pricing (Odlyzko, 1999).

    The easy access to online resources is leading to increasing usage, as will be discussed later, and is also documented in Anderson et al. (2001), Gazzale and MacKie-Mason (this volume), Guthrie (this volume) and Luther (2001). But not all online access is equal. Many scholars use Amazon.com's search page as a first choice in doing bibliographic searches for recent books, since it is more user-friendly than the electronic catalogs of the Library of Congress, say. Luther (2001) notes, "Both Academic Press and the American Institute of Physics (AIP) noted that they experienced surges in usage after they introduced new platforms that simplified navigation and access."

    Ease of use has an important bearing on pricing. Odlyzko (1995) predicted that pay-per-view was likely doomed to fail in scholarly publishing, because of its deterrent effect on usage.[9]Publishers have now, after experiments with PEAK and other pricing models, moved to this view as well. For example, Hunter (this volume) states that

    [Elsevier's] goal is to give people access to as much information as possible on a flat fee, unlimited use basis. [Elsevier's] experience has been that as soon as the usage is metered on a per-article basis, there is an inhibition on use or a concern about exceeding some budget allocation.

    Similarly, Luther (2001) points out that "Philosophically, Academic Press is opposed to a business model in which charges increase with use because it discourages use."

    Easy access implies not only greater use, but also changing patterns of use. For example, a recent news story discussed how the Internet is altering the doctor-patient relationship (Kolata, 2000). The example that opens the story is of a lady who is reluctantly told by the doctor she might have lupus, and leaves the clinic terrified of what this might be. She then proceeds to obtain information about this disease from the Internet. When she returns to see a different, more pleasant physician, she is well-informed and prepared to question the diagnosis and possible treatment. What is remarkable about this story is that the basic approach of this patient was feasible before the arrival of the Web. She could have gone to her local library, where the reference librarians would have been delighted to point her to many excellent print sources of medical information. However, few people availed themselves of such opportunities before. Now, with the easy availability of the Web, we see a different story.

    The arguments about effects of barriers to access and of lowering such barriers suggest that scholarly communication will undergo substantial changes. We should expect to see greater use of online material. We should also see much greater use of it by people outside the narrow disciplinary areas that produce it. Much of this use will come from outside the traditional academic and research institutions, but a considerable portion is likely to come from other departments within an institution. Further, the increasing volume of material, as well as the decreasing role of traditional peer review, are likely to lead to greater demand for survey and handbook material. With lower barriers to interactions and access to specialized literature, we should also see more interdisciplinary work.

    2.5 Scholarly information as a commodity

    Authors like to think of their articles as precious resources that are absolutely unique and for which no substitutes can be found. Yet a more accurate picture is that any one article is just one item in a river of knowledge, and that this river is rising. Substitutes exist for almost everything. Some people interested in Fermat's Last Theorem will want, for historical or other reasons, to see Andrew Wiles' original paper (Wiles, 1995). Many others will be happy with a reference to where and when that paper was published, and others will be satisfied with various popular accounts of the proof. Even those interested in the technical details will often be satisfied with, and often be better server by, other presentations, such as that in the Darmon, Diamond, and Taylor account of the proof (Darmon et al., 1997).

    Thinking about a river of knowledge instead of a collection of unique and irreplaceable nuggets helps explain why scholars manage to function even with a badly flawed information system. Even though in 40% of the cases, a desired book cannot be retrieved from a desired book cannot be retrieved from the library's shelves, usually some other book covering the same topic can be found. Spending on libraries by research universities is correlated most strongly the total budgets, and very weakly with quality. Harvard spends about $70 million per year on its libraries, verus $25 million for Princeton. Yet would anyone claim that a Harvard education or scholarly output is almost three times as good as that of Princeton?

    The Internet is reducing the costs of production and distribution of information. As a result, there is a flood of material. Much is of low quality, but a substantial fraction is very good. Before looking whether scholars are using this material let us consider usage of print material.

    2.6 Usage of print journals

    We are fortunate to have an excellent recent survey of usage of print journals in the book of Carol Tenopir and Don King (2000).[10] It shows that a typical technical paper is read, which is defined as not necessarily reading it carefully, but going beyond just glancing at the title and abstract, between 500 and 1500 times. These readings average about one hour in length, and in about half the cases represent the reader's first encounter with an article.

    The estimate of 500 to 1500 readings per article is much higher than some earlier studies had come up with. The studies on which Tenopir and King base their estimates do have biases that may raise the reading estimates above the true value. For example, they are based on self-reporting by technical professionals, who may overestimate their readings. Further, those figures include articles in technical journals with large circulations (such as Science, Nature, and IEEE Spectrum) that are not typical of library holdings. If one considers library usage studies, such as those that have been carried out at the University of Wisconsin in Madison, one comes up with somewhat lower estimates for the number of readings per paper.[11] Still, the basic conclusion that a typical technical paper is read several hundred times appears valid.

    The studies reported in Tenopir and King (2000) also show that, in the print world, articles are usually read mostly in the first half-year after publication. Afterwards, usage drops off sharply.

    2.7 Growth in usage of electronic information

    It is hard to measure online activity accurately. The earliest and still widely used measure is that of "hits," or requests for a file. Unfortunately, with the growth of complicated pages, that measure is harder to evaluate. When possible, I prefer to look at full article downloads. Finally, as a conservative measure, one can look at the number of hosts (unique IP addresses) that requested information from a server. Even then, there are considerable uncertainties. The same person may send requests from several hosts. On the other hand, common employment of proxies and caches means that many people may hide behind a single host address, and a single download may lead to multiple users obtaining copies (as happens when papers are forwarded via email as well).

    In addition to the uncertainties in interpreting the activity seen at a server, it is hard to compare data from different servers. Logs are set to record different things, and some Web pages are much more complicated than others that have the same or equivalent content. Thus comparing different measures of online activity is of necessity like comparing apples, oranges, pears, bananas, and onions. Some of the difficulties of such comparisons can be avoided by concentrating on rates of growth. If online information access is growing much faster than usage of print material, it will eventually dominate.

    In spite of problems inherent in measuring online activity, it is obvious by most measures that Internet is growing rapidly. Typical growth rates, whether of bytes of traffic on backbones or of hosts, are on the order of 100% per year (Odlyzko, 2000; Coffman and Odlyzko, 1998). When one looks at usage of scholarly information online, typical growth rates are in the 50 to 100% range. For example, Table 2.1 shows the utilization of the online resources of the Library of Congress. Growth, in terms of bytes transmitted was over 100% per year for three years before decreasing to 90% in 1998, and then decreasing further in 1999, to 38%. It then increased to 62% in 2000. Table 2.2 shows downloads from the AT&T Labs - Research Web site, at http://www.research.att.com/, which contains a variety of papers, software, data, and other technical information. The growth rate there in the number of requests has been around 50% per year for several years, but between 2000 and 2001, it jumped to over 120%.

    Table 2.1: Library of Congress electronic resource usage statistics.
    month GB requests (millions)
    Feb. 1995 14.0 1.1
    Feb. 1996 31.2 3.9
    Feb. 1997 109.4 15.1
    Feb. 1998 282.0 36.0
    Feb. 1999 535.0 48.6
    Feb. 2000 741.1 61.3
    Feb. 2001 1202.6 86.7
    NOTE: For each month, shows total volume of material sent out that month, in gigabytes, and the number of requests.
    Table 2.2: AT&T Labs - Research external Web server statistics.
    month requests hosts
    Jan. 1997 542,644 17,866*
    Jan. 1998 754,477 35,943
    Jan. 1999 1,204,664 67,191
    Jan. 2000 1,843,319 100,077
    Jan. 2001 4,190,362 178,923
    NOTE: Excludes most crawler activity.
    *Number of hosts for Jan. 1997 is an estimate.

    Some measures of electronic information usage are showing signs of stability, or even decreasing growth. For example, Table 2.3 shows utilization of Leslie Lamport's page devoted to material about a logic for specifying and reasoning about concurrent and reactive systems.[12] Usage had been pretty stable in 1996 through 1998. When I corresponded with him about this in 1999, he thought usage had reached a steady state, with the entire community interested in this esoteric technical subject already accessing the page as much as they would ever need to do. However, the final counts for 1999 and 2000 showed substantial increases.

    Table 2.3: Visits to Leslie Lamport's Temporal Logic of Actions Web page.
    year visits hosts
    1996 18,800 5,300
    1997 19,000 5,600
    1998 18,400 5,300
    1999 31,100 8,000
    2000 33,500 8,000
    NOTE: approximate counts

    The next few sections discuss data about several online information sources that are freely available on the Internet.

    2.8 Electronic journals and other organized databases

    Some reports are already available on the dramatic increase in usage of scholarly information that is easily available. Traditionally, theses and dissertations have been practically invisible, used primarily within the institution where they were written, and even there, they were not accessed frequently. Free access to digital versions is now leading to an upsurge in usage, as is described in McMillan et al. (1999).

    In the remainder of this section, even though it is not fully justified, I will equate a full article download with a reading as measured by Don King and his collaborators.

    The entire American Mathematical Society e-math system was running at about 1.2 million "hits" per month in early 1999. The Ginsparg archive (arXive) at Los Alamos was getting about 2 million hits per month. The netlib system of Jack Dongarra and Eric Grosse was at about 2.5 million hits per month.

    For detailed statistics on usage and growth of JSTOR, see (Guthrie, this volume). By the end of 1999, its usage was several million a month, whether one counts hits or full article downloads, and was growing at over 100% per year.

    The Brazilian SciELO (Scientific Electronic Library Online) project available at http://www.scielo.br/ , started out in early 1998. It appears to be still going through the initial period of explosive growth. In January 1999, 4,943 pages were transmitted. A year later, that number had grown to 63,695. 67,143 hosts requested pages in 1999, so it was not just a small group of users who were involved. It is too early to tell about how fast it will continue to grow, but it seems worth listing this project to show that even the less industrialized countries are participating in making literature freely available.

    Paul Ginsparg's arXive had about 100,000 papers in early 1999, and was running at a rate of about 7 million full article downloads per year. Thus on average each article was downloaded about 70 times per year. These download statistics were just for the main Los Alamos server. If we assume that the more than a dozen mirrors collectively see as much activity as the main server, then we get a download rate of about 140 times per year per article. This is misleading, though, since it mixes old and new papers, which have different utilization patterns.

    If we look at download activity for arXiv articles as a function of time, we find that on average an article gets downloaded around 150 times within one year of its submission, and then 20 to 30 times a year in subsequent years.[13] In particular, even articles submitted around 1991 get downloaded that often. Since this again covers just the main server, we probably should again multiply these numbers by two to get total activity. If we do that, we get into the range of readings per article that established journals experience. The pattern of usage differs from that observed by King and other for printed journal articles. Those are read primarily in the six months after publication, and then the frequency with which they are accessed decreases.

    The Electronic Journal of Combinatorics published about 200 articles by early 1999, and had about 30,000 full article downloads from its main site during 1999. That is an average of 150 downloads per article. Multiplying that by two to account for the many mirror sites again gets us to about 300 downloads per article per year. Data about distribution of downloads with time is not available.

    The general impression from the statistics quoted above is that articles in electronic archives and electronic journals may not yet be read as frequently as printed journal articles, but are getting close. On the other hand, some sources appear to be used much more frequently online than they would be in print.

    Additional evidence that online access changes scholars' reading patterns is provided by First Monday, "the peer-reviewed journal of the Internet," at http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/ . Issues are made freely available on the first Monday of each month. First Monday started publication in May 1996. About 3,600 people subscribe to the e-mail notification service.

    First Monday has provided me with access to the logs of their U.S. Web server from January 1999 through February 2000.[14]This is not sufficient for a careful statistical study, but some interesting patterns can be discerned in the data.

    Over this period, the number of full paper downloads has grown from a range of 50,000 to 60,000 per month in early 1999, to between 110,000 and 120,000 per month in early 2000. Distinct hosts requesting articles have increased from between 12,000 to 15,000 to over 20,000 each month. Thus the growth rate of requests has been close to the 100% that has occurred frequently on the Internet. Since there are only 3,600 subscribers, this suggests many others learn of the material through word of mouth, e-mail, or other methods.

    In a typical month, the largest number of downloads is to articles from that month's issue. In subsequent months, accesses to an issue drop in a pattern similar to that found by Don King in his studies of print journals. Half a year later, downloads are usually down to a quarter or even a sixth of the first month's rate. At that stage, though, the story changes. Whereas for print journals, usage continues to decrease with time, for First Monday it appears to increase. For example, there were 9,064 full article downloads from all the 1997 issues in February 1999, and 19,378 in February 2000. Thus accesses to the 1997 issues kept pace with the general growth of usage. Of the articles that were most frequently downloaded in 1999, 6 of the top 10 had been published in previous years. This supports the thesis that easy online access leads to much wider usage of older materials.

    My personal Web page, which was at AT&T until August 2001, and is now at http://www.dtc.umn.edu/~odlyzko/doc/internet.size.pdf, has also seen rapid growth in usage. However, it is hard to discuss growth rates meaningfully in a short space, since most of the growth came from new papers in new areas. Instead, I will discuss the usage patterns that I have observed.

    During January 2000, there were 10,360 hits on my home page from 1,808 hosts, excluding .gif files, and hits from obvious crawlers. Most of these 1,808 hosts only looked at various index files. If we exclude those, as well as the ones that downloaded only my cv or only abstracts of papers, we are left with 656 hosts that downloaded 1,198 full copies of articles. Of those 656 hosts, 494 downloaded just a single paper. Many of those 494 requested a specific URL for an article as opposed to looking at the home page for pointers, and then disappeared. Thus on average the people who visited my home page seemed to know what they were looking for, got it, and moved on.

    Visitors to my Web page were remarkably quiet in the face of some obvious faults. Many of the papers posted on that page, especially old ones, are incomplete, in that they are early versions, and usually do not have figures that are present in the printed versions. Still, that occasions few complaints. For example, in 1999, a posting to a number theory mailing lists resulted in 152 downloads of a paper in the space of less than two weeks. However, only one person complained about the lack of figures in the Web version, even though they are very helpful in visualizing the behavior shown in the paper.

    Another anecdotal piece of evidence demonstrates what happens on the Web. Several times people have told me they were glad to meet me, as they had read my papers and benefited from them. Conversation showed that they indeed were familiar with the papers in question. However, they also told me that they had lost the URL, and would I please remind them where my home page was? Even though finding my home page on the Web is easy, since my name is not a particularly common one, they obviously did not find it necessary to bother doing so. This, as well as the lack of complaints regarding incomplete papers, suggests a world of plenty. People are guided to Web pages by a variety of cues, get whatever they can from those pages, and move on to other things. It is not a world of a few precious treasures with no substitutes.

    The importance of making material easily available was demonstrated in a very graphic form when I made .pdf versions of my technical papers available in April 1998. There was an immediate jump in the rate of downloads. Prior to that, mathematical papers were available only in .ps and .tex formats, and the ones on electronic publishing and related topics in .ps and straight text. Most PC owners do not have easy access to tools for reading .ps papers, and were apparently bypassing the available material that required extra effort from them. This is similar to observations of Academic Press and the American Institute of Physics (Luther, 2001) that better interfaces lead to higher usage.

    The temporal pattern of article usage on my Web page shows the behavior that was already noted for arXiv and for First Monday.After an initial period, frequency of access does not vary with age of article, and stays relatively constant with time, after discounting for general growth in usage.

    There is more evidence that easy online access leads to changes in usage patterns. For example, downloads from my home page go to a variety of sources all over the world. Some are leading to email correspondence from places like Pakistan, the Philippines, or Mexico. This is not surprising in itself, since those countries do have technically educated populations that are growing. What is interesting is that this correspondence predominantly refers to papers that have been downloaded electronically, and to copies of older papers that are not available in digital form, and which the requesters had learned about from my home page. This does suggest strongly that easy availability is stimulating interest from a much wider audience. This conclusion is also supported by similar observations concerning correspondence with people in industrialized countries. Much comes from people outside universities or large research institutions that have good libraries and who would be unlikely to read my papers in print.

    In a small fraction of cases the referrer field on requests shows where the requester found the URL. In many cases, such requests come from reading lists in college or graduate courses.

    As a final note, spikes in usage often occur when one of my papers is mentioned in some newsletter or discussion group. For example, Bruce Schneier publishes CRYPTO-GRAM, a monthly email newsletter on cryptography and computer security, with a circulation of about 20,000. In early August 1999, CRYPTO-GRAM mentioned a recent preprint of mine which I had not advertised much, and which was about to appear in a regular print journal. Over the next two weeks over a thousand copies were downloaded. I am convinced that this is a higher figure than the number of times the printed version will be read.

    The CRYPTO-GRAM example as well as those of other visits to my home page suggest that informal versions of peer review are in operation. A recommendation from someone, or a reference in a paper that the reader trusts, all serve to validate even unpublished preprints. Scholars pursue a variety of cues in selecting what material to access.

    2.9 New forms of scholarly communication

    A popular destination on the AT&T Labs - Research Web server is my colleague Neil Sloane's On-Line Encyclopedia of Integer Sequences, accessible from his home page, at http://www.research.att.com/~njas/. In January 2000, it attracted more than 6% of all the hits to the AT&T Labs - Research site. This "encyclopedia" is a novel combination of a database, software, and now also a new online journal. The integer sequence project enables people to find out what the next element is in a sequence such as

    0, 1, 8, 78, 944, 13800, 237432, ...

    This might seem like recreational mathematics, but it is very serious, as many research papers acknowledge the assistance of Sloane's database or, in earlier times, his books on this subject. It serves to tie mathematicians, computer scientists, physicists. chemists, and engineers together, and stimulate further research.[15] It represents a novel form of communication that could not be captured in print form.

    Table 2.4: Requests to Neil Sloane's sequence server.
    month requests hosts
    Jan. 1997 6,646 550*
    Jan. 1998 33,508 2,294
    Jan. 1999 58,655 3,996
    Jan. 2000 135,843 7,851
    Jan. 2001 222,795 11,105
    NOTE: *Hosts for 1997 estimated.

    Another popular site that is also a locus of mathematical activity is Steve Finch's "Favorite Mathematical Constants" page at http://www.mathcad.com/library/Constants/ . It also shows rapid growth in usage although one that is harder to quantify, since monitoring software was changed less than a year ago, so comparisons are harder to make. Just as with Sloane's integer sequence page, it is becoming a form of "portal" to mathematics, one that does not fit easily into traditional publications models.

    2.10 Conclusions and predictions

    Many discussions of the future of scholarly publishing have been dominated by economic considerations. Digitization has often been seen as a solution to the "library crisis," which forces libraries to cut down on subscriptions. So far there has been little effect in this area, as pricing trends have not changed much (Odlyzko, 1999).

    In the long run it has been clear that print will eventually become irrelevant, aside from any economic pressures, as it is simply too inflexible. Gutenberg's invention imprisoned scholarly publishing in a straitjacket that will be discarded eventually. However, the inertia of the scholarly publishing system is enormous, and so traditional journals have not changed much. They are in the process of migrating to the Web, but operate just as they did in print. However, we are beginning to see new ventures that will lead to new modes of operations. Still, it will be a while before they become a sizable fraction of the total scholarly publishing enterprise.

    The large majority of scholarly publications are likely not to change much for several decades. However, there will be growing pressure to make them easily available. In particular, scholars are likely to press ever harder for free circulation and archiving of preprints. The realization will spread that anything not easily available on the Web will be almost invisible. Whether they like it or not, scholars are engaged in a war for the eyeballs and ease of access will be seen as vital.

    Ease of access is likely to promote the natural evolution of scholarly work. There will be more interdisciplinary research, and more survey publications. Some of these trends are beginning to appear in the data discussed in this paper, and we are likely to get more confirmation in the next few years.

    Notes

    I thank Steve Finch, Paul Ginsparg, Jim Gray, Eric Grosse, Kevin Guthrie, Stevan Harnad, Steve Heller, Patrick Ion, Don King, Kevin Kiyan, Greg Kuperberg, Leslie Lamport, Steve Lawrence, Carol Montgomery, Gary Mullen, Ann Okerson, Kimberly Parker, Robby Robson, Carol Tenopir, Ed Valauskas, Hal Varian, Tom Walker, and Herb Wilf, for providing comments, corrections, and helpful information.return to textreturn to text

    1. For circulation figures for major research libraries in the U.S., see Association of Research Libraries, Statistics and Measurement Program (http://www.arl.org/stats/index.html). return to text

    2. An excellent and up-to-date survey of those is presented in Tenopir and King (2000). See also a brief summary in King and Tenopir (this volume).return to text

    3. Some of the notable ones are Anderson et al. (2001), Guthrie (this volume), Hunter (this volume), Lawrence (2001), Luther (2001) and Tenopir et al. (2000). They will be referenced later.return to text

    4. Evidence can be found in Guthrie (this volume) and Luther (2001), for example, and in later sections of this paper.return to text

    5. For more evidence, see also Klopfenstein (1989) and the references there.return to text

    6. It should be noted that print also had these characteristics O'Donnell (1996) and Trithemius (1974).return to text

    7. See Griffiths and King (1993) and Fig. 9.4 on p. 202 of Lesk (1997), reproduced from Griffiths and King (1993).return to text

    8. See endnote 10 to Chapter 2 of Buckland (1992) for references.return to text

    9. More evidence and arguments supporting that prediction was developed in Fishburn et al. (1997) and in Odlyzko (2001).return to text

    10. A summary is presented in King and Tenopir (this volume).return to text

    11. The University of Wisconsin study is available at http://wendt.library.wisc.edu/archive/journals/costben.html return to text

    12. This page is available at http://research.microsoft.com/users/lamport/tla/tla.html.return to text

    13. This is based on extrapolating very freely from data kindly supplied by Paul Ginspargreturn to text

    14. The data for January 1999 is incomplete, since the main server was then in transition from Denmark to the U.S.return to text

    15. For an account of the project, see Sloane's recent paper Sloane (1998).return to text

    II. Pricing Electronic Access to Knowledge: The PEAK Experiment

    Section Introduction: Pricing Electronic Access to Knowledge (The PEAK Experiment)

    The PEAK experiment (Pricing Electronic Access to Knowledge) grew out of Elsevier Science's TULIP (The University Licensing Program) project, an early effort to understand the requirements for providing access to electronic journals. Between 1991 and 1995, nine university libraries participated in TULIP. Coinciding with the birth of the World Wide Web, the project caught the wave of building excitement for the Internet and Web applications. TULIP, however, had a limited focus. Its primary achievement lay in working through technical requirements to deliver digital facsimiles of print journals. Each participant institution created a local implementation and engaged users in the study, yet the project barely scratched the surface of issues associated with emerging user preferences and behavior. TULIP also fell short of original goals to address pricing models for electronic products.

    As a successor to TULIP, PEAK picked up where TULIP left off in tackling economic and user issues. The project coalesced critical expertise at the University of Michigan: the expertise of librarians and technologists deeply engaged in early digital library development as well as research expertise from the School of Information's program in Information Economics, Management and Policy. The context in which PEAK took shape presented an extraordinary convergence of trends. Universities along with other large research organizations were grappling with early requirements of network infrastructure and core applications. The Web catalyzed significant interest, yet the rate of adoption and participation was uneven, often due to barriers within the institution. Libraries, comfortable with the successes of online catalogs and networked indices, were suddenly faced with new full text products but had relatively little relevant experience to help guide the associated publisher negotiations. PEAK unfolded concurrently with these critical developments of institutional, library, and publisher infrastructure for digital resources.

    In her chapter, Karen Hunter captures the transformation underway in publishing in the mid-1990's. Early pricing proposals conjoined print and electronic pricing (e.g., print and electronic journal packages at 135% of print cost) and were conceived to sustain institutional spending levels during the transition from print to digital delivery. The industry's sales force was experiencing the teething stages of licensing, and publishers were pressured to re-conceive approaches to the library marketplace. New questions about functionality, license terms, and sustainability were added to the library's existing concerns about escalating costs.

    The environment of technology development at the University of Michigan provided a rich context for PEAK. Yet, as Bonn et. al. describe in their chapter, the project presented a series of challenges as Elsevier's production processes evolved and as PEAK's research protocols shaped the necessary system design. In the end, PEAK delivered some 1200 journals to 12 institutions while also creating an experimental context in which distinct subscription models could be explored.

    In retrospect, many of the lessons learned from PEAK seem obvious. PEAK shed light on the foundational requirements for institutional technology infrastructure. Pervasive network connectivity and robust authentication are now taken for granted. PEAK data also highlighted potentially useful attributes for electronic journal services. User behavior suggested a benefit of access to comprehensive collections, enabling use beyond known print title preferences. Obstacles to use, however small (e.g., the necessity of entering a user name and password), were found to have real impact. Not surprisingly, user-pay models were viewed as less than desirable by librarians and individual users alike. Analysis by PEAK's research team also suggested that the library's intermediary role between publisher and constituents may temper the direct market effects of user behavior.

    Gazzale and MacKie-Mason detail the research design behind PEAK's journal price models. The novel concept of generalized subscriptions addressed the desire among libraries for ownership of collections, while unbundling the convention of journal volumes and issues. The generalized model offered the opportunity for pre-payment for articles (at a higher price than traditional subscriptions) and development of institution-specific, customized archives of journal content based on user selections. Some would argue that the capability to search a large body of journals and to extract desired articles for local ownership was a model ahead of its time in offering a flexible alternative to traditional subscriptions. Today, we see a growing number of customizable services for digital content.

    PEAK's experimental design was groundbreaking in several respects. PEAK provided a context in which publisher price models could be controlled and manipulated for participant institutions. In its role, the University of Michigan took on the development, marketing, and management of a full-blown journal service for over 340,000 users. Elsevier Science provided content and also accepted the necessary distance from the research design to ensure the integrity of the experimental protocol. This type of collaboration and field research is a rarity.

    PEAK was perhaps unique in exploring the interaction of publisher, institutional (library), and user interests. Since PEAK, few (if any) opportunities have emerged to take such a holistic approach.

    4. The PEAK Project: A Field Experiment in Pricing and Usage of a Digital Collection

    Electronic access to scholarly journals is now an important and commonly accepted tool for researchers. The user community has become more familiar with the medium over time and has started to actively bid for alternative forms of access. Technological improvements in communication networks, paired with decreasing costs of hardware, create greater incentives for innovation. Consequently, although publishers and libraries face a number of challenges, they also have promising new opportunities.[1] Publishers are creating many new electronic-only journals on the Internet, while also developing and deploying electronic access to literature traditionally distributed on paper. They are modifying traditional pricing schemes and content bundles, and creating new schemes to take advantage of the characteristics of digital duplication and distribution.

    From 1997 to 1999, researchers in economics at the University of Michigan worked in collaboration with the University of Michigan Library to design and run a project called Pricing Electronic Access to Knowledge (PEAK). This project was both a production service for electronic journal delivery and an opportunity for experimental pricing research that provided access to the more than 1,100 journals then published by Elsevier Science—-journals that include much of the leading research in the physical, life and social sciences. The project provided an opportunity for universities and other research institutions to have electronic access to a large number of journals. This access provided fast and sophisticated searching, nearly instantaneous document delivery, and new possibilities for subscriptions. The University of Michigan Library Digital Library Production Service (DLPS) provided a host service consisting of roughly three and a half years of content (January 1995—June 1999) of the Elsevier Science scholarly journals. Participating institutions had access to this content for over 18 months, after which the project ended and access through our system ceased. Michigan provided Internet-based delivery to over 340,000 authorized users at twelve campuses and commercial research facilities across the U.S. On top of this production system we implemented a field trial in electronic access pricing and usage.

    Our primary experimental objective was to learn how additional value can be extracted from existing content by means of innovative electronic product offerings and pricing schemes. We sought to determine how users respond to different pricing schemes and to assess the additional value created from different product offerings. We also analyzed the impact of the different pricing schemes on producer revenues. To a limited extent, we think our results generalize to various business models, customer populations and information goods. Finally, we compared our empirical results with the current conclusions of the economic literature on bundling of information goods.

    4.1 PEAK in context: Electronic journal publishing and the University of Michigan Library

    The scholarly journal has a tradition of purpose and structure dating back several centuries, with little change. Despite the combined effects of price inflation and fluctuations of currency exchange that libraries weathered in the 1970's and 1980's, the basic construct of journals and subscriptions remained stable and, in fact, the journal has continued to flourish in a world of scholarly publishing that is increasingly global and conglomerate. In contrast to this tradition-laden history, the rapid change stimulated by information technologies in the 1990's was remarkable and unprecedented.

    Early efforts to harness the potential of digital technology for journals focused primarily on distribution and access. A far more gradual and separate process of re-engineering editorial review and production processes emerged somewhat later. Major publishers undertook an array of projects with heightened activity evident at the dawn of the Web. Efforts such as Springer Verlag's Red Sage project and Elsevier Sciences' TULIP (The University LIcensing Program) initiative broke ground in testing the limits of Internet distribution and catalyzing the development of more robust access systems. TULIP involved nine institutions and addressed a broad set of issues, including both technical and behavioral concerns. The four-year project achieved significant progress, but failed to address issues of economics and pricing for the new electronic media (Elsevier Science, 1996).

    In the aftermath of this early experimentation in electronic journal publishing, a number of inter-related issues emerged that stimulated interest in the economic questions surrounding journals and their electronic versions. Nearly every major publisher launched electronic publishing initiatives and, typically, tackled issues of price, product, and market in a manner that extrapolated from print practices. Early pricing models tightly coupled electronic and print subscriptions. Often electronic versions were available as a companion to the print version, at a surcharge of 15% or more. Almost simultaneously, the phenomenon of electronic preprint services emerged. These factors—plus a growing appetite for enhanced journal functionality—have contributed to the heightened interest surrounding pricing and product models for scholarly journals.

    The University of Michigan was one of the institutional participants in TULIP, with a joint project team drawing from Engineering, the School of Information and Library Studies (now the School of Information), the Information Technology Division, and the University Library. Michigan was the first site to implement the 43 journals in materials science offered through TULIP and was also the first to move the service to the Web environment. TULIP's outcomes included a far better understanding of the distribution and access issues associated with electronic journals, but also underscored the inadequacy of an experiment offering too few journals to attract users on a regular basis.

    The TULIP experience, coupled with an early history of standardized markup language (i.e., SGML) development in the 1980's, provided a unique environment for digital library development and contributed to Michigan's selection as a technology service provider for the Mellon Foundation-funded JSTOR project (Guthrie, this volume, 1997). The unique organizational collaboration begun with TULIP was expanded in 1993 and institutionalized in a campus-wide digital library program that today encompasses a full production service and development capability (Lougee, 1998). Within this new program the TULIP legacy was pursued with an eye toward better understanding the value, price, and product for electronic journals.

    In 1996, an agreement was reached with Elsevier Science to launch PEAK in an attempt to address issues left outstanding in the TULIP process. Through PEAK, Michigan hoped to gain a better understanding of large-scale management of electronic journals through the development of production systems and processes to accommodate the large body of content published by Elsevier Science. While this goal was important, PEAK also provided a large-scale testbed in which to explore issues of pricing and product design for electronic journals.

    4.2 Issues guiding the design of PEAK

    Information goods such as electronic journals have two defining characteristics. The first and most important is low marginal (incremental) cost. Once the content is transformed into a digital format, the information can be repackaged and distributed at almost zero cost. Nevertheless, the second feature is that information goods often involve high fixed ("first copy") costs of production. A production facility and distribution server must be in place in order to take advantage of the low costs of distribution. For a typical scholarly journal, most of the cost to be recovered by the producer is fixed.[2] The same is true for both publisher and distributor in an electronic access environment. With the cost of electronic "printing and postage" essentially zero, nearly all of the distribution cost consists of system costs for hardware, administration, database creation and maintenance—all costs that must be incurred whether there are two or two million users. Our experience with PEAK bears this out: the only significant variable operating cost was the service of the user support team who answered questions from individual users—a small part of the total cost of providing the PEAK service.

    Electronic access offers new opportunities to create and extract value from scholarly literature. This additional value can benefit readers, libraries, distributors and publishers. For distributors and publishers, additional value can help to recover the high fixed costs. Increased value can be created through the production of new products and services (such as early notification services and bibliographic hyperlinking). Additional value that already exists in current content can also be delivered to users and, in part, extracted by publishers through new product bundling and nonlinear pricing schemes that become possible with electronic distribution. For example, journal content can be unbundled and then rebundled in many different ways. Bundling enables the generation of additional value from existing content by targeting a variety of product packages for customers who value the existing content differently. For example, most four-year colleges subscribe to only a small fraction of Elsevier titles. With innovative electronic bundling options, this and other less-served populations may be able to access additional content.

    4.3 System design and implementation

    Our research team worked with the University Library to design an online system and market this system to a variety of information clients. We primarily targeted libraries, focusing on academic and corporate libraries. Contacts were made with institutions expressing interest and institutions already invested in digital library activity. Over thirty institutions were contacted as potential participants, of which twelve agreed to join the effort. Decisions not to participate were frequently driven by budget limitations, or by the absence of pricing options of interest to the institution.[3] The resulting mix of institutions were diverse in size and information technology infrastructure, as well as in organizational mission. PEAK participants were the University of Michigan, the University of Minnesota, Indiana University, Texas A & M, Lehigh University, Michigan Technological University, Vanderbilt, Drexel, Philadelphia College of Osteopathic Medicine, University of the Sciences in Philadelphia, Dow Chemical, and Warner-Lambert (now part of Pfizer Pharmaceuticals).

    The PEAK system provided full-text search of and retrieval from the entire body of Elsevier content for the duration of the experiment, including some content from earlier years that Elsevier provided to help establish a critical mass of content. Several search and browse options were available to users, including mechanisms that limited searches to discipline-specific categories designed and assigned by librarians at the University of Michigan. Any authorized user could search the system, view abstracts, and have access to all free content (see below). Access to "full-length articles" (a designation assigned by Elsevier) depended on the user's institutional subscription package. With this access, articles could be viewed on screen or printed.

    The delivery and management of such a large body of content (over 11,000,000 pages at the conclusion of the experiment) and the support of the PEAK experiment required a considerable commitment of both system and human resources. In addition to the actual delivery of content, project staff were responsible for managing the authentication mechanisms, collecting and extracting statistics, and providing user support for, potentially, tens of thousands of users.

    PEAK ran primarily on a Sun E3000 with four processors, and was stored on several different configurations of RAID (redundant, fast access drive systems). User authentication and subscription/purchase information was handled by a subsidiary Sun UltraSparc.

    Searching was conducted primarily with a locally-developed search engine called FTL (Faster Than Light). Bibliographic search used the OpenText search engine. The authentication/authorization server ran Oracle to manage user and subscription information. Several other types of software came into play with use of the system. They included

    • Cartesian Inc.'s compression software, CPC, which allowed us to regain a significant amount of disk space through compression of the TIFF images;

    • Tif2gif software developed at the University of Michigan, which converted images stored in CPC to GIFs;

    • CPC, printps (for generating Postscript), and Adobe Distiller, which were used in combination to deliver images to users as PDF files; and

    • The Stronghold web server, which provided SSL encryption for the security of user information.

    Project staff at the University of Michigan Digital Library Production Service (DLPS) wrote middleware to manage the interoperation of the tools discussed above.

    Designing and maintaining the PEAK system, as well as providing user support and service for the participant institutions, required significant staff resources. Once the system was specified by the research staff, design and maintenance of the system were undertaken by a senior programmer working close to full time in collaboration with a DLPS interface specialist. DLPS programming staff contributed as needed, and the head of DLPS provided management. A full time programmer provided PEAK database support, collecting statistics for the research team and the participants, as well as maintaining the database of authorized users and the transaction database. Two librarians provided about one full-time equivalent of user support (one was responsible for the remote sites, the other for the University of Michigan community). Other UM library staff put in considerable time during the setup phases of PEAK to market the service to potential participants, some of whom required substantial education about the methods and aims of the experiment, and to formalize the licensing agreements with participants.

    In order to facilitate per-article purchases, PEAK also needed to have the capacity to accept and process credit card charges. In the early months of the service, this billing was handled by First Virtual, a third-party electronic commerce company. This commercial provider also verified the legitimacy of users and issued virtual PINs that were to be used as passwords for the PEAK system. Less than half way through the PEAK experiment, First Virtual restructured and no longer offered these services. At that point, DLPS took over the processing of applications and passwords. Credit card operations were transferred to the University of Michigan Press.

    We designed the system to support the planned research as well as to serve the daily information needs of a large and varied user community. Accommodating both purposes introduced a number of complexities. We sought to balance conflicting demands and to adhere to some fundamental goals:

    • Providing meaningful intellectual access via a Web interface to a very large body of content.

    • Balancing the aims of the research team with the library's commitment to making the content as easily accessible as possible.

    • Enabling and supporting a number of different transaction types, taking into account that not all users have access to all types of transactions and that the suite of transaction choices may change over time, depending on the manipulation of experimental conditions.

    • Enabling and supporting a number of different access levels, based on whether the user authenticates by password, the location of the user, the date of the material, and the type of material (e.g., full-length articles vs. other materials).

    Tensions were exacerbated by our reliance on content from just one large commercial publisher and by the specific requirements for the research experiments. John Price-Wilkin, Head of DLPS, compared the production system problems to those of a standard service (Price-Wilkin, 1999):

    The research model further complicates these methods for access, where all methods for access are not available to all institutions, and not all institutions choose to take advantage of all methods available to them. This creates a complex matrix of users and materials, a matrix that must be available and reliable for the system to function properly. Independence from Elsevier was critical in order for us to be able to test these models, and the body of Elsevier materials was equally important to ensure that users would have a valuable body of materials that would draw them into the research environment. The ultimate control and flexibility of the local production environment allowed the University of Michigan to perform research that would probably not have otherwise been possible, or could not have been performed in ways that the researcher stipulated.

    4.4 Economic and experimental design

    The PEAK system built upon existing digital library infrastructure and information retrieval mechanisms described above, but its primary purpose was to serve as a testbed for economic experiments in pricing and bundling information goods. Central to the PEAK experiment were the opportunities that electronic access creates for unbundling and re-bundling scholarly literature. A print-on-paper journal is, in itself, a bundle of issues. Each issue contains a bundle of articles, each of which is again a bundle of bibliographic information, an abstract, references, text, figures and many other elements.[4] In addition, the electronic environment makes possible other new dimensions of product variations. For example, access can be granted for a limited period of time (e.g., day, month, and year) and new services such as hyperlinks can be incorporated as part of the content. Permutations and combinations are almost limitless.[5]

    Choosing what to offer from the different bundling alternatives is not an easy task. In the PEAK experiment, we were constrained by the demands of the experiment and the demands of the customers. Given the limited number of participants, bundle alternatives had to be limited in order to obtain enough experimental variation to support statistical analysis. The products had to be familiar enough to potential users to generate participation and reduce confounding learning effects. The economic models were designed by the University of Michigan research team, then reviewed and approved by a joint advisory board comprised of two senior Elsevier staff members (Karen Hunter and Roland Dietz), the University of Michigan Library Associate Director for Digital Library Initiatives (Wendy Pradt Lougee), and the head of the research team (Professor Jeffrey MacKie-Mason).

    After balancing the different alternatives and constraints, we selected three different bundle types as the products for the experiment. We refer to the product types as traditional subscriptions, generalized subscriptions and single articles (sometimes called the "per-article" model).[6] We now describe these three product offerings.

    Traditional subscription: A user or a library could purchase unlimited access to a set of articles designated as a journal by the publisher for $4/issue if the library already held a print subscription. The value-based logic supporting this model is that the content value is already paid in the print subscription price, so the customer is only charged for an incremental electronic delivery cost. If the institution was not previously subscribed to the paper version, the cost of the traditional subscription was $4 per issue, plus 10% of the paper version subscription price. In this case, the customer is charged for the electronic delivery cost plus a percentage of the paper subscription price to reflect the value of the content. The electronic journals corresponded to the Elsevier print-on-paper journal titles. Access to subscribed content continued throughout the project. The traditional subscription is a "seller-chooses" bundle, in that the seller, through the editorial process, determines which articles are delivered to subscribed users.

    Generalized subscription: An institution (typically with the library acting as agent) could pre-purchase unlimited access to a set of any 120 articles selected by users. These pre-purchases cost $548 for the bundle, which averages about $4.50 per article selected. This is a "user-chooses" bundle. With a generalized subscription, the user selected which articles were accessed, from across all Elsevier titles, after the user had subscribed. Once purchased the article was available to anyone in that user community.

    Per-article: A user could purchase unlimited access to a specific article for $7/article. This option was designed to closely mimic a traditional document delivery or interlibrary loan (ILL) product. With ILL the individual usually receives a printed copy of the article that can be retained indefinitely. This is different from the "per-use" pricing model often applied to electronic data sources. The article was retained on the PEAK server, but the user could access a paid-for article as often as desired. This was a "buyer-chooses" scheme, in that the buyer selected the articles purchased.

    The per-article and generalized subscription options allowed users to capture value from the entire corpus of articles without having to subscribe to all of the journal titles. Once the content is created and added to the server database and the distribution system is constructed, the incremental cost of delivery is approximately zero. Therefore, to create maximal value from the content, it is important that as many users as possible have access. The design of the price and bundling schemes affected both how much value was delivered from the content (the number of readers), and how that value was shared between the users and the publisher.

    Institutional generalized subscriptions may be thought of as a way to pre-pay for individual document delivery requests. One advantage of generalized subscription purchases for both libraries and individuals is that the "tokens" cost substantially less per article than the per article license price. By predicting in advance how many tokens would be used (and thus bearing some risk), the library could essentially pre-pay for document delivery at a reduced rate. However, unlike commercial document delivery or an interlibrary loan, all users within the community have ongoing unlimited access to the articles obtained with generalized subscription tokens. Thus, for the user community, a generalized subscription combines features of both document delivery (individual article purchase on demand) and traditional subscriptions (ongoing shared access). One advantage to a publisher is that generalized subscriptions represent a committed flow of revenue at the beginning of each year, and thus shift some of the risk for usage (and revenue) variation onto the users. Another is that they allow access to the entire body of content to all users and, by thus increasing user value from the content, provide an opportunity to obtain greater returns from the publication of that content.

    A simplified example illustrates why a library might spend more purchasing generalized subscriptions than traditional subscriptions. Consider a traditional subscription with 120 seller-specified articles, selling for $120, and a generalized subscription that allows readers to select 120 articles from the larger corpus, also for $120. Suppose that in the traditional subscription, users get zero value from half of the articles, but something positive from the other half. Then, the library is essentially paying $2 per article for which users have positive value. In this example, a cost-benefit oriented library would only purchase traditional subscriptions as long as the average value for the articles users want is at least $2 each. In a generalized subscription, however, users select articles that they actually value (they are not burdened with unwanted articles), so the average cost is $1 per article the user actually wants to read. The library then might justify a budget that continues buying generalized subscriptions to obtain the additional articles that are worth more than $1 but less than $2 to users. The result is more articles and more publisher revenue than with traditional subscriptions. Of course, the library decision process is more complicated than this, but the basic principle is that users get more value for the dollar spent when they—not the sellers—select the articles to include, and thus, since additional spending creates more user value wth the generalized subscription, over time the library might spend more.

    The twelve institutions participating in PEAK were assigned randomly to one of three groups, representing three different experimental treatments, which we labeled the Greeen, Blue and Red groups. Users in every group could purchase articles on a per-article basis. In the Green group they could also purchase institutional generalized subscriptions; in the Blue group they could purchase traditional subscriptions; in the Red group they could purchase both traditional and generalized subscriptions in addition to individual articles.

    Regardless of treatment group, access was further determined by whether the user had logged in with a PEAK password. Use of a password allowed access from any computer (rather than only those with authorized IP addresses) and, when appropriate, allowed the user to utilize a generalized subscription token. The password authorization protected institutions from excessive usage of pre-paid generalized subscription tokens by walk-in users at public workstations. The password was also required to purchase articles on a per-article basis (to secure the financial transaction) and to view previously purchased articles (to provide some protection to the publisher against widespread network access by users outside the institution).

    The password mechanism was also useful for collecting research data. Usage sessions were of two types: those authenticated with a PEAK password and those not. For unauthenticated sessions, the system recorded which IP addresses accessed which articles from which journal titles. When users employed their PEAK passwords, the system recorded which specific users accessed which articles. Uses that resulted in full-article accesses were also classified according to which product type was used to pay for access. Because we recorded all interface communications, we were able to measure complex "transactions". For example, if a user requesting an article (i.e., clicked on its link) was informed that the article was not available in the (traditional or generalized) subscription base, we could distinguish between whether the user chose to pay for the article on a per article basis or decided to forego access.

    An important consideration for both the design and analysis of the experiment was the possibility of learning effects during the life of the project. If present, learning makes it more difficult to generalize results. We expected significant learning about the availability and use of the system due to the novelty of the offered products and the lack of user experience with electronic access to scholarly journals. To decrease the impact of learning effects, we worked with participating institutions to actively educate users about the products and pricing prior to implementing the service. Data were also collected for almost two years, which enabled us to isolate some of the learning effects.

    To test institutional choices (whether to purchase traditional or generalized subscriptions, and how many), we needed participation from several institutions. Further, to explore the extent to which electronic access and various price systems would increase content usage, we needed a diverse group of potential (individual) users. Therefore, we marketed the project to a diverse group of institutions: four-year colleges, research universities, specialized research schools, and corporate research libraries. A large user community clearly improved the breadth of the data, but also introduced other complications. For example, user behavior might be conditioned by the characteristics of the participating institutions (such as characteristics of the institution's library system, availability of other electronic access products, institutional willingness to reimburse per-article purchases, etc.).

    4.5 Pricing

    Pricing electronic access to scholarly information is far from being a well-understood problem. Contemporaneous with the PEAK experiment, Prior (1999) reported that, based on a survey of 37 publishers, when both print-on-paper and electronic versions were offered 62% of the publishers had a single combined price, with a surcharge over the paper subscription price of between 8% and 65%. The most common surcharge was between 15% and 20%. Half of the respondents offered electronic access separately at a price between 65% and 150% of print, most commonly between 90% and 100%. Fully 30% of the participating publishers changed their pricing policy in just one year (1999). In this section, we will describe the pricing structure we implemented in the PEAK experiment and our rationale for it.

    For content that can be delivered either on paper or electronically, there are three primary cost categories: content cost, paper delivery cost and electronic delivery costs. The price levels chosen for the experiment reflect the components of cost, adjusted downward for an overall discount to encourage participation in the experiment.[7]

    To recover the costs of constructing and operating the electronic delivery system, participating institutions paid the University of Michigan an individually negotiated institutional participation license (IPL) fee, roughly proportional to the number of authorized users. To account for the content cost, institutions or individual users paid the access prices associated with each product type described above (traditional subscriptions, generalized subscriptions, or individual articles)

    Arbitrage possibilities impose some constraints on the relative prices between access options. Arbitrage arises when users can choose different options to replicate the same access. For example, the PEAK price per article in a per-article purchase had to be greater than the price per article in a generalized subscription, and this price had to be greater than the price per article in a traditional subscription. These inequalities impose the restriction that the user could not save by replicating a traditional subscription through purchasing individual articles or a generalized subscription, nor save by replicating a generalized subscription by paying for individual articles. Alternatively, arbitrage constrains publishers to charge a price for bundles that is less than the sum of the individual component prices.

    The mapping of component costs to price levels is not exact, and in some cases the relationship is complicated. For example, although electronic delivery costs are essentially zero, there is some additional cost to creating the electronic versions of the content (especially at the time of the PEAK experiment because Elsevier's current production process was not fully unified for print and electronic publication). Therefore, the electronic access price might be set in a competitive market to recover both the content value and also some amount of incremental electronic delivery cost.

    Based on the considerations above, and on negotiations with the publisher, we set the following prices: per article at $7; generalized subscriptions at $548 for 120 articles; and traditional subscriptions at $4 per issue plus 10% of the paper subscription price. A substantial amount of material, including all content available that was published two calendar years prior, was available without any additional charge after an institution paid the IPL fee for the service. We refer to this as "unmetered". Full-length articles from the current two calendar years were "metered": users could access them only if the articles were paid for under a traditional or generalized subscription, or purchased on a per-article basis.

    4.6 Results

    In this section we report some descriptive results from both the experiment and the production service. See Gazzale and MacKie-Mason (this volume) for a more detailed study of the economic experiment results.

    Revenues and costs

    In Table 4.1 we summarize PEAK revenues. The actual total was over $580,000 (the sum of total revenue in the first two rows).[8] The first and third rows report annual revenues, with 1999 adjusted to reflect an estimate of what revenues would have been if the service were to run for the full year (it ended in August 1999, but only six months of content were included, and prices were adjusted accordingly). On an annualized basis, two-year revenues were about $712,000.

    Between the first and second year of the service, the number of traditional subscriptions substantially decreased: this occurred because two schools cancelled all of their (electronic) subscriptions. By reducing the number of journal titles under traditional subscription, the users of these libraries needed to rely more heavily on the availability of unused generalized subscription tokens, or they had to pay the per-article fee. We see from the table that the annualized revenues for per-article purchasing are seventeen times higher in 1999 than in 1998, and that the 1999 generalized subscription revenues (annualized) are 8% lower than in 1997-1998.

    A full calculation of the costs of supporting the PEAK service is difficult, given the mix and dynamic nature of costs (e.g., hardware). We estimate that total expenditures by the University of Michigan were nearly $400,000 during the 18 month life of the project. Of this cost, roughly 35% was expended on technical infrastructure and 55% on staff support (i.e., system development and maintenance, data loading, user support, authentication/authorization/security, project management). Participant institution (IPL) fees covered approximately 45% of the project costs, with vendor and campus in-kind contributions covering another 20-25%.[9] The UM Digital Library Production Service contributed resources to this effort, reflecting the University of Michigan's desire to provide this service to its community, and also its support for the research.

    Table 4.1: PEAK Revenues
    Traditional Subscriptions Generalized Subscriptions Individual Articles Total Access IPL**** Total
    Year Quantity Revenue Quantity Revenue Quantity Revenue Revenue Revenue Revenue
    1997-1998 1939 $216,018 151 $82,748 275 $1,929 $300,691 $140,000 $440,691
    1999* 1277 $33,608 92 $50,416 3186 $22,302 $106,326 $42,000 $148,326
    Annualized 1999** 1277 $78,996 138 $75,624 4779 $33,453 $188,073 $84,000 $272,073
    Total 1997-1999*** 3216 $295,014 289 $158,372 5054 $35,378 $488,764 $224,000 $712,764
    *Partial year results, January to August 1999; new articles available only if published before June.
    **Annualization done by scaling the quantity of generalized subsciptions and per article purchases. Traditional subscriptions prices at the full year rate.
    ***Annualized.
    ****The "IPL" is the Institutional Participation License, an annual fee charged to each participating institution.

    4.7 User demographics

    In the PEAK project design, unmetered articles and articles covered by traditional subscriptions could be accessed by any user from a workstation associated with one of the participating sites (authenticated by the computer's IP address). If users wanted to use generalized subscription tokens or to purchase individual articles on a per-article basis they had to obtain a password and use it to authenticate.[10] We have more complete data on the subset of users who obtained and used passwords.

    Table 4.2: Distribution of users with passwords by status and academic division
    Status
    Division Faculty Staff Graduate Student Undergrad Other Total
    Engineering, science and medicine 408 214 1032 211 38 1903
    Architecture and urban planning 103 11 47 16 19 196
    Education, business, information/library science and social science 91 43 287 46 2 469
    Other 178 240 350 176 34 978
    Total 780 508 1716 449 93 3546

    In Table 4.2 we report the distribution of the more than three thousand users who obtained passwords and who used PEAK at least once. Most of the users are from engineering, science and medicine, reflecting the strength of the Elsevier collection in these disciplines. 70% of these users were either faculty or graduate students (see Figure 4.1). The relative fractions of faculty and graduate students varies widely by discipline (see Figure 4.2). Our sample of password-authenticated users, while probably not representative of all electronic access usage, includes all those who accessed articles via either generalized subscription tokens or per-article purchase. It represents the interested group of users, who were sufficiently motivated to obtain and use a password. Gazzale and MacKie-Mason (this volume) discuss the effects of passwords and other user costs on user behavior.

    Figure 4.1: Distribution of users who obtained passwords and used them to access PEAKFigure 4.1: Distribution of users who obtained passwords and used them to access PEAK
    Figure 4.2: Users with Passwords Who Accessed PEAKFigure 4.2: Users with Passwords Who Accessed PEAK

    In Table 4.3 we summarize usage of PEAK through August 1999. Authorized users joined the system gradually over the first nine months of 1998. There were 208,104 different accesses to the content in the PEAK system over 17 months.[11] Of these, 65% were accesses of unmetered material (not-full-length articles, plus all 1998 accesses to content published pre-1997, and all 1999 accesses to pre-1998 content).[12] However, one should not leap to the conclusion that users will access scholarly material much less when they have to pay for it, though surely that is true to some degree. To correctly interpret the "free" versus "paid" accesses we need to account for three effects. First, to users much of the metered content appeared to be free: the libraries paid for the traditional subscriptions and the generalized subscription tokens. Second, the quantity of unmetered content in PEAK was substantial: on day one, approximately January 1, 1998, all 1996 content and some 1997 content was in this category. On January 1, 1999, all 1996 and 1997 content and some 1998 content was in this category. Third, the nature of some unmetered content (for example, letters and announcements) is different from metered articles, which might also contribute to usage differences.

    Table 4.3: Total number of unique content accesses by treatment group and type of access (Jan 1998-August 1999)
    Treatment group
    Access Type Green Red Blue All Groups
    Unmetered 24632 96658 13911 135201
    Traditional subscription articles
    1st use N/A 27140 2881 30021
    2nd or higher use N/A 11914 597 12511
    Generalized subscription articles
    1st use 8922 9467 N/A 18389
    2nd or higher use 3535 4789 N/A 8324
    Individually purchased articles
    1st use 194 75 3192 3461
    2nd or higher use 108 26 63 197
    Total accesses 37391 150069 20644 208104
    NOTE: See definitions of treatment groups in Section 4.4.

    Generalized subscription "tokens" were used to purchase access to 18,389 specific articles ("1st use"). These articles were then distinctly accessed an additional 8,324 times ("2nd or higher use"), for an average of 1.45 accesses per generalized subscription article. Traditional subscription articles had an average of 1.42 accesses per article. A total of 3461 articles were purchased individually on a per-article basis; these were accessed 1.06 times per-article on average. The difference in the number of accesses per article for articles obtained by generalized subscription and by per-article purchase is likely due to the difference in who may access the article after initial purchase. All authorized users at a site could access an article once it has been purchased with a generalized subscription token, while only the individual making a per-article purchase has the ability to re-access that article. Thus, we estimate that for individually purchased articles (whether by generalized subscription token or per-article purchase), the initial reader accessed the articles 1.06 times, and additional readers accessed these articles 0.39 times. That is, there appears on average at least one-third additional user per article under the more lenient access provisions of a generalized subscription token.

    Figure 4.3: Concentration of article accesses across different journal titlesFigure 4.3: Concentration of article accesses across different journal titles

    In Figure 4.3 we show a curve that reveals the concentration of usage among a relatively small number of Elsevier titles. We sorted articles that were accessed from high to low in terms of how often they were accessed. We then determined the smallest number of articles that, together, comprised a given percentage of total accesses, and counted the number of journal titles from which these articles were drawn. For example, 37% of the 1200 Elsevier titles generated 80% of the total accesses. 40% of the total accesses were accounted for by only about 10% of the journal titles.

    Figure 4.4: Percentage of model used by experimental group: Jan 1998-Aug 1999Figure 4.4: Percentage of model used by experimental group: Jan 1998-Aug 1999

    In Figure 4.4 we compare the fraction of accesses within each treatment group that are accounted for by traditional subscriptions, generalized subscriptions and per-article purchases. Recall that the Green and Blue groups only had two of the three access options.[13] When institutions had the choice of purchasing generalized subscription tokens, their users purchased essentially no access on a per-article basis. Of course, this makes sense as long as tokens are available: it costs the users nothing to use a token, but it costs real money to purchase on a per-article basis. Indeed, our data indicate that institutions that could purchase generalized subscription tokens tended to purchase more than enough to cover all of the demand for articles by their users; i.e., they didn't run out of tokens in 1998. We show this in aggregate in Figure 4.5: only about 50% of the tokens purchased for 1998 were in fact used. Institutions that did not run out of tokens in 1999 appear to have done a better job of forecasting their token demand for the year (78% of the tokens purchased for 1999 were used). Institutions that ran out of tokens used about 80% of the tokens available by around the beginning of May.

    Figure 4.5: Percentage of pre-paid tokens used as a percentage of time availableFigure 4.5: Percentage of pre-paid tokens used as a percentage of time available

    Articles in the unmetered category constituted about 65% of use across all three groups, regardless of which combination or quantity of traditional and generalized subscriptions an institution purchased. The remaining 35% of use was paid for with a different mix of options depending on the choices available to the institution. Evidently, none of the priced options choked off use altogether.

    Figure 4.6: Total accesses per potential user:  Jan 1998-August 1999Figure 4.6: Total accesses per potential user: Jan 1998-August 1999

    We show the total number of accesses per potential user for 1998 and 1999 in Figure 4.6. We divide by potential users (the number of people authorized to use the computer network at each of the participating institutions) because different institutions joined the experiment at different times. This figure thus gives us an estimate of learning and seasonality effects in usage. Usage per potential user was relatively low and stable for the first 9 months. However, it then increased to a level nearly three times as high over the next 9 months. We expect that this increase was due to more users learning about the existence of PEAK and becoming accustomed to using it. Note also that the growth begins in September 1998, the beginning of a new school year with a natural bulge in demand for scholarly articles. We also see pronounced seasonal effects in usage: local peaks in March, November and April.

    To see the learning effect without interference from the seasonal effect, we calculated usage by type of access in the same three months (March-May) of 1998 and 1999; see Table 4.4. Overall, usage increased 167% from the first year to the second.

    Table 4.4: Learning: usage comparison across two years (March-May averages)
    1998 1999 Percentage Change
    Unmetered 19291 55745 189%
    Traditional 6374 10560 66%
    1st Token 1648 4805 192%
    1st per-article purchase 1 1288 N/A
    2nd or higher Token 3060 8166 167%
    2nd or higher per-article purchase 8 472 5800%
    Total 30382 81036 167%

    We considered the pattern of repeat accesses distributed over time. In Figure 4.7 we show that about 93% of articles accessed were accessed no more than two times. To further study repeat accesses, we selected only those articles (7%) that were accessed three or more times between January 1998 and August 1999 (high use articles). We then counted the number of times they were used in the first month after the initial access, the second month after, and so forth; see Figure 4.8. What we see is that almost all access to even high use articles occurred during the first month. After that, a very low rate of use persisted for about seven more months, then faded out altogether. Thus, we see that, even among the most popular articles, recency was very important.

    Figure 4.7: Percentage of Articles by Number of Times ReadFigure 4.7: Percentage of Articles by Number of Times Read
    Figure 4.8: The distribution of usage for high use articlesFigure 4.8: The distribution of usage for high use articles

    Although recency appears to be quite important, we saw in Table 4.1 that over 60% of total accesses were for content in the unmetered category, most of which was over one year old. Although we pointed out that the monetary price to users for most non-unmetered articles was still zero (if accessed via institution-paid traditional or generalized subscriptions), there were still higher user costs for much of the more recent usage. If a user wanted to access an article using a generalized subscription token, then she had to obtain a password, remember it (or where she put it) and use it. If the article was not available in a traditional subscription and no tokens were available, then she had to do the above plus pay for the article with hard currency. Therefore, there are real user cost differences between the unmetered and metered content. The fact that usage of the older, unmetered content is so high, despite the clear preference for recency, supports the notion that users respond strongly to costs of accessing scholarly articles.[14]

    4.8 Discussion

    PEAK was a rather unique project. During a relatively early stage of the transition to digital scholarly collections, we delivered a large-scale production service containing several years of content from over 1100 journal titles, for a total of about 10 million pages. On top of the commitment to a production-quality service we also implemented a field experiment to test usage response to several different pricing and bundling models, one of which (generalized subscriptions) was quite novel. We summarize our most important general conclusions here:

    • Of the purchasing options we offered, the generalized subscriptions—our innovation—was the most popular and generated the most interest. Libraries saw the generalized subscription as a way of increasing the flexibility of their journal budgets and of tying purchasing more closely to actual use. The generalized subscription provides fast and easy access to articles in demand from the complete corpus, not just from a subscribed subset of titles.

    • We observed great interest in the monthly statistical reports on local article use that we generated for participating libraries. Participants were eager to use these to help assess current subscription choices and to further understand user behavior.

    • The user cost of access—comprised not just of monetary payments, but also of time and effort—has a significant effect on the number of articles that readers access. (See Gazzale and MacKie-Mason (this volume) for further detail.)

    • There was a substantial learning period during which users became aware of the service and accustomed to using it. It appears that usage was increasing even after a year of service. By the end of the experiment, usage was at a rather high level: approximately five articles accessed per month per 100 potential users, with potential users defined broadly (including all undergraduate students, who rarely use scholarly articles directly).

    • It has long been known that overall readership of scholarly literature is low. We have seen that even the most popular articles are read only a few times, across 12 institutions. We did not, however, measure how often those articles were being read in print versions during the same periods.

    • Recency is very important: repeat usage dropped off considerably after the first month. (This was also reflected in user comments, not reported above.)

    PEAK had a limited life by design, and today most of the major publishers of scholarly journals have implemented their own PEAK-like service. The environment is far from stable, however, and service options, pricing and bundle offerings continue to evolve. Our results bear on the economics and usage of digital collections today and in the future, and provide some support for particular design choices.

    Notes

    1. See MacKie-Mason and Riveros (2000) for a discussion of the economics of electronic publishing.return to text

    2. Odlyzko (1995) estimates that it costs between $900-$8700 to produce a single math article. 70% of the cost is editorial and production, 30% is reproduction and distribution.return to text

    3. This assertion and others concerning user preferences are based on our analysis of the ethnographic records compiled during the marketing process by Ann Zimmerman, and the results from a series of user surveys we administered during the project.return to text

    4. One pragmatic and unresolved issue for applying bundling models is the treatment of journal content other than articles. Many notices and reviews, as well as editorial content integral to a journal's identity, cannot be categorized as articles. How and when these items are indexed (thus becoming searchable) as well as how they should be priced are still open questions in electronic journal delivery and pricing.return to text

    5. See MacKie-Mason and Riveros (2000) for a more complete discussion of the economic design space.return to text

    6. All older materials (i.e., pre-1998) were freely available to all project participants, as were bibliographic and full-text searches, with no charges levied for viewing citations, abstracts or non-article material such as reviews and notices. The remaining content (essentially, full-length articles published after 1997) could be purchased through one of the product types.return to text

    7. If the scholarly publishing market is competitive, then we expect long-run prices to reflect incremental component costs. Whether this market is competitive is a contentious issue; see McCabe (this volume) for one view on this question.return to text

    8. Due to delays in starting the project, the first revenue period covered content from both 1997-98, although access was available only during 1998. For this period, prices for traditional subscriptions were set to equal $6/issue, or 1.5 times the annual price of $4/issue, to adjust for the greater content availability.return to text

    9. The University of Michigan production service retained IPL annual participation fees. The publisher received the content charges, minus a modest service fee for the credit card service provided by the University of Michigan Press.return to text

    10. Through an onscreen message we encouraged all users to obtain a password and use it every time in order to provide better data for the researchers. From the data, we concluded that only a small fraction apparently chose to obtain passwords based solely on our urging; most who did apparently obtained passwords because they were necessary to access a specific article.return to text

    11. We limited our scope to what we call "unique accesses," counting multiple accesses to a given article by a single individual during a PEAK session as only one access. For anonymous access (i.e., access by users not enetering a password), we define a "unique" access as any number of accesses to an article within 30 minutes from a particular IP address. For authenticated users, we define a "unique" access as any number of accesses to an article by an authenticated user within 30 minutes of the first access.return to text

    12. See the definition of unmetered material in the text above.return to text

    13. Individual article purchase was available to both; Green institutions could also purchase generalized subscriptions, and Blue could purchase traditional subscriptions.return to text

    14. In another preliminary test of the impact of user cost on usage, we compared the usage of the Red and Blue groups. Red institutions had both generalized and traditional subscriptions available; Blue had only traditional. We calculated the number of paid articles accessed (paid by generalized tokens or per-article) for each group, after normalizing by the number of traditional subscriptions, and the number of potential users at the institutions. We found that when generalized subscriptions were available, which have a much lower user cost since the library pays for the tokens, three times as many articles were accessed as at institutions which had to pay for each of these articles on a per-article basis. See Gazzale and MacKie-Mason (this volume).return to text

    5. PEAK and Elsevier Science

    This paper reviews Elsevier Science's participation in the PEAK experiment. Elsevier and the University of Michigan had been partners in the TULIP experiment (1991—95), the first large-scale delivery of journals to university desktops. PEAK was designed by Michigan to address some of the economic questions unresolved by TULIP. Once the design of the experiment was agreed, Elsevier's day-to-day role was limited, but its interest in the outcome of the experiment was high enough to risk participating.

    PEAK operated in parallel with Elsevier's own development of local and web-based commercial journal database services. The issues associated with parallel experimentation and commercialization were significant. Pricing policies and product attributes for the commercial offering were developed and implemented at the same time as the PEAK experiment was ongoing. This created points of comparison and potential tension.

    This paper reviews Elsevier Science's relation to PEAK and the application at Elsevier of what has been learned from PEAK.

    5.1 Pre-PEAK experimentation with the University of Michigan

    Starting in the late 1980s, Elsevier Science had been approached by several universities to do experiments with them that would test the delivery of full text to the desktop over campus local-area networks. We had discussions at the Stanford University Medical School, the University of Pennsylvania, Cornell University, and Carnegie Mellon University, among others. In each case, the project would be unique to that university and, while perhaps helpful to the university, would do little to give us a test that would be scaleable or that would provide us with sufficient market data to aid in practical product development. We progressed the farthest with Carnegie Mellon and carried those discussions over into a broader forum — the newly-formed Coalition for Networked Information. At the Spring, 1991, CNI meeting, it was agreed that if ten or fifteen universities would commit to the same basic experiment, then a publisher could justify investing in the creation of a major testbed. Fifteen universities organized a project on the spot and the challenge was on.[1]

    Out of this challenge came TULIP — The University LIcensing Program. Ultimately, nine of the initial fifteen universities became a part of TULIP with the other six remaining as observers. The participants were Carnegie Mellon, Cornell, Georgia Tech, MIT, the University of California (all campuses), the University of Michigan, the University of Tennessee, the University of Washington, and Virginia Tech. The experiment went live in January, 1993, and continued through 1995. Elsevier scanned print copies of initially 40 and ultimately 83 materials science journals (starting with the 1992 issues), creating TIFF files with edited, structured ASCII headers and raw OCR-generated ASCII full-text files. By the end of the project, more than 500,000 pages were in the system.

    These files were shipped to each university, where they were made available to users via software systems developed at each site. Although there was a technical working group and some of the development was shared among more than one site, essentially each implementation was unique. This was the intent of the project, as the thinking at the time was that each university would want to integrate these journals with other information held on its campus and present it in its own context (e.g., within the Melville system at the University of California, where there was a central host that served all campuses).

    There were three basic goals of the TULIP project: (1) to determine the technical feasibility of networked distribution, (2) to study reader usage patterns under different distribution situations, and (3) to understand — through the implementation of prototypes — alternative costing, pricing, subscription and market models that might be viable in electronic journal systems. The project was enormously successful in many ways and an incredible amount was learned, particularly about the technical issues. For example, this was in the very early days of the Internet. Elsevier Science had no Internet server expertise and had to go outside (to Engineering Information, now a part of Elsevier) to be our technical host. Initially, all shipments of files were over the Internet, but this proved unsatisfactory. Not only did we at times account for 5% of Internet traffic (the 4,000 TIFF pages sent every two weeks individually to nine sites was a slow and heavy load), but the logistics on the receiving end did not work well either. In 1994 there was a switch from push to pull shipments from our central server to the campus machines and, finally, in 1995 to delivery on CD-ROM. TULIP also made clear the need for high-speed networks in the links among the University of California campuses.

    Mosaic made its appearance in the middle of the project, immediately changing the perspective from unique university installations to a generic model. Indeed, one of the participants — having developed a Unix-based system, only to find the Materials Science department. all used Macs — gave up when Mosaic appeared, as it seemed pointless to convert their Unix system for the Mac when something else was going to move in.

    In the end, of the nine implementations, it was fair to say that three were very good, three were more limited but satisfactory, and three never really got underway. The outstanding player was clearly the University of Michigan. They organized their effort with a highly-motivated interdepartmental team, were the first to go live, and put up not one but three implementations. There was a general, relatively low-functionality implementation through MIRLYN (the NOTIS-based OPAC), a much higher-level functionality approach on the College of Engineering's CAEN (Computer-Aided Engineering Network) system; and finally, a web-based approach once the advantages of a web system were clear. Michigan became the TULIP lead site and they graciously showed many visitors their implementation. To some degree Michigan's involvement with JSTOR came out of its TULIP participation, or at least the expertise gained during TULIP.

    5.2 From TULIP to PEAK

    One of the Elsevier dilemmas during TULIP was what to do when the project ended at the end of 1995. Was there a marketable product here? Had we learned enough about networked delivery of journals to go beyond the limited scope of TULIP? The decision was made in 1994 to scan all 1,100+ Elsevier journals and to launch a commercial version of TULIP called Elsevier Electronic Subscriptions (EES). Michigan became one of the first subscribers when TULIP ended. EES, like TULIP, consisted of files delivered to the campus for local loading and delivery over the campus network.

    In designing the transition, Michigan made it clear that there was one thing they were particularly disappointed in with respect to TULIP: namely, that so little was learned about the economics of networked journals. There were many reasons for that — including preoccupation with technical and user behavior issues and reluctance on both sides to take risks — but the reality was that Michigan was correct: we had not gathered the hoped-for economic data. In addition, at this time the number of scholarly journals available in electronic form was slowly growing and the pricing used by publishers could be described at best as experimental. Elsevier, for reasons described below, had introduced and briefly priced its EES product at a combined print and electronic price of 135% of the print. Michigan had concerns about the variability and volatility in the pricing models they were seeing. Therefore, in deciding to proceed with the EES program, Michigan stressed the importance of continuing our experimental relationship, looking specifically at pricing. Out of that discussion came the PEAK (Pricing Electronic Access to Knowledge) experiment.[2]

    5.3 PEAK design

    Unlike TULIP, where Elsevier took a leading role in the design and oversight of the experiment, PEAK was a University of Michigan experiment in which Elsevier was a participant. Wendy Pradt Lougee, from the university library, was PEAK project director and Jeffrey K. Mackie-Mason, a professor in the School of Information and Department of Economics, led the economic design team that included economics graduate students, librarians and technologists. It fell to Jeff's team to do much of the experimental design and, later, the analysis of the results. Wendy took on the thankless task of recruiting other institutional participants and managing all of the day-to-day processes. They were ably assisted by others within the School of Information.

    PEAK was similar to TULIP in having a central data provider and a number of participating libraries. The University of Michigan was the host, offering web access to all Elsevier titles from its site. The participating institutions totaled twelve in all, ranging from a small, highly specialized academic institution to corporations and large research universities. The goal of the experiment was to understand more about how electronic information is valued and to investigate new pricing options.

    The participating libraries were assigned by Michigan to one of three groups (Red, Green and Blue). There were also three pricing schemes for content access being tested and each library group had some (but not full) choice among these three pricing schemes.

    In addition to the content fees (which came to Elsevier), Michigan charged a "participation fee," to offset some of their costs, ranging from $1000 to $17,000 per year. The fee was differentially set based on the relative size (e.g., student population) of the institution.

    What were the pricing choices?

    1. Per article purchase — The charge was $7 per article. After this type of purchase, the article was available without additional charge to the requesting individual for the duration of the experiment.

    2. Generalized subscription — $548 for a bundle of 120 articles ($4.57 per article). Bundles had to be purchased at the beginning of the year and the cost was not refundable if fewer articles were actually used. Articles in excess of the number purchased as bundles were available at the $7 per article price. Articles accessed under this option were available to the entire user community at no additional charge for the duration of the experiment.

    3. Traditional subscription — $4 per issue (based on the annual number of issues) if the title was subscribed to in paper; $4 per issue plus 10% of the print price if the title was not subscribed to in paper; full price of the print if the print were to be cancelled during the experiment. Those purchasing a traditional subscription had unlimited use of the title for that year.

    In addition to the paid years (1998 and 1999), there were back years (1996-1997) available for free.

    Elsevier participated in the pricing in the sense that we had discussions with our Michigan counterparts on pricing levels and in the end agreed to the final prices. There was some give and take on what the prices should be and how they should be measured (e.g., using issues as a measurement for the traditional subscriptions was a compromise introduced to permit some reflection of the varying sizes of the journals). We had hesitation about the low levels of the prices, feeling these to be unrealistic given real costs and the usage levels likely to develop. But in the end we were persuaded by the economic and experimental design arguments of Jeffrey Mackie-Mason.

    Once the prices were set, the Red group had all three pricing alternatives to choose from, Green had choices 1 and 2 and Blue had choices 1 and 3. In making choices, some decided to take all three, some to take only the per article transactions or only the generalized subscription. As the experiment ran more than one year, there was an opportunity to recalibrate at the end of 1998 based on what had been learned to date and to make new decisions for 1999.

    The process of agreeing to and setting up the experiment and then actually getting underway took much longer than any of the participants expected. We had all hoped for an experiment of at least two years (1997-1998). We started our discussions no later than late 1995 or early 1996. The experiment was actually live in 1998 and ended in August, 1999. It is hard now to reconstruct what happened to delay the experiment. Perhaps most of the initial long delay was a result of Elsevier's hesitation on pricing issues (more on this below), although the experimental design also took time at Michigan. The difficulties later were more in the implementation process. Signing up institutions was difficult. Many institutions that were approached were unsure about the price in general and wanted, for example, a lower participation fee, hence the ultimate range of fees negotiated. They were also concerned about participating in an experiment and felt there could be some confusion or difficulty in explaining this to their users. Once signed, start-up also took time at each location. In addition, there was a need, not always immediately recognized, for marketing and promotion of PEAK availability on campus.

    5.4 Elsevier's ScienceDirect activities during PEAK

    During the PEAK experiment, the management of the experiment and day-to-day contact with participants, including all billing, was handled by Michigan. There were times when we at Elsevier wanted to be more involved in the daily activities, including sending in our sales support staff to assist in training or promotion of the service. We were concerned about the slow start-up at many sites, fearing this would be interpreted as low demand for the journals instead of the effect of needing to promote and acquaint users with the service, something we had learned from TULIP and ScienceDirect. Michigan discouraged this for valid reasons: (1) it was not our system, so we were not familiar with its features and (2) this could interfere with the experimental design.

    Instead, we focused on production of the electronic files — and we had plenty to be concerned about. Elsevier was the supplier of the testbed journals and, in that context, was most active in trying to improve delivery performance. The product delivered to Michigan under the EES program at that time was the same as TULIP — images scanned from the paper copy. That meant they were, by definition, not timely. Also, there were issues missing and problems within the files. Additionally, not all format changes were handled with appropriate forewarning on our part. Occasional problems also occurred on the Michigan end, as when it was discovered that there were about 50 CDs that had inadvertently not been loaded onto the server by Michigan. This type of problem was not unique to Michigan but is symptomatic of the problems encountered with local hosting.

    Stepping back, it is important to understand Elsevier's product and pricing development during this same time period. The EES product line (the commercialization of TULIP) was available at the time TULIP was ending. It was being sold on a "percentage of print" pricing model, where subscribing institutions paid an additional percentage to receive the electronic files and supplied their own hardware and software. When first introduced, the electronic price had been announced at 35% of the underlying print subscription price (i.e., a total of 135% for paper and electronic). This percentage was chosen because it was the amount that would be required to compensate if all duplicate paper subscriptions were cancelled. It was quickly clear that this was too high, both in terms of what libraries were prepared to pay and what the product was worth. We lowered the price, in the case of Michigan for example, to less than a 5% charge in addition to the paper price. We set such a low price set because we very much wanted Michigan to continue with the electronic package.

    While Elsevier actively sold the EES product, it had also started in 1995 with the design of what would become ScienceDirect, our web-based electronic journal system. This would be driven by the direct output of a single journal production system that would create electronic files from which both the paper and electronic products could be produced. The new system, which would feed ScienceDirect online, would offer journal articles both in HTML (from SGML files) and in PDF.[3] ScienceDirect would also incorporate some of the lessons learned from TULIP, including the integration of a broader abstracting and index layer with the full text.

    ScienceDirect was in beta testing in late 1997, just as the implementation of PEAK was underway. It was available for full commercial use (without charge) in 1998 and sold starting in 1999. That means that the pricing decisions for this product — as well as EES, later SDOS — were going on simultaneously with PEAK. That, in itself, was a source of frustration to the PEAK participants, as there appeared to be a hope that PEAK would lead to a new pricing scheme and Elsevier would not make pricing decisions until PEAK was completed. That was an unrealistic hope from the beginning and one that Elsevier should have done more to temper.

    ScienceDirect pricing had as its fundamental initial objective a desire to smooth the transition from paper to electronic. That meant from our side there was a strong incentive to try to maintain the present library spending level with Elsevier Science. We were hoping to reduce the cancellation of subscriptions. Pricing also had to be explainable by a new team of sales managers and librarians still going through the teething stages of licensing. It could not be too difficult to explain or require too much of a change in thinking.

    We also felt that we could not depart too quickly from the present paper-based journal title, volume, year subscription model. We considered pricing models that were based on separating the content charge from a media charge or that were more reflective of use or the number of users. We also considered making electronic journals the principal charge, with paper as an add-on. In the end, we decided it was too early and too little was known to make substantive changes from the "print plus" model. No one at this stage was prepared to cancel print subscriptions and go electronic-only. The electronic services were too new, electronic archiving had not been addressed in any meaningful way, and the bottom line to be considered by a library was generally, "How much is this going to cost me in addition to what I am already paying, i.e., in addition to print?"

    This translated into the following pricing formula, which was offered for the 1999 subscription year and continued into 2000:

    1. The fees paid by a ScienceDirect customer had three components: platform fee, content fee and transactional fee.

      1. The platform fee was essentially a utility fee to help compensate for the basic costs of developing and maintaining the service. The platform fee reflected the type of institution (academic or corporate) and the number of people within the class of people authorized as users of the system. This was the most novel (and controversial) part of the pricing model from the library's point of view.

      2. If standard accounts wanted to either select only a subset of their paper journals to receive electronically or to cancel duplicates or unique titles from their paper collection, then the content fee was 15% of the paper list price for the titles involved. There was a 10% discount for electronic-only subscriptions.

      3. Finally, subscribers could purchase single copies of articles in journals outside of their subscriptions on a per-transaction basis; the standard price per transaction was $30.

    2. For customers prepared to make a "full commitment," which meant to continue spending at the level currently spent on paper, then the content fee was reduced to 7.5%. Also a significant transactional allowance permitted these customers to get articles outside of their subscribed titles at no cost, and then at $15 per copy if the allowance was used up. They could also substitute within the total spending commitment — that is, cancel a duplicate or unique title (particularly as they got more usage data) and substitute titles of equal value. This permitted an institution to recalibrate its collection as it gathered more usage data.

    3. For consortium customers, it was possible to construct situations where either all members of the consortium had access to all of the Elsevier titles or, for a "cross-access fee," each member could access anything subscribed to by another member of the consortium.

    4. Finally, there were sometimes areas for negotiation, such as the actual amount of the platform fee, the possibility of price caps on print increases from year to year (in multiple year contracts) and, in some cases, the content fee percentage as well. Negotiation was a reflection of the individual needs, goals and readiness levels of specific institutions or consortia, making the comparison of any two licenses difficult.

    5.5 ScienceDirect versus PEAK pricing

    It may be helpful to compare the ScienceDirect pricing as in effect during the transition from PEAK with PEAK pricing. From the [the following table][table ]tbl:hunter_tbl one sees: (1) both had a charge to help defray host service costs (and in both cases, that charge did not provide full cost recovery); (2) the ScienceDirect and PEAK content fees were somewhat similar, although PEAK charges were on a flat fee basis and were lower (which was of concern to us from the beginning, as that seemed unrealistic); (3) the transactional fees were also lower for PEAK and permitted continued electronic re-use by the individual purchaser; and (4) nothing in ScienceDirect was comparable to PEAK generalized subscriptions.

    Pricing Feature ScienceDirect PEAK Comments
    host charge platform fee participation fee often subject to negotiation; SD fees generally higher than PEAK, but not always
    content charge (SD) or traditional subscription access (PEAK) % of print charge flat $4 per issue PEAK cheaper — e.g., for 2000, Physics A has 56 issues and costs $4374.$4/issue = $224 for PEAK vs. 7.5% = $328 for SD.
    transactions free transactional allowance; $15 or $30 otherwise $7 per article in SD, window of 24 hours of access for each transaction; in PEAK, individual has continuing online access to the article
    generalized subscription nothing comparable bundles at $548 for 120 articles continuing online community access to purchased articles

    The two parallel tracks came together during 1999 when it was necessary to plan for the end of PEAK and a transition to ScienceDirect for those libraries wishing to continue to have access to Elsevier journals. We were not willing to continue the experiment beyond August, 1999. As it developed, PEAK participants did not have the option to continue accessing the journals from Michigan on a commercial rather than experimental basis, as Michigan decided it was ready to stop serving as a host and, indeed, did not want to continue to receive and mount journals locally for their own use either. Michigan chose to become a ScienceDirect online subscriber. Michigan's decision to cease local hosting was a combination of two factors: the end of the research project and the relative cost-effectiveness for Michigan of ScienceDirect online versus a local implementation. (Elsevier does have arrangements with other SDOS sites — University of Toronto, Los Alamos, and OhioLINK among others — where one institution serves as a host for other libraries.) It was necessary to make a transition plan early in 1999, before PEAK had ended and before the data could be evaluated. What could we take from PEAK (and from non-PEAK ScienceDirect experience) to inform the transition process?

    5.6 Access to the whole database

    There were two messages we heard from institutions involved in PEAK: the desire for flexibility in pricing (an ability to make choices) and the value of providing access to the entire database. Of these, perhaps not surprisingly, the second message was one we also heard from other customer environments, such as OhioLINK and Toronto, where the user has access to the entire database — namely, there is significant use of articles from non-subscribed titles. Therefore, anything we could do to increase access to the whole database would be a win-win solution for ScienceDirect subscribers.

    In PEAK, it follows therefore that the most satisfied participants were those using the generalized subscription model. They liked the notion of having access to the entire database and of not having to pre-select on a journal title basis. Even though almost everyone overbought bundles in 1998, that was generally a reflection of the slower start-up (i.e., if one annualized the monthly use near the end of 1998, the total purchase for the year would generally have been correct). The purchases for 1999 were much more accurate. This also reflects, in our judgment, the need for marketing and promotion (only recognized late in the process) and the need to build a knowledgeable user base.

    It is worth briefly considering the experiences of one PEAK customer, Vanderbilt University, in a bit more detail.[4] Going into PEAK, Vanderbilt subscribed to 403 of the 1,175 Elsevier journals in PEAK at a cost of approximately $700,000 per year. They chose to use only the generalized model, paying $24,600 for 5,400 tokens in 1999. For many reasons (including the Michigan requirement for a registration process, normal ramping up and the critical introduction part way through the year of a link from their OPAC), tokens were not used at the rate anticipated, ending 1998 with slightly more than 2,800 tokens used. More interesting, however, is what was purchased with these tokens. First, there was heavy use of the engineering titles, attributed to the generally poor quality of the engineering paper collection. Haar commented, "Thus information-starved engineering quickly recognized PEAK as a dream come true..." Second, looking more broadly, Vanderbilt users accessed articles from 637 journals. Of those, 45% (289) were also subscribed to in paper. The remaining 55% (348 titles) were not subscribed to in paper. And, of the 403 titles Vanderbilt subscribed to in paper, 114 (28%) were not used online.

    In his paper, John Haar of Vanderbilt ascribes some of this behavior to the engineering situation and some to the fact that there was little promotion of PEAK availability within the medical community. I agree with Haar that some of the lower online use of titles subscribed to in paper may be attributable to problems Elsevier had in providing current issues. If an important journal can be read more quickly in paper than online, then it may not be surprising that the online use is modest. That is, admittedly, a more optimistic spin on the data, but one that we believe has to be considered in evaluations.

    It is interesting to compare data during essentially the same time (April 1998 - March 1999) on the use of Elsevier and Academic Press journals by OhioLINK. At the annual American Library Association meeting in June, 1999, Tom Sanville, Executive Director of OhioLINK, presented these average use figures for the 13 universities within OhioLINK:

    • 1,345 Elsevier and Academic journals were available, of which on average (at the institutional level) 362 were owned in print

    • 1,035 journals had articles downloaded from them

    • of the 1,035 downloaded articles, on average, 318 were held in print and 735 (about 70%) were not held in print

    • 19,284 articles were downloaded, of which 9,231 (48%) were from journals not held in print

    This reinforced for us a essential message: there is tremendous value in finding ways to give people access to the entire database. Although there are some collection development librarians who have continued to argue strenuously on listservs that is it essential to select and acquire only on a title-by-title basis, the facts do not support that position. Clearly, in an era of limited funding and budgets inadequate to acquire everything needed by faculty and students, systems that make it easy to access a broad range of refereed information offer significant user advantages. Having said this, however, it is clear that there is still room for much more research in how users actually use services such as ScienceDirect, what value they place on which functionalities and content, and which enhancements they will appreciate most.

    5.7 Transition from PEAK to ScienceDirect

    Given this, there was a push by some PEAK participants for a continuation of the generalized model in the commercial ScienceDirect service. At Elsevier this was extensively discussed. While it would not be immediately possible to switch to a system where one "permanently" buys access at the article level, over the long term it was certainly possible to consider doing this. There was a sense that this "permanent" access gave the libraries a sense of ownership not otherwise present in a license arrangement. Yet, to date we have not adopted the generalized model, preferring other ways of giving full access to the database and providing long-term access rights.

    The obvious question is: why not adopt this model? The reason is that from the Elsevier perspective this model runs counter to what we would like to achieve. Our goal is to give people access to as much information as possible on a flat fee, unlimited use basis. Our experience has been that as soon as the usage is metered on a per-article basis, there is an inhibition on use or a concern about exceeding some budget allocation. The generalized model, although offering access to the whole database, is in the end simply a transaction model where the cost of the transaction has been discounted in return for non-refundable prepayment. It is a hybrid — a subscription-based transactional system. It also carries with it increased costs of selling, educating and general marketing to get to the point that a flat rate, all-you-can-use system automatically offers.

    What, then, did we do in the transition from PEAK to ScienceDirect for those making the transition? We gave all PEAK customers, as thanks for participating in the experiment and as a way of continuing the transition, unlimited access through 2001 to all titles in the database, but with fees based solely on subscribed content in 1999. For some of the smaller institutions this was an incredibly generous offering. For all schools except Michigan (which would have "earned" a large free transaction allowance in any case) it was a significant improvement over either a normal ScienceDirect license or a PEAK generalized subscription model. What will happen after 2001? We plan to have new pricing plans in place that will make a continuation of access to the whole database possible for the former PEAK — and all other ScienceDirect — accounts.

    Also in the transition, in order to address some of the "ownership" concerns, we formalized an electronic archiving and long-term access policy. This policy guarantees that guarantees that we will assume responsibility for the long-term availability of the archive of the Elsevier Science-owned electronic journals and provide access to that archive on an ongoing basis for all years for which an electronic subscription was purchased. Implementation of this policy continues to evolve, including exploration of deposit arrangements with various major libraries worldwide and special consideration of how to accommodate journals which are sold (i.e., for which the publishing contract is lost or are otherwise no longer currently published by Elsevier). We believe that in the long run these policies will result in better certainty of access and a more scalable system than tracking individual institutional purchases forever at the individual article level, as the generalized model requires.

    5.8 Longer term effect of PEAK on ScienceDirect

    What, then, has been the effect of PEAK on Elsevier Science thinking? As was noted above, there were two outstanding lessons we took from PEAK. One is the value of access to the whole database, which was core to our new product and pricing discussions in 2000. The second is the desire to have choices, to be able to tailor what is purchased to local needs.

    In response to this second point, Elsevier moved in January, 2000, to introduce a second product line called ScienceDirect Web editions. This provides free access to PDF files for all titles subscribed to in paper. Initially, Web editions did not have all the functionality of the full ScienceDirect and were limited to a nine month rolling backfile. For many libraries, this is the "choice" they want to have and they have decided to sign up for the Web editions rather than the full ScienceDirect. This was positive for Elsevier as well, as it meets more libraries' needs.[5]

    There are other product and pricing changes in discussion at the Elsevier board level. The discussions leading up to these changes reflect what we have learned from ScienceDirect and PEAK to date. PEAK has provided significant input to the broad thinking process and we are grateful to the University of Michigan, and in particular to Wendy Pradt Lougee and Jeffrey Mackie-Mason, for their insight and persistence in making this happen. We hope that the discussion of the pricing and packaging of electronic products, particularly journals, will continue in a spirited way.

    Notes

    1. Full information on TULIP is available at http://www.elsevier.com/wps/find/librariansinfo.librarians/tulip .return to text

    2. Full information on PEAK is available at http://web.archive.org/web/20011127031111/www.lib.umich.edu/libhome/peak/.return to text

    3. Recall that EES (later renamed ScienceDirect On Site — SDOS) was at that time taking the paper product and scanning it, creating a time delay. SDOS currently delivers PDF files created directly in the production process, avoiding the time delays for most journals.return to text

    4. (Haar, 1999).return to text

    5. During 2000, Web editions were dramatically increased in functionality and coverage was increased to twelve months.return to text

    6. User cost, usage and library purchasing of electronically-accessed journals

    6.1 Introduction

    Electronic access to scholarly journals has become an important and commonly accepted tool for researchers. Technological improvements in, and decreased costs for, communication networks and digital hardware are inducing innovation in digital content publishing, distribution, access and usage. Consequently, although publishers and libraries face a number of challenges, they also have promising new opportunities.[1] Publishers are creating many new electronic-only journals on the Internet, while also developing and deploying electronic access to literature traditionally distributed on paper. They are modifying traditional pricing schemes and content bundles, and creating new schemes to take advantage of the characteristics of digital duplication and distribution.

    The University of Michigan operated a field trial in electronic access pricing and bundling called "Pricing Electronic Access to Knowledge" (PEAK). We provided a host service providing access to roughly four and a half years of content (January 1995 - August 1999) including all of Elsevier Science's approximately 1200 scholarly journals. Participating institutions had access to this content for over 18 months.[2] Michigan provided Internet-based delivery to over 340,000 authorized users at twelve campuses and commercial research facilities across the U.S. The full content of the 1200 journals was received, catalogued and indexed, and delivered in real time. At the end of the project the database contained 849,371 articles, and of these 111,983 had been accessed at least once. Over $500,000 in electronic commerce was transacted during the experiment. For further details on this project, including the resources needed for implementation, see Bonn et al. (this volume).

    We elsewhere describe the design and goals of the PEAK research project (MacKie-Mason and Riveros (2000)). In MacKie-Mason et al. (2000) we detail the pricing schemes offered to institutions and individual users. We also report and analyze usage statistics, including some data on the economic response of institutions and individuals to the different price and access options.

    In this paper, we focus on an important behavior question: how much does usage respond to various differences in user cost? We pay careful attention to the effect of both pecuniary costs and non-pecuniary costs such as time and inconvenience.

    An interesting aspect of the PEAK project is the role of the library as economic intermediary and the effects of its decisions on the costs faced by end users.[3] In the first stage of the decision process, the library makes access product purchasing decisions. These decisions then have a potentially large effect on the costs that users face in accessing particular electronic journal articles, whether it be the requirement that users obtain and use a password or pay a monetary cost. The consumer then decides whether she will pay these costs to access a given article.

    The standard economic prediction is that a user will access an article if the marginal benefit she expects from the article (i.e. the incremental value) is greater than her marginal cost. Different users are going to have different valuations for electronic access to journal articles. Furthermore, even the same user will not place the same value on all requested articles. Information regarding users' sensitivity to user cost (known to economists as the elasticity of demand) for various articles is important to an institutional decision-maker who wants to maximize, or at least achieve a minimally acceptable level of user welfare.[4] Demand elasticity information is also vital to firms designing access options and systems because design decisions will affect non-pecuniary costs faced by the users, and thus overall demand for access.

    It is well known that the usage of information resources responds to the monetary cost users bear. We find that even modest per article fees drastically suppressed usage. It is also true, but perhaps less appreciated, that non-pecuniary costs are important for the design of digital information access systems. We find that the number of screens users must navigate, and the amount of external information they must recall and provide (such as passwords), have a substantial impact on usage. We estimate the amount of demand that was choked off by successive increases in the user cost of access. Further, we find preliminary evidence that users were more likely to bear these costs when they are expected. Finally, given the access options and prices offered in the PEAK experiment, we calculate the least costly bundles of access options an institution could have purchased to meet the observed usage, and compare this to the actual bundles purchased in each year. From this comparison we learn about the nature of institutional forecasting errors, and the potential cost savings to them from the detailed usage information of the sort provided by PEAK.

    6.2 Access options offered

    To choose which access products (and their prices) to offer PEAK participants, we balanced a complex set of considerations. These included the desire to study innovative access options, the desire to create substantial experimental variation in the data, and the need to entice institutions to participate. Hunter (this volume) gives a fuller account of these deliberations. In the end, participating institutions in the PEAK experiment were offered packages containing two or more of the following three access products:

    1. Traditional Subscription: Unlimited access to the material available in the corresponding print journal.

    2. Generalized Subscription: Unlimited access (for the life of the project) to any 120 articles from the entire database of currently priced content. Articles are added to the generalized subscription package as users request articles that were not already otherwise paid for, until the subscription is exhausted.[5] Articles selected for generalized subscriptions may be accessed by all authorized users at that institution.

    3. Per Article: Unlimited access for a single individual to a specific article. If an article is not available in a subscribed journal, nor a generalized subscription, nor are there unused generalized subscription tokens, then an individual may purchase access to the article, but only for his or her use (for the life of the project).

    The per-article and generalized-subscription options allow users to capture value from the entire corpus of articles, without having to subscribe to all journal titles. Once the content is created and added to the server database, the incremental delivery cost (to the publisher and system host) is approximately zero. Therefore, to create maximal value from the content, it is important that as many users as possible have access. The design of the pricing and bundling schemes affect both how much value is delivered from the content (the number of readers) and how that value is shared between the users and the publisher.

    Generalized subscriptions may be thought of as a way to pre-pay (at a discount) for interlibrary loan requests. One advantage of generalized subscription purchases is that the "tokens" cost substantially less per article than the per-article license price. Institutions did, however, need to purchase tokens at the beginning of a year and thus bore some risk. There is an additional benefit: unlike an interlibrary loan, all users in the community have ongoing unlimited access to the articles obtained via generalized subscription token. To the publisher, generalized subscriptions represent a committed flow of revenue at the beginning of each year, and thus shift some of the risk to the users. Another benefit to the publisher, as noted by Hunter (this volume), is that that they open up access to the entire body of content to all users. Generalized subscriptions thus offer one method for the publisher to increase user value from the already produced content, and which creates an opportunity to obtain greater returns from the publication of that content.

    Table 6.1: Access models
    Institution ID Group Traditional Generalized Per Article
    5, 6, 7, 8 Green X X
    3, 9, 10, 11, 12 Red X X X
    13, 14, 15 Blue X X
    NOTE: An "X" indicates that this option was available to the institutions listed in that row of the table.

    Participating institutions were assigned randomly to one of three different experimental treatments, which we labeled as the Red, Green and Blue groups. Institutions in every group could purchase articles on a per-article basis. Those in the Green group could purchase generalized subscriptions, while those in the Blue group could purchase traditional subscriptions. Institutions in the Red group could purchase all types of access. Twelve institutions participated in PEAK: large research universities, medium and small colleges and professional schools, and corporate libraries. Table 6.1 shows the distribution of access models and products offered to the participating institutions.

    6.3 Summary of user costs

    The PEAK experiment was designed to assess user response to various pricing and access schemes for digital collections. Since the content was traditional refereed scholarly literature, we implemented access through the traditional intermediary: the research library. The reliance on research libraries affected the design of the experiment and thus the research questions we could investigate. As we noted above, the intermediary, by choosing the combination of access products available to users, determines the costs faced by its users. The individual users then make article-level access decisions.[6] Thus, there are two different decision makers playing a role in access decisions. We must take both into account when analyzing the usage data.

    When confronted with the PEAK access options and prices, nearly all of the participating libraries purchased substantial prepaid (traditional or generalized subscription) access on behalf of their users. As a consequence, relatively few users were faced with the decision of whether or not to pay a pecuniary charge for article access. Although we measured over 200,000 unique individual uses of the system, we estimate that a user was asked to pay a pecuniary cost in only about 1200 instances. Therefore we focus as much on user response to non-pecuniary costs as to pecuniary costs.

    Access at zero user cost. Substantial amounts of PEAK content were available at zero user cost. This content included:

    • all "unmetered" content, which included articles published at least two calendar years prior as well as all non-full-length articles;

    • articles in journals to which the institution purchased an electronic traditional subscription; and

    • articles which had previously been purchased by a user at the institution with a generalized subscription token.

    All such access required authentication, but this was most often accomplished automatically by system verification that the user's workstation IP address was associated with the participating institution. Thus, most such authentications required no user time, effort or payment, and the overall marginal user cost per access was zero.[7]

    Access at medium user cost. For some access, users incurred a higher cost because they were required to enter a password. The transactions cost of password entry ranged from small to substantial. In the worst case, the user needed to navigate elsewhere in the system to fill out a form requesting a password, and then wait to receive it via e-mail. Once received, the user had to enter the password. If the user previously obtained a password, then the only cost to her was to find or recall the password and enter it. Content accessible via password entry included:

    • articles in journals to which the institution did not have a traditional subscription, assuming that the institution had generalized tokens available;

    • subsequent access to an article which an individual previously purchased on a per-article basis.

    Access at high user cost. If the institution did not have any unused generalized subscription tokens, then content not available at zero cost could be accessed by payment of a $7 per-article fee. The user who wished to pay the per-article fee would also bear two non-pecuniary costs: (1) password recall and entry, as above for the use of a generalized subscription token, and (2) credit card recall and entry.[8] In many cases, institutions subsidized, either directly or indirectly, the per-article fee. Although subsidized, access of this type still resulted in higher transactions costs. In the indirect subsidy case, a user needed to submit for reimbursement. In the direct case, except at institution 15, users needed to arrange for the request to be handled by the institution's interlibrary loan department.

    Exceptions. Several of the access procedures—and thus users' costs —were different at institutions 13 and 14. At both, per-article access for all requests was paid (invisibly to the user) by the institution, so users never faced a pecuniary cost.[9] At institution 14, a user still faced the non-pecuniary cost of finding her password and entering it to access "paid" content.[10] However, all users at institution 13 accessing from associated IP addresses were automatically authenticated for all types of access. Thus users at institution 13 could access all PEAK content at zero total (pecuniary and non-pecuniary) cost. These differences in access procedures were negotiated by the production and service delivery team during the participant acquisition phase, with the approval of the research team. In our analyses below we use the differences in user cost between these two institutions and the others as a source of additional experimental variation.

    Complexity. From the description above, it might appear that the PEAK access program was much more complicated than one would expect to find in production services. If so, then our results might not generalize readily to these simpler production alternatives.

    In fact, most of the complexity is at the level of the experiment, and as such creates a burden on us (the data analysts), and on readers, but not on users of the PEAK system. Because this was an experiment, we designed the program to have different treatments for different institutions. We had to keep track of these differences, but users at a single institution did not need to understand the full project (indeed, they were not explicitly informed that different variations of PEAK were available elsewhere). In most cases they did not even need to understand all three access options, because most institutions had only two options available to them.

    Among our three access options, the traditional subscription and per-article fee options were designed to closely mimic familiar access schemes for printed journals, and as such they did not cause much confusion. The generalized subscription was novel, but the details largely were transparent to end users: they clicked on an article link, and either it was immediately available, or they were required to enter a password, or they were required to pay a per-article fee. Whether the article was available through a traditional or generalized subscription was not relevant to individual users. Thus, to the user the access system had almost identical complexity to existing systems: either an article is available in the library or not, and if not the user can request it via interlibrary loan (and/or with a per-article fee from a document delivery service).

    The librarians making the annual PEAK purchasing decisions needed to understand the differences between traditional and generalized subscriptions of course. We prepared written explanatory materials for them, and provided pre-purchase and ongoing customer support to answer any questions. In section 6.6 below we discuss some evidence on how learning about the system changed behavior between the first and second year, but we did not observe any significant effects we could attribute to program complexity.

    6.4 Effects of user cost on access

    In this section, we measure the extent to which user costs to access PEAK content affected the quantity and composition of articles actually accessed. Clearly the costs and benefits of accessing the same information via other means, particularly via an institution's print journal holdings, will have an enormous impact on a user's willingness to bear costs associated with PEAK access. We do not explicitly model these costs, although we do control for them at an institutional level. Kingma (this volume) provides estimates of some costs associated with information access via several non-electronic media.

    As noted above, user costs for accessing PEAK content depended on a variety of factors. One factor is the type of content requested ("metered" versus "unmetered"). Looking only at metered content, the pecuniary and non-pecuniary costs associated with access depended in large part on the access products purchased by a user's institution. Further, the access costs faced by users within a given institution depended on the specific products selected by an institution (i.e. the specific journals to which an institution holds a traditional subscription, and the number of generalized subscription tokens purchased), individual actions (whether a password had already been obtained) and also on the actions of other users at the institution (whether a token had already been used to purchase a requested article, and how many tokens remain). In the following sections, we estimate the effects of these incremental costs on the quantity and composition of metered access.

    Non-pecuniary costs

    To gauge the impact of user cost of usage on aggregate institutional access, we compared the access patterns of institutions in the Red group with those in the Blue group. Red institutions had both generalized and traditional subscriptions available; Blue had only traditional. Users at both institutions could obtain additional articles at the per-article price. We constructed a variable we call "Normalized Paid Accesses" to measure the number of "paid" accesses to individual articles (paid by generalized tokens or by per-article fee) per 100 unmetered accesses, normalized to account for the number of traditional subscriptions. Adjusting for traditional subscriptions accounts for the amount of prepaid content provided by the user's institution; adjusting for unmetered accesses adjusts for the size of the user community and the underlying intensity of usage in that community.[11]

    Table 6.2:Normalized paid access per 100 unmetered accesses, by institution.
    Institution Access group Normalized paid accesses per 100 unmetered accesses
    3 Red 13.5
    9 Red 20.4
    10 Red 31.7
    11 Red 7.59
    12 Red 26.4
    Average Red 15.1
    13 Blue 51.0
    14 Blue 15.1
    15 Blue 4.72
    NOTE: Average not reported for Blue institutions because of variations in experimental conditions; see text for details.

    We use our statistic, Normalized Paid Accesses, as a measure of relative (cross-institution) demand for paid access. We present the statistic in Table 6.2. Even after controlling for the size of an institution's subscription base and the magnitude of demand for unmetered content, paid demand differed among institutions with the same access products. This suggests that there are institution-specific attributes affecting demand for paid access. It is also possible that we incompletely control for subscription size. One possibility is that the number of traditional subscriptions affects the cost a user expects to have to pay for an article before the actual cost is realized. Users at an institution with a large traditional subscription base, such as institution 3, would have had a lower expected marginal cost for access as a large percentage of the articles are accessible at zero cost. Some users at these institutions might attempt to access articles via PEAK, expecting them to be free, while not willing to pay the password cost when the need arises. This difference between expected and actual marginal cost may be important; we return to this point later.

    We can make some interesting comparisons between institutions in the Red group and those in the Blue group. While institution number 13, as a member of the Blue group, only had traditional subscriptions and per-article access available, users at this institution did not need to authenticate for any content, and thus faced no marginal cost in accessing any paid content. Most users at Red institutions faced the cost of authenticating to spend a token.[12] We would therefore expect a higher rate of paid access at institution 13, and this is in fact the case.

    Paid access at institution 14 was similarly subsidized by the institution. However, in contrast to institution 13, authentication was required. Thus the marginal user cost of paid access at institution 14 was exactly the same as at the Red institutions. We therefore expected that demand for paid access would be similar. This is in fact the case: Normalized Paid Access is 15.1 at both. Finally, per-article access for users at institution 15 was not automatically subsidized. Thus, users faced very high marginal costs for paid content. In addition to the need to authenticate with a password, users at this institution needed either to: a) pay the $7.00 per-article fee and enter their credit card information; or b) arrange for the request to be handled via the institution's interlibrary loan department. In either case, the user cost of access was higher than password only, and, as we expected, the rate of paid access was much lower than in the Red group.

    Table 6.3: Estimated effects of user cost on access.
    No month dummies Month dummies
    Constant 87.535* 108.615*
    (10.394) (14.643)
    Blue: Credit Card (Inst. 15) -280.490* -270.879*
    (37.627) (35.508)
    Red + Institution 14 -58.999* -57.764*
    (7.900) (7.186)
    Out of Tokens -25.070* -25.665*
    (1.635) (2.533)
    Graduate Students/Faculty Ratio 43.821* 41.748*
    (7.301) (6.912)
    Percentage Engineering, Science and Medicine -225.913* -215.767*
    (7.535) (36.553)
    Sample Size 530 530
    R2 0.171 0.229
    NOTE: Standard errors are shown in parentheses.
    Dependent variable is weekly normalized paid access per 100 free accesses.
    * Significant at the 99% level.

    Table 6.3 summarizes the results from a multiple regression estimate of the effects of user cost on access. We controlled for differences in the graduate student / faculty ratio and the percentage of users in Engineering, Science and Medicine.[13] The dependent variable, Paid accesses per 100 unmetered accesses, controls for learning and seasonality effects. We thus see the extent to which paid access, starting from a baseline of access to paid content at zero marginal user cost, falls as we increase marginal costs. Imposition of a password requirement reduces paid accesses by almost 60 accesses per 100 unmetered accesses (Red and institution 14), while the depletion of (institution-purchased) tokens results in a further reduction of approximately 25 accesses (per 100 unmetered).

    We use the distinction between metered and unmetered access to further test the extent to which increased user costs throttle demand. As a reminder, full-length articles from the current year are metered: either the institution or the individual must pay a license fee to gain access. Other materials (notes, letters to the editor, tables of contents, and older full-length articles) are not metered: anyone with institutional access to the system can access this content after the institution pays the institutional participation license fee. Some of the unmetered content comes from journals that are covered by traditional subscriptions, some from journals not in subscriptions. We calculate the ratio of this free content accessed from the two subsets of content. If we make the reasonable assumption that, absent differential user costs, the ratio of metered content from the two subsets would be the same as the ratio of unmetered content, then we can estimate what the demand would be for metered content outside of paid subscriptions if that content were available at zero user cost (e.g., if the institution added the corresponding journals to its traditional subscription base). Our estimate is calculated as:

    Table 6.4: Paid access as percentage of average predicted for zero user cost.
    Institution Year Actual Per Predicted Percent Free Access Psswd. Authent. Credit Card Required Password Entered When Prompted
    3 1998 21.1% 11.1% 0 6.69%
    10 1998 146.2% 45.4% 0 13.5%
    11 1998 16.4% 8.81% 0 2.6%
    12 1998 83.3% 51.7% 0 7.14%
    13 1998 125.9% 98.8% 0 100.0%
    14 1998 79.3% 54.5% 0 44.4%
    15 1998 0.00% 22.2% 1 8.06%
    3 1999 31.4% 19.1% 0 10.4%
    10 1999 123.4% 43.9% 0 13.4%
    11 1999 20.8% 18.5% 0 14.1%
    13 1999 77.7% 100.0% 0 100.0%
    14 1999 56.7% 63.2% 0 17.8%
    15 1999 19.5% 12.2% 1 2.39%
    "Percent free access password authenticated" indicates the percentage of times that users accessing free material were already password authenticated (which isn't in fact necessary for free accesses).
    "Credit card required" means the user was required to pay a per-article fee.

    In Table 6.4 we present actual paid access (when customers face the actual user cost) as a percentage of predicted access (at zero user cost) for all institutions that had traditional subscriptions in a given year. All observations except three (institutions 10 and 13 in 1998, and institution 10 in 1999) show actual access substantially below predicted when users bear the actual user cost. We conjecture that the surprising result for institution 10 might be partially due to the fact that they had the fewest traditional subscriptions. Because relatively little was available at zero user cost, users at this institution might have expected to bear the user cost (password recollection and entry in this case) for every access. If this were the case, then our method of predicting access at zero user cost is biased and the results for institution 10 are not meaningful. As for institution 13, recall that its users in fact faced no incremental user cost to access paid materials. We thus expect its paid accesses to be closer to that predicted for zero user cost, and are not surprised by this result.

    Though not related to our focus on user cost, there are two other statistical results reported in Table 6.4 that bear mention. First, usage is substantially, and statistically significantly higher when the graduate student / faculty ratio is higher. It is not implausible that graduate students make more frequent use of the research literature, reading more articles while taking classes and working on their dissertations, than more established scholars. This may also reflect life cycle differences in effort and productivity. However, it is also possible that a higher graduate student ratio is proxying for the intensity of research (by both graduate students andfaculty) at the institution, which would be correlated with higher access.

    The other, and more surprising result is that the higher is the percentage of engineering, science and medicine (STM) users, the lower is usage, by a large and statistically significant amount. We cannot be sure about the interpretation of this result, either. We were surprised because the Elsevier catalogue is especially strong in STM, reflected in breadth, depth and quality of content. Perhaps the nature of study and research in STM calls for less reading of journal articles, but this conjecture cannot be tested without further data.

    For all other institutions we generally see that the user costs associated with paid access caused an appreciable reduction in the number of paid articles demanded. We also present in Table 6.4 factors which we believe help explain this shortfall, namely the percentage of free access that is password authenticated, whether or not a credit card is required for all paid access, and the rate at which passwords were entered for paid access when prompted.

    Table 6.5: Estimation results of effects of user cost on actual paid accesses as percent of predicted accesses
    Independent variable Coefficient (standard error)
    Percent Free Psswd. Auth. 2.12*
    (.45)
    Prompted Login Percent -1.05**
    (.54)
    Credit Card Required -.213
    (.25)
    Sample Size 13
    R2 0.85
    NOTE: Standard errors shown in parentheses. Dependent variable is actual paid access as a percentage of predicted.
    *Significant at the 99% level; **Significant at the 95% level.

    In Table 6.5 we summarize the results from the estimation of the effects of user cost on actual paid access as a percentage of predicted accesses. Despite the small sample size, the results clearly demonstrate that, as we increase the number of individuals who can access paid content without additional marginal costs (proxied by the percent of free access that is password authenticated, which indicates that the password user cost has already been incurred), more paid access is demanded. The dummy variable for credit card required (for per-article payment) is not significant, but there was almost no variation in the sample from which to measure this effect.[14] The coefficient for the percent of prompted users who log in is of the wrong sign to support our hypothesis: we expected that the higher the number of users who are willing to bear the non-pecuniary costs of login, the higher would be the access to paid material.

    Pecuniary costs

    If an institution did not purchase any, or depleted all of its tokens, a user wanting to view a paid article not previously accessed had three choices.[15] She could pay $7.00 to view the article, and also incur the non-pecuniary cost of entering credit card information and waiting for verification. If the institution subscribed to the print journal, she could use the print journal article rather than the electronic product. She could also request the article through a traditional interlibrary loan, which also involves higher non-price costs (effort to fill out the request form, and waiting time for the article to be delivered) than spending a token.[16]

    Due to details of the system design, we are unable to determine the exact number of times that users were faced with the decision of whether or not to enter credit card information in order to access a requested article. We were able to identify in the transaction logs events consistent with the credit card decision (hereafter we call these "consistent events"). These consistent events are, however, a noisy signal for the actual number of times users faced this decision.

    We used evidence from the experimental variation to estimate the actual rate of requests for credit card payment. In some months some institutions had unused tokens and thus there were nocredit card (per-article) purchases, since unused tokens are always employed first. For these months we divided the number of consistent events by the number of access requests handled by the system for that institution, to obtain a measure of the baseline rate of consistent events that are not actual credit card requests. For each institution that did deplete its supply of tokens, we then subtracted this estimated baseline rate from the total number of consistent events to measure requests for credit card payment. For institutions that never had tokens, we use the weighted average of the estimated baseline rates for institutions with tokens.

    Table 6.6: Credit card payments as a percent of requests, estimated from transaction log evidence
    Institution Estimated Credit Card Requests Credit Card Payments Percent
    3 53 13 25.5%
    6 260 194 74.6%
    9 190 1 0.5%
    11 562 61 10.9%
    15 137 73 53.3%

    In Table 6.6 we present the number of actual payments as a percent of estimated requests for credit card payments. The relative percentages are consistent with our intuition. Institutions 6 and 15 never had any tokens. We thus expect that users at these institutions expected a relatively high cost of article access, and would not bother accessing the system or searching for articles if they were not prepared to pay fairly often.[17] Among the institutions at which tokens were depleted, the payment rate is appreciably higher at institutions 3 and 11, which is consistent with the fact that at these institutions the user could make an interlibrary loan request for articles through PEAK, and the institution would pay the per article charge on behalf of the user.

    We gain further understanding of the degree to which differences in user cost affects the demand for paid article access by looking at only those institutions that depleted their supply of tokens at various points throughout the project. There were three institutions in this category: institution 3 ran out of tokens in November 1998 and again in July 1999; institution 11 in May 1999; and institution 9 in June 1999.

    For institutions that had tokens available at certain times, we can estimate the number of credit card requests (by PEAK, to the user) based on the number of tokens spent per free access. If we make the assumption that this rate of token expenditure would have remained constant had tokens still been available, we can estimate the number of credit card requests to be equal to the estimated number of tokens that would have been spent had tokens been available.

    Table 6.7: Credit card payments as a percent of requests, estimated from token expenditure rate
    Institution Credit Card Requests Credit Card Payments Percent
    3 128 13 10.2%
    9 366 1 0.3%
    11 1128 61 5.4%

    In Table 6.7 we present the rate of credit card payments as estimated from the rate of token expenditure. The relative percentages are consistent with our previous estimates for these institutions. The estimated number of requests for credit card payment are about twice as high as the estimates in Table 6.6. One possible explanation is that when users know they are going to face a credit card payment request (tokens have run out, which they learn on their first request for an article that is not prepaid) they may make fewer attempts to access material, which would be another measure of the effect of transaction payments on service usage.

    Table 6.8: Effect of token depletion on demand for paid content
    Institution 3 Institution 3 Institution 9 Institution 11
    1998 1999 1999 1999
    30 days prior 13.6 18.4 20.2 16.0
    30 days after 0.25 0.29 0.00 0.35
    Percentage Decrease -98.2% -98.4% -100.0% -97.8%
    NOTE: Units: Normalized paid access per 100 unmetered accesses.

    To further quantify the decrease in demand for paid access resulting from a depletion of tokens, in Table 6.8 we present the normalized accesses of metered content per hundred accesses of free content at these institutions for the 30 days prior and subsequent to running out of tokens. Usage plummeted after tokens ran out and users were required to pay per article for access to metered content.

    Summary: Effects of user costs

    The results we presented in this section demonstrate that increases in user costs substantially diminish demand for paid content. In particular, the decisions made by thousands of users demonstrate that non-pecuniary costs, such as password use, have an impact on demand that is of the same order of magnitude as direct monetary costs.

    6.5 Effects of Expected User Cost on Access

    As we showed in Table 6.4, at most institutions actual paid usage when users directly paid the user cost was substantially below predicted usage with zero user costs. Users at institution 10 were notable exceptions. We hypothesized that users at this institution might have expected to bear more cost, and they were willing to pay more often when confronted with costs. We explore this hypothesis in this section.

    According to our hypothesis, the frequency with which users are asked to pay for content will affect a user's ex ante estimation of how much she will need to pay. This effect on her estimate can stem from either her previous direct experience, or through "word of mouth" learning. It is our hypothesis that the expected access cost affected the probability that a user paid for access when requested.

    We have two conjectures about user behavior that would cause willingness to pay to depend on prior expectations about cost. The first concerns an induced selection bias. The higher the expected cost to access an article, the fewer the users who will even attempt to access the information via PEAK. In particular, users with a low expected benefit for an article will generally be less likely to use PEAK at all. The result would be that those who do use PEAK are more likely to pay necessary article access fees. Our second conjecture is that context of the request for payment matters, i.e. there is a "framing" effect. It is possible that if a user is habituated to receiving something for free, she will be resistant to paying for that object, even if her expected benefit is greater than the actual cost.[18] Unfortunately, the data that we have do not permit us to distinguish between these two scenarios.

    Table 6.9: Effect of subscription coverage on paid access
    Institution Normalized paid accesses per 100 unmetered Estimated expected rate of zero cost access Percent who log in when requested
    3 13.5 83.6% 8.48%
    10 31.7 6.9% 13.5%
    11 7.6 74.2% 2.6 %
    12 26.4 11.1% 7.1%
    14 15.1 31.4% 29.6%
    NOTE: See text for more complete definitions of the variables.
    Correlation coefficients: Paid access and % of unmetered in subscription base: -0.87
    Prompted login and % of unmetered in subscription base: -0.36

    In Table 6.9 we present some evidence that users' expectations do matter. To explore this hypothesis, we rely on the difference in user cost between accesses to traditional subscription material (no password required) and generalized subscription material (password required). Therefore, we report all institutions at which password entry was required in order to spend a generalized subscription token, plus institution 14, at which users faced similar costs. We use accesses of unmetered content—which has zero incremental user cost for all material, whether in traditional subscriptions or not—as our comparison benchmark. In the second column we report the forecast of unmetered content accesses that were contained within the institution's traditional subscription base. We use this as an estimate of the user's expected user cost of access. For example, if 75% of unmetered access came from traditional subscription material, then we estimate that the user also expects 75% of her demand for metered material to be from traditional subscriptions (with zero incremental user cost), and only 25% of requests for metered material to involve the password user cost (for generalized subscription content).

    In the last two columns we present measures of user willingness to bear user cost. The institution's normalized paid access is a scaled measure of the rate at which (metered) generalized subscription material was accrued (and thus how soon the password cost was incurred). The pecent who login when requested is another measure of user willingness to bear the password user cost.

    The data are consistent with our hypothesis that users with lower expected access costs (see column 2) will be less likely to bear the user cost of password retrieval and entry. The correlation between the expected rate of zero-cost access and normalized paid access is -0.87. We also see a negative correlation of -0.36 between the expected rate of zero cost access and willingness to enter a password when requested.

    6.6 Improving library budgeting with usage information

    Librarians are in an unenviable position when they select subscriptions to scholarly journals.[19] They must determine which journals best match the needs and interests of their community subject to two important constraints. The budgetary constraint has become increasingly binding because renewal costs have risen faster than serial budgets Haar (1999). The second constraint is that libraries have incomplete information about community needs. A traditional print subscription forces libraries to purchase publisher-selected bundles of information (the journal), while users are interested primarily in the articles therein. Users only read a small fraction of articles,[20] and the library generally lacks information about which articles the community values. Further compounding the problem, a library makes an ex ante (before publication) decision about the value of a bundle, while the actual value is realized ex post.

    The PEAK electronic access products relaxed these constraints. First, users had low-cost access to articles in journals to which the institution did not subscribe. This appeared to be important: at institutions that purchased traditional subscriptions, 37% of the most accessed articles in 1998 were outside the institution's traditional subscription base. This figure was 50% in 1999. Second, the transaction logs that are feasible for electronic access allowed us to provide libraries with monthly reports not only on which journals their community valued, but also which articles. Detailed usage reporting should enable libraries to provide additional value to their communities. They can better allocate their serials budgets to the most valued journal titles or to other access products.

    In this section we present analyses of the extent to which improved information available from an electronic usage system could lead to reduced expenditures and better service.

    Improved budgeting with improved usage forecasts

    We first estimate an upper bound on how much the libraries could benefit from better usage data. We analyze each institution's accesses to determine what would have been its optimal bundle if it had been able to perfectly forecast which material would be accessed. We then calculate how much this bundle would have cost the institution, and compare this perfect foresight cost with the institution's actual expenditures. Obviously even with extensive historical data, libraries would not be able to perfectly forecast future usage, so the realized efficiencies from better usage data would be less. Below we analyze how the libraries used the information from 1998 to change their purchasing decisions in 1999.

    We present these results by access product in Table 6.10. We found that actual expenditures were markedly higher than optimal purchases in 1998. In particular, institutions in the Red and Blue groups purchased far more traditional subscriptions than would be justified if they had perfect foresight. Most institutions purchased more generalized subscriptions than would have been optimal with perfect foresight. We believe that much of the budgeting "error" can be explained by a few factors:

    • First, institutions overestimated demand for access, particularly for journals for which they purchased traditional subscriptions.[21]

    • Second, institutional practices, such as "use it or lose it" budgeting and a preference for fixed, predictable expenditures, might have affected decisions. A preference for predictable expenditures would induce a library to rely more heavily on traditional and generalized subscriptions, and less on reimbursed individual article purchases or interlibrary loan.[22] However, Kantor et. al. (2001) Kantor et al. (this volume) report the opposite: that libraries dislike bundles because they perceive them as forcing expenditures for low-value items.

    • Third, because demand foresight is necessarily important, libraries might want to "over-purchase" to provide insurance against higher than expected usage demand. Of course, per-article purchases (possibly reimbursed to users) provide insurance (as does an interlibrary loan agreement), but at a higher cost per article than pre-purchased generalized subscription tokens, or than traditional subscriptions.

    Table 6.10: Actual versus optimal expenditures per access product for 1998-1999
    Access Product Totals
    Traditional Generalized Per Article
    Year Instid Actual Optimal Actual Optimal Actual Optimal Actual Optimal $ Savings % Savings
    1998 3 25,000 17,000 2,740 3,836 7 133 27,747 20,969 6,778 24.43%
    5 N/A 0 15,344 6,576 0 169 15,344 6,745 8,599 56.04%
    6 N/A 0 0 548 672 0 672 548 124 18.45%
    7 N/A 0 24,660 12,604 0 0 24,660 12,604 12,056 48.89%
    8 N/A 0 13,700 2,740 0 0 13,700 2,740 10,960 80.00%
    9 0 556 13,700 6,576 0 56 13,700 7,188 6,512 47.53%
    10 4,960 323 8,220 7,672 0 483 13,180 8,478 4,701 35.67%
    11 70,056 5,217 2,192 13,700 0 84 72,248 19,001 53,247 73.70%
    12 2,352 107 2,192 1,096 0 98 4,544 1,301 3,243 71.37%
    13 28,504 139 N/A 0 952 1,120 29,456 1,259 28,197 95.73%
    14 17,671 0 N/A 0 294 504 17,965 504 17,461 97.19%
    15 18,476 0 N/A 0 0 1,176 18,476 1,176 17,300 93.63%
    1999 3 12,500 10,528 2,740 1,096 84 0 15,324 11,624 3,699 24.14%
    5 N/A 0 8,768 2,740 0 399 8,708 3,139 8,708 63.96%
    6 N/A 0 0 548 686 0 686 548 138 20.12%
    7 N/A 0 10,960 9,864 0 511 10,960 10,375 585 5.34%
    8 N/A 0 6,028 5,480 0 462 6,028 5,942 86 1.43%
    9 0 278 7,124 6,576 7 182 7,131 7,036 94 1.33%
    10 2,480 1,401 8,768 6,576 0 210 11,247 8,187 3,060 27.21%
    11 0 576 4,384 2,740 427 532 4,559 3,848 711 15.60%
    12 0 0 1,644 548 0 539 1,644 1,087 557 33.88%
    13 9,635 7,661 N/A 0 19,964 7,175 29,599 14,836 14,763 49.88%
    14 0 0 N/A 0 623 623 623 623 0 0%
    15 8,992 1,058 N/A 0 511 1,694 9,502 2,751 6,751 71.04%
    Table 6.11: Predicted vs. actual direction of expenditure change for traditional and generalized subscriptions (by institution, 1998-99).
    Change in expenditure 1998-99
    Traditional Generalized
    Institution Predicted Actual Predicted Actual
    3 - 0 + +
    5 N/A N/A - -
    6 N/A N/A + 0
    7 N/A N/A - -
    8 N/A N/A - -
    9 + 0 - -
    10 - 0 - +
    11 - - + +
    12 - - - +
    13 - 0 N/A N/A
    14 - 0 N/A N/A
    15 - + N/A N/A
    NOTE: Predicted change direction is based on whether institution over- or under-purchased that product in 1999.
    "0" indicates no change; "N/A" indicates the access product was not available to that institution; "+" and "-" indicate an increase and decrease, respectively.

    We also analyzed changes in purchasing behavior from the first to the second year of the project. The PEAK team provided participating institutions with regular reports detailing usage. We hypothesized that librarian decisions about purchasing access products for the second year (1999) might be consistent with a simple learning dynamic: increase expenditures on products under-purchased in 1998 and decrease expenditures on products they over-purchased in 1998. For each institution we compared the direction of 1998-99 expenditure change for each access product to the change we hypothesized.[23] We present the results in Table 6.11.

    Six of the nine institutions adjusted the number of generalized subscriptions in a manner consistent with our hypothesis.[24] Fewer adjusted traditional subscriptions in the predicted direction. Two of the seven institutions that purchased more traditional subscriptions in 1998 than was ex post optimal then decreased the number purchased in 1999. Indeed, only three of the eight institutions made any changes at all to their traditional subscription lineup. This suggests an inertia that cannot be explained solely by direct costs to the institution. Perhaps libraries see a greater insurance value in having certain titles freely available through traditional subscriptions than from having generalized subscription tokens available that can be used on articles from any title. Generalized subscription tokens are also more expensive per article than traditional subscription prices, so the libraries are purchasing more potential usage with their budgets. Another explanation might be that libraries were more cautious about purchasing generalized subscriptions because it was a less familiar product.

    Table 6.12: Estimation results for forecast error
    Independent variable Coefficient (standard error)
    Year 1999 -35.7*
    (9.3)
    Green 54.6*
    (10.0)
    Red 53.3*
    (8.1)
    Blue 85.8*
    (9.2)
    Sample Size 24
    R2 0.85
    NOTE: Dependent variable is forecast error (in percent).
    No constant term is included in the regressions.
    Standard errors are shown in parentheses.
    * Significant at the 99% level.

    We performed a regression analysis to assess the differences between apparent over-purchasing in 1998 and 1999. Our dependent variable was the difference between the perfect forecast expenditure and actual expenditure, which we call the "forecast error". In Table 6.12 we report the effects of learning (the change in the error for 1999) and the average differences across experimental groups. The perfect foresight overspending over the life of the project averaged between 53% (Red) and 86% (Blue). However, the overspending was on average 36 percentage points lower in 1999. This represents a reduction of about one-half in perfect foresight overspending.[25]

    We also considered other control variables, such as the institution's level of expenditures, fraction of the year participating in the experiment and number of potential users, but their contribution to explaining the forecast error was not statistically significant. The between-group variation and the 1999 improvement account for about 85% of the variation, as measured by the R2 statistic.

    Decisions about specific titles

    In addition to comparing the total number of subscriptions for an institution with the optimal number, we can also identify the optimality for each particular title subscribed. We calculate, based on observed usage and prices, which titles an institution with perfect foresight should have obtained through traditional subscriptions, and call this the optimal set. Then we calculate two measures of actual behavior. First, we determine which titles in the optimal set an institution actually purchased. Second, we determine which traditional subscription titles the institution would have been better off foregoing because actual access would have been less expensive using other available access products.

    In Table 6.13 we present our analysis of the traditional subscription titles selected by institutions. There is wide variation both in the percent of purchased subscriptions that are in the optimal set, and in the percent of journals in the optimal set to which the institution did not subscribe,[26] Overall, there is substantial opportunity for improvement. This is not a criticism of institutional decisions. Rather, it indicates the opportunity for improved purchasing decisions if libraries obtain the type of detailed usage information PEAK provided.

    We do generally see better decisions in 1999. However, in both years a rather large percentage of subscribed journals were not accessed at all.

    Table 6.13: Optimality of subscription choices
    Institution Year Total subscriptions Percent subscribed that are in optimal set Percent of optimal set that were not subscribed Percent of subscriptions accessed at least once
    3 1998 907 53.3% 3.4% 92.5%
    10 1998 23 0.0% 100.0% 65.2%
    11 1998 663 3.6% 0.0% 84.5%
    12 1998 22 0.0% 100.0% 81.8%
    13 1998 205 0.5% 0.0% 12.7%
    14 1998 72 0.0% N/A 36.1%
    15 1998 102 0.0% N/A 48.0%
    3 1999 907 75.0% 7.7% 97.0%
    10 1999 23 13.0% 76.9% 65.2%
    13 1999 205 29.8% 62.6% 86.8%
    14 1999 72 0.0% N/A 20.8%
    15 1998 102 10.8% 8.3% 84.3%
    Dynamic Optimal Choice

    Access product purchasing decisions made by institutions have a profound impact on the costs faced by users, and thus on the realized demand for access. Therefore, in deciding what access products, electronic or otherwise, to purchase, an institution must not only consider the demand realized for a particular level of user cost, but also what would be demanded at differing levels of user costs. Likewise, in our determination of the optimal bundle of access products, we should not take the given set of accesses as fixed and exogenous. As a simple example, let us assume that a subscription to a given journal requires 25 accesses in order to pay for itself. Now assume that the institution in question did not subscribe to that journal, and that 20 tokens were used to access articles in the time period. At first look, it appears as though the institutions did the optimal thing. Let us assume, however, that we know that accesses would increase by 50%, to 30, when no password is required. It now appears as though the institution should have subscribed, since the reduced user costs would stimulate sufficient demand to justify these higher costs.

    Table 6.14: Optimal bundles with barrier-free access: Selected institutions
    Institution Year Trad. Subscriptions Addit. Articles Increase Total
    Actual Optimal Rescaled Optimal Actual Optimal Rescaled Optimal Optimal Cost Access Increase
    3 1998 500 556 1099 1130 9.39% 12.53%
    3 1999 737 805 236 146 4.85% 7.46%
    11 1998 24 31 2532 3019 21.11% 21.09%
    12 1998 1 1 254 287 17.76% 13.67%
    14 1999 0 0 168 249 48.21% 48.21%
    15 1999 12 17 242 366 47.56% 60.36%

    In Table 6.4 we reported results that allow us to estimate how much usage would increase if no passwords or other user costs were incurred. We now calculate the product purchases that would have optimally matched the usage demand that we estimate would have occurred had the library removed or absorbed all user costs. We report the results in Table 6.14.[27] For most institutions, the optimal number of journal subscriptions increases, because greater usage makes the subscription more valuable. In general, the estimated institution cost of the optimal bundle would not increase greatly to accommodate the usage increase that would follow from eliminating user costs. Although we cannot quantify a dollar value for the eliminated user costs (because they include nonpecuniary costs such as those from requiring a password), we show in the last two columns that the modest institutional cost increase would be accompanied by comparable or larger increases in usage. The greatest cost increase (48%) occurs for the institutions (14 and 15) at which generalized subscription tokens were not available and the institution did not directly subsidize the per-article fee, i.e. at those institutions where users faced the highest user costs. Thus, the higher institutional costs should be weighed against high savings in user costs (including money spent on per-article purchases).

    6.7 Conclusion

    Experience from the early years of electronic commerce indicates that low user costs—nonpecuniary as well as pecuniary—are critical to the success of electronic distribution systems. In the PEAK experiment, we have evidence that for the information goods in question, these non-pecuniary costs are of the same magnitude as significant pecuniary costs. In a two-tiered decision problem such as in this project, where intermediaries determine the user costs required to access specific content, both the quantity and composition of demand is greatly affected by users' reactions to these costs. Therefore any determination of what the intermediary "ought" to do must take these effects into account. Furthermore, we have initial evidence that suggests that users who come to expect information at zero marginal cost are far less likely to pay these non-monetary costs when requested than their counterparts who expect these costs. This finding is of great import to both those who design electronic information delivery and pricing systems as well as any intermediaries controlling information access and costs.

    In the second part of the chapter we investigated the extent to which libraries could have improved their purchasing decisions if they had detailed usage information that provided a reliable basis for forecasting future usage. We found that with perfect foresight about next year's usage, libraries could have substantially reduced their expenditures. They could also have substantially improved the match between what titles they purchased and what articles users want to access.

    We then linked the two sets of analyses by showing how much greater usage would be if the library absorbed or removed the pecuniary and non-pecuniary user costs we observed. The result would be substantial increases in usage. The library expenditures would have to increase by comparable percentage amounts; however the institution should recognize that these costs would be offset by the lower user costs incurred by its constituents, and the net cost, if any, would support substantial increases in usage.

    Notes

    1. See MacKie-Mason and Riveros (2000) for a discussion of the economics of electronic publishing.return to text

    2. See Bonn et al. (this volume) and Hunter (this volume) for accounts of the genesis of this project.return to text

    3. Kingma (this volume) provides a good discussion of the role of library as intermediary.return to text

    4. As we further discuss below, user cost may include several components only one of which is a standard price. The other components may include, for example, time and inconvenience. We expect these user costs, taken together, and not price alone, to determine usage.return to text

    5. 120 is the approximate average number of articles in a traditional printed journal for a given year. We refer to this bundle of options to access articles as a set of tokens, with one token used for each article added to the generalized subscription during the year.return to text

    6. For example, a Green institution first decides how many generalized subcriptions to purchse (if any). Users then access articles using generalized subscription "tokens" at zero pecuniary cost until the tokens run out, and thereafter pay a fee per article for additional articles. The library determines how many articles (not which articles) are available at the two different prices.return to text

    7. To access PEAK from other IP addresses, users entered a password. Once access was granted, all content in these categories was available without further user cost.return to text

    8. In the first eight months of the experiment, users paid with a First Virtual VPIN account, rather than with a credit card. Because a VPIN was an unfamiliar product, the non-pecuniary costs were probably higher than for credit card usage, although formally the user needed to undertake the same steps.return to text

    9. When the user accessed an article for which per-article payment was required, the institution was automatically billed by the PEAK service.return to text

    10. Paid content is metered content not including article in journals to which an institution purchased a traditional subscription.return to text

    11. Formally, Normalized Paid Access is equal to , where Apaid is the total number of paid accesses, Aunmetered the total number of unmetered accesses, and Scale is equal total number of free accesses divided by the total number of accesses of free content in journals to which the institution does not have a traditional subscription. We multiply by Scale because the more that accesses are covered by traditional subscriptions, the less likely a user is to require paid access. Scaling by access to unmetered content also controls for different overall usage intensity (due to different numbers of active users, differences in the composition of users, differences in research orientation, differences in user education about PEAK, etc.). Unmetered accesses proxies for the number of user sessions, and therefore our statistic is an estimate of paid accesses per session.return to text

    12. Only 28% of unmetered accesses from Red group users were password authenticated. This suggests that a large majority of users attempting to access paid content would not already be password authenticated. For these users, the need to password authenticate would truly be a marginal cost.return to text

    13. The Elsevier journal catalogue is especially strong in these subject areas, so we expect differences in usage when the subject area concentration of the user community differs.return to text

    14. In only two cases were credit cards are required, and both were at the same institution.return to text

    15. Recall that all users at an institution could access, without password authentication, any article previously purchased by that institution with a generalized token. For articles purchased on a per-article basis, only the individual who purchased the article could view it without further monetary cost.return to text

    16. The libraries at institutions 3 and 11 processed these requests electronically, through PEAK, while the library at institution 9 did not and thus incurred greater processing delays.return to text

    17. In addition, institution 6 is a corporate institution. It is possible that its users' budgetary constraints were not as binding as those associated with academic institutions.return to text

    18. This phenomenon was widely discussed—though not, to our knowledge, sufficiently demonstrated—during the early years of widespread public access on the Internet. Many businesses and commentators asked whether users would pay for any content after being accustomed to getting most Internet-delivered information for free.return to text

    19. For an excellent discussion of the collection development officer's problem, see Haar (1999)return to text

    20. The percentage of articles read through June 1999 for academic institutions participating in PEAK ranged from .12% to 6.40%. An empirical study by King and Griffiths (1995) found that about 43.6% of users who read a journal read five or fewer articles from the journal and 78% of the readers read 10 or fewer articles.return to text

    21. Project implementation delays exacerbated the demand forecasting problem. For example, none of the institutions in the Blue Group started the project until the third quarter of the year.return to text

    22. With print publications and some electronic products libraries may be willing to spend more on full journal subscriptions to create complete archival collections. All access to PEAK materials ended in August 1999, however, so archival value should not have played a role in decision making.return to text

    23. As 1999 PEAK access is for 8 months, the number of 1999 generalized subscriptions was multiplied by 1.5 for comparison with 1998.return to text

    24. One of the institutions that increased token purchases despite over purchasing in 1998 was more foresightful than our simple learning model: its usage increased so much that it ran out of tokens less than six months into the final eight-month period of the experiment.return to text

    25. E.g., the Green group had average overspending of about 55% so a 36-point change represents a shift from about 73% in 1998 to about 37% in 1999.return to text

    26. The calculations in the two columns are independent and should not generally sum to one. The first column indicates the percent of titles that were subscribed that should have been subscribed (given perfect foresight). A high percent means there were not many specific titles subscribed that should not have been. However, this does not indicate that a library subscribed to most of the titles that it should have. A library that subscribes to zero journals will get 100% on this measure: no journals were subscribed that should not have been. The second column addresses this question: what percent of those titles that should have been subscribed were missed? The two columns correspond to Type I and Type II error in classical statistical theory. The first should be high, and the second low if the institution is forecasting well (and following our simple model of "optimal" practice).return to text

    27. We performed the calculation for those institutions for which we have a good estimate of the user cost effect (see Table 6.4), and for which there were enough article accesses for meaningful estimation.return to text

    III. Digital Publishing Economics

    In this section of the book several distinguished authors address the economics of scholarly journals in the digital age. Our authors include academics, publishers and leaders of innovative not-for-profit projects. They focus on the economic issues facing publishers and academic libraries, with some attention to other stakeholders.

    These diverse authors agree on the fundamental facts. Case (chapter 12) provides a pithy summary: "The library community has been faced with high and ever-rising prices for scholarly resources. A number of factors have contributed to this situation, most fundamentally, the commercialization of scholarly publishing". King and Tenopir (chapter 8) document the rising prices, calculating several different metrics based on one of the most extensive data sets ever collected on scholarly publishing. McCabe (chapter 11) and Case also report evidence on the faster-than-inflation price increases for scholarly journals.

    The price increases, particularly from commercial publishers, are not in dispute; the important question is what forces are driving these price increases? In their chapters McCabe, and King and Tenopir, analyze data to assess the extent to which different causes explain rising prices. King and Tenopir find some of the increase in prices is due to rising costs. For example, the number of articles per journal has increased. More subtly, the average number of subscribers for new (often more specialized) journals has been decreasing, necessitating higher average prices per page to recover the fixed costs of publishing a journal. However, King and Tenopir conclude that only part of observed price increases can be explained by higher costs.

    McCabe uses multivariate statistical methods to analyze his data on journal prices. Like King and Tenopir, he concludes that costs can explain part but not all of the the rise in journal prices. He also finds evidence that increases in quality, which in turn increase production costs, also explain part of the price increases. However, McCabe further concludes that commercial publishers have gained what economists call market power, which is the power to raise prices above the level necessary to recover costs and earn a normal rate of profit. He finds that increases in market power can explain a substantial share of the gap between price and cost increases.

    Spinella reports on the role of cost increases from a publisher's perspective. Many observers have written that with all-electronic publishing, variable (per copy, or per subscriber) costs are approximately zero, focusing the issue of pricing entirely on the fixed costs of publishing. Spinella points out that this characterization is generally incorrect. For example, all-electronic publications have variable costs from maintaining subscriber records, sending renewal notices and bills, providing subscriber services, and so forth. He also notes that the importance of variable costs differs across journals, depending on such factors as the frequency of publication, the size of their circulation, and whether they distribute globally. Further, digital publishing usually is accompanied by the creation of new reader and author services, many of which generate new variable costs of their own.

    To the extent that price increases are driven by rising quality and production costs, libraries and other buyers are getting what they pay for. Improvements in price necessarily depend on system improvements that reduce costs. Each author in this section agrees that the shift to electronic publication offers some hope for tempering costs. On the other hand, new service possibilities available from digital publishing (for example, hyper-linking from bibliographic references to the referenced article) may only be obtained at higher cost (Spinella, chapter 10). In a well-functioning market, library and reader demand should determine the extent to which publishers create new services that require higher prices: if purchasers find the services to be worth less than they cost, competitive publishers will not develop or offer the services.

    However, if McCabe and Case are right that a substantial share of cost increases are due to limited market competition, then cost reductions alone will not abate the path of rising prices. When the market is not highly competitive, publishers with market power may choose to bundle in more high-priced new services than purchasers actually want, or publishers may simply raise prices above cost for traditional services. McCabe, Case, and Halliday and Oppenheim (chapter 9) each explore the role of competition in the pricing of scholarly journals.

    Case describes the Scholarly Publishing and Academic Resources Coalition (SPARC) which is a project to increase competition, particularly in the market for science, technology and medical journals. SPARC is a collective of libraries and scholarly societies. One of its programs is the development of new journals targeted to compete with especially high-priced commercially published journals. Case argues that pressure from efforts like SPARC has started to have an effect. McCabe conjectures that the characteristics of electronic publishing are especially well-suited to reduce publisher market power.

    Halliday and Oppenheim construct an economic/business process model of journal publishing to analyze the impact of different cost-generating activities on prices. They model three different pricing scenarios: a traditional commercial model, a non-commercial author-pays model, and a free market model with payments to and from authors as well as readers. They conclude that, given the cost structure of digital publishing, the traditional professional publisher model is most efficient. However, the sharing of the benefits of cost efficiency between authors, readers and publisher shareholders depends on the competitiveness of the publishing market. If market power is substantial, authors and readers might be better off with less efficient publishing business models in which they receive a higher share of the value created.

    Krichel (chapter 13) describes a working example of a non-commercial effort to reduce costs and and improve access. Research Papers in Economics (RePEC) is an "open" digital library: open to contribution by third parties, and open to implementation of new user services. The core idea is to create a centralized metadata library that facilitates access to the distributed "library" of scholarly articles (in the field of economics, in this case) available across the Internet. Although the protocols have been implemented and the database has received a large number of contributions, Krichel notes that a number of economic issues remain that will determine the viability of his open library model. In particular, it is not clear whether the incentives for users to contribute are sufficient to establish critical mass in various scholarly fields. There is also no business model to recover the costs of quality control on the metadata repository.

    Taken together, the chapters in this section document the facts about rising prices, provide evidence on the causes for these increases, and investigate a variety of approaches to improve the situation. If prices for scholarly publications continue to rise faster than general inflation, there is no question that research libraries will either have to reduce access (that is, acquire access to a smaller fraction of published material), or reduce spending on other valuable library services. These authors have tackled one of the fundamental troublesome puzzles of the digital information age: how do we ensure that access increases rather than decreases during a time when the quantity of information produced is increasing and the technologies for using that information are improving?

    8. Scholarly Journal and Digital Database Pricing: Threat or Opportunity?

    8.1 Introduction

    For over three centuries, scientific scholarly journals have demonstrated remarkable stability. A large number of studies performed during the past few decades have shown their continued use, usefulness, and value. However, two phenomena have evolved over the last thirty years that have the potential either of destroying the scholarly journal system or substantially enhancing its considerable usefulness and value. These two phenomena are the maturation and integration of communication technologies and the economics of the journal system, particularly pricing of traditional journal subscriptions and access to digital full-text databases through site licensing and package "deals". Certainly, the new technologies should, if deployed with care, enhance the journal system (e.g., Tenopir et al., 2003), but contemporary pricing policies have been a greater threat to the journal system. Up to the mid-1990's rapid, and little understood, price rises posed a significant threat to the system and, and then more recently, policies of site licensing and negotiated journal packages have become commonplace even though little is known as to their sustainability.

    The early pricing policies resulted in substantially reduced personal subscriptions, increased reliance on library access, library prices raised far higher than inflation or increased journal sizes would warrant, and libraries and scientists having to rely more heavily on obtaining separate copies of articles through interlibrary loan, document delivery, preprints, reprints, and photocopies or electronic copies from authors and colleagues. Recently, most academic libraries in the U.S. and many other types of libraries have negotiated licenses with individual publishers, library consortia, and other vendors to obtain access to multiple journals. While there are appreciable benefits to both publishers and libraries of such arrangements (King and Xu, 2003), there are considerable concerns as well (Frazier, 2001). One concern is that negotiation seems to vary from deal-to-deal and it is not at all clear that long-term revenue to publishers will be sufficient. In this chapter, we discuss the early pricing policies and why prices spiraled upward and we show that problems leading to this dilemma are also inherent to the current licensing policies.

    This chapter provides some insights gained from analysis of over 15,000 responses from readership surveys of scientists; cost analysis of publishing, library services and scientists' communication patterns; tracking of a sample of scholarly journals from 1960 to 2002; and review of over 600 publications dealing with scientific scholarly journals. This chapter will attempt to dispel some myths concerning communication costs, system participants' incentives, and reasons for increased prices. It will also present perspectives on pricing that might help in an electronic age and provide some suggestions concerning subscription pricing, site licensing, and online access to separate copies of articles.

    8.2 Are Scientific Scholarly Journals Worth Saving?

    Over the years there have been a number of skeptics regarding the use, usefulness, and value of scientific scholarly journals. However, since the 1950s, there have been over twenty studies that show that scientists in general rely more on journals than any other source for their information, although this is not true for engineers or "technologists" (King and Tenopir, 2000; Tenopir and King, 2004). Consider evidence from surveys of scientists conducted by King Research from 1977 to 1998, the University of Tennessee School of Information Sciences 2000 and 2001, Drexel University 2002, and University of Pittsburgh 2003. A 1977 national survey of scientists showed that they averaged 105 readings of scholarly journals per scientist per year, and a follow-up survey in 1984 revealed about 115 readings per scientist; several surveys in organizations from 1993 to 1998 yielded combined estimates of 120 readings; and surveys in 2000-2003 resulted in a weighted average of 134 readings, thus suggesting that amount of reading might have increased over the years.[1] Extrapolated to the entire population of scientists and articles published, these data indicate that the average readings per article was about 640 readings per article in 1977 and about 900 readings in the late 1990s. Three studies in the 1960s and 1970s estimated the amount of reading per article by asking sampled scientists to indicate which articles listed on recently published tables of contents they had read. Average readings per article, extrapolated to the population of scientists sampled, showed that psychology articles averaged 520 readings per article (Garvey and Griffith, 1963), economic articles averaged 1,240 readings (Machlup and Leeson, 1978), and Journal of the National Cancer Institute articles averaged 1,800 readings per article[2] (King, McDonald, and Olsen, 1978), or 756,000 readings for the entire volume of 12 issues. Thus, there is ample evidence that scientists read many scholarly articles and that journals are well read.[3]

    Scholarly articles are read for many purposes ranging from supporting specific research projects and teaching to administrative purposes. They are also read by people wanting to keep current in their disciplines. A number of studies have shown the importance of scholarly articles for these and other purposes. Our recent surveys of university scientists show that readings for teaching purposes are rated high in importance (5.10 on a scale of 1-not at all important to 7-absolutely essential) while readings for research are rated even higher (5.32). One-third of the readings are said to be "absolutely essential" to the teaching or research. Similar results are observed in surveys of non-academic scientists, who individually read fewer articles than university scientists, but totally account for about three-fourths of all reading due to the overwhelming number of these scientists.

    Machlup (1979) defines two types of value of the information provided by scholarly journals: purchase value and use value. Purchase value is what scientists are willing to pay for the information in monies exchanged and time expended in obtaining and reading the information. The purchase value expended on scholarly journal information exceeds $5,400 per year per scientist, most of which involves their time spent obtaining and reading the information. In fact, the price paid in scientists' time tends to be five to ten times the price paid in purchasing journals, separate copies of articles, and other journal-related services. Of twenty studies by various researchers that provide estimates of time spent reading, the median time spent is 9.0 hours per month or about 108 hours per year per scientist. Our recent surveys show that scientists annually spend about 130 hours reading scholarly articles, up from 80 hours in 1977. Also, scientists are spending more time obtaining articles because they more often use library-provided articles than their own personal subscriptions (more is said about this later).

    Use value involves the outcomes or consequences of using scholarly journal information. Examples of use value from our surveys include evidence of producing work with greater quality, faster, or at a lower cost in time or money. Several studies, dating back to the 1950s, have shown that amount of reading is correlated with productivity. Our surveys established that amount of reading is positively correlated with five indicators of productivity (i.e., outputs and input time measured in five ways) (Griffiths and King, 1993). Another indicator of use value is that scientists whose work has been formally recognized through awards, special assignments, or designated by personnel department (for our survey purposes) tend to read more than others.[4] This was observed in the 1960s (Lufkin and Miller, 1966) and was invariably observed in 21 of our surveys. Thus, there is also abundant evidence of the purchase and use values of scholarly journals, and one must conclude that any changes in the future should ensure that the use, usefulness, and value of scholarly journals be retained.

    8.3 Scholarly Journals Examined from a Systems Perspective

    In the late 1970s King Research performed a series of studies for the National Science Foundation on scientific and technical information communication, with particular emphasis on scientific scholarly journals.[5] As part of these studies we identified and characterized all the principal functions performed in the journal system, participants who performed the functions, and hundreds of detailed activities necessary to perform the many functions. For each activity we established quantities of output and amount of resources required (with dollar amounts placed on the resources). We traced the flow of messages transmitted among participants, which, in 1978, numbered in the billions. We also examined all of the activities in terms of the introduction of evolving technologies to assess when comprehensive electronic journals were likely to become commonplace.

    As a result of our 1978 systems study we indicated that:

    Recent technological advances, which were developed largely independently of the scientific and technical communication, provide all the components of a comprehensive electronic journal system. Such a system would provide enormous flexibility, particularly because individual articles can be distributed in the most economically advantageous manner. Much-read articles may still be distributed in paper form, and infrequently read articles can be requested and quickly received by telecommunication when they are needed (King et al., 1981).

    We went on to say that:

    This comprehensive electronic journal system is highly desirable and currently achievable. It is believed that within the next twenty years, a majority of articles will be handled by at least some electronic processes throughout but not all articles will be incorporated into a comprehensive electronic journal system.

    At that time (1978), some communications researchers scoffed at this "pessimistic" view of when electronic journals would become widespread, and some at NSF were disappointed because other studies forecast much quicker implementation of electronic journals.

    One aspect of the systems analysis done at the time was to sum the resource costs applied to all the activities identified in order to establish an overall journal system cost in the U.S. In 1975 we estimated the total amount of resources expended that year on scientific journals to be $5.05 billion (or about $15.6 billion in 1998 dollars, considering increases in resource costs). A reasonable estimate of the corresponding total system cost in 1998 is $45 billion.[6] This systems approach ignores the amount of money exchanged between participants, such as the price paid by scientists and libraries for subscriptions purchased, the price paid for online bibliographic searches, fees paid for document delivery services, and so on. Including such transfer payments would only duplicate the costs of system resources applied by publishers, online vendors, and document delivery services. Thus, the additional cost to the U.S. economy (or scientific community) for processing and using scientific journals was another $5.05 billion in 1975 (or $15.6 billion in 1998 dollars) and $45 billion in 1998. The $15.6 billion (1998 dollars) comes to about $7,000 per scientist or about $69 per article reading. In 1998 we estimated the comparable system cost to be about $7,100 per scientist or $59 per reading.

    The 1998 total system cost per scientist ($7,100 per scientist) is sub-divided as follows: authors ($640), publishers ($500), libraries and other intermediaries ($420), and readers ($5,540). Thus, scientists' time spent with writing and reading dominates the total system costs (i.e., 87% of the total costs). The costs per scientist of authorship, publishing, and libraries and other intermediaries have all decreased over time, but readers' cost per scientist has increased. The reader increase in cost per scientist is attributable to an increase in their time spent acquiring and reading articles. The number of personal subscriptions of scientists has decreased by over one-half, with nearly all prior reading from personal subscriptions replaced by reading from library-provided journals. Thus, scientists spend more time obtaining articles, and they also appear to spend more time reading an article (due perhaps to an increase in size of articles as shown later). The decrease in cost per reading is due to relative decreases in library and publishing resources expended.

    The relative resource expenditures of libraries (and other intermediary services) are down, whether calculated by cost per scientist or cost per reading. The library cost per scientist is down because of relative reduction in library budgets, but also because of efficiencies due in part to library automation, resource sharing (King and Xu, 2003), and replacement of print journals by electronic versions (see Section 8.9). The library cost per reading is down due in large part to the increase in the amount of reading from library-provided journals resulting from the shift from personal subscriptions to library-provided articles. For example, from 1977 to the current era, the number of personal subscriptions declined from 5.8 to 2.4 and number of readings from library collections increased from 15 to 66 readings per scientist.

    The relative cost of publishing has apparently also decreased. For example, the cost per page published is down, due in part to use of technologies, increased efficiencies, and increased sizes of journals. The cost per scientist is down, due in part to the factors mentioned above, but also to the fact that there is an average of over three fewer subscriptions circulated per scientist. The publishing cost per reading is also down due, in addition to the factors above, to a greater amount of reading. This discussion of the systems perspective causes us to ask this question: Why have average prices risen by a factor of nearly nine over a period of time during which the relative cost of publishing has actually decreased?

    8.4 To Understand Price One Must Understand Publishing Costs

    While there have been literally hundreds of articles written about the price of scholarly journals in recent years, very little has been written about the cost of publishing journals. To understand why prices are what they are, one must know about the cost of publishing journals. One reason that costs are not often discussed in the literature is that publishers do not want their competitors to know their costs. Also, costs vary a great deal among journals, depending on the characteristics of journals such as manuscript rejection rates, number of articles, number of pages, number of issues, and circulation and the type of resources used such as location and experience of editors, technologies applied, and quality of paper. With that concern in mind, we decided to develop a cost model of journal publishing in order to analyze effects of circulation, changes in characteristics of journals over time, and how such factors might affect the price of journals. We formulated a cost model using data we collected for the 1978 journal systems analysis and more recent pieces of information gleaned from the literature. The model has been reviewed by staff from different types of journal publishers, who found it reasonable with the caveats mentioned above. We also compared our model data with other published data and found them a good source of validation.[7]

    The cost model consists of five functions or groups of activities as follows:

    • Article processing including manuscript receipt processing, initial disposition decision-making, identifying reviewers or referees, review processing, subject editing, special graphic and other preparation, formatting, copy editing, processing author approval, indexing, coding, redaction, and preparation of master images.

    • Non-article processing including many of the same activities involving editorials, letters to the editor, brief communications, and book reviews. It also includes preparation of issue covers (for paper versions), tables of contents, and indices.

    • Reproduction involving printing, collating, binding of issues, and printing for reprints (all of which activities are not necessary for electronic versions).

    • Distribution of paper versions involving wrapping, labeling, sorting by zip code, and mailing; distribution of electronic versions including storage and access. Subscriptions maintenance is required of both versions.

    • Support activities including marketing and promotion, rights management and other legal activities, administration, financing, and other indirect activities.

    In 2002 the average US science journal characteristics were estimated to be 10.8 issues, 154 articles, 213 manuscripts submitted, 1,910 article pages, 397 special graphics, 2,215 total pages, and 4,800 subscriptions.[8] The cost model estimates for these functions are $255,897 for article processing, $22,957 for non-article processing, $215,392 for reproduction and distribution, and $197,908 for support, for a total of $692,154. The article processing cost per article is $1,660 per article and the reproduction and distribution cost per subscription is about $45 per subscription (without allocation of support costs).

    By holding all other journal characteristics and cost parameters constant, we can assess the effects of journal characteristics on the total and unit cost. For example, we find that the cost per hypothetical subscription varies substantially by number of subscribers (see Table 8.1).

    Table 8.1: Publishing Unit Cost Per Subscription by Various Numbers of Subscribers: 2002 Dollars.
    Subscribers Cost per Subscription
    500 $993
    1,000 $519
    2,500 $235
    5,000 $140
    10,000 $93

    The price necessary to recover costs at 500 subscribers is at least $993 per subscriber, but it decreases sharply at the 2,500-5,000 subscription range, at which point the unit costs decrease slowly approaching an asymptote (which is the incremental reproduction and distribution costs). At 500,000 subscribers the cost is $2 above these costs. Of course, in reality the journal characteristics and cost parameters among journals vary. For example, large circulation journals tend to publish more issues, have expensive photos and graphics, reject more manuscripts, and use more expensive covers and paper. Spinella (this volume) makes this point in discussing publication of large circulation journals, such as Science. However, by holding non-circulation characteristics and cost parameters constant we get a good picture of the effect of size of circulation. Halliday and Oppenheim (this volume) present similar results as above, but expand by showing effects of varying overhead and profit levels (which we call support above).

    Similarly, by varying the number of articles published from, say, 50 to 200, we find that cost per subscriber increases from $77 to $172 (at 4,800 subscribers). The direct article processing costs per article do not vary much—$1,747 per article with 50 articles and $1,651 per article with 200 articles in a journal—but the difference in cost per article is substantial when non-article processing, reproduction and distribution, and support functions are included ($7,375 vs. $4,130). Similarly, the cost per article received by subscribers decreases from $1.54 per article with a 50 article journal to $0.86 per article for a journal with 200 articles. That cost per article decreases as journal size increases may be the reason that publishers have steadily increased the size of journals over the years (from an estimated average of 85 articles per title in 1975 to 154 in 2002).

    8.5 What do Average Prices Mean?

    Prior to discussing reasons why journal prices have increased so much, it is worth noting that there are several ways in which one can measure average price. In the literature, average price is nearly always calculated as the average price per title. That is, the prices of a set of journals are summed and divided by the total number of journal titles in the set. This average has specific meaning. For example, it makes sense for an individual library to estimate the average price for their collection in this way, particularly for comparison over time. However, from a total systems perspective it makes more sense to measure average price by the price per subscription. That is, one takes the total price of all journals circulated and divides by the total circulation. This average price is much lower than the average price per title and has a much different meaning. The point can be made through a simple arithmetic example, taking into account that low circulation journals have higher prices due to relatively higher fixed costs. In 1995 we observed the following equal number of journals in four ranges of circulation (i.e., quartiles) and the average circulation observed in each quartile as shown in Table 8.2.[9] In the table we also present the price necessary to recover publishing costs at the average circulation and with the other characteristics and cost parameters mentioned in the previous section held constant.

    Table 8.2: Circulation Quartiles of U.S. Scholarly Journals, Average Circulation, and Price Necessary to Recover Publishing Costs: 1995
    Circulation No. of Journals Avg. Circulation Price
    < 900 1,693 520 $747
    901 - 1,900 1,693 1,310 $316
    1,901 - 5,700 1,693 3,290 $145
    > 5,700 1,693 18,100 $53
    ALL 6,772 5,805 $315
    NOTE: Source: Tenopir and King 2000

    Average price per journal can be roughly estimated by summing the four sets of prices of all journals in each quartile (e.g., $747 x 1,693) and dividing the total of the four quartiles by 6,772 journal titles (recognizing that this estimate is below the real average). As shown, the average cost/price per journal title is $315.

    The average price per subscription is estimated by summing the four sets of prices of all subscriptions in each quartile (e.g., $747 x 1,693 x 520) and dividing the total of the four quartiles by the total number of subscriptions in 1995, which is about 39.3 million (i.e., 6,772 journal titles x 5,805 subscriptions per title). The average price per subscription is $96—far less than the price per journal title ($315). Thus, it is clear that the highly skewed distribution of journal circulation means that large circulation journals dominate average price calculated in this way. Yet this measure of average price is more meaningful when considering the impact of price on the U.S. economy or in terms of examining price trends to the entire scientific community, not just to individual libraries.

    8.6 Reasons Why Journal Subscription Prices Spiraled Upward

    There is overwhelming evidence that individual scholarly journal prices increased dramatically from 1960 to 1995. For example, we sampled 430 U.S. scientific scholarly journals and tracked them from 1960 to 1995.[10] In this sample, prices rose from an average of $8.51 per title in 1960 to $284 in 1995. One particular concern is that the rate of increase accelerated, even in constant dollars. There are many reasons that prices increased in this manner. Okerson (1989) provides an excellent discussion of some reasons for this phenomenon, and below we present some numeric examples as to why prices per title increased so much.

    Some of the high increases in price over these two decades can be explained by inflation and increase in the size of journals. Referring back to the publishing cost model above, one can establish an indication of how much increasing journal size has affected prices over time. As mentioned earlier, the average number of articles published in science journals has increased from 85 to 154 articles per title from 1975 to 2002. Other journal characteristics (e.g., number of issues, pages, special graphics) increased in size as well. By substituting 1975 and 2002 characteristics in the cost model and keeping number of subscriptions and cost parameters at 2002 levels we estimate that the cost per subscriptions for the 1975 size journal is about one-half that of the 2002 journal. Thus there is evidence that the increased size of journals has resulted in a substantial increase in journal publishing cost and, therefore, the necessity to increase prices accordingly.

    A more subtle factor is that the estimated number of scientific scholarly journals increased from 4,447 in 1975 to 6,772 in 1995. Most of the new journals had a small circulation and, therefore, must have a higher-than-average price per title. Consequently, the continued addition of new journals had the effect of increasing average price both per title and per subscription. In fact, journal prices increased at a rate greater than inflation since at least 1960, when there were only 2,815 scientific journals provided by U.S. publishers (Tenopir and King, 2000).

    This phenomenon can be documented by examining the 1975 number of journals in the quartile ranges shown for 1995 above and applying the same calculation of average price per journal title and per subscription as shown in Table 8.3. One can see that in 1995 there were more of the smaller-circulation journals and fewer larger ones.

    Table 8.3: 1995 Circulation Quartiles of U.S. Scholarly Journals and the Number and Proportion of Journals in the Ranges in 1975 and 1995
    No. of Journals Proportion of Journals (%)
    Circulation 1975 1995 1975 1995
    < 900 880 1,693 19.8 25
    901 - 1,900 805 1,693 18.1 25
    1,901 - 5,700 1,579 1,693 35.5 25
    >5,700 1,183 1,693 26.6 25
    ALL 4,447 6,772 100.0 100
    NOTE: Source: Tenopir and King 2000.

    In order to make unbiased comparisons, we again assume that all cost parameters remain the same and that average prices in the four ranges do as well. We find that the average price per journal title of 1975 journals with their circulation would be about $270 per title compared with $315 in 1995. Thus, this average price per journal would have increased about 17 percent due only to the change in distribution of circulation. A much smaller increase is observed in the average price per subscription, from $91 per subscription for 1975 circulation to $96 in 1995.[11] Note that the average circulation per title did not decrease much from 1975 to 1995, from 6,100 to 5,800 subscriptions, but the median dropped from about 2,900 to 1,900 subscriptions.

    The shifts in the distribution of circulation are attributable to more than the influx of new, small-circulation journals. Increased prices had a spiraling effect. As mentioned above, the average number of personal subscriptions per scientist dropped more than 50 percent over a twenty-year period. Had the average remained constant, there would be about 19 million more personal subscriptions than there actually were in 1995. Even at modest personal subscription prices, publishers undoubtedly lost billions in annual revenue from cancelled personal subscriptions, in which case they probably tried to recover the lost revenue through exceptionally high price increases to libraries. They would have been able to do this because library demand is much less sensitive to price changes than personal subscription demand.[12] Both personal and institution (library) prices jumped dramatically in the late 1970s due to high inflation, fluctuating international exchange rates, and other factors. When this happens, subscriptions can decrease even though the number of scientists interested in a discipline continues to increase. With small-circulation journals, decreases in circulation result in an accelerated increase in cost per subscription. For example, if circulation decreases by 100 subscribers from a 2,500 level, the cost to publishers at 2,400 subscribers would be $6 more per subscriber. However, a 100-subscriber decrease from 500 to 400 subscribers would require an increased cost of $186 per subscriber in order to recover costs. Examples of required cost increases are outlined in Table 8.4:

    Table 8.4: Required Publishing Cost Increases at Various Decreases in Circulation: 1995 Dollars
    Circulation Decrease Required Cost Increase
    2,500 to 2,400 $6
    2,000 to . 1,900 $8
    1,500 to . 1,400 $18
    1,000 to . 900 $41
    500 to . 400 $186

    Thus, the accelerated publishing cost increases can result in corresponding price increases and further decreases in circulation, leading to higher costs and in turn, by necessity, to spiraling prices. Since personal subscriptions are much more sensitive to price changes than library subscriptions, the spiraling effect was initially observed with personal subscriptions.

    Even with these reasons for the price increases of the past few decades, other factors must contribute as well. One explanation is that publishers have grown substantially in terms of the number of journals published. Some of this is due to publishers starting new journals and "splitting" journals into two or more when they increase in size, although the trend in recent decades has been to let them grow in size. Another factor has been growth through mergers. McCabe (this volume) provides evidence that such growth results in higher prices of journals due to market power. We believe that size of labor-intensive organizations such as publishers, tend to have relatively higher support costs as they grow in size. In our cost model for 2002, we estimated support costs to be about $198,000 or 29 percent of all costs. Others have speculated that commercial journal publishers are making an exorbitant profit by increasing prices, although this has yet to be proven for all commercial publishers. Furthermore, net revenue may also be positive for some society and other non-profit publishers. Case (this volume) emphasizes the importance of competition among publishers in order to minimize the potential for monopolization of the system.

    8.7 Factors That Affect Demand

    Clearly, demand for scientific journals is affected by price, but other factors affect demand as well. Scientists are willing to pay more for better journal attributes such as special electronic journal features, quality, speed of publishing, comprehensiveness and relevance of articles, and reputation of authors. In fact, studies in the 1970s suggest that such attributes were more important at that time than price. Our studies have shown that availability and relative cost of alternative sources of information determine to a large degree whether or not scientists and libraries will purchase journals. For scientists there are three types of alternative information sources. One alternative, discussed in Odlyzko (this volume), involves information from other research that has led to the research reported in an article or from near equivalent research done by others. A second alternative source exists because research results are often reported via a number of different channels, such as discussions, presentations, conference proceedings, technical reports, patents, and books, in addition to journal articles. A third alternative source involves the many distribution means and media in which journal articles are found. Alternative distribution means from which scientists can choose include personal subscriptions, library subscriptions, and separate copies of articles such as preprints, reprints, interlibrary loans and document delivery, and copies provided by colleagues, authors, and others. These distribution means can be in paper, electronic, or microform. The point is that numerous combinations of distribution means and media are used by scientists based on their assessment of availability and relative access costs.

    Sources of articles that are read have changed dramatically over the years as shown by the proportion of readings from three sources in Table 8.5:

    Table 8.5: Proportion of Readings of Articles by Source of Articles Read: U.S. 1977, 1993-1998, 2000-2003
    Proportion of Readings by Years of Observation
    Source of Article 1977 1993-1998 2000-2003
    Personal Subscriptions 68.4% 27.5% 31.7%
    Library-provided 14.7% 55.0% 52.7%
    Other 16.9% 17.6% 15.6%
    Total 100.0% 100.0% 100.0%
    NOTE: Source: Tenopir and King 2000, University of Tennessee, Oak Ridge Nat'l Lab., Drexel University and University of Pittsburgh

    Clearly, scientists are reading less from their personal subscriptions, which undoubtedly is due to their subscribing to fewer journals (5.8 per scientist in 1977 to 2.4 in 2000-2003). Library-provided articles have been the alternative source of choice. The proportion of readings from other sources (e.g., shared department collections, colleagues, and authors) has remained consistent over the years. Few of these readings are currently from author web sites or preprint archives.

    Our cost studies show that there is a break-even point in the amount of reading over which it is less costly to subscribe to a journal and below which going to the library or author source is less expensive. The break-even point, of course, is higher with higher prices. By knowing the distribution of sources among journals, we have determined the sensitivity of demand to personal subscription prices. We have also shown that scientists' time is an important component in the cost equations, and that scientists generally behave in an economically rational manner in deciding whether or not to purchase a journal. For example, distance to the library also affects the break-even point and the purchase of journals. As corroborating evidence, we have observed that

    • Scientists close to libraries purchase fewer personal subscriptions than those further away (e.g., 1.8 subscriptions per person for those less than ten minutes away versus 2.6 for those further away).

    • Scientists close to libraries and shared department print collections read more from these sources than from personal subscriptions (e.g., 91 percent of readings by those less than 5 minutes away; 65 percent for those 5 to 10 minutes away; 43 percent for those more than 10 minutes away).

    • Even with availability of electronic personal subscriptions, most scientists prefer to subscribe to print versions. This may be because, as we have observed, it takes them less time to browse current print journals than electronic versions. However, when library journals are available online, scientists prefer to browse these journals online because it saves nearly 15 minutes per reading by not having to go to the library to browse or obtain older articles.

    It is clear that the relative cost of alternative sources is important and that scientists' time is an essential component of cost that must be kept in mind. Now that scientists can obtain some copies of articles online, the choice is complicated somewhat. However, as will be discussed later, amount of reading from a journal and scientists' time both remain dominant factors in the decision.

    Libraries are faced with similar choices between purchasing (in paper or electronic media) or relying on obtaining separate copies of articles. The amount of reading of specific journals, their price, and the cost of obtaining separate copies are all important factors which should play a role in decision-making.[13] Over time, scientists pretty well know how much they will read a journal, but it is more difficult for libraries to establish the extent to which individual journals are used, particularly with electronic journals. With print versions, common practice is to ask library users to leave journal issues and bound volumes on the table to be counted when re-shelved (or to use circulation bar codes). A weakness in this method of observation is that use of an issue (or bound volume) may involve reading of several articles and all readings should be counted when deciding between purchase or obtaining separate copies of articles. However, reasonable adjustments can be made to the use data.

    8.8 What Are We Really Buying?

    We mentioned earlier that scientists consider journal attributes to be important in their decision-making process and that availability and relative costs of alternative sources of information are important as well. Another perspective is that scientists are buying two product components: (1) the information contents and their attributes and (2) combinations of distribution means and media. With traditional scientific scholarly journals (and articles) the information contents and attributes remain the same regardless of combination of distribution means and media used.[14] Furthermore, article processing cost required to provide the information contents is essentially the same regardless of distribution means and media. That is, regardless of where scientists obtain articles—from personal subscriptions in paper or electronic medium, library-provided articles in paper or electronic medium, or in separate copies from a database, colleague, or author—the article processing cost is about the same for all distribution alternatives. Thus, one can ignore the article of processing costs and focus on the costs of the alternative distribution means and media.

    First, just a note of clarification concerning the article processing costs. In the literature one finds widely varying estimates of these costs, say, from $400 per 20-page article (Harnad, quoted in Halliday and Oppenheim, this volume) to $8,000 per article in mathematics journals (Odlyzko, 1995). The lower estimates tend to be made by those publishing exclusively electronic journals and who are strong advocates for doing away with the paper medium. Yet, in a sense, making such cost comparison is a moot point because journals in which costs are as low as $400 per article could just as easily be distributed in paper issues at the additional cost of reproduction and distribution (i.e., about $40 to $50 per subscription). Thus, in order to at least breakeven the price that publishers charge would have to recover two components of their cost: (1) article processing to provide information content (i.e., anywhere between $400 to $8,000 per article) and (2) the cost of distribution means/media of the version preferred by users. Obviously, distribution cost by electronic media is negligible, whether through access by subscription or by separate copy of articles. Paper distribution of subscriptions tends to be in the $40 to $50 per subscription range and compared with paper distribution by interlibrary loan or document delivery which tends to be in the $15 to $30 per article range (see also Spinella, this volume).

    Thus, based on the added $40 to $50 per subscription of paper distribution, it might seem that electronic distribution would always be the alternative of choice. However, when amount of reading and costs to users other than the price paid are taken into account, the choices are not so clear. For example, most of the readings of current articles are identified through browsing for the purpose of keeping up with the literature. Assuming the $50 paper distribution cost and that a scientist reads 50 articles from a year's subscription, the distribution component of the price would cost the scientist only $1 per reading versus near zero cost for electronic access. Yet when the cost of scientists' time for browsing and equipment are included, it appears that the paper version costs less per reading or is very close to that of the electronic version. Other aspects, then, of the two versions could prevail in decision-making which may explain why scientists overwhelmingly choose print over electronic personal subscriptions, but electronic over print for use of library collections.

    Similar arguments can be made for library decisions concerning purchase of paper or electronic subscriptions or access to separate copies of articles. Here the unit cost per reading paper distribution can also be negligible because reading is in the hundreds for some journals. Thus, again, libraries can choose one or both versions depending on factors other than cost to them of the price paid and the cost of processing electronic or print issues.

    Of course, publishers do not distinguish between the information content and distribution components of price. However, Harnad and others have suggested that authors or their funders pay for the information content (i.e., article processing) and then journals would be "free," since articles would be distributed electronically (Halliday and Oppenheim, this volume). This suggestion ignores the potential desirability of the paper distribution medium that might be less expensive to some users and/or preferred for some other reason.

    The point is that there is some merit in distinguishing between the information content and distribution components of costs/prices. The article processing costs have remained relatively stable or perhaps decreased some over the years, and these costs are now recovered primarily by library budgets versus an earlier combination of lower library payment and payment by scientists through subscription albeit often from discretionary funds provided by their employers. This transfer of cost recovery from scientists to libraries resulted in publishers being publicly criticized or blamed for spiraling prices, libraries paying more for less information, and scientists paying more in the scarce resource of their time. Funders of the scientists and libraries are questioning the whole process, even though in fact they may be paying less in cost per reading considering all resources expended.

    One can make a strong argument for author funders paying the information content costs since they already pay for authors' time. The 2003 survey at the University of Pittsburgh yielded an estimate of 95 hours of scientists' time per article authored. Thus, their funders appear to pay far more than the cost to publishers in processing articles. At least two initiatives are trying this approach to publishing. The Public Library of Science proposes to charge $1,500 per article (not too different from our $1,660 article processing model cost above) and BioMedCentral proposes a $500 per article fee (with some institutional membership alternatives),

    However, for such initiatives to be widely accepted by all system participants, they must be convinced of the economic incentives involved. We believe that this can be achieved by understanding the flow of funds among the participants. For example, preliminary analysis of the flow of funds from sources (e.g., government, universities, industry, etc.) to organizational R&D performers and then to authors and readers suggests that author sources of funds come roughly from the following sources: industry (25%), government (33%), foundations (7%), and solely unversities (35%). However, readership sources of funds are not nearly equally allocated (i.e., solely university [20%]; universities funded through external sources [4%]; and solely non-university [76%]).

    There are other important aspects of the flow of funds as well. For example, where do publishing R&D funds come from—government, foundations, commercial investors, and so on? What is the international "balance of information" determined by authorship and reading? For example, the 2003 University of Pittsburgh survey shows that 9 percent of articles read by these scientists are authored by US scientists, 24 percent by non-US scientists and seven percent are collaboration by US and non-US scientists.

    8.9 Era of Site Licensing and Package Arrangements

    The previous sections have dealt largely with the traditional journal system and pricing policies. We have tried to describe the journal system environment and what led to spiraling journal prices in this environment. Recently, however with the growth of electronic publishing, publishers and libraries have taken a new approach to their participation in the system through site licenses involving multiple journal packages over an extended period of time, say up to five years. This is a form of economic bundling. The multiple journal packages are sometimes negotiated directly between a library and a publisher, but more often libraries have formed or used existing library consortia to negotiate arrangements between groups of libraries and publishers, or libraries have made arrangements through aggregators. Such arrangements have proven to be beneficial in many ways to both libraries and publishers, not the least of which is that libraries can plan their budgets more accurately (often with lower prices) and publishers can build a steadier revenue flow (King and Xu, 2003).

    An example is given below concerning the many sources of journals used by libraries in electronic journal acquisition. In 1998, the medium-sized W.W. Hagerty Library (Drexel University) had gone through a phase in which many of its high-price core journals had been cancelled and its acquisition was down to about 1,700 titles averaging a price of $120 per title. The new Library Dean, Carol Montgomery and the university administrators decided to migrate to a nearly all-electronic journal collection. In fact, by 2002 the Library acquired only 370 print journals and 8,600 unique electronic journal titles (Montgomery and King, 2002; Montgomery, this volume). The Library made several different arrangements that are categorized as follows:

    • Individual subscriptions. Almost always purchased from a subscription agent (e.g., Wiley titles, specialty design arts titles).

    • Publishers' packages. May or may not be a part of a consortium or from the publisher directly (e.g., Science Direct, Kluwer titles).

    • Aggregator journals. From vendors that provide access to different publishers' journals. The aggregators do not drop content, only add (so far). The collections started as full-text content and added searching (e.g., JSTOR, MUSE).

    • Full-text database journals. Provide access to electronic journals from different publishers but do not make title or issue level access available (except ProQuest). Examples are WilsonSelect and Lexis/Nexis. Titles are added or removed regularly according to the database vendor's contracts with publishers. They often have an embargo on current issues of six months or more. There is considerable overlap among the journals in these collections, and between the full-text database journals and the other two types.

    This example demonstrates the complexity resulting from site licensing and the various kinds of arrangements that can be made.

    These arrangements meant that there was an overlap in electronic titles acquired (e.g., 13,500 total titles, but only 8,600 unique titles in 2002). As a result, many acquired electronic journals are not used and some of the cancelled high price journals have very high use (for other observations see Davis, 2002; Nicholas and Huntington, 2003; Sanville, 2000). The price per title varied among the four types of arrangements made: $432 per title for individual electronic subscriptions; $134 per title for publishers' packages; $60 per title for aggregator journals; and $6 per title for full text database journals. However, the migration to electronic journals has affected library costs in many more ways than the price paid for the journals (King et al., 2003; Montgomery and King, 2002). The library operational costs and staffing patterns have shifted. For example, the electronic journal collection has required higher costs for collection development for negotiation, training of staff and users, reference support, and equipment and systems. On the other hand, print input processing and space costs are down, as are reshelving, photocopying, and directional reference costs. On balance, overall operational costs are less for the electronic journal collection than for the print collection.

    A particularly revealing way to examine the effects of an electronic journal collection is to compare the cost per use of the alternative collection services; that is, access to the electronic, current periodicals and bound volume collections. Drexel obtained publisher and vendor online use statistics and maintained its own server use counts by journal title. Drexel also observed reshelving counts for the current periodicals and bound volume collections. However, there are well documented flaws with such methods, and thus measured electronic use is not fully comparable to measured print use. We also obtained estimates of the amount of actual reading from user surveys that, while flawed as well, at least provide a common measure of use for the three access services (King and Montgomery, 2002). The costs per reading (including price paid and operations) are: $2.00 per reading of the electronic collection; $3.90 per reading of current periodicals; and $23.50 per reading of bound volumes. One particularly important cost is to users who may save as much as 24 hours per year per person by having external access to library journals.

    The Drexel situation is unique in that they migrated to nearly all-electronic collection. Most libraries are not doing this, but rather depend on some duplication of print and electronic collections due to concern of the viability of long term archives. The problem with large duplication is that electronic collection use dominates (i.e., over 80% of library use in Pittsburgh and Drexel). Thus, the cost per use increases substantially for print collections. In fact, if Drexel had also continued its core print collection it probably would cost about $7.80 per reading versus $3.10 under the strategy chosen by Drexel. While there are journals in the large electronic collection that are infrequently read, the overall subscription and processing costs of the electronic collection is less than the cost had Drexel continued its core print collection and not acquired an electronic collection.

    Thus, the impact of site licensing and multiple journal arrangements appear to be highly advantageous to libraries and their users. However, the long-term advantages to publishers are not as clear. Ultimately, as with single subscriptions, publishers must recover the high cost of processing articles and any other related activities. Generally, decline in reproduction and distribution costs of print journals have been counter-balanced with extensive computer and systems costs so that large costs still must be recovered. The question then becomes whether the many, varied license arrangements can produce sufficient revenue over time to cover these costs. While long-term licenses help reduce revenue volatility, there is no guarantee that the license policies provide the solution to the library and publisher problems.

    8.10 Some Alternative Pricing Policies

    One way in which the two cost/price component approach can be addressed is with site licenses. We have suggested one possible scheme to achieve this type of site license, as detailed below:

    • The license would cover the price paid for all journals provided to the organization by the publisher, regardless of whether the organization's library, department, or any employee subscribes to the journal.

    • The library and publisher would establish the current subscription cost of all print subscriptions to the publisher's journals in the organization.

    • The library would estimate the total readership in the organization of the currently purchased journals and estimate the subscription cost per reading (i.e., current revenue divided by total readings).

    • The first annual access cost would be this current total subscription amount.

    • Any electronic access to currently purchased print journals would be free. Electronic access to any other journals available from the publisher would be at the calculated cost per reading plus some allocated support costs.[15] Distribution of paper issues from any of the journals would be at the reproduction, distribution, and allocated support costs.[16]

    • During the first year, each access to the articles would be counted electronically and used as a basis for future charges on a cost per reading basis.[17]

    • The publisher must agree to ensure future access to all the journals covered by the term of the agreement, thus permitting the library to discard all relevant paper issues.[18]

    This type of site license provides advantages to every participant. While libraries and their constituents pay the same amount to publishers as they do now, they achieve considerable savings in input processing, storage and maintenance (e.g., approximately $90 per subscription for a large library and $125 for a small one). They also save an estimated $1.43 per reading by avoiding current reshelving, directional reference and photocopying costs which, for a frequently-read journal, can be as much as the subscription price. Libraries also save on interlibrary borrowing or document delivery costs from journals in the publisher's database that they did not purchase. Finally, the library has the option to retain certain current periodicals or department collections in paper. These savings exceed any advantages that might have been achieved from reduced electronic journal prices.

    Publishers have the advantage of retaining any cost savings they might obtain from electronic publishing, plus they receive additional revenue from distribution of electronic separates, either from their digital databases or royalties from document delivery services, that previously took place outside of their control.

    Readers benefit by having the choice of obtaining articles in paper or electronic versions, both at substantial savings in their time and to their parent organizations. In other words, by this kind of negotiation, publishers win, libraries win, readers win, and funding sources win.

    This kind of agreement, of course, may have downsides, but it is given to demonstrate the need to arrive at arrangements that can be beneficial to all participants in order to end the adverse effects of traditional pricing strategies.

    Another pricing approach is to extend current price differentiation to reflect potential readership by purchasers. Varian (1996) argues that small niche markets, which accurately describe most scholarly publishing, are generally not well served if the producer is required to charge a uniform, single price. As mentioned earlier, purchasers/users always have alternative sources available to them if cost per reading is too high. Thus, amount of reading serves as a useful means for identifying classes of purchasers for differentiation. In fact, negotiating "bundles" of journals can achieve this objective. Furthermore, electronic journals provide a useful vehicle for charging on a transaction or potential transaction basis.

    In another vein, Getz (1999) has suggested that readers be given personal debit accounts with libraries to access separate copies of articles. This would permit scientists to order separate copies from services depending on attributes of speed, image, quality, and accessibility that are provided at appropriate prices. This interesting notion, of course, can be extended to subscriptions in print or electronic media and other related services as well. Getz feels that such an account would end up serving users more effectively and relieve libraries of some clerical activities. The examples given involve academic libraries but are even more feasible in a special library environment.

    Several alternative approaches to distribution that will require careful pricing policies are presented by others in this book. For example Halliday and Oppenheim (this volume) discuss three alternative models: one that follows traditional print, without the reproduction and distribution of print; one suggested by Harnad in which authors bear the article processing costs by producing and archiving the articles and providing them free of charge on the web (although recently he advocates institutional archiving); and a free-market model suggested by Fishwich and colleagues. Hunter (this volume) and Gazzale and MacKie-Mason (this volume) explore results of the PEAK experimentation. Hunter presents some innovative approaches to pricing and their advantages and disadvantages. Gazzale and MacKie-Mason examine three access products, how they are used, and what they cost users. Case (this volume) discusses the SPARC initiative and argues its merits.

    All of these and other approaches warrant detailed examination, but one must keep in mind that the scholarly journal system has been successful because it has achieved certain minimal objectives, including

    • serving as a means of communication of new, peer-reviewed, and edited information. Thus, the information should be trustworthy and, to the degree possible, supported by other research findings;

    • being readily available to readers and accessible to an unlimited audience beyond the author's primary or immediate community;

    • providing permanent, locatable, and retrievable archives for the information, since many articles are read years after they are published;

    • continuing to provide alternative distribution means and media so that authors and readers can choose from alternatives that satisfy their specific needs and requirements, particularly to minimize their time and effort;

    • protecting against plagiarism, copyright ownership violation, and unauthorized modification or altering of the record of ideas, discoveries, and hypotheses tested;

    • properly conveying the concept of prestige and recognition for authors, their research, and their institutions.

    Any proposals for changes in pricing policies or other modifications in the scholarly journal system should take such desirable objectives into account. Then the system use, usefulness, and value will be maintained and future pricing can be an opportunity and not a threat.

    Notes

    1. Surveys involved national probability samples of scientists (1977, 1984), audiences of Science and the Journal of the National Cancer Institute, and samples of scientists in organizations such as the National Institutes of Health, AT&T Bell Labs, Oak Ridge National Lab, The Johns Hopkins University, University of Tennessee, Drexel University, University of Pittsburgh and American Astronomical Society members. There may be some bias in organization surveys because the organizations are self-selected.return to text

    2. Estimates of readership of articles by this survey method are in fact biased on the low side because they miss readings that take place after the survey responses, they do not include readings of separate copies of articles (over 100 million currently), and they miss other article distribution means.return to text

    3. All uncited data come from Tenopir and King (2000) or are new, unpublished results.return to text

    4. Of course, it may be that intelligent professionals read more and get more recognition for their work, but the latter for their intelligence, not necessarily because they read a lot. Regardless, it shows that this resource is important to them.return to text

    5. For example: King, D.W., D.D. McDonald, N.K. Roderer, and B. Wood. 1976. Statistical Indicators of Scientific and Technical Communication. (1960-1980): Vol. 1 A Summary Report. GPO 083-000-00295-3 and King, D.W. and N.K. Roderer. 1978. Systems Analysis of Scientific and Technical Communications in the U.S.: The Electronic Alternative to Communication Through Paper-Based Journals. NTIS: PB281-847.return to text

    6. The increase in total cost is largely attributable to an increase in estimated number of scientists who are active in research, teaching, and/or other endeavors that involve reading scholarly journals; i.e., 2.23 and 6.38 million scientists in 1975 and 1998 respectively. Estimates of number of scientists are inexact (see Science & Engineering Indicators-2000, p.3-3 to 3-5).return to text

    7. (e.g., Halliday and Oppenheim this volume, Holmes 1997, Marks 1995, and Shaw and Price 1998)return to text

    8. The cost model also included 20 fixed and variable cost parameters such as setup costs associated with each issue, cost per page of editing and proofing.return to text

    9. We use 1995 data in Table 8.2, 8.3 and 8.4 because we had better data on circulation in 1995. Also, introduction of licenses and negotiated packages of journals has diminished the meaning and count of circulation. In 2002 we estimate average circulation to be about 4,800 subscriptions.return to text

    10. The tracking process took into account births, deaths, and splitting of journals into two or more journals.return to text

    11. There is a small distortion in the 1975 average circulation in that calculation from the data gives 6,300 subscriptions per title, but the average calculated from the sampled journals was 6,100.return to text

    12. See Tenopir and King (2000) for detailed evidence of this phenomenon.return to text

    13. Detailed examples of economic break-even points are given for decisions with personal subscriptions vs. use of the library and library subscriptions vs. obtaining separate copies in Tenopir and King 2000.return to text

    14. Of course, there are some attributes achievable through technology, such as links to back and forward citations, searchable databases, numeric data sets, moving graphics, and so on (Halliday and Oppenheim, this volume; Tenopir et al., 2003).return to text

    15. Support costs vary greatly among publishers. Our average is 29% above direct article processing costs. Halliday and Oppenheim (this volume) present other amounts.return to text

    16. We have observed allocated support costs of about 15% on direct reproduction and distribution costs.return to text

    17. Of course, one must establish what constitutes a "reading" based on electronic use as pointed out in Odlyzko (this volume).return to text

    18. The question of archiving journal articles is a contentious one between libraries and publishers, but it must ultimately be resolved. Some are proposing institutional archiving (see, for example, Harnad's September Forum).return to text

    9. Economic Models of Digital-Only Journals

    We are exploring economic aspects of digital-only journals using Ithink Analyst, a modelling software package. We have produced three models and have used simulations to test model sensitivities. We will first describe some background to the models. We will then describe the software tool and how we used it. We will then describe each model in turn and, finally, describe our plans to further develop models of digital journal production and delivery.

    9.1 Background

    Much development of digital journals, especially digital parallels of print journals, has been conducted by commercial publishers. Their pricing models do nothing to address the serials crisis. More innovative pricing models have been developed by stakeholders from within the higher education (HE) community (e.g., Harnad and Hemus, 1997; Harnad, 1995b; Fishwick et al., 1998; Harnad, 1996). They seek an effective and affordable system for disseminating peer-reviewed scholarly articles. Their models often bypass commercial publishers; in other words, the journals are produced by the HE community. Proponents of these models claim that digital publishing can be significantly cheaper than print publication. They argue that as much as 70% of the total cost of journal production and distribution is incurred by printing and distributing print copy and that this is saved in a digital environment (Harnad, 1995; Duranceau, 1995; Harnad and Hemus, 1997; Harnad, 1996). This is contested by publishers who claim that the variable costs they claim, including print and distribution, account for only 20-30% of the total (Garson, 1996; Arjoon, 1999; Noll, 1993; Rowland et al., 1995; Fisher, 1997). Some of the difference between these positions is related to level of functionality that writers assume is necessary.

    Proponents of alternative models argue that many publisher functions are unnecessary. Their models are often based on production of unsophisticated text articles produced at significantly lower cost. This approach can be criticised for two reasons. First, journal users expect additional functionality (Elsevier Science, 1996; SuperJournal, 1999a,b). They anticipate that digital journals will allow them to work more efficiently. Users consider core features to include the ability to browse, search and print, good system performance, critical mass and currency, and the facility for seamless discovery and access (SuperJournal, 1999a; McKnight, 1997; Electronic Publishing Services Ltd , EPS Ltd; Armstrong and Lonsdale, 1998; Butterworth, 1998; Jenkins, 1997; Rowland et al., 1997; Fletcher, 1999; Prior, 1997; Rusch-Feja and Siebeky, 1999; Petersen Bishop, 1998; SuperJournal, 1999b). User acceptance is essential if digital journals are to succeed.

    The second criticism is that the elimination of some of the filtering and organisation that is traditionally done by publishers increases the work of librarians and end users. The net effect on the academic community may be increased cost. For these reasons, we did not study an end product consisting of unsophisticated text. Our models assume the core level of functionality that users demand. The development and inclusion of this enhanced functionality requires technical skill that is expensive. Publishers claim that the additional costs more than compensate for any savings from print and distribution. They argue that digital journals cost at least as much to produce and distribute as print journals.

    It is difficult to compare the cost of digital and print journal production and distribution. Publishers are reluctant to disclose costs. Even if they did so, it would be difficult to compare journal costs across companies because different accounting practices are employed. The publishing industry does not employ activity-based costing. There has been academic work on activity-based costing of print journals, notably that of Carol Tenopir and Don King (see Chapter 8 and Tenopir and King 2000). The costs associated with digital publication are, as yet, unknown. The activities involved in digital publishing have yet to stabilise, making it difficult to determine costs.

    We are building activity-based models so that we can develop a better understanding of the production and delivery of digital-only journals and of the different roles and costs involved in that process. These models also allow us to explore alternative cost-recovery and pricing mechanisms.

    To date, we have built and tested three models of digital-only journal production and delivery. These models were based on a review of the literature supplemented by personal communication with practitioners. The models were built as part of a project which evaluated economic models of a number of aspects of the digital library within a four-month period.[1] In 2000, Leah Halliday conducted interviews with several stakeholder groups and revised the models in line with the data that she collected. The results suggest that publication is most efficiently undertaken by professional publishers within an organisation that is dedicated to journal publishing (Halliday and Oppenheim, 2001a,b,2000b,a).

    9.2 The models

    We will now describe the three models that we have developed. Journal production and delivery is an international business, but these models were built from a UK perspective. Thus, for example, staff costs are based on UK figures and where value added tax is applicable in the UK it is applied at the rate of 17.5%. Where we quote figures, however, we have converted them to US$ at the exchange rate in February 2000.[2]

    We refer to the first model as "traditional". It models a process similar to that of print journals. This model is included for comparison with current practice but does not include production of print. In this traditional model, authors, referees and editors are unpaid. Editors receive from the publisher only a contribution towards editorial office costs. Production and delivery costs are recovered through sales of subscriptions and individual articles. The model differs from print production in that the entire editorial process is conducted electronically and the product is delivered to libraries in electronic form.

    The second model is of a non-commercial journal that is available for use free of charge on the Internet. This model is based on the work of Stevan Harnad (Harnad and Hemus, 1997; Harnad, 1995b,1996). His model is based on the premise that academics submitting papers to journals for publication seek to disseminate their findings widely and would contribute to costs to facilitate widespread dissemination. In a print environment, it was necessary to accept access restrictions because print publication is expensive and publishers had to recoup their costs. In a digital environment, Harnad argued, costs can be reduced by as much as 70%, bringing them to a level that can be recovered from authors rather than subscribers. Harnad proposed that authors pay page charges and that journals be available to all users free of charge on the Internet. He suggested that the author fee should be around $400 for a 20-page article. Recovering costs from authors would actually contribute to cost reduction as subscription administration would be unnecessary.

    The third model is a free-market model. It is based on a supporting study commissioned by the UK Electronic Libraries Programme (eLib) and conducted by Fishwick et al. (1998). Fishwick et al. compared a number of different models for pricing electronic scholarly journal articles. Their report suggested that the current academic information delivery chain is inefficient due to a number of distortions in the supply-demand chain. Among these are that: (1) authors represent a principal source of demand for publication but make no contribution to publication costs; (2) those consuming the information, i.e. the readers, seldom pay for it, preferring instead to obtain it from libraries; and (3) much of the journal publication work is undertaken by editors and referees without payment, or with minimal honoraria.

    Fishwick et al. proposed an alternative model which introduced `normal' market feedback mechanisms into the academic information delivery chain with a view to developing an efficient market for scholarly articles. Publication would be funded by a combination of author submission fee and by sales of subscriptions and/or individual articles. Thus, both authors and users would contribute to costs, reflecting the fact that both contribute to demand. Editors and referees would be paid to encourage efficiency, and authors would receive royalties. Fishwick et al. argued that if authors paid to have their work published and received royalties based on the number of copies sold, they would submit for publication only their best work. Rather than publishing as many papers in `minimum publishable units' to maximise their perceived research output, they would concentrate their best work in fewer, high-quality papers. to encourage them to submit for publication only material of the highest quality. The system includes a mechanism to support authors who cannot afford to pay a submission fee. The editorial office would apply to charitable foundations to fund these papers. Papers would then be available individually or in customised bundles from the publisher database.

    Fishwick et al. also suggested that the facility to print from digital journals be rationed even when the library obtains a journal or database of articles by site license and thus, has paid in advance for unlimited access by end users. They argued that this would force end users to identify and select only journal articles that they genuinely read rather than filtering after printing. This would generate usage data for librarians (and possibly publishers) that would reflect real need, argued Fishwick et al..

    This recommendation suggests that end-users currently waste resources by gathering information that they do not need. Given that researchers' time is scarce, this seems unlikely. Rather than making the system more efficient, rationing might prejudice researchers' ability to do their jobs. This is a potential practical problem. There are also cultural barriers to the market model. It is important to some academics in their roles as authors, editors and referees, that scholarly publishing operate independently of market forces. They believe that direct financial remuneration introduces motives that have no place in the system (L. Halliday, unpublished data 2001).

    All three of our models represent the full publication cycle from receipt of manuscripts by the editor to delivery to end users. The resources required to produce and deliver journals are similar in each model. Staff costs are most significant. All of the models include two half-time staff responsible for production and systems. In the market model, where editors and referees are paid, the total financial cost is substantially higher than in the other models. We included an overhead on staff costs which represents, for example, buildings and support such as personnel and training, i.e. resources that are not related directly to products such as journals. We pitched the overhead rate at 120%. This reflects true costs in a large organisation such as a university. As these alternative models are proposed as HE-based operations, we think it realistic that they be costed as if housed in universities. It is important to recognise that just because work is undertaken without charge does not mean that it is cost-free. In economic terms, production that distracts an academic from her/his core tasks, i.e. research and teaching, may be more expensive than production that is undertaken by someone with the required skills who is dedicated to journal production. Nevertheless, we recognise that it may be possible to produce journals in a leaner organisation so we applied the overhead at 60% and re-ran model simulations for comparison. We also varied the surplus applied from zero to 20% in two of the models. We assumed that some surplus would be required for development of the journal. The free-access model is a much leaner model and does not include a surplus. Development would have to be funded through grants or other sources of funding.

    9.3 Modelling software and simulations

    The software package we used is called Ithink Analyst.[3] Four key element types are used to build Ithink models.

    StockStock

    A stock represents an accumulation. The items accumulate by flowing into and/or out of the stock (see description of the `flow' below). The total content amounts to the inflow minus the outflow at each time period in a model simulation. In many of the stocks represented in our models the inflow and outflow are equal. For example, a journal editor receives a number of manuscripts every year. Of those, he or she rejects a very small percentage and the remainder are sent for peer review. The same number of manuscripts enter and leave the editorial office.

    FlowFlow

    A flow either fills or drains a stock in the direction of the flow arrow. A cloud at either end of a flow indicates an infinite source of or destination of the material flowing to or from a stock. Basically this indicates that the source of material passing through the flow is beyond the scope of the model.

    ConverterConverter

    A converter informs other elements in the model. It may contain a constant value, e.g. tax at 17.5%; an incremental value, e.g. 1 in year 1 and rises by 1 in each subsequent year; a variable which can be manipulated by a model user; or an algebraic relationship between different elements in the model.

    ConnectorConnector

    A connector is like a wire which transmits information between elements in a model, e.g. in Figure 9.1 the flow labelled `xfer to ref' represents the number of manuscripts that are sent to referees to be reviewed. The value of this flow is determined by the number of manuscripts received by the editor (MS received), and the number rejected immediately, e.g. because the subject is unsuitable. The value of the converters is conveyed to the flow by connectors.

    Figure 9.1: A chunk of model consisting of one stock, three flows and two connectors.Figure 9.1: A chunk of model consisting of one stock, three flows and two connectors.

    Each of our models consists of four interconnected sectors: content origination, publication, information brokerage, and the library function. The models all simulate production of a small journal which publishes 120 10-page papers per annum. We used Ithink to represent graphically the interrelationships that characterise each system. We then defined numerically each element in the model. Some of these definitions are equations which describe the relationship between two or more elements in the model. The bases of the equations and the assumptions in each model element are described within the model in element `documents.'[4] These can be viewed by a model user. The models are designed to be used rather than viewed. Although we deliberately kept them as simple as possible, the systems modelled are fairly complex. Pictures of whole models cannot be captured in a page.

    Figure 9.2 is an example of the publisher section from the market model. It includes two stocks: "publication" and "publ budget". "Publication" is the accumulation of articles published in the journal. The flow from the "origination" sector into this stock is not shown. "Publ budget" is the publisher's budget. Costs ("publ spend") and profit ("publ profit") flow out of it and revenues flow in. The flows representing revenues from other sectors are not shown in Figure 9.2. Converters are used to calculate various values in the model. For example, total publishing costs are calculated with reference to publication costs, editorial costs and the overhead applied to those. The total publication costs informs the authors' contribution, i.e., the value of the author fee. Converters with rectangular buttons in them (e.g. "overhead" and "profit margin") represent those whose values are determined by the model user.

    Figure 9.2: The publisher sector of the `Market model'Figure 9.2: The publisher sector of the `Market model'

    9.4 Results

    We varied the value of elements in each of the three models and ran a series of simulations to establish the costs and benefits for different stakeholders in manipulating elements in this way and also to identify model sensitivities. As is evident from Figure 9.2, which shows only one quarter of a model, each model has a large number of elements that could be varied. The time period of the project severely limited the number of simulations that we were able to run.[5] We will now report the results of some of those simulations.

    9.5 Traditional model

    First, we ran a series of simulations to determine the subscription price of a traditional-model journal if the following elements were varied: the overhead rate, the profit margin, and the size of the subscription base. We display the results in Table 9.1.

    Table 9.1: Traditional electronic journal model simulations to determine subscription fee if overhead rate, profit margin and subscription base size are varied.
    Overhead rate 120% 60%
    Profit margin 0% 10% 20% 0% 10% 20%
    No. of subscribers Subscription fee ($)
    200 1,062 1,167 1,274 772 849 927
    500 425 467 510 308 340 371
    1,000 212 233 255 154 170 186
    2,000 105 116 127 77 85 93
    20,000 11 11 13 8 8 9

    It is clear from these figures that a journal making a modest profit and recovering full costs can be supplied to users for a modest fee as long as the subscription base consists of at least 500 subscribers. This gives an idea of how inexpensive journals can be without adopting an alternative cost-recovery model. We acknowledge, however, that the journal modelled is slightly smaller than the average scientific journal. Our modelled journal publishes 1,200 article pages per annum whereas an average journal publishes 1,434 article pages per annum (Tenopir and King, 2000, p.237). The effect of this is likely to be negligible.

    9.6 Free-access model

    We also ran simulations to determine the level of author fees that would be required to fund the free-access model. We present the results in Table 9.2. The fee varies depending on the overhead and profit margin. These fees were submission fees, i.e. they are based on the assumption that all authors whose papers are refereed contribute to costs. It has been argued that all authors should contribute to journal costs as some costs are related to administration and refereeing of papers regardless of whether they are accepted. It may be unrealistic, however to expect UK authors whose papers are rejected to contribute. Some journals in the USA charge submission fees which are not returnable but UK authors are less willing to pay fees for submission or publication (L. Halliday, unpublished data, 2001).

    Table 9.2: Harnad electronic journal model simulations to determine author submission fee if overhead rate, and journal rejection rate are varied.
    Overhead rate 30% 60% 120%
    Rejection rate 10% 90% 10% 90% 10% 90%
    Submission fee ($) 816 58 1005 112 1383 154
    Per Page ($) 81 6 101 11 138 15

    Harnad suggested that fees of tens of dollars a page rather than hundreds of dollars a page would be acceptable and estimated that it would cost approximately $400 to produce a 20-page article. This gives a page charge of $20 which is insufficient to support our model. That is not surprising considering that ours is a model involving the employment of paid professionals to produce a journal with what we consider to include core functionality. Harnad suggested that professional publishing staff are unnecessary. His model and relied largely on unpaid contributions. Nevertheless, the fees generated by our model fall within a range that some authors consider acceptable. Acceptance of the free-access model requires authors to take a system-wide view of the costs and benefits of scholarly publishing as it affects the whole organisation including the library (see Tenopir and King 2000). The main barrier to implementation of this model is cultural: that is, getting authors to accept the principle of page charges. Few journals have tested this model. One example is the New Journal of Physics published by the Institute of Physics Publishing . Authors pay $500 per accepted paper. Submissions to this journal have been slow, but this is the case for any new journal. Authors' reluctance to publish in NJP may be related to concerns about digital publication per se rather than to the pricing model.

    9.7 The market model

    Clearly, the financial cost of producing a market-model journal is high because editors and referees are paid and authors receive royalties on their papers. Again, we will report on subscription fees and author fees. Fishwick et al. suggested that published papers be sold to users either by subscription to the publisher's whole list, by subscription to specific parts (e.g., within a specific subject area), by a two-part tariff which consists of a reduced subscription price combined with reduced transaction cost per individual article, or simply on a pay-per-use basis. We were unable to explore the likely proportion of subscriptions to sales of individual articles but we did consider the effect of sales of individual articles on author royalties. The author fee pays for editorial and refereeing work and contributes 10% of production costs. The author receives a royalty of 5% on subscriptions income and sales of articles. The administration of royalty fees adds to costs in this model as do additional tasks associated with unfunded papers — Fishwick et al. suggested that the editorial office should seek funding for these from appropriate charitable foundations. In the model, this administration is undertaken by a half-time secretary, who, we estimated, would be capable of processing 600 manuscripts per annum (a journal with an 80% rejection rate that publishes 120 papers per annum would process 600). Table 9.3 presents the submission if the overhead rate and rejection rate are varied.

    Table 9.3: Fishwick electronic journal model simulations to determine author submission fee if overhead rate, and journal rejection rate are varied.
    Overhead rate 60% 120%
    Rejection rate 10% 90% 10% 90%
    Submission fee ($) 651 562 703 579
    Per Page ($) 65 56 70 58

    Obviously, the rejection rate has little impact on submission fees in this model because author fees contributed to only 10% of production costs. The fee is collected primarily to pay editors and referees who are unpaid in other models. The author pays for the peer-review function while subscribers pay for publication of journal articles.

    In this model, we varied the value of the following elements to determine the effect on subscription price: rate of overhead, profit margin, and size of subscription base. The results, reported in Table 9.4, show that the subscription price of a market-model journal is generally 10-12% less than that for the traditional-model journal and the latter does not include a submission fee.

    Table 9.4: Fishwick electronic journal model simulations to determine subscription price if overhead rate, profit margin and number of subscribers are varied.
    Overhead rate 120% 60%
    Profit margin 0% 10% 20% 0% 10% 20%
    No. of subscribers Subcription fee ($)
    200 955 1,051 1,147 695 765 834
    500 382 420 458 278 305 333
    1,000 190 211 230 138 153 167
    2,000 96 105 115 69 77 83
    20,000 9 11 11 6 8 8

    Royalty income is related to the sale of subscriptions and individual articles. The royalty is included in the market model as an incentive to publish only high-quality material. The royalty rate is related to journal income. Income is static as any increase in subscriber numbers is used to reduce the price of subscriptions and articles. Thus, author royalties increase only in relation to those of other authors published in the same journal, i.e. a relatively popular paper will generate more income for its author than one that is not frequently read.

    Finally, we revised the traditional model to explore the figures generated if both authors and subscribers contributed to costs. This would effectively distribute costs across two groups both of which contribute to demand. The subscription fees generated by the traditional model are modest without author contributions. Author fees reduce them further. However, administration of both sets of fees would add to costs. It is often argued that authors and end users are drawn from the same group so the distinction is not necessary. This is not entirely true, however; many journal readers never write papers. Readers from industrial, professional and clinical settings often are not part of the academic research community. Thus, journals funded only by author fees would subsidise these users. The question to be asked is whether or not this matters as long as scholarly publication is as efficient as possible for the academic community.

    9.8 Discussion

    These models are first drafts. They contain flaws and omissions, some of which we have discovered although some may remain to be discovered. One example is that we were unable to separate subscriptions administration and maintenance from other publisher costs. We would have liked to represent costs associated with subscriptions by calculating part of the overhead as a percentage of sales income. This would reflect the fact that costs vary with the number of subscriptions. However, calculating the overhead in that way would have required a circular connection between model elements which is prohibited by the software package. It is important that we isolate subscriptions-related costs because they are eliminated when costs are recovered from authors. A fair comparison between models that recover costs only from authors and those charging subscription fees is impossible unless we can do so.

    During 2001, L. Halliday built two models based on data from interviews with established commercial and learned society publishers, and with alternative publishers who publish from within universities. Subscriptions administration costs are isolated in these models.

    Another important factor is the staffing level required to produce a digital journal. The models documented here were criticised as overstaffed. Halliday's work work during 2001, however, suggested that the models described here are understaffed. All of the activities associated with publishing the journal including production, marketing, and development are undertaken by these staff. Interview data suggest that a journal publishing 120 papers per annum on this basis would require two full-time employees. As staff costs and the overheads on them are the most substantial costs, alteration to staff levels would impact significantly on total costs.

    Despite their flaws, these models have been useful for developing our understanding of the digital-journal production and delivery process, and for eliciting feedback. The models allowed us to explore journal publishing and elicited feedback that informed the design of a project, conducted during 2000 and 2001, during which Halliday built models that break journal publishing costs into discrete functions. Incurred by libraries, the costs associated with providing end users with access to digital journals were not modelled, as none of the librarians interviewed had a clear idea of the activities involved let alone the costs of those activities. The model building and simulation was supplemented with qualitative exploration of digital journal publishing and use. Many of the barriers to implementing `alternative' models or to the success of digital-only journals are cultural in getting authors to accept new charging mechanisms. Full details of this work have yet to be published.

    Notes

    1. The report from this project is available at the following URL: http://www.ukoln.ac.uk/services/elib/papers/supporting/#ukoln.return to text

    2. $1.60=£1.return to text

    3. Information about Ithink can be found at the following URL: http://www.iseesystems.com/Softwares/Business/ithinkSoftware.aspx .return to text

    4. They are also documented in the report of our project which is available at the following URL: http://www.ukoln.ac.uk/services/elib/papers/supporting/#ukoln.return to text

    5. Copies of the complete models are available to anyone who would like to manipulate them. They can be opened and simulations run using a free runtime version of the Ithink software which is available from the Ithink Web site.return to text

    10. Electronic Publishing Models and The Pricing Challenge[†]

    How will the Internet change scholarly publishing, and how should it? Will print publishing become obsolete, or only be supplemented by online, searchable articles? Will the Internet somehow lead to a whole new system where raw `self-published' material is commented on widely by an expert community, supplanting the traditional notion of peer review by only a few, often anonymous, experts?

    Amid these long-term, philosophical questions about the very nature and purpose of publishing, there is more immediate interest in understanding the economics of online publishing. Publishers are under great pressure to supply new capabilities available through online publishing, and to develop business models to ensure the future viability of scientific publishing in the new medium. Those who fall behind, or so it is feared, may ultimately fail as publishers, or face a kind of publishing oblivion as more and more readers rely on the Internet for locating information resources.

    At the same time, librarians—-facing the ongoing explosion of new information sources, and consequent rising costs—-hope that Internet publishing will somehow lead to radical declines in publication prices. Lower prices are sought partly as a response to institutional budget pressures, and partly out of frustration with the perceived unreasonable pricing practices of some publishers. But a longer term and more important pressure exists as well: amidst the explosion of new information products, librarians are properly searching for a means to ensure that libraries continue to fulfill their traditional role providing broad access to comprehensive collections of information.

    In this chapter, I will describe the current state of the transition underway in scholarly publishing from the publisher's point of view. I will also review a number of the electronic publishing models currently in use, focusing on Science and contrasting it with the typical scholarly publication. Finally, I will discuss the challenges faced by publishers in setting online prices, noting the critical importance of sales volume as a driver of online prices.

    10.1 Scholarly Publishing in Transition

    Scholarly publishing used to be a quiet, respectable backwater of the much larger, and more volatile, publishing industry. Where many consumer publications rise each year and fail quickly, scientific journals seem to prosper over time, once they overcome the considerable entry barriers. Cross-platform searchability and other features of the Internet are ideal tools for improving the utility of scholarly journals. This characteristic has cast scholarly publishers on the leading edge of the publishing industry, and has forced them to explore both the technical and business aspects of online publishing. Many scholarly publications are managed by non-profit associations, or by relatively conservative commercial publishers, who have little experience in managing such a complex transition.

    Besides the reader benefits, publishers are attracted to Internet publishing for a number of reasons. One key attraction is the promise of wider readership and recognition for the publication. Publishers see an opportunity for brand preservation and extension in this new medium which has rapidly become widely available. These opportunities for expanded readership and branding, of course, spell a possible business opportunity in selling subscriptions and advertising online. Some see new revenue possibilities that never existed in print, such as the chance for widespread pay-per-article, an extended economic life for older content, and e-commerce in conjunction with advertisers or other new partners. Finally, many publishers also view their online presence in a defensive mode: bringing the traditional printed work online may be necessary in order to maintain the interest of current readers and protect the revenue base that already exists in print.

    The Internet is regarded as a 'disruptive' technology, one that changes the service norms and economics of publishing (and other industries) in unpredictable ways. Therefore, online publishing is seen as a threat. To understand the current milieu of scholarly publishing, it may be useful to consider how online publishing has changed four of the forces that act on publishers: competitors, suppliers, buyers, and the market environment.

    The Internet reduces fundamentally the cost of publication distribution. Thus, it lowers barriers to entry, inviting many new players into the field, along with the traditional, known competitors. Start-ups that seek to exploit a new technology are generally more nimble than traditional players, partly because they have less existing revenue at risk, and tend to eschew ingrained ideas about the new business.

    Furthermore, a new technology with uncertain parameters leaves incumbent competitors unsure of how to manage the new situation. As leading companies attempt different models for creating a viable business, they send confusing or unreadable signals into the market. Competitors do not know how to assess these moves, or how to respond to them.

    Suppliers to the scholarly publishing industry are also undergoing realignment as the Internet comes into wide use. Key suppliers for scientific publishers are the researchers themselves, who submit papers describing their latest findings for review and publication. As authors find they have more outlets for their work and discover new ways to communicate findings to their colleagues, they are empowered in their dealings with publishers. Though they still seek the imprimatur of independent refereed journals, they have more leverage to demand services and concessions. Technology workers have become similarly empowered as publishers (and most other industries as well) increasingly depend on their capabilities. These influences will tend to increase costs as publishers must compete for the raw materials and resources of their business.

    On the flip side, suppliers on the distribution side of publishing have lost ground. Postal services and other distribution outlets (such as international carriers) face significant threats if many publications ultimately choose to publish online only. But this phenomenon doesn't benefit publishers by bringing them cost savings unless they can entirely abandon the distribution chain. In the short run costs will, perversely, tend to rise. Since distribution costs are to some extent volume based, publishers may face rising delivery costs per unit as their volumes drop. The same problem applies to the printers of scholarly publications. Savings are only available to the publisher by abandoning the medium altogether. Reductions in print volume will only tend to increase costs per unit.

    Demand is another force operating on publishers. Publishers are affected by both buyers and readers, who for scholarly journals are often not the same persons. They include individual subscribers and the libraries that support many scholarly publications. Consumers of information gain from the Internet by obtaining much better access to alternative information sources. As competition increases among information sources trying to bring value online, consumers' expectations will tend to increase for quality benefits and features from the publishers they support. Meanwhile, the sharing and copying of digitized information is far easier and less expensive for users than any previous copying method or device. This will tend to lower demand for subscriptions, licensing or pay-per-view services through which publishers may have expected to generate revenue. In addition, the interactivity enabled by chat groups and listservs gives consumers access to more accurate pricing and term information, which in turn makes them better negotiators for the services of publishers. Publishers will find that, with the easy access provided through convenient and inexpensive e-mail, their customer service costs will rise.

    The general market environment for publishers is also undergoing transitions with unforeseeable consequences. Electronic distribution of information and the ability of users to duplicate and redistribute materials at very low cost hold unclear legal implications for copyright enforceability. In any case, assuming a constant level of enforcement, we should expect that lower copying costs will result in more copying, even if it is illegal. This would tend to suggest that publisher revenues are more vulnerable. Meanwhile, increased consumer awareness of privacy issues places pressures and restrictions on publishers' use of the data they gather about their readers, and has a dampening effect on marketing activities.

    At the same time, some aspects of the market environment may be changing in favor of publishers. Some book publishers, for example the National Academy Press, are said to have seen an increase in sales as a result of making their content available for free on the Internet. This seemingly paradoxical result may be due to the general preference for reading things in print, combined with the wider resource location possibilities presented by the Internet. The availability of the products online for free may have served as an effective marketing tool. Where traditional marketing may have been too expensive or inaccessible to the small non-profit publisher, online the credibility of the publisher and the ability of the reader to `sample' the product by reading a chapter or two may have stimulated sales.

    Might similar effects apply to scholarly periodicals? The remarkable usage figures of JSTOR articles reported by Guthrie (this volume) suggest that articles may enjoy a longer shelf-life by being digitized and made easily searchable online as part of a collection of refereed works. Nevertheless, it is not clear how or even whether increased usage of archived articles might lead to enhanced sales for the periodicals themselves.

    Amidst downward pricing pressures from their buyers, publishers face increasing demands for sophisticated online services, which adds to cost. At the same time, it appears that a `centralization' of buyers is occurring. Even though a publisher may locate more readers online, it is less clear that these readers will become paying subscribers. Library services have, in effect, broken out of the library, as site-wide subscriptions bring the information to users' desktops. This is, of course, a good thing for the availability of information, but it does not necessarily lead to lower prices, since it may discourage personal subscriptions.

    One reaction of publishers to all these influences and conflicting goals has been to develop a proliferation of access and pricing models in search of viable solutions to the business challenges they face. Next, I will examine some of the main access systems and pricing models in place among scholarly publishers, with an emphasis on the models in use by Science Online.

    10.2 Access and Pricing Models

    A variety of revenue strategies exist in periodical publishing. Controlled-circulation magazines rely almost exclusively on advertisers to pay the cost of producing a journal for a targeted market. The key to success in this type of publishing is to achieve very high coverage of a targeted market that also is a focal point for advertisers. This is an uncommon strategy for peer-reviewed journals however. The scholarly publishing industry primarily relies either on library sales or on personal subscriptions or memberships in a non-profit society for the ongoing revenue to produce the journal. Some rely on a mixture of these circulation revenue streams, along with advertising sales.

    In these early days, it is not surprising that online strategies tend to reflect the print strategy of the publisher. Table 10.1 provides a summary of the main print business models, and their online corollaries.

    Table 10.1: Scholarly Publishing Business Models and Online Corollaries
    Print Model Revenue Stream Market Features Online Corollary
    Controlled circulation Advertisers or external funds Requires high market coverage; Seldom used for refereed journals Free, ad-supported site; or free with print and registration
    Personal or Member subscriptions Individuals Refereed; tends to large audience; may be more general Free or small fee with print; may allow online only
    Institutional subscriptions Libraries Small, specialist audience; tend to higher prices Site-wide for fee or free with print
    Mix Members and libraries (+ads?) Complex interplay of markets Unsettled, but usually fee + print required

    Controlled magazines will tend to put their contents online for free to users, attempting to attract advertising revenue to the site. Those journals relying on library subscriptions will most likely develop a library site-wide access model, while those relying on personal subscriptions or memberships may treat the online product as an added-value benefit of the print subscription. Mixed revenue models may include a mixture of advertising, individual subscriptions, and library subscription, not to mention licensing and authors fees, to provide revenue. Of course, nearly every magazine may have some mixture of these various types of revenue, but what is important in devising an online strategy is to fully understand which revenue stream or streams are the main drivers of the business.

    Science receives some revenue from every one of the sources named above, and so is a rather complex case. However, there is no question that the economic drivers of the journal are membership dues and advertising. Though subscription sales and membership dues represent less direct revenue than advertising, they may nevertheless be seen as the underlying driver for the publication, since advertising sales are also premised on the journal's relatively large circulation. Library subscription sales represent a significant third revenue stream. Though the smallest revenue stream of the three, they represent a critical part of the mix. If there were no library sales, personal subscription rates would be appreciably higher. The corollary is that if there were no personal subscription sales, library rates would be significantly higher. The other revenue sources named, such as licensing, are modest in comparison. There is little reason to think the journal could be sustained on these revenue sources alone.

    I have jumped directly to the complex case of Science, but it is worth mentioning the other paid subscription models. Essentially, there are two: library driven, and membership or personal-subscription driven. A rule of thumb, for which there may be many exceptions, is that if the circulation of a magazine is under 5,000, its underlying economics are probably library-subscription driven. Many scholarly journals fall into this category, even those published by associations. Of course, the economic underpinnings of a given journal may be as much a reflection of publisher's choices as of market conditions. Some journals may have low circulation because they serve a small, highly specialized audience and are only sustainable through library subscriptions. Others may have restricted their personal subscription support by setting prices at too high a level, or by deliberately choosing to focus on the library market.

    What are the online corollaries to these print subscription models? Many scholarly journals, whether commercial or non-profit, are struggling with this question, and many different experiments with access types are being conducted. Among the most widely in use are

    • free public access after an embargo period,

    • free personal online access with membership or paid print subscription,

    • institutional site-wide access free with print subscription,

    • institutional site-wide subscription,

    • institutional access by subscription and restricted in some way, such as via

      • embargo on when the content becomes available online,

      • incomplete content,

      • limited geographic access, for instance to a library or a portion of campus or a single `site' (which may be a building, a campus, or a city),

      • limited virtual access to certain workstations or a subnet.

    The online strategy a given publisher will pursue is closely related to the publisher's view of the likely interaction between print and online publishing in the short run. Since there are many difficulties and unknowns with the revenue model for online access, publishers will often hedge their bets by pursuing what may be thought of as a forced print model. In these models, online access may or may not be charged, but is conditioned on the retention of a print subscription. In most cases, the online product will be treated as a supplement to the print, and charged (if at all) as an ancillary service. Though this is a conservative approach, the forced print model cannot be written off as merely a reactionary and futile attempt to preserve print. Science has substantial user feedback suggesting print is still highly valued among readers. Online values such as immediacy and searchability are highly desirable as complements to the print, but not as substitutes. Offering the two media together currently seems to be the best way for many journals to provide the best of both worlds and, coincidentally, to protect the principal revenue streams that emanate from the print product.

    At the other extreme, some publishers will seek to capture the cost-saving potential of online publishing and, perhaps, to steal a march on competitors in the transition to electronic-only publishing. This approach will be particularly appealing to start-ups, although it is certainly not unheard of among traditional publishers. The strategy is reflected in business models that encourage the buyer to purchase online access only, and provide pricing incentives for doing so. In the case of many start-ups, the strategy could be called a forced online strategy, i.e., there is no print product at all. Other publishers will continue to provide print in response to reader demand, while setting discounted prices for online only, usually at 80% to 90% of the print price, to give buyers incentives to make the switch. Forced online strategies are relatively risky because of the many unknowns about user acceptance, ability to generate revenue from subscriptions or advertising, and sustainability of the system at reasonable costs. But they do make sense for smaller circulation publishers, especially of high frequency or high page-count journals, where substantial savings can be gained by pushing toward online delivery.

    Forced print models attempt to preserve a journal's established revenue base of print subscriptions by offering online access as a free or low-priced added-value service. Generally, this will be a less risky approach than a forced online model. However, forced print models carry their own set of risks, and can be difficult to administer if the publication relies on both personal and institutional subscriptions. This is because institutional site-wide online subscriptions impinge on the personal subscription market, both online and in print, far more severely than institutional print subscriptions do. In print, accessibility to library copies is limited (one-at-a-time usage) and inconvenient (the reader must go to the library), so most frequent readers and many occasional readers will be strongly motivated to acquire personal subscriptions to the journals they find most important or useful. With the advent of site-wide access, however, the contents of journals are far more readily available to all users, thus presenting a temptation, especially among the marginal readers, to forego personal subscriptions. With site-wide subscriptions, there is no ability to reserve online access exclusively for paying individual subscribers. Further complications arise if an advertising revenue stream in print needs to be either preserved or migrated to online. Responses to this situation are the most complex and wide-ranging, in part because no one knows which will be most effective. Thus, many publishers are pursuing a variety of access models simultaneously. Table 10.2 shows the main access models currently in use by Science Online. Note that some of the access models provide only partial content in order to approach certain market segments, or achieve different business goals.

    Table 10.2: Science Online Access Models
    Access Model Target Audience Business Goal
    Personal access
    Free Samples/Searching All potential users Attract prospects
    Abstracts with registration Moderate user Readership for advertising; attract prospects subscription
    Pay-per-View Infrequent user Attract prospects subscription
    Full text access with fee Members only Subscription revenue; readership for advertising
    Institutional access
    Workstation access Libraries, mainly public or high school, or colleges with minimal science focus Economy access for broad range of primary ed. institutions
    Site-wide full text access Universities/Research Institutes/Corporations Subscription revenue; readership for advertising
    Consortial access Universities, 2-year, and HE institutions Expand site-wide subscription market; readership for adverstising
    Licensed content with embargoes or usage limits Library segments with specialized needs Ancillary revenue

    If the proliferation of access models appears confusing, the price structures in use for these many different models are all the more so. Some principles from print subscription pricing do seem to carry over into the online world so far, although not always with the same results.

    There are several regularities in traditional scholarly journal pricing. In general, institutional subscription prices are well above the price for personal subscriptions. Higher circulation journals tend to have relatively lower prices than small circulation specialty journals. And lastly, the narrower titles serving very small populations tend to rely on library sales much more heavily than personal subscription sales. All these rough principles seem to hold true, at least so far, in online pricing. However, as we shall see, publishers face a series of perplexing problems and risks in setting online subscription prices. These issues are far from settled at this time.

    10.3 Pricing Challenges

    It is widely understood that the Internet presents an opportunity for substantial decreases in the costs of scholarly publishing. Because paper, printing and postage—the principle variable manufacturing costs of publishing—are quite substantial for nearly all publications, there is an opportunity for both publishers and buyers to capture some cost savings through online delivery. But there is also a good deal of misunderstanding about the economics underlying print publication costs and pricing.

    There is more to publishing than covering these variable manufacturing costs. In accounting terms, any price must cover variable costs, fixed costs and margin. When a journal has other sources of revenue than subscription sales, of course, the costs may be spread out over different sources; thus, the subscription price will reflect a contribution to the total fixed and variable costs, but not necessarily full coverage. Even so, many scholarly journals are largely dependent on a single revenue stream for their existence, and more often than not, that single revenue source is library subscriptions.

    Not all journals experience the same level of variable costs. The cost of serving each new subscriber can vary greatly, depending on factors such as the frequency of publication, whether the journal is distributed globally, the size of the circulation, and the number of pages per issue. In general, we can expect online distribution to significantly improve these costs, thus, in theory, making it possible for low circulation journals to publish many pages, circulate them worldwide, and publish as frequently as needed. But, of course, these savings will only be realized if and when publishers can abandon print publication altogether.

    Another misunderstanding that may need clarification is the idea that manufacturing costs are the only variable costs a journal faces. They are not. For instance, the cost of maintaining subscriber records, sending renewal notices and bills, and providing customer service are all variable costs that rise as circulation of the journal increases. Some of these items may also be improved, but not eliminated, by use of the Internet. Science, for example, has begun to accept orders and renewals online, and this source of orders has increased rapidly, relative to more traditional sources such as direct mail.

    The fixed costs of publishing cover things like overhead and the cost of all the staff (not just editors) needed to run a professionally produced journal. Fixed costs are not uniform across all types of print journals. They may vary based on the depth of peer review undertaken, the breadth of disciplines and issues covered, and the extent of the marketing and other support efforts needed to produce the journal. Staffing costs are not likely to be reduced by online delivery, and in fact may increase substantially. Increased reader expectations can drive demand for more editors, more technical staff, and more sophisticated customer services.

    Besides staff costs, fixed costs include major technical systems required to maintain the publication. One reason for the pricing disarray that exists in scholarly publishing right now is the uncertainty about what the steady-state cost structure of online publishing will be. Everyone, by now, has come to realize that merely throwing a few files onto a server will not constitute a viable publishing operation. Publishers are expected to provide value-added services that exploit the special features of the Internet to improve searchability, linking to outside resources, and other aspects of the readers' experience; to maintain a number of back issues indefinitely; and to provide for a more permanent archive. Quality control is also a much larger problem online than in print. With the expectation of retaining back issues online indefinitely and integrating them with new material for searchability, quality control is a job that is, in a very real sense, never completed. All these activities represent new costs associated only with online publishing, and until a more settled view of expectations is reached, it will be difficult for publishers to assess accurately what their fixed costs will be.

    Further complicating the situation is the centralization of buyers. As mentioned earlier, there is some reason to think that, even though the Internet may bring more readers than ever to a journal, there will be fewer paying subscribers. Libraries used to maintain multiple subscriptions to the most popular journals, but will purchase only one site license to online publications, no matter how popular they become.

    More important—if the publisher relies on individual subscriptions—is the problem of library subscriptions cannibalizing the publication's personal subscription base. In print, this phenomenon is a minor factor, because many people will still decide to purchase their own copies for convenience and portability. Some individuals also like to retain their own personal collection of key journals. All this is swept away by institutional site licenses to journals. Many of the compelling benefits of personal print subscriptions are lost if the very same product is available online at one's desktop through the university. Though most readers still report a preference for the look and feel of print, and for its portability, these benefits are strained against the economic incentive to drop print and save the subscription cost.

    If buying centralization continues to grow, it means that the fixed publication costs will be spread over a smaller number of payers, and thus will rise as a portion of total price. Depending on how much the size of the buying market declines, the effects on price can be surprisingly steep.

    Margin, the third component of pricing, is usually expressed as a percentage of the gross cost of production. There may be endless arguments about how much margin (profit) is appropriate for a scholarly journal, or even whether any margin should be charged by non-profit entities. The fact of the matter is that nearly every important and vibrant publication will charge some sort of margin. A publisher cannot produce cash for improvements, fund startup projects (whether charitable or commercial), or even merely ensure that the journal has enough financial flexibility to weather an unforeseen crisis or to pursue an unexpected opportunity without generating some revenue in excess of the precise costs of producing the journal.

    In a durable business, margin is expected to increase with risk. Among the risks faced by publishers navigating the transition from print to online publishing are

    • new competitive challenges,

    • increased demand for technically sophisticated information products,

    • potentially diminished print revenue base,

    • unclear cost basis,

    • centralization of buyers.

    All these risk factors have been mentioned in other contexts in this paper. Given the number of unknowns and their financial implications, it may be predicted that publishers will price their new online products to compensate for substantial risk.

    This review of publishing costs should provide a more nuanced understanding of the complexity of moving from print to online publication of scholarly journals. Although some publication costs will decrease in the transition to online, others will increase. Further, the total cost may be borne by a smaller number of paying subscribers. The net effect on subscription pricing is uncertain.

    A simple example, summarized in Table 10.3, will illustrate the point. See the King and Tenopir (this volume) chapter for a substantive discussion of these effects, using actual industry cost and price averages. For this example, assume no other major revenue stream that will share costs or be affected by a transition to online, and no price differentiation among target market segments. The purpose is only to illustrate the effects of changes to the paying base on the pricing for a journal. Imagine a print periodical with 10,000 subscribers and a frequency of 12 issues per year. Suppose the fixed costs for producing the journal are $1 million, the manufacturing and distribution costs are $2 per issue, other variable costs are $.50 per issue per subscriber. Then the cost base per subscriber, exclusive of margin, would be $130: $24 in manufacturing costs, $6 in other variable costs, and $100 for fixed cost contribution. The publisher would likely add between $13 and $26 dollars of margin to produce a price per subscriber of, say, $149.

    Table 10.3: Illustration of the impact of buyer centralization on online pricing
    Print Scenario Online, no centralization Online, w/ centralization
    Circulation $10,000 $10,000 $7,000
    Total Fixed Cost ($) $1,000,000 $1,000,000 $1,000,000
    Fixed contribution/subscriber/year $100 $100 $143
    Variable cost/subscriber/year $30 $11 $11
    Straight margin $19 $19 $19
    Percent margin 15% 17% 12%
    Total subscription price $149 $130 $173

    Now suppose this journal switches to online publication, entirely abandoning print as a medium. Again, this scenario is simplistic in order to underscore what the economics of a fully online journal might look like after a transition is completed. I am ignoring, for now, the effects on pricing from producing the journal in two media simultaneously, although this is the reality facing many scholarly publishers today.

    The middle scenario in Table 10.3 demonstrates the ideal circumstances for a journal moving online. Assuming that fixed costs remain the same, variable costs decline, and circulation sales hold steady when the journal moves online, there is reason to expect that both buyers and the publisher will gain by making the transition. With online production, the variable costs will decrease quite significantly. Suppose the manufacturing decreases by 75% to only $.50 per issue ($6 per year), while the other variable costs reduce to $5 per year. Costs that had represented $30 of the print price now represent only $11. Of course, if all other factors remained equal, this should be a boon to all parties. The publisher could lower the price and still maintain the same gross profit as before.

    But all other factors do not remain the same. In all likelihood the fixed costs will be higher, as reader expectations increase. Even if the fixed costs do remain the same, if there is a decrease in the number of buyers, the fixed costs will have to be spread across a smaller group. The percentage increase in the fixed-cost portion of the price can be greater than the percentage decrease in subscriptions. If, for example, the number of buyers falls by 30%, dropping from 10,000 to 7,000, the fixed cost portion of the price will rise by nearly 43%, from $100 to $142.85. And overall, this would result in a higher base cost of $142 + $11 = $153. Even if the publisher accepts the same gross margin (which would be a thinner percentage), the final price would rise to $172, a 15% price increase to subscribers, despite the substantial decrease in variable costs.

    The last scenario in Table 10.3 summarizes the pricing effects of a 30% decline in circulation sales due to cannibalization from moving the contents of the journal online. Among the distortions caused by this scenario are that the price for buyers increases 15% over the print price, and the publisher receives a lower marginal percentage for a more risky model, which defies normal business practice. If the publisher decided to maintain the same margin as print, the end price would rise even further, to $177, nearly a 19% increase for subscribers. The greater the pricing impact, the more probable that the circulation figures would decline. If they decline even more sharply than the projected figures, it could set off a "death spiral" reaction in the journal, also described in the King and Tenopir (this volume) chapter, where prices keep rising to cover the circulation shortfalls, thereby dampening demand even further.

    Of course, this is just an illustration. For many reasons the end result for a particular journal may be different. For instance, a creative publisher could turn an online presence into other revenue opportunities, such as advertising. But these opportunities may be wishful thinking. There is little reason to believe that a publisher who cannot sell ads in the print journal would succeed much better merely for having the journal online. Indeed, if the publisher does sell print advertising, there may be a loss of revenue, since many advertisers remain skeptical of the online medium, and highly resistant to paying prices similar to print advertising rates.

    Another possibility is that subscription losses of this magnitude may not occur. It is certainly true that the scenario described above allows some flexibility for the publisher to lose subscriptions. The break-even amount of subscription loss in the case above is around 16%. That is, assuming a drop in subscriptions to 8,400, and assuming the fixed costs for producing the journal stay the same, then the variable cost savings are enough to offset the increased portion of the price dedicated to fixed costs. So the problem for publishers isn't whether they will lose any subscriptions, but a more complicated problem: how much will fixed costs increase due to online publishing, how much will subscriptions decline, and how much variable-cost savings will there really be? It is the complicated interplay of these uncertain effects, along with the enticing but uncertain prospect of developing other revenue streams, that leaves the pricing of online journals a very tricky matter.

    10.4 Conclusions

    From the above example, we can ascertain a few principles to help guide publishers in assessing the risks and costs of a transition to electronic publishing. First, when the publication's variable costs are a larger portion of the total cost than the fixed, moving online will likely be less risky. This is because the greater the savings that can be accomplished from electronic publication, the deeper the subscription losses would have to be before they caused the fixed-cost distribution to rise more than the variable-cost savings. This enables us to create a profile of the type of journal that would be the best candidate for moving to online publication rapidly:

    • low circulation,

    • high page counts and/or high frequency,

    • narrow, focused editorial scope,

    • small, if any, reliance on advertising revenue.

    Low circulation and high page counts would tend to lead to poor economies of scale, thus one would expect substantial variable costs. In addition, if the circulation were mainly library subscriptions, not personal, and were mainly purchased at one copy per institution, the likelihood of revenue cannibalization or centralization of buyers impacting the journal would be smaller. Narrow editorial scope would contribute to maintaining relatively lower fixed costs. Lack of advertising revenue would simplify the risk assessment for moving online and reduce the chances of revenue cannibalization. These circumstances describe a very significant number of scholarly journals, particularly those published by non-profit discipline-focused societies.

    Should we, therefore, not expect to see journals moving online that do not meet these criteria? In some ways, high-circulation journals with a variety of revenue streams might seem to have everything to lose and nothing to gain by undertaking the transition. However, recall from the discussion at the beginning of the chapter that the attractions of online publishing for larger scholarly journals are many:

    • greatly enhanced reader benefits,

    • broader and more convenient accessibility,

    • brand extension and preservation,

    • possibility of substantial variable-cost savings combined with the promise of novel revenue streams,

    • ability to remain up-to-date and relevant to readers, to defend against obsolescence.

    With both reader demand and library demand for enhanced service so high, it is inevitable that journals of all stripes will begin moving online. One important factor for the larger publications will be trying to create some way to either preserve the print subscriptions or translate the broad print audience of buyers to an equally broad audience of buyers online. The more widely fixed costs can be distributed the lower will be the price for all parties. In the print world, of course, this would be a commonplace understanding. Ironically, however, in the current context of online publishing, this modest insight seems like heterodoxy, since it follows from the counterintuitive assertion that, in some circumstances, online publishing could actually result in higher priced subscriptions than print.

    Publishers, librarians and readers of scientific journals are all rightly inspired and intrigued by the great possibilities for electronic publishing to revolutionize and democratize scholarly communication. But if the publishing and peer-reviewing processes add value to those communications—and most observers continue to agree that they do—then these cooperating parties will need to come to a fuller understanding of the economics that underlie the process. Care must be taken that in the rush to implement new technologies to benefit readers, we do not undermine the fundamentals that make publishing a useful, as well as a financially viable, enterprise.

    Notes

    Thanks to Jeffrey MacKie-Mason for his many excellent and thought-provoking clarifications and improvements to this chapter. I would also like to thank several readers at AAAS who offered comments on earlier drafts, particularly Phil Blair, Colleen Struss, and Marlene Zendell.return to textreturn to text

    11. A Portfolio Approach to Journal Pricing[†]

    11.1 Introduction

    In recent years access to print journals has been threatened.[1] Beset by persistent journal price inflation (especially in the so-called STM fields, or science, technology and medicine) and stagnant budgets, many university libraries have been forced to re-allocate dollars from monographs to journals, to postpone the purchase of new journal titles, and in some cases, to cancel titles. As a consequence, libraries have often relied on interlibrary loans to satisfy faculty demands. This situation and its possible causes has been studied at great length in the library science literature. With few exceptions, a consensus has evolved which focuses on the growing importance of commercial publishers in the market for scholarly journals: Over the past decade or more, commercial firms have aggressively raised prices at a rate disproportionate to any increase in costs or quality. This appears to be especially true for the largest commercial firms.[2]

    The research discussed in this paper is the first to assess the merits of this consensus from an economic perspective.[3] Have changes in journal costs and quality accounted for most of the price inflation or has the exercise of market power by publishers played an important role? In addressing this question, I offer both theoretical and empirical support for the latter alternative. A model of journal pricing is proposed that reflects the underlying demand behavior of libraries. Although individual users are interested in just a handful of STM journals, libraries maximize the usage of broadly-defined collections, e.g. all biomedical journals, subject to a budget constraint. The result is demand for a portfolio of titles. In practice this means that libraries rank titles according to cost/use from lowest to highest and then select the largest set of low-ranked titles that they can afford. In other words, unlike most markets involving differentiated products, it is not appropriate to model demand as a discrete choice process. Rather, the typical library attempts to provide access to as many STM journals as possible through a combination of subscriptions and interlibrary exchanges.

    Given this portfolio demand, publisher pricing strategies are determined by the distribution of budgets and a title's relative quality. Since all journals in a particular demand portfolio compete for the same budget dollars, relative quality determines demand for individual titles (if prices are equal, higher quality journals experience greater demand.). In turn, the budget distribution influences whether, for example, high quality titles choose low prices and sell to most libraries or set high prices and sell only to the largest-budget institutions. Furthermore, the pricing model predicts that in some cases firms controlling larger portfolios of journals have an incentive to charge higher prices, all else equal. Thus, past publishing mergers may account for some of the observed price increases.

    To evaluate this and other conjectures, a unique data set was assembled that includes cost, US price, and quality of information for 900 biomedical titles as well as holdings information for these same journals at almost 194 biomedical libraries.[4] These data are used to estimate a structural model to identify the separate impacts of journal costs, quality, and publisher market power. The results indicate that the firm-level demand for journals is highly inelastic, that quality- and cost-adjusted price increases have been substantial over the past decade, and that past mergers have contributed to these price increases. The fact that firm-level journal demand is inelastic, e.g. demand for a firm's titles decreases less than 1% when its prices increase 1%, is a sufficient condition for the exercise of market power. But the econometric estimates suggest that firms are not profit-maximizing, at least not in a short-term sense. One possible explanation is that, in anticipation of future growth in library budgets, publishers preserve future sales by pricing less aggressively today. This story can also account for the estimated annual price increases. The third result is that merger-related price increases for the acquired firms' titles were substantial, about 25%. Yet US antitrust authorities expressed no concerns about the respective mergers.

    These results raise a number of policy questions: (1) Since STM journal content is a public good (funded in most cases by tax dollars), does the performance of commercial STM publishing constitute a market failure? If so, do better alternatives exist? (2) Do antitrust authorities need a new paradigm for academic publishing and other portfolio-type markets? (3) How will the growing transition to electronic distribution affect the status quo? I briefly address these questions at the conclusion of the paper.

    The chapter is organized as follows. I first discuss journal demand. Next, I describe the journal pricing model. I then discuss the empirical model, describe the data, and present the estimation results. Finally, I conclude by discussing the policy issues mentioned above.

    11.2 Journal Demand

    From the perspective of a journal user, it might seem that demand for each unique journal title should be treated separately. For example, articles in Brain Research are distinct from and cannot easily substitute for articles in the New England Journal of Medicine, much less those in the American Economic Review. If demand for each unique title is independent, then the publishers of individual titles have the capacity to obtain monopoly returns. Mergers won't matter.

    The notion that the demands for individual titles are unrelated is incorrect because it misidentifies the purchaser. Libraries, not readers, buy most of the subscriptions, especially for expensive STM journals. Thus it is the demands by libraries for different titles that determine whether mergers will create additional market power. Discussions with dozens of librarians revealed the following: their purchase of academic journals is generally based on two factors: annual subscription price and expected usage. To assemble and maintain their collections, most libraries appear to construct a cost per use ratio for each title.[5] Given a budget for a relevant academic field, e.g., biomedicine, they then proceed to rank journals from lowest to highest in that field according to this ratio, and identify a cutoff above which titles are not subscribed.

    From year to year, as budgets and titles' usage change, collections are adjusted accordingly.[6] Over the past decade or so the general trend is for increases in library budgets to lag journal price inflation; a consequence is that many libraries have been forced to re-allocate dollars from monographs to journals, to postpone the purchase of new journal titles, and, in some cases, to cancel titles.

    The most interesting aspect of library demand for journals is that individual titles within a given field are considered simultaneously. That is, the content may be unique, but on a cost per use basis different titles are substitutes. Titles compete with each other for budget dollars across an entire field of users served by the library, rather than demand for each title being independent as the user perspective suggests. Demand by libraries, the actual purchasers of subscriptions, is for a portfolio of titles drawn from a rather broad set.

    11.3 For-Profit Journal Pricing

    Given this demand structure, how do for-profit publishers price their journals?[7] Commercial journal publishers, like firms in any industry, will take into account the structure of demand and the likely strategies of competitors when setting prices. As described earlier, libraries — which constitute the bulk of demand for STM journals — attempt to purchase the most usage given their serials budgets.

    To model how prices are set in this demand environment I assume that there are two types of library budgets, small and large.[8] I assume that each journal title is sold by a separate publisher. No price discrimination is allowed, i.e. annual subscriptions are sold for a unique price. Journal production includes two components: fixed, first copy costs, and a marginal cost. I assume the latter equals zero.

    I consider a two-stage game. In the first period, each of the firms consider whether to target (through choice of content) all libraries or just those with large budgets.[9] Once these sunk investments have been made, each firm takes into account the pricing strategies of firms that have made a similar marketing choice.

    Given these and some additional assumptions, we can show that firms owning high-use titles will target all libraries, and that the remaining firms will focus on the large-budget customers (an ordered equilibrium). The intuition for the ordered equilibrium is that differences in journal use offer a competitive (dis) advantage to (lower-) higher-use titles. All else equal, libraries will purchase higher-use titles. And if we assume that there is a sufficient number of small-budget libraries, firms owning the high-use titles will find it profit maximizing to sell to all libraries, while the remaining firms sell only to the large-budget customers. Although the latter could set a price low enough to attract large- and small-budget customers, it is not optimal for them to do so.

    Furthermore, journal pricing for each target population is similar: owners of high-use titles charge higher prices. On the other hand, journal prices decrease as the aggregate usage of competing titles increases. The explanation for the first result is straightforward: since libraries rank titles according to cost per use, firms that own high-use titles have an incentive to set prices that exceed those of lower-use titles. Aggregate use matters since budgets are finite in size, i.e. as total usage increases, the competition for a fixed number of budget dollars intensifies, forcing a title (whose usage is fixed) to lower its price.

    How do mergers affect outcomes in this simple model? There are a number of potential scenarios: mergers within budget classes, those across budget classes, and some combination of these first two cases. Consider the case of a within-class merger involving two high-use titles. What pricing strategy does the merged firm adopt? As we noted earlier, a journal's profitability decreases in aggregate class usage. This suggests that the merged firm might benefit from raising the price of one of its titles enough to cause the small-budget libraries to drop it and replace it with a lower-use title. This "jumping" between budget classes lowers the aggregate usage of titles sold to all libraries, and thus enhances the profitability of the merged firm's remaining general circulation title. The profitability of the "dropped" title may go up or down, depending on the model's parameters.[10] The sum of these two components will determine the post-merger pricing strategy. If the net effect is positive, then the merger is harmful: the average quality of library collections decreases.

    11.4 Testing the Portfolio Theory

    The Institute for Scientific Information (ISI) tracks citations in peer-reviewed titles for over 8,000 STM journals in various fields. Not surprisingly, the number of publishers, both commercial and non-profit, is large as well. With respect to biomedical journals, ISI tracks titles published by at least 70 companies. Over the past decade a flurry of merger activity has been observed in the STM publishing market, particularly in the past two years. Since the latter half of 1997 alone, at least six major commercial publishers have been purchased by competitors. In addition, numerous small-scale transactions involving one or two journal titles occur every year.

    Although these recent natural experiments will provide a rich empirical opportunity in the near future (once several years of post-merger data are available), two mergers that occurred in the early 1990s should shed some light on the likely impact of this ongoing merger wave. In 1991, Reed-Elsevier purchased Pergamon and its large portfolio of STM titles, including some 57 ISI-ranked biomedical journals. At the time, Elsevier's biomedical portfolio numbered 190 rankded titles. During the same period, Wolters-Kluwer added Lippincott's 15 ISI-ranked biomedical titles to its collection of 75 ranked biomed journals. Since that time both companies's portfolios have grown further. In 1998, according to ISI data, Elsevier's portfolio stood at 262 ranked titles; Kluwer controlled 112 ranked journals.

    Empirical Models

    Previous empirical studies of journal pricing have not attempted to assess the extent of market power in the academic publishing market. Chressanthis and Chressanthis (1994) specified a reduced form hedonic model to study the determinants of pricing for economics journals. Their results suggest that prices are related to journal characteristics (e.g., as journal quality and size increase, so does price). Lieberman et al. (1992) estimated a supply and demand system using data for 225 ISI-ranked science journals. They find that supply is downward-sloping, consistent with the notion that publishing is characterized by scale economies at the individual title level. Based on this evidence they indirectly argue that entry by new titles has lowered circulation for existing journals, forcing the latter to raise prices to cover fixed costs. However, their model is unable to explain a significant portion of the observed price increases.

    Results for two empirical models are reported here. First, to test whether libraries' acquisition strategies reflect a ranking of journals according to cost/use values, I estimated an exponential cumulative distribution function (cdf).[11] The expectation is that cost per use and journal demand are inversely related. Confirmation of this hypothesis provides support for the portfolio approach to demand.

    Second, I estimated a structural two-equation model of supply and demand that measures firm-specific demand elasticities and explicitly accounts for the possibility of increased market power due to past mergers. Recall that inelastic demand is a necessary condition for the exercise of market power by publishers. Evidence of merger-related price increases is consistent with a portfolio market definition as well as the type of strategic behavior implied by the pricing model.

    Data

    For the period 1988-98, the U.S. Department of Justice collected publisher and price data for some 3000 journals, and holdings information from various libraries. I supplement these data with additional information extracted from the ISI's Journal Performance Indicators database (JPIOD). This database allows me to calculate annual citation rates for individual journals;[12] JPIOD also includes the number of papers published annually by each journal during the sample period.

    My empirical analysis is focused on a subset of these journals, namely, biomedical titles. The reasons for this choice are several. First, based on my discussions with various librarians, biomedical libraries are most likely to evaluate their purchases using the portfolio approach described earlier; furthermore, these libraries typically make no distinctions among various biomedical disciplines, permitting us to consider all biomedical titles as part of a single, large portfolio.[13] Finally, practical considerations, including the fact that biomedical holdings data are reported in a relatively standard fashion, supported an initial focus on this subset of titles.[14]

    During the sample period, almost two thousand ISI-ranked biomedical journals were published; complete time series were available for about 1800 of these titles. Of this latter group, almost 1400 were published by organizations with at least three ISI-ranked titles. For the analysis presented here, only journals sold by commercial firms with portfolios consisting of ten or more titles were considered (thus excluding journals distributed by small private publishers as well as the non-profits), or about 900 titles. Complete holdings data for 194 U.S. medical libraries were collected, representing in aggregate some 60,000 subscriptions to ISI-ranked journals; the libraries were randomly selected from the approximately 1500 Medical Library Association members. Libraries of all sizes are represented in the sample, some holding less than ten subscriptions, while others report collections exceeding 1,300 titles.

    The sample period, 1988-1998, is useful in at least two respects. First, it is sufficiently long to assess whether price increases continue in the journal market. Second, as described above, the period contains a number of natural experiments, i.e., publishing mergers, that enables me to identify the impact of mergers on pricing. Growth via merger should be distinguished from internal growth arising from the introduction of new titles. The latter may produce benefits (such as coverage of emerging fields of study) that helps to offset any intentional competitive harm. Harm associated with acquisitions, on the other hand, is less likely to be balanced by substantial benefits. Journals are simply reshuffled and, based on the public statements made by merging firms, the fixed cost savings seem to be small.[15]

    Descriptive Statistics

    Using the ISI-defined biomedical portfolio and the corresponding library holdings, I calculate the actual size of various commercial publishers' journal portfolios as well as the number of titles subscribed to by the libraries in the sample (see Table 11.1)

    Table 11.1: ISI-Ranked Medical Titles from Major Commercial Publishers, 1998.
    # of titles published # of subscribed ISI titles** %
    Blackwell 112 99 0.88
    Churchill-Livingstone 17 12 0.71
    Elsevier 262 225 0.86
    Harcourt 118 109 0.92
    Karger 45 39 0.87
    Mosby 27 25 0.93
    Plenum 22 20 0.91
    Springer 99 87 0.88
    Taylor 19 16 0.84
    Thomson 41 36 0.88
    Waverly 37 35 0.95
    Wiley 78 70 0.90
    Wolters-Kluwer 112 98 0.88
    Totals 989 871 0.88
    *Major firms are those with at least 10 ISI-ranked biomedical journals.
    **Subscribed data based on holdings for 194 medical libraries, during 1988-98 period.

    It is clear from this table that significant variation in portfolio size exists in the industry. Note that, based on the ISI numbers, the proposed 1998 merger between Reed/Elsevier, Wolters/Kluwer and Thomson would have affected about 42% of the biomedical titles owned by large commercial publishers.

    In Table 11.2, I present information on average price, citations, cost per use (price/citation), and number of papers published for each publisher in the years 1988 and 1998.

    Though prices, citations and paper counts generally increased during the period, the rate of change for prices was far more striking, resulting in higher cost/use numbers by the end of the period. For example, Elsevier's average journal price more than tripled during the period, while the corresponding citation and paper counts increased less than 25%.

    I provide average circulation rates for titles by publisher in 1988 and 1998 in Table 11.3.[16] Given that nominal prices increased dramatically over the sample period, the apparent inelasticity of demand indicated by these numbers is notable. It suggests that library serials budgets increased sufficiently during the period to absorb most of the price increases.

    Table 11.2: Selected Descriptive Stats, Avg. Values by Publisher
    1988 1998
    Price ($) Cites Cost per Use Papers Price ($) Cites Cost per Use Papers
    Blackwell 193 1575 0.40 123 508 2652 0.55 156
    Churchill-Livingstone 183 1726 0.26 103 721 2821 0.62 146
    Elsevier 482 3477 0.36 179 1548 4222 0.78 204
    Harcourt 209 3713 0.18 164 518 5294 0.34 171
    Karger 321 893 0.59 86 711 935 1.01 79
    Mosby 100 4071 0.07 248 241 5369 0.15 269
    Plenum 233 1352 0.25 92 759 1733 1.86 121
    Springer 481 2268 0.44 141 1057 2386 0.84 153
    Taylor 259 759 0.48 74 658 572 1.67 55
    Thomson 207 1210 0.46 92 733 2788 0.45 140
    Waverly 119 3171 0.10 188 277 5770 0.16 237
    Wiley 333 2205 0.38 128 1409 3338 1.10 145
    Wolters-Kluwer 176 2535 0.19 154 504 3519 0.52 153
    Unweighted Averages 253 2227 0.32 136 742 3184 0.77 156
    NOTE: Numbers based on journals that commenced publication prior to 1989 and had >= 100 cites in 1988 or 1998.
    Cost per use ($/cite) is the average value of price/cites where the latter quantity is first calculated for each individual journal before averaging.
    Table 11.3: Avg. Circulation for ISI-Ranked Journals by Publisher
    1988 (# subscribers) 1998 (# subscribers)
    Blackwell 31.72 30.16
    Churchill- Livingstone 34.00 31.20
    Elsevier 30.08 27.92
    Harcourt 50.51 53.23
    Karger 28.81 22.77
    Mosby 94.50 96.55
    Plenum 27.61 22.89
    Springer 21.60 19.03
    Taylor 11.67 12.08
    Thomson 13.50 19.42
    Waverly 61.67 63.41
    Wiley 24.41 23.51
    Wolters-Kluwer 41.62 42.28
    Unweighted Avgs 36.28 35.73
    NOTE: All numbers based on holdings for 194 medical libraries, during 1988-98 period. All titles commenced publication prior to 1989.
    Estimation Results

    The results for the exponential cdf model are consistent with expectations. They suggest that higher cost/use journals are purchased by fewer libraries. For example, in 1991, the marginal journal for a $100,000 budget library has a cost/use value equal to about 0.22 and, using the parameter estimates, is held by about 30 of the 194 libraries in the sample. The marginal journal for a $200,000 budget library has a cost/use value equal to about 0.59, and is held by some 17 libraries.

    Turning to the structural model results, the estimates imply that, after controlling for changes in citation rates and costs, publishers increased annual journal prices some 140% over the 1988-98 period (over the same period the Consumer Price Index increased by 37%). In addition, as a journal's citation rate improves relative to the average value in the sample, demand increases. Demand is apparently very inelastic. No firm-specific demand elasticity was more than 0.50 in absolute value. These small elasticities imply that publishers have an incentive to more than exhaust existing library serial budgets and any anticipated increases. This observation is consistent with numerous librarians' experiences and with what some publishers have privately acknowledged.[17] However, these estimates suggest that firms are not profit-maximizing, at least not in a short-term sense. One possible explanation is that, in anticipation of future growth in serials budgets, publishers preserve future sales by pricing less aggressively today. Under such circumstances, the estimated, firm-specific demand elasticities should lie somewhere between zero and one, in absolute terms.[18] Note that this story can also account for the estimated annual price increases.

    Did the two publishing mergers earlier in the decade enhance the participating firms' market power? With respect to the Reed-Elsevier/Pergamon transaction the answer seems clear. Post-merger (1992-1998), Elsevier journal prices were unchanged but the former Pergamon titles experienced a 27% increase. This asymmetry is observed in the Kluwer-Lippincott merger as well. Post-merger, the former Lippincott titles experienced a 30% price increase while the Kluwer prices were unchanged. However, in this case the Lippincott price increase is not solely a consequence of enhanced market power. The results suggest that demand for Lippincott titles became slightly more inelastic in the post-merger period, contributing at least partially to the observed 30% price increase.

    11.5 Policy Implications and Future Directions

    Market Failure?

    Efficient pricing is not sustainable in the declining average cost environment of academic publishing. This begs the question of how the performance of commercial publishers compares to a second-best break-even standard. Our analysis suggests that prices far exceed marginal costs, but do they exceed average costs? One way to assess this question is to examine the pricing of comparable non-profit titles; presumably non-profit publishers set prices closer to if not equal to average costs. If the latter prove to be cheaper, then scholars have a real alternative for disseminating scholarly information in a more efficient fashion.

    Though a comprehensive analysis of non-profit journals is beyond the scope of the present paper it is useful to report some initial qualitative results.[19] In Table 11.4, I calculate average prices and citation rates for both commercial and non-profit ISI-ranked biomedical journals.

    Table 11.4: 1998 Statistics for ISI-Ranked Biomedical Titles, By Date of Initial Publication
    1978-1987 1938-1947
    Non-Profit Commercial Non-Profit Commercial
    Variable N Mean N Mean N Mean N Mean
    PRICE($) 27 287 343 736 17 379 28 838
    CITES 27 10304 343 2159 17 8946 28 7913
    1968-1977 1928-1937
    Non-Profit Commercial Non-Profit Commercial
    Variable N Mean N Mean N Mean N Mean
    PRICE($) 26 306 221 919 4 139 19 591
    CITES 26 13907 221 2720 4 3547 19 3695
    1958-1967 1918-1927
    Non-Profit Commercial Non-Profit Commercial
    Variable N Mean N Mean N Mean N Mean
    PRICE($) 17 446 101 1316 9 294 22 483
    CITES 17 14163 101 5067 9 6402 22 2949
    1948-1957 Before 1918
    Non-Profit Commercial Non-Profit Commercial
    Variable N Mean N Mean N Mean N Mean
    PRICE($) 11 289 58 625 16 292 69 702
    CITES 11 13445 58 3774 16 12593 69 3365

    Titles are aggregated according to the decade of initial publication, going backward from 1987.[20] The discrepancy in average prices and citations for the two groups is striking. For example, if we compare titles that originated at similar points in time, we find that the average non-profit subscription price is between fifty to seventy-five percent less than the commercial rates for titles of similar vintage. At the same time average citation rates for the non-profit journals greatly exceed those of the commercial publishers' in most instances, sometimes by a factor of five. Among commercial journals, prices and citations are positively correlated. Thus, the substantially lower prices of comparable non-profit titles suggests that commercial publishers are setting prices well in excess of average costs.[21] Despite this apparent superiority,[22] the population of ranked non-profit titles is far smaller than that of the commercial journals, 148 versus 1032. Has the lucrative journals market induced too much entry or have the non-profits been too slow to exploit emerging research areas? Although this question deserves further attention it seems clear that the two distinct publishing models exist, each successful in their own way.

    Antitrust Paradigms

    When the proposed 1998 merger between Reed Elsevier and Wolters Kluwer collapsed, opposition from antitrust authorities in Europe and the U.S. was cited as a primary cause. Although no formal complaints were filed by agencies on either side of the Atlantic, regulators had sent a variety of signals indicating their serious concerns. Negotiations with the European Union had progressed the farthest and it appeared that the proposed deal would proceed only if the parties agreed to significant divestitures. It was widely reported at the time that the EU's preferred set of divestitures upset the financial logic of the merger and resulted in its demise.

    What is interesting here is that the EU's main focus was not on academic journals, but rather legal publishing (in Europe), and that its theory of anti-competitive harm was based on a user-based approach to publishing mergers: excessive overlap in content (and therefore similar to the DOJ's approach to the 1996 merger of legal publishers Thomson and West). The U.S. focus was far different, in part because European legal publishing was not germane and because the model of harm relied upon was novel.

    Though one can only speculate on how a U.S. antitrust case might have proceeded, it is clear that the combined Reed-Elsevier/Wolters-Kluwer entity would have controlled large journal portfolios in a number of broad fields, including biomedicine. Assuming that these broad fields constituted antitrust markets, some of these portfolios would have crossed the U.S. government's concentration threshold (based on the Antitrust Guidelines) with shares in excess of 30-35%. Based on the results discussed here, such a merger may have resulted in substantial price increases over time. If the U.S. had filed a complaint and had been successful with this market definition, an important legal precedent would have been set, one that would have made it easier to employ a portfolio theory in mergers involving combined market shares less than the threshold, e.g. the subsequent merger of Wolters-Kluwer and Waverly, and/or a large firm buying a relatively small portfolio of journals. The recent reluctance of the Antitrust Division to oppose several mergers in the publishing industry can be partially attributed to insufficient market shares. However, since many future deals are likely to be relatively small in scope, opposition to journal mergers will need to adopt novel approaches in the definition of both markets and concentration thresholds.[23]

    A Digital Future

    Scholarly journals render at least three services: research communication, archiving, and quality certification. Digital technology offers the potential to transform the first two by providing instantaneous access to current and past research. With modest investments in computer hardware and software, global scientific communities can dramatically lower the costs of exchanging information.[24] Though these innovations might seem to threaten the future of the traditional journal, the latter's role as a quality filter may be sufficient to preserve its existence, albeit in modified form. Although it is possible to conceive of new mechanisms for evaluating journal quality, e.g. measuring the number of hits generated by a journal website, it seems likely that the existing expert-based system for assessing new research will survive.[25]

    Commercial publishers have begun to exploit these new opportunities by bundling their individual journal titles and providing libraries with electronic access to article databases.[26] In doing so, the economics of commercial publishing may change in (subtle) ways. Portfolio size will still matter, but the number of journals may matter less than the total article population. Digital technology will make it feasible to control, monitor and price access in new and myriad ways, suggesting that sophisticated price discrimination schemes could be observed someday. The prospect of bundling and price discrimination, of course, will inevitably raises antitrust issues. A few, large portfolios might reduce transactions costs for libraries yet have the potential for influencing new entry as well as pricing.[27]

    11.6 Conclusions

    This chapter offers a new framework for understanding the interaction between libraries and commercial publishers. A portfolio approach to journal demand is proposed that is consistent with the observed pattern of journal purchases. This approach to demand can be used to explain for-profit publisher pricing as well as the incentives for mergers in this market. Estimation of a structural model of supply and demand reveals that the firm-level demand for journals is highly inelastic, that quality- and cost-adjusted price increases have been substantial over the past decade, and that past mergers have contributed to these price increases. Together these theoretical and empirical results raise a number of policy questions regarding (1) the performance of commercial publishers, (2) the efficacy of current antitrust paradigms and (3) the possibility that electronic distribution may mitigate existing problems in the market for scholarly journals.

    Notes

    I would like to thank many of my former colleagues at the Department of Justice, including Craig Conrath, Renata Hesse, Aaron Hoag, Russ Pittman, David Reitman, Dan Rubinfeld, and Greg Werden, as well as Jonathan Baker, Cory Capps, George Deltas, Luke Froeb, Jeffrey MacKie-Mason, Roger Noll, Dan O'Brien, Richard Quandt, Lars-Hendrik Röller, Steve Salop and Margaret Slad; seminar participants at the Federal Trade Commission, Georgia Tech, SUNY Stony Brook, and Wissenschaft Zentrum Berlin; and participants at the meetings of the American Economic Association, the European Association of Research in Industrial Economics, the Southern Economics Association, and Western Economics Association. The Association of Research Libraries and its members, the National Library of Medicine, the Georgia Tech Library, and the Georgia Tech Foundation have provided invaluable assistance. Expert data support was provided by a large group of individuals, including Deena Bernstein, Claude Briggs, Pat Finn, Doug Heslep and Steve Stiglitz. Finally, I would like to thank the dozens of librarians and publishers who have provided me with important insights.return to textreturn to text

    1. Increasingly, journals are available in both print and electronic versions, and for some new titles only an electronic format is available. The advent of electronic journals is very recent, however, and is unlikely to have influenced behavior during the sample period analyzed in this paper. See Tenopir and King (2000), chapter 15, for a discussion of these changes.return to text

    2. See Tenopir and King (2000), Chapter 13, for a review of this literature. An alternative explanation for journal price inflation has been offered by Lieberman, Noll, and Steinmuller in their 1992 working paper, The Sources of Scientific Journal Price Increase, Center for Economic Policy Research, Stanford University. They argue that entry by new titles over time has lowered circulation for existing journals, forcing the latter to raise prices to cover fixed costs. They estimate a supply and demand system for a set of journals and find that supply is downward sloping, consistent with this notion that individual titles exhibit scale economies. However, after controlling for this and other factors there remains a significant inflation residual that is unexplained by the model.return to text

    3. See my working paper, Academic Journal Pricing and Market Power: A Portfolio Approach, November, 2000 for a complete exposition.return to text

    4. This data collection effort began in 1998 while I was still employed by the U.S. Justice Department's Antitrust Division. At that time, the Division was reviewing a number of proposed mergers between commercial publishers of STM journals, including (1) Reed-Elsevier, Wolters-Kluwer and Thomson; (2) Wolters-Kluwer and Waverly; and (3) Harcourt and Mosby.return to text

    5. This claim is generally true for medical libraries; though other types of academic libraries may not be as precise in their processes, they appear to behave in similar fashion. In any case, this is an empirical question that is tested using the holdings data.return to text

    6. Of course, this begs the question of how libraries measure usage for titles to which they do not currently subscribe. Presumably, evidence from interlibrary loans and citation data provide the basis for these measurements.return to text

    7. Commercial and non-profit journal publishers have different objectives. The latter are intent generally on disseminating knowledge, whereas the former are interested primarily in profits. I assume that the non-profit firms set prices to cover average costs and I ignore them in the analysis that follows.return to text

    8. I analyze this case, and also the case in which each library budget is unique (McCabe, 2000).return to text

    9. The choice of content will influence a journal's use in a library. So, for example, a general journal is likely to be used far more at a particular institution than a narrower, niche-oriented title.return to text

    10. Of course, a journal jump between budget classes influences the prices charged by other firms. In simulations of the merger scenario described above, the non-jumping journals experience modest price changes compared to the jump journals. The merged firm's high-use, jump journal exhibits large price increases; the non-merger, low-use, jump journal shows relatively large price decreases. This pattern persists as one increases the number of titles and budget classes. However, if the journal populations of particular budget classes are unchanged after a merger then the prices for those titles remain unchanged. Since it is likely that any observed merger will involve titles in different budget classes, it is possible that the merging firms' titles will jump in both directions, i.e. higher-use titles will jump "up" by increasing prices while lower-use titles will jump "down" by lowering their prices.return to text

    11. For those not familiar with the general concept of a cdf, consult any introductory probability textbook. Note that by specifying an exponential cdf I am assuming that cost per use and journal demand are inversely related. If this particular model "fits" the data, then there is support for the hypothesis.return to text

    12. The University of Wisconsin Libraries "Cost Per Use Statistics", previously available from http://www.wisc.edu/wendt/journals/costben.html (archived at http://wendt.library.wisc.edu/archive/journals/costben.html.return to text

    13. Note that confirmation of my portfolio hypothesis in the current context does not necessarily generalize to other acquisition environments in which cost per use is relied upon less or is more difficult to measure.return to text

    14. Unlike most fields, biomedical scholars enjoy the use of the National Library of Medicine's central database that contains information on several thousand medical collections. Although this data source offered substantial benefits with respect to the initial phase of data collection, the data were not ideally organized for analysis. One of the major difficulties was that many of the data — some 25% — were too idiosyncratic for data processing; as a consequence several hundred additional hours of manual effort were required to transform the data into usable form.return to text

    15. Furthermore, if publishing mergers do result in cost savings, economic theory implies that post-merger prices should decline, everything else equal.return to text

    16. These numbers exclude titles that commenced publication after 1988. Including these newer titles would tend to lower the reported 1988 figures relative to the later 1998 numbers.return to text

    17. According to one former publishing executive, "If we didn't raise our prices each year, our competitors would grab the surplus dollars available from our customers."return to text

    18. In a single period game, each publisher would attempt to forecast the size of journal budgets, and set prices so that its average absolute demand elasticity was close to one. However, in a multi-period context, with budgets increasing each period, a firm's pricing strategy changes. It is possible to show that firms will set prices so that absolute elasticities in each period lie between zero and one. The intuition is that lowering the price (and thus the absolute elasticity) in each period preserves future sales and, combined with budget growth, raises total profits.return to text

    19. The results described here are based on data for journals first published no later than 1988 and sold by publishers with at least three ISI-ranked titles.return to text

    20. Pre-1918 titles are considered together with the oldest titles dating from the 1820s. For younger titles, the average prices for the two groups are similar but the non-profit citation rates are about five times larger.return to text

    21. Since it is possible that smaller subscription bases may account for the commercial titles' higher prices, all else equal, a smaller subscription base raises average costs. I also calculated the average number of subscriptions for both types of publishers by decade. For some of these decade groupings, i.e. 1928-1937 and 1938-1947, the commercial titles actually exhibited larger subscription bases. In the other instances, commercial subscription bases were smaller than those for non-profit titles, but not enough, it would seem, to account for the observed price discrepancies. Except for the pre-1918 titles, commercial subscription bases were only 20 to 40% smaller than for the corresponding non-profit titles. In each of these cases, calculated revenues for commercial titles exceeded those of the non-profit titles. The average revenue level for commercial titles exceeded the corresponding non-profit values by 145%.return to text

    22. Some of this citation gap may be due to the more general subject matter of many non-profit titles, compared to the niche strategy of some commercial journals.return to text

    23. To avoid future antitrust scrutiny the large firms of the journal publishing world are likely to grow by adding relatively small numbers of journals at frequent intervals. If pursued diligently, this stealth strategy can be just as successful as any blockbuster merger.return to text

    24. The Los Alamos physics server is perhaps the best example to date of this digital future (http://xxx.lanl.gov/). This website, funded by US government sources, has become the standard method of exchange for physics working papers.return to text

    25. One important justification for this claim is that professional advancement within (academic) institutions relies on and supports the existing approach to quality assessment.return to text

    26. See the chapters by Gazzale and MacKie-Mason, and Hunter. For example, Elsevier's database product, ScienceDirect, www.sciencedirect.com, contains articles from its more than 1100 peer-reviewed journals in various disciplines. To gain access to the entire database or some customized subset, a library is required to maintain its Elsevier paper subscriptions. The access price is typically calculated as a percentage markup on the library's Elsevier "paper budget." Recently, Elsevier has begun to offer smaller bundles of titles that correspond to broad disciplinary markets, such as biomedicine.return to text

    27. For example, in the print context, cancellation of expensive journals has provided libraries with some modest ability to influence publisher pricing. If and when libraries begin to rely primarily on large, digital bundles for providing access to peer-reviewed research, the credibility of a threat to cancel an entire bundle will be far lower, reducing the effectiveness of this strategy. The impact of this change becomes particularly acute once a bundle grows beyond 50% of a specific market. It is easy to show that once this threshold is passed, perhaps due to a merger, that the profitability of the bundle is greatly enhanced.return to text

    12. Capitalizing on Competition: The Economic Underpinnings of SPARC

    Over the last 15 years the library community has been faced with high and ever-rising prices for scholarly resources. A number of factors have contributed to this situation, most fundamentally, the commercialization of scholarly publishing. While libraries have tried a number of strategies to ameliorate the effects of high prices, the development of SPARC, the Scholarly Publishing and Academic Resources Coalition, finally seems to be having some positive effects.

    This paper will review the current library environment, outline the elements that contribute to the marketplace for science, technology, and medical publishing, and briefly discuss the various calls for more competition in the scholarly publishing market. I will then discuss SPARC, a major initiative intended to introduce low-priced, high-value alternatives to compete with high-priced commercial publications for authors and subscribers.

    12.1 The Environment

    Over the past 15 years, libraries have struggled with the growing gap between the price of scholarly resources and their ability to pay. Data collected by the Association of Research Libraries (ARL), a membership organization of over 120 of the largest research libraries in North America, reveal that the unit cost paid by research libraries for serials increased by 207% between 1986 and 1999 (Association of Research Libraries, 1999). While serial costs increased at 9% a year, library materials budgets increased at only 6.7% a year. Libraries simply could not sustain their purchasing power with such a significant gap. Even though the typical research library spent 170% more on serials in 1999 than in 1986, the number of serial titles purchased declined by 6%. More dramatically, book purchases declined by 26%. With such a drastic erosion in the market for books, publishers had no choice but to raise prices (although not nearly as high as did journal publishers). In 1999, the unit cost of books had increased 65% over 1986 costs. As points of comparison, over the same time period, the consumer price index increased 52%, faculty salaries increased 68%, and health care costs increased 107% (Association of Research Libraries, 1999; Bureau of Labor Statistics, 2000; American Association of University Professors, 1986, 1999).

    At the same time price increases were straining library budgets, an explosion in the volume of new knowledge and new formats was adding yet more stress. According to Ulrich's International Periodicals Directory, the number of serials published increased over 54% between 1986 and 2000 from 103,700 to over 160,000 titles (Ulrich's, 2000; Okerson, 1989). While the majority of these titles are not scholarly journals and would not be collected by research libraries, the data does give some indication of the health of the serials publishing industry. According to figures from UNESCO, over 850,000 books were published worldwide in 1996 (Greco, 1999). Data from the top 15 producing countries reveals that book production increased 50% between 1985 and 1996 (Greco, 1999; Grannis, 1991). In the meantime, electronic publishing is booming with the number of peer-reviewed electronic journals increasing well over 570 times between 1991 and 2000 (Association of Research Libraries, 2000a). While worldwide output of information resources increases dramatically, the research library is purchasing a smaller and smaller proportion of what is available. The typical library that subscribed to 16,312 serial titles and 32,679 monographs in 1986 is now able to afford only 15,259 serials and 24,294 monographs (Association of Research Libraries, 1999).

    The overall high prices and significant price increases of journals have been traced to titles in science, technology, and medicine (STM). Price increases in these areas have averaged from 9 to 13% a year over at least the past decade (Albee and Dingley, 2000). Many librarians believe that the dominance of commercial publishers in STM journals publishing is one of the underlying causes of the high prices. In addition, the consolidation going on in the publishing industry raises even more concerns that fewer companies with greater market power will exacerbate current trends.

    The growth of the commercial presence in scholarly publishing has introduced a market economy to an enterprise that had been considered by scholars as a "circle of gifts." Scholars have always been interested in the widest possible dissemination of their work and the ability to build on the work of others. To share their findings with colleagues and claim precedent for their ideas they have been willing to give away their intellectual effort for no direct financial remuneration. Their rewards come in the form of reputation within their fields and promotion and tenure from their institutions. Scholars trusted that the publishers to whom they gave their work were operating in their best interests, intent on furthering the scholarly enterprise through wide distribution of research results.

    For a long time, this arrangement worked well. Publishers helped shape the disciplines by collecting manuscripts in specific fields, managing the peer review process, and marketing and selling subscriptions. But as a few large publishers recognized the earnings potential of a constant supply of free content that must be purchased by libraries, they raised prices higher and higher. As noted by King and Tenopir in Chapter 8, individuals who are more sensitive to price changes were the first to cancel their subscriptions. Eventually, however, even libraries were forced to launch major cancellation projects, decreasing access and resulting in additional price increases for the remaining subscribers. While it may seem counter-intuitive that selling fewer copies could increase revenues, there is some evidence to suggest that particularly in the case of mergers this is in fact the case (see McCabe, Chapter 11). So the wide distribution desired by authors can be directly opposed to the strategy used by publishers to maximize their profits.

    It is important to acknowledge that commercial publishers are doing exactly what their stockholders would expect them to do. They are behaving responsibly toward their shareholders, their highest priority. This need to protect shareholder value, however, sometimes conflicts directly with the need for scholars to both distribute their own work widely and have ready access to the work of others.

    12.2 The Noncompetitive STM Marketplace

    Publisher profits begin to reveal the nature of the scholarly publishing enterprise, particularly that of journals publishing in science, technology, and medicine, and increasingly in business and economics. Some of the world's largest journal publishers are companies owned by Wolters Kluwer and Reed Elsevier. Data from these companies compared with that from the periodicals publishing industry as a whole show margins 2-4 times as high and return on equity almost twice as high (Wyly, 1998). In his analysis of these companies' financial data, Wyly notes that in conjunction with other evidence, "a high return on equity is at least a potential indicator that equity holders are benefiting from investing in activities not subject to competitive forces" (p. 9).

    Other evidence consistent with high margins is noted by McCabe in Chapter 11 in his section on Data: Descriptive Statistics. McCabe's data show that while the prices of 1000 biomedical titles increased 3 times between 1988 and 1998, the average number of subscriptions held by 194 medical libraries decreased by only 1.5%. McCabe concludes that demand is very price inelastic at the firm level, a sufficient condition for the exercise of market power, and thus the existence of high margins. Can we conclude from this information that the market for scholarly journals is non-competitive? Of course high margins are not necessarily inconsistent with competition. Profits may be low due to large fixed costs. However, McCabe's econometric estimates reveal that quality- and cost-adjusted price increases have been substantial over the past decade. This evidence suggests that profits are high and that the market for scholarly journals is not competitive.

    A number of factors contribute to this environment in which journal publishers operate. While faculty may desire to publish to share their work with colleagues, they are also driven to publish by the promotion and tenure system and the need to obtain grants. Faculty will submit their articles to the most prestigious journals in their fields or to the title that will most likely accept their work. As a title gains in prestige as measured by its impact factor (the frequency with which the average article in a journal has been cited in a particular year), it becomes more and more attractive to faculty, both as authors and readers, and takes on a dominance in the field. Faculty expect their libraries to subscribe to both the prestigious titles and to the second tier titles in which they publish.

    Libraries, whose mission is to serve current and future scholars, purchase as many titles as their budgets will allow. With journals, their practice had been to set up a subscription which was generally not reviewed unless an unusual event, such as a dramatic price increase, drew attention to it. Once in a library's journals collection, there was little chance of a title being canceled. As publishers raised prices, libraries did everything they could to protect their journals budgets. But they had also inadvertently protected faculty from the reality of journal prices. In the professionalization of collection development in the 1960's and 70's, responsibility for selection had migrated from faculty to librarians. Faculty were no longer aware of the institutional prices that were being charged for the titles in which they published and to which they expected the library to subscribe. It is more typical, now, for libraries to review all of their journal titles on a routine basis to cull out little used, low value, or no longer relevant titles.

    Yet another factor contributing to the market dynamic was the hesitance of scholarly societies to compete with well-established titles or to launch new titles in developing fields. Launching titles in new fields or niche areas could draw papers away from established society titles jeopardizing both their clout and financial stability. Thus, authors turned to commercial publishers to support their interests. As these new journals grew in size and prominence, it was less and less likely that societies would take the financial risk to compete. They could not afford to carry the loss for the approximately 5-7 years needed for a new journal to break even (Tenopir and King, 2000).

    Libraries undertook a number of strategies to cope with the continuing increase in journal prices. They reduced dramatically the purchase of monographs, asked their administrations for special budget increases, and when these were not enough, canceled millions of dollars worth of serials. Libraries also turned to document delivery services and developed strategies to improve interlibrary lending performance. They sought to re-invigorate cooperative collection development programs. More recently, site licensing of electronic resources helps eliminate the need for duplicate print subscriptions while consortial arrangements are reducing unit costs at individual institutions while spreading costs across a wider range of libraries. While all of these strategies can help local institutions better manage their budgets, none of them have changed the underlying dynamics of a system where the publisher, operating in a non-competitive environment, can unilaterally set prices without a countervailing pressure from competitive forces.

    12.3 Calls for Competition

    In 1988, at the request of ARL, the Economic Consulting Services Inc. (ECS) undertook a study of trends in average subscription prices and publisher costs from 1973-1987. The study compared the price per page over time of a statistically valid sample of approximately 160 journal titles published by four major commercial publishers (Elsevier, Pergamon, Plenum, and Springer-Verlag) with an estimated index of publishing costs over the same time period. The study concluded that the price-per-page of the journals exceeded the growth in costs by 2.6 to 6.7% a year. This meant that these companies could be enjoying operating profits of 33 to 120% a year. The ECS concluded that "If such estimated rates of growth are reasonably accurate, then the library community would benefit greatly from such measures as the encouragement of new entrants into the business of serials publishing, and the introduction of a program to stimulate greater competition among publishers by injecting a routine of competitive bidding for publishing contracts of titles whose ownership is not controlled by the publishers (Economic Consulting Services Inc., 1989)."

    In a companion piece to the ECS study, a contract report by Ann Okerson defined the causes of the "serials crisis" and proposed a set of actions to confront the problems. The report concluded that "the distribution of a substantial portion of academic research results through commercial publishers at prices several times those charged by the not-for-profit sector is at the heart of the serials crisis (Okerson, 1989)." The report went on to note that:

    Satisfactory, affordable channels for traditional serials publication already exist. For example, there are reasonably priced commercial serials publishers. Many of the non-profit learned societies are already substantial publishers. University presses could substantially expand their role in serials publishing.... The serials currently produced by these organizations are significantly less expensive than those from the commercial publishers, even though they may increase in price at similar rates. Several analyses of the "impact" of serials, in terms of the readership achieved per dollar, show that those produced by non-commercial sources have a higher impact than commercial titles. (p.43)

    Among the recommendations in the report was one centered on introducing competition: "ARL should strongly advocate the transfer of publication of research results from serials produced by commercial publishers to existing non-commercial channels. ARL should specifically encourage the creation of innovative non-profit alternatives to traditional commercial publishers." p.42

    Over the next several years, ARL directed great energy at engaging stakeholders beyond the library community, such as societies, university presses, and university administrators, in the discussions of the scholarly communication crisis. The Association of American Universities (AAU) formed a series of task forces to address key issues related to research libraries. The Task Force on a National Strategy for Managing Scientific and Technological Information took up, as its name suggests, the issues related to scholarly journals publishing. In its report of May 1994, the Task Force called for competition, but this time competition facilitated through electronic publishing. The recommendation stated that the community should "introduce more competition and cost-based pricing into the marketplace for STI by encouraging a mix of commercial and not-for-profit organizations to engage in electronic publication of the results of scientific research (Association of American Universities, 1994)."

    As a result of the work of the task force, ARL proposed several projects intended to address the crisis in scholarly publishing. These were rejected as too narrow or too broad or not directed at the appropriate leverage point in the system. Reaching consensus among the membership on a way forward seemed less and less likely. In the meantime, prices continued to climb. Finally, in May of 1997, at an ARL membership meeting, Ken Frazier, Director of Libraries at the University of Wisconsin, Madison, proposed that "If 100 institutions would put up $10,000 each to fund 10 start-up electronic journals that would compete head to head with the most expensive scientific and technical journals to which we subscribe, we would have $1 million annually.... I don't see any way around the reality that we have to put the money out in order to make this start to happen (Michalak, 2000)."

    Within six months, Frazier's proposal had a name—SPARC, the Scholarly Publishing and Academic Resources Coalition, a business plan was in development, and potential partnerships were under discussion. In June 1998, a SPARC Enterprise Director was hired (Richard Johnson) and the first partnership (with the American Chemical Society) was announced.

    12.4 SPARC

    SPARC is a membership organization whose mission is to restore a competitive balance to the STM journals publishing market by encouraging publishing partners (for example, societies, academic institutions, small private companies) to launch new titles that directly compete with the highest-priced STM journals or that offer new models that better serve authors, users and buyers. In return, libraries agree to purchase those titles that fall within their collections parameters. By leveraging their subscription dollars, libraries reduce the financial risk for publisher-partners allowing them the time to build the prestige needed to attract both authors and readers.

    Over 200 libraries and library organizations from Hong Kong, Australia, Belgium, Denmark, Germany, England, Canada, and the United States now belong to SPARC. Members pay a modest annual membership fee and agree to the purchase commitment.

    A number of strategies must be pursued for SPARC to be successful. First, it must be able to deliver on library subscriptions to partners. This includes marketing support that reaches both SPARC members and the broader library community. SPARC is also working with prestigious societies and editorial boards. This is essential to build name recognition for SPARC and early interest in new titles. Raising faculty awareness of the issues in scholarly publishing is also a critical component of the SPARC program. Faculty who understand the context and are reconnected with the reality of journal prices are more likely to change their submission habits if there is a reasonably priced prestigious or promisingly prestigious alternative. This educational effort is also intended to encourage editors to become more engaged in the business aspects of the titles for which they work. Editors (or societies) can renegotiate contracts, move their titles, or start up competitors. SPARC must also catalyze the development of capacity and scale within the not-for-profit sector. Numerous studies have consistently demonstrated that journals published by societies or other non-profit publishers are significantly lower in price and higher in quality than commercial journals (See for example: Cornell, 1998; McCabe, 1999; Wisconsin, 1999; Bergstrom, 2001). However, STM publishing is clearly dominated by commercial companies. A recent market analysis by Outsell, Inc., estimates that commercial companies account for 68% of the worldwide revenue for STM primary and secondary publishers (Outsell, Inc., 2000). For a true competitive environment to exist, much greater capacity in the non-profit sector is essential.

    As it has developed over the past two years, SPARC has categorized its efforts into three programmatic areas: SPARC Alternatives, SPARC Leading Edge, and SPARC Scientific Communities. In addition, SPARC is also supporting the Open Archives Initiative, an effort to develop standards to link distributed electronic archives. SPARC views the development of institutional and disciplinary e-archives as an important strategic direction for the future of scholarly communication.

    SPARC Alternatives

    The first and most directly competitive of SPARC's programs is the SPARC Alternatives. SPARC Alternatives are the titles that compete directly with high-priced STM journals. The first partnership in this category was that with the American Chemical Society (ACS) which agreed to introduce three new competitive titles over three years. Organic Letters, the first of these, began publication in July 1999. Organic Letters competes with Tetrahedron Letters, an $9036 title (the subscription price in 2001) published by Elsevier Science. ACS, one of the largest professional societies in the world and highly respected for its quality publications program, was able to attract three Nobel laureates and 21 members of the National Academy of Sciences to its new editorial board. Two hundred and fifty articles were posted on the Organic Letters website and more than 500 manuscripts were submitted in its first 100 days (Ochs, 2001).

    A 2001 subscription to Organic Letters costs $2,438. The business plan calls for a fully competitive journal offering 65-70% of the content at 25% of the price. The effects of this new offering have already been felt. The average price increase for Tetrahedron Letters for several years had been about 15%. For 2000, just after Organic Letters was introduced, the price increase of Tetrahedron Letters was only 3%; in 2001 it was 2%. For 2000, the average price increase across all of the Elsevier Science titles was 7.5% and for 2001 it was 6.5%. If the price of Tetrahedron Letters had continued to increase at the rate of 15%, it would cost $12,070 in 2001. Subscribers have saved over $3,000 as a result of competition. Even if the title had increased at the more modest average rate of the Elsevier Science titles for 2000 (7.5%) and 2001 (6.5%), subscribers would be paying over $800 more for Tetrahedron Letters in 2001 than they are currently paying.

    Even more importantly, the introduction of Organic Letters has had a significant impact on the number of pages and articles published by Tetrahedron Letters.[1] During the second half of 1999, the number of articles in Tetrahedron Letters declined by 21% compared to the same period in 1998 and the number of pages declined by 12%. In the first half of 2000, the number of articles decreased 16% compared to the first half of 1999 while the number of pages actually increased 5%. The loss in articles has been compensated for by increasing the number of pages per article, in the second half of 1999 by 11% and the first half of 2000 by 24%. Organic Letters, in the meantime, surpassed its projected pages and articles and has clearly demonstrated that quality, low-cost alternatives can attract authors. The second ACS SPARC Alternative, Crystal Growth and Design, will be introduced in 2001.

    Another high profile SPARC Alternative is Evolutionary Ecology Research (EER), a title founded by Michael Rosenzweig, a Professor of Ecology and Evolutionary Biology at the University of Arizona. In the mid-1980's, Rosenzweig founded and edited Evolutionary Ecology with Chapman & Hall. The title was subsequently bought and sold, most recently in 1998 to Wolters Kluwer. During these years, the journal's price increased by an average of 19% a year. Fed up by the price increases and the refusal of the publishers to take their concerns seriously, the entire editorial board resigned. In January 1999, they launched their own independent journal published by a new corporation created by Rosenzweig. A subscription to EER was priced at $305, a fraction of the cost of the original title ($800).[2]

    As of the end of 2000, EER had published 16 issues while the original title published only 6. Authors had no qualms submitting their papers to this new journal edited by respected scholars in the field. In fact, 90% of the authors withdrew their papers from Evolutionary Ecology when the editorial board resigned. EER was quickly picked up by the major indexes, surmounting yet another hurdle that faces new publications. And, most significantly, EER broke even in its first year. SPARC played a significant role in generating publicity about and, more importantly, subscriptions to EER. EER is another example of how a new title can quickly become a true competitor.

    SPARC has a number of other titles in the Alternatives program. These include PhysChemComm, an electronic-only physical chemistry letters journal published by the Royal Society of Chemistry; Geometry & Topology, a title that is free of charge on the web with print archival versions available for a fee; the IEEE Sensors Journal, to be published by the Institute for Electrical and Electronics Engineers in 2001; and Theory & Practice of Logic Programming, a journal founded by an entire editorial board who resigned from another title after unsuccessful negotiations with the publisher about library subscription prices. New titles added recently include Algebraic & Geometric Topology, a free online journal hosted at the University of Warwick Math Department, and the Journal of Machine Learning Research, a computer science publication offered in a free web version. A number of other partnerships are under negotiation.

    SPARC Leading Edge Partnerships

    To support the development of new models in scholarly publishing, SPARC has created a "Leading Edge" program to publicize the efforts of discipline-based communities that use technology to obtain competitive advantage or introduce innovative business models. Titles in this program include the New Journal of Physics, the Internet Journal of Chemistry and Documenta Mathematica.

    The New Journal of Physics, jointly sponsored by the Institute of Physics (U.K.) and the German Physical Society, is experimenting with making articles available for free on the web and financing production through the charging of fees to authors whose articles are accepted for publication. That fee is currently $500.

    The Internet Journal of Chemistry is experimenting with attracting authors by offering them the opportunity to exploit the power of the Internet. This electronic-only journal was created by an independent group of chemists in the U.S., the U.K., and Germany. It offers the ability to include full 3-D structures of molecules, color images, movies and animation, and large data sets. It also allows readers to manipulate spectra. Institutional subscriptions to the journal cost $289.

    Documenta Mathematica is a free web-based journal published by faculty at the University of Bielefeld in Germany since 1996. A printed volume is published at the end of each year. Authors retain copyright to articles published in the journal and institutional users are authorized to download the articles for local access and storage.

    SPARC Scientific Communities

    Another important program area for SPARC is the Scientific Communities. These projects are intended to support broad-scale aggregations of scientific content around the needs of specific communities of interest. Through these projects, SPARC encourages collaboration among scientists, their societies, and academic institutions. The Scientific Communities program helps to build capacity within the not-for-profit sector by encouraging academic institutions to develop electronic publishing skills and infrastructure, and seeks to reduce the sale of journal titles by providing small societies and independent journals alternative academic partners for moving into the electronic environment.

    One of the most ambitious projects in the Scientific Communities is BioOne , a non-profit, web-based aggregation of peer-reviewed articles from dozens of leading journals in adjacent areas of biological, environmental, and ecological sciences. Most of these journals are available currently only in print. While there is a risk to societies of offering electronic versions of their titles through institutional site licenses, i.e., the loss of personal member subscriptions, there is a greater danger that scholarship not in electronic form will be overlooked and marginalized. But many of the societies do not have the resources or expertise to create web editions on their own. BioOne provides that opportunity.

    BioOne, to be launched in early 2001 with 40 titles out of an eventual 150 or more, is a partnership among SPARC, the American Institute of Biological Sciences, the University of Kansas, the Big 12 Plus Library Consortium, and Allen Press. In an unprecedented commitment to ensuring that the societies not only survive but play an expanding role in a more competitive and cost-effective marketplace, SPARC and Big 12 Plus Library Consortium members have contributed significant funds to the development of BioOne. These funds will be returned over a five year period as credits against their subscriptions. BioOne offers participating societies a share in the revenues, protection against accelerated erosion of print subscriptions, and no out-of-pocket costs for text conversion and coding.

    Several other Scientific Communities projects have received support from SPARC. These include eScholarship from the California Digital Library, Columbia Earthscape, and MIT CogNet. The goal of California's eScholarship project is to create an infrastructure for the management of digitally-based scholarly information. eScholarship will include archives of e-prints, tools that support submission, peer-review, discovery and access, and use of scholarship, and a commitment to preservation and archiving. Columbia's Earthscape is a collaboration among Columbia University's press, libraries, and academic computing services. The project integrates earth sciences research, teaching, and public policy resources. MIT CogNet is an electronic community for researchers in cognitive and brain sciences that includes a searchable, full-text library of major reference works, monographs, journals, and conference proceedings, virtual poster sessions, job postings, and threaded discussion groups. All three of these projects received funding from SPARC in a competitive awards process.

    12.5 Evaluating the SPARC Model

    The SPARC Purchasing Commitment

    As SPARC was being developed, several key decisions had to be made to determine its scope of action. It was clear that the main goal of SPARC was to reduce the price of STM journals. Based on the several analyses of the journals crisis mentioned above, the SPARC founders believed that introducing direct head-to-head competition with high priced titles would be the most effective strategy for achieving this goal. But would SPARC itself be the publisher and actually fund and distribute the competing journals? Or would it provide development funds to established publishers who would launch the new titles? Or would the promise of library subscriptions be enough to encourage publishers to participate?

    The SPARC working group quickly rejected the notion of SPARC becoming a publisher. Many able and sympathetic publishers already existed. Moreover, SPARC did not yet have a name. SPARC supported titles would need to develop prestige quickly to attract editors, authors, and readers, as well as subscribers. While prestige necessarily takes time to establish, the working group members believed that partnering with traditional scholarly societies and university presses known for their high-quality publications could help speed the process along. In addition, working with prestigious partners would help SPARC establish its own reputation.

    SPARC, then, saw its role as a catalyst to encourage primarily not-for-profit scholarly publishers to create the new titles. Many working group members indicated their willingness to contribute substantial amounts of money to SPARC to allow it to provide incentives to publishers in the form of development funds. But early conversations with some potential partners revealed that, at least for traditional publishers, what was needed most was libraries' subscription dollars. The publishers were willing to absorb the up-front development costs if they could be assured that libraries would subscribe early on to the new titles. This would ensure wide visibility from the beginning, reduce the amount of time publishers would need to recover their investments, and avoid possible legal entanglements that could result from external funding arrangements. Hence the evolution of SPARC's incentive plan for publishers: a commitment by SPARC member libraries that they would subscribe to SPARC partner journals as long as the titles fit into their collections profile.

    While the purchase commitment is one of the greatest attractions of SPARC for publishers, it is one of the most controversial parts of SPARC's program for some of its members. In essence, SPARC's alternatives program is creating new titles that members are expected to buy (or is contributing to journal proliferation, as some would say). The founders of SPARC recognized that changing the system would require investment by libraries. While they hoped that university administrators would provide special allocations to support SPARC fees and purchase commitments, it is more likely the case that funds are coming from already over-stretched collections budgets. Purchase of a new SPARC title likely requires the cancellation of another title. In theory, that other title should be the existing high-priced journal. But these are often established journals and cannot easily be cancelled. Over time, as competition works, the high-priced titles should lose authors to the new titles and should ultimately be forced to lower their prices or at least curtail their price increases. As valuable content is lost, the titles will become easier to cancel. But this takes time, and, in the meantime, some publishers have started to bundle their products, eliminating the opportunity to cancel.

    Nevertheless, as the number of new SPARC alternatives grows, it may be possible for libraries to cancel only a few of the competitors to be able to recoup their investment in SPARC titles. In early 2001, the 10 commercial titles with which SPARC alternatives compete head-to-head cost a total of over $40,000. The 10 SPARC titles cost a total of just over $5,200. The cancellation of only a few of the established titles would easily pay for the SPARC titles.

    In the meantime, SPARC has launched a program intended to make the cancellation of the original title easier. Called Declaring Independence, this effort is directed at journal editors and encourages them to evaluate the effectiveness of their current journals in meeting the needs of the researchers in their community. If the findings are unsatisfactory and they are unable to negotiate improvements with their current publishers, Declaring Independence gives the editorial board members suggestions for moving their journals elsewhere. As demonstrated by Evolutionary Ecology Research, prestigious boards will take authors with them creating a vulnerable time for the original journal as it struggles to find new editors and rebuild its author base. This is an opportune moment for libraries to cancel.

    The Emergence of New Pricing Models

    The founders of SPARC understood that publishing, however streamlined, cost money. The pledge of member subscriptions was a recognition of this reality. Most of the SPARC partners, particularly the traditional publishers, have maintained the typical subscription model for their new titles. A few community-based titles, however, are experimenting with alternative models. Three journals hosted by university mathematics departments are taking advantage of the ease of web-based publishing to offer their products online for free. Geometry & Topology and Algebraic & Geometric Topology are both hosted by the University of Warwick (U.K.). Documenta Mathematica is published at the University of Bielefeld in Germany. All 3 journals are run by faculty who are committed to "open-access e-journals [that] provide to authors and readers ... broad dissemination and rapid publication of research (SPARC, 2001)." All three produce a printed volume at the end of the year which is available at a minimal cost. According to the editors of Geometry & Topology, the most time-consuming part of the publishing process is the formatting of papers (Buckholtz, 2001). This work is being subsidized in part through the sale of the paper editions. This model may work while some libraries still feel compelled to purchase paper, but it is not clear what will happen when archiving and cultural issues are resolved.

    Another model used by a SPARC partner is the charging of a fee to authors whose papers are accepted for publication. The New Journal of Physics (NJP), published by the Institute of Physics and the German Physical Society, is an electronic-only journal and available to the reader for free. A fee of $500 is charged to authors whose works are published. In order to encourage faculty to consider publishing in the NJP, a few libraries have offered to pay the fee for their faculty members. Approximately 60 papers have been published by the NJP in the last two years. As faculty have gotten less and less used to paying page charges, however, such fees may prove difficult to sustain.

    Yet a third model is being explored by one of SPARC's newest partners, the Journal of Machine Learning Research (JMLR). JMLR is published by JMLR, Inc. in partnership with the MIT Press. Two electronic versions are offered: a free site maintained by JMLR, Inc., and a paid electronic edition available on the CatchWord Service. The paid version provides additional features including linking to abstracting and indexing services, archiving, and mirror sites around the world. Quarterly paid print editions are also available from MIT Press. It will be interesting to see whether the community will pay for enhanced features when a free edition is available and whether that choice may vary by "subscriber" type, i.e., a library or an individual.

    12.6 Measuring Success

    When SPARC was being designed, the developers set out a number of measures by which its success could be determined. These included

    • SPARC-supported projects are financially viable and significantly less expensive;

    • SPARC-supported products are attracting quality authors and editors;

    • New players have entered the STM marketplace;

    • An environment where editorial boards have been emboldened to take action has been created; and

    • STM journal price increases have moderated significantly.

    It was anticipated that it might take as much as five years to begin to see the effects of SPARC.

    At this point, SPARC has been in existence for only three years. But there are already signs that it is having the desired impact. Evolutionary Ecology Research is financially viable and is offering quality content for under 40% of the alternative. Organic Letters is on track to meet its financial goals and has been able to attract high quality editors and editorial board members. In addition, it has quickly attracted authors away from its competitor, as has EER. Others report strong starts and encouraging prospects.

    Through the Scientific Communities program, SPARC is supporting new players in the market—partnerships have included libraries, library consortia, and academic computing centers working with societies, university presses, independent journal boards, and individual faculty. These projects are in their very early development but give a clear indication of the long term possibilities for expanding not-for-profit publishing capacity.

    SPARC has also been very successful to date in focusing attention on issues through its advocacy and public communications efforts. This in turn has created an environment where editorial boards and societies are beginning to question their publishers about pricing and other policies. Some of these negotiations are successful leading to the lowering of prices as happened recently in the case of American Journal of Physical Anthropology. The American Association of Physical Anthropologists was concerned over the many cancellations of its journal that had resulted from high prices. The Association and the Publications Committee informed the publisher of its title that they were considering options, including the possible launch of a competitive journal. After extensive negotiations, the publisher and the Association were able to come to terms, which resulted in a reduction in the subscription price of more than 30% (Albanese, 2000).

    Other negotiations between editorial boards and commercial publishers have not been as successful. In the case of the Journal of Logic Programming, the entire editorial board resigned after 16 months of unsuccessful negotiations about the price of library subscriptions. They have founded a new journal, Theory and Practice of Logic Programming, which began publication in January 2001 (Birman, 2000).

    The ultimate aim of SPARC is to make scientific research more accessible by lowering prices for STM journals across the board. In 2000, the overall average increase in STM journal subscriptions fell below 9% for the first time since 1993 (Albee and Dingley, 2000). Elsevier Science, the largest STM journals publisher in the world, announced in 1999 that it was ending the days of double-digit price increases and set increases for 2000 at 7.5% and 2001 at 6.5% (Elsevier Science, 2000). These changes are significant.

    For most SPARC member libraries, the savings represented by this decline is far more than their investment in SPARC and the creation of a more competitive market environment.

    While SPARC may not be the only cause of these changes, it does seem clear that by raising the profile of the issues and achieving some early `proof of concept' success, SPARC has emboldened librarians, scholars, and societies to take action. Competition can work.

    Notes

    1. The following analysis is based on data collected by SPARC, July 2000.return to text

    2. An account of the development of Evolutionary Ecology Research can be found in Rosenzweig (2000).return to text

    13. RePEc, an Open Library for Economics[†]

    arXiv.org, the eprints archive at founded by Paul Ginsparg and Los Alamos National Laboratory, continues to be the leading provider of free eprints in the world. Its subject focus is Physic, Mathematics and Computer Science. There is no evidence supporting the idea that similar collections can be built for other subject areas. This chapter is concerned with an alternative approach as exemplified by the RePEc digital library for economics. RePEc has a different business model and a different content coverage than arXiv.org. This chapter addresses both differences.

    As far as the business model is concerned, RePEc is an instance of an "Open Library". Such a library is open in two ways. It is open for contribution (third parties can add to it), and it is open for implementation (many user services may be created). Conventional libraries—-including most digital libraries—-are closed in both senses.

    As far as the content coverage is concerned, the term RePEc stands for Research Papers in Economics. However RePEc has a broader mission. It seeks to build a relational dataset about scholarly resources and other content relating to these resources. The dataset would identify all authors, papers and institutions that work in Economics. Such an ambitious project can only be achieved if the cost to collect data is decentralized and low, and if the benefits of supplying data are large. The Open Library provides a framework where these conditions are fulfilled.

    13.1 Introduction

    In this chapter I am not concerned with the demand for documents, nor am I concerned with the supply of documents.[1] Instead, I focus on the supply of information about documents. For some documents, holding detailed information about the document is as good as holding the document itself. This is typically the case when the document can be accessed on the Internet without any access restriction. Such a document will be called a public access document. Collecting data about documents is therefore particularly relevant for public access documents.

    The main idea brought forward in this paper is the "Open Library". Basically, an open library is a collaborative framework for the supply and usage of document data. Stated in this way the idea of the open library seems quite trivial. To fully appreciate the concept, it is useful to study one open library in more detail. My example is the RePEc dataset about Economics. In Section 13.2, I introduce RePEc as a document data collection. In Section 13.3, I push the RePEc idea further. I discuss the extension of RePEc that allows one to describe the discipline, rather than simply the documents that are produced by the members of the discipline. In Section 13.4, I make an attempt to define the open library more precisely. The example of RePEc demonstrates the relevance of the open library concept. I conclude the paper in Section 13.5.

    The efforts of which RePEc is the result go back to 1992. I deliberately stayed away from a description of the history of the work to concentrate on the current status. Therefore, insufficient attribution is given to the people who have contributed to the RePEc effort. See Krichel (1997) for an account of the early history of the NetEc projects. These can be regarded as precursors of RePEc.

    13.2 The RePEc document dataset

    Origin and motivation of RePEc

    A scholarly communication system brings together producers and consumers of documents. For the majority of the documents, the producers do not receive a monetary reward. Their effort is compensated through a wide circulation of the document and peer approval of it. Dissemination and peer approval are the key functions of scholarly communication.

    Scholarly communication in Economics has largely been journal-based. Peer review plays a crucial role. Thorough peer review is expensive in time. According to Trivedi (1993), a paper commonly takes over three years from submission to publication in an academic journal, not counting rejections. From informal evidence, slowly rising publication delays have been curbed in the past few years as journal editors have fought hard to cut down on what have been perceived to be intolerable delays.

    Researchers at the cutting edge cannot rely solely on journals to keep abreast of the frontiers of research. Prepublication through discussion papers or conference proceedings is now commonplace. Access to this informally-disseminated research is often limited to a small number of readers. It relies on the good will of active researchers to disseminate their work. Since good will is in short supply, insider circles are common.

    This time gap between informal distribution and formal publication can only fundamentally be resolved by reforming the quality control process. The inconvenience resulting from the delay can, however, be reduced by improving the efficiency of the informal communication system. This is the initial motivation behind the RePEc project. Its traditional emphasis has been on documents that have not gone through peer review channels. Thus RePEc is essentially a scholarly dissemination system, independent of the quality review process, on the Internet.

    Towards an Internet-based scholarly dissemination system

    The Internet is a cost-effective means for scholarly dissemination. Many economics researchers and their institutions have established web sites. However, they are not alone in offering pages on the Web. The Web has grown to an extent that the standard Internet search engines only cover a fraction of the Web, and that fraction is decreasing over time (Lawrence and Giles, 1999). Since much of economics research uses common terms such as "growth", "investment" or "money", a subject search on the entire Web is likely to yield an enormous number of hits. There is no practical way to find which pages contain economics research. Due to this low signal-to-noise ratio, the Web per se does not provide an efficient mechanism for scholarly dissemination. An additional classifying scheme is required to segregate references to materials of interest to the economics profession.

    The most important type of material relevant to scholarly dissemination are research papers. One way to organize this type of material has been demonstrated by the arXiv.org preprint archive, founded in 1991 by Paul Ginsparg of the Los Alamos National Laboratory, with an initial subject area in high energy physics. Authors use that archive to upload papers that are stored there. ArXiv.org has now assembled over 150,000 papers, covering a broad subject range of mathematics, physics and computer science, but concentrating on the original subject area. An attempt has been made to emulate the arXiv.org system in economics with the "Economics Working Paper Archive" (EconWPA) based at Washington University in St. Louis, but success has been limited. There are a number of potential reasons:

    • Economists do not issue preprints as individuals; rather, economics departments and research organizations issue working papers.

    • Economists use a wider variety of document formatting tools than physicists. This reduces the functionality of online archiving and makes it more difficult to construct a good archive.

    • Generally, economists are not known for sophisticated practices in computer literacy and are more likely to encounter significant problems with uploading procedures.

    • There is considerable confusion as to the implications of networked pre-publication on a centralized, high-visibility system for the publication in journals.

    • Economics research is not confined to university departments and research institutes. There are a number of government bodies—central banks, statistical institutes, and others—which contribute a significant amount of research in the field. These bodies, by virtue of their size, have more rigid organizational structures. This makes the coordination required for the central dissemination of research more difficult.

    An ideal system should combine the decentralized nature of the Web, the centralized nature of the arXiv.org archive, and a zero price to end users. I discuss these three requirements in turn.

    The system must have decentralized storage of documents. To illustrate, let us consider the alternative scenario. This would be one where all documents within a certain scope, say within a discipline, would be held on one centralized system. Such a system would not be ideal for three reasons. First, those authors who are rejected by that system would have no alternative publication venue. Since Economics is a contested discipline, this is not ideal. Second, the storage and description of documents is costly. The centralized system may levy a charge on contributors to cover its cost. However, since it enjoys a monopoly, it is likely to use this position to extract rent from authors. This would not be ideal.

    On the other hand, we need access points to the documents for both usage of the documents by end users, as well as for the monitoring of this usage. These activities are best conducted when a centralized document storage is availble, such as the one that arXiv.org affords. Otherwise the economics paperes become lost in the complete contents of the web and their usage is recorded in the web logs of many servers. Such usage logs are private to the manangement of the web servers. They can not be used to monitor usage.

    To explain why the end-user access to the dissemination system should be free, it is useful to refer to Harnad's distinction between trade authors and esoteric authors (1995a). Authors of academic documents are esoteric authors rather than trade authors. They do not expect payments for the written work; instead, they are chiefly interested in reaching an audience of other esoteric authors, and to a lesser extent, the public at large. Therefore the authors are interested in wide dissemination. If a tollgate to the dissemination system is established, then the system will fall short of ideal.

    Having established the three criteria for an ideal system, let me turn to the problem of implementing it. The first and third objectives could be accomplished if departments and research centers allow public access to their documents on the Internet. But for the second, we need a library to hold an organized catalog. The library would collect what is known as "metadata": data about documents that are available using Internet protocols. There is no incentive for any single institution to bear the cost of establishing a comprehensive metadata collection, without external subsidy. However, since every institution will benefit from participation in such an effort, we may solve this incentive problem by creating a virtual collection via a network of linked metadata archives. This network is open in the sense that persons and organizations can join by contributing data about their work. It is also open in the sense that user services can be created from it. This double openness promotes a positive feedback effect. The larger the collection's usage, the more effective it is as a dissemination tool, thus encouraging more authors and their institutions to join, as participation is open. The larger the collection, the more useful it becomes for researchers, which leads to even more usage.

    Bringing a system to such a scale is a difficult challenge. Change in the area of scholarly communication has been slow, because academic careers are directly dependent on its results. scholarly communication. Change is most likely to be driven from within. Therefore, scholarly dissemination system on the Internet is more likely to succeed if it enhances current practice, without a threat to replace it. In the past, The distribution of informal research papers has been based on institutions issuing working papers. These are circulated through exchange arrangements. RePEc is a way to organize this process on the Internet.

    The architecture of RePEc

    RePEc can be understood as a decentralized academic publishing system for the economics discipline. RePEc allows researchers' departments and research institutes to participate in a decentralized archival scheme which makes information about the documents that they publish accessible via the Internet. Individual researchers may also openly contribute, but they are encouraged to use EconWPA.

    Each contributor needs to maintain a separate collection of data using a set of standardized templates. Such a collection of templates is called an "archive". An archive operates on an anonymous ftp server or a Web server controlled by the archive provider. Each archive provider has total control over the contents of its archive. There is no need to transmit documents elsewhere. The archive provider retains the liberty to post revisions or to withdraw a document.

    An example archive. Let us look at an example. The archive of the OECD is at http://web.archive.org/web/20010829193045/http://www.oecd.org/eco/RePEc/oed/. In that directory we find two files. The first is oedarch.rdf:

    Template-Type: ReDIF-Archive 1.0
    Handle: RePEc:oed
    Name: OECD Economics Department
    Maintainer-Email: eco.contact@oecd.org
    URL: http://www.oecd.org/eco/RePEc/oed

    This file gives basic characteristics about the archive. It associates a handle with it, gives an email address for the maintainer, and most importantly, provides the URL where the archive is located. This archive file gives no indication about the contents of the archive. The contents list is in a second file, oedseri.rdf:

    Template-type: ReDIF-Series 1.0
    Name: OECD Economics Department working papers
    Type: ReDIF-Paper
    Provider-Name: OECD Economics Department
    Provider-Homepage: http://www.oecd.org/eco/eco/
    Maintainer-Email: eco.contact@oecd.org
    Handle: RePEc:oed:oecdec

    This file lists the content as a series of papers. It associates some provider and maintainer data with the series, and it associates a handle with the series. The format that both files follow is called ReDIF. It is a purpose-built metadata format. Appendix B discusses technical aspects of the ReDIF metadata format that is used by RePEc. See Krichel (2000) for the complete documentation of ReDIF.

    The documents themselves are also described in ReDIF. The location of the paper description is found through appending the handle to the URL of the archive, i.e. at http://web.archive.org/web/20010627025821/www.oecd.org/eco/RePEc/oed/oecdec/. This directory contains ReDIF descriptions of documents. It may also contain the full text of documents. It is up to the archive to decide whether to store the full text of documents inside or outside the archive. If the document is available online—inside or outside the archive—a link may be provided to the place where the paper may be downloaded. Note that the document may not only be the full text of an academic paper, but it may also be an ancillary files, e.g. a dataset or a computer program.

    Participation does not imply that the documents are freely available. Thus, a number of journals have also permitted their contents to be listed in RePEc. If the person's institution has made the requisite arrangements with publishers (e.g. JSTOR for back issues of Econometrica or Journal of Applied Econometrics), RePEc will contain links to directly access the documents.

    Using the data on archives. One way to make use of the data would be to have a web page that lists all the available archives, and allow users to navigate the archives searching for documents of interest. However, that would be a primitive way to access the data. First, the data as shown in the ReDIF form is not itself hyperlinked. Second, there is no search facility nor filtering of contents.

    Providing services that allow for convenient access is not a concern for the archives, but for user services. User services render the RePEc data in a form that make it convenient for a user. User services are operated by members of the RePEc community, libraries, research projects etc.. Each service has its own name. There is no "official" RePEc user service. A list of services in at the time of writing may be found in Appendix A.

    User services are free to use RePEc data in whatever way they see fit, as long as they observe the copyright statement for RePEc. This statement places some constraints on the usage of RePEc data:

    You are free to do whatever you want with this data collected on the archives that are described here, provided that you
    (a) Don't charge for it or include it in a service or product that is not free of charge.
    (b) When displaying the contents of a template (or part of a template) the following fields must be shown if they are present in the template: Title, Author-Name, File-Restriction and Copyright (if present).
    (c) You must contribute to RePEc by maintaining an archive that actively contributes material to RePEc.
    (d) You do not contravene any copyright statement found in any of the participating archives.

    Within the constraints of that copyright statement, user services are free to provide all or any portion of the RePEc data. Individual user services may place further constraints on the data, such as quality or availability filters.

    Because all RePEc services must be free, user services compete through quality rather than price. All RePEc archives benefit from simultaneous inclusion in all services. This leads to an efficient dissemination that a proprietary system can not afford.

    Building user services. The provision of a user service usually starts with putting frequently updated copies of RePEc archives on a single computer system. This maintenance of a frequently updated copy of archives is called "mirroring". Everything contained in an archive may be mirrored. For example, if a document is in the archive, it may be mirrored. If the archive management does not wish the document to be mirrored, it can store it outside the archive. The advantage of this remote storage is that the archive maintainer will get a complete set of access logs to the file. The disadvantage is that every request for the file will have to be served from the local archive rather than from the RePEc site that the user is accessing.

    An obvious way to organize the mirroring process overall would be to mirror the data of all archives to a central location. This central location would in turn be mirrored to the other RePEc sites. The founders of RePEc did not adopt that solution because it would be quite vulnerable to mistakes at the central site. Instead, each site installs the mirroring software and mirrors its own data. Not all sites adopt the same frequency of updating. Some may update daily, while some may only update weekly. A disadvantage of this system is that it is not known how long it takes for a new item to be propagated through the system.

    The documents available through RePEc

    Over 160 archives, some of them representing several institutions, in 25 countries currently participate in RePEc. Over 100 universities contribute their working papers, including U.S. institutions such as Berkeley, Boston College, Brown, Maryland, MIT, Iowa, Iowa State, Ohio State, UCLA, and Virginia. The RePEc collection also contains information on all NBER Working Papers, the CEPR Discussion Papers, the contents of the Fed in Print database of the US Federal Reserve, and complete paper series from the IMF, World Bank and OECD, as well as the contributions of many other research centers worldwide. RePEc also includes the holdings of EconWPA. In total, at the time of writing in March 2001, over 37,000 items are downloadable.

    The bibliographic templates describing each item currently provide for papers, articles, and software components. The article templates are used to fully describe published articles. They are currently in use by the Canadian Journal of Economics, Econometrica, the Federal Reserve Bulletin, IMF Staff Papers, the Journal of Applied Econometrics, and the RAND Journal of Economics. These are only a few of the participating journals.

    The RePEc collection of metadata also contains links to several hundred "software components"—functions, procedures, or code fragments in the Stata, Mathematica, MATLAB, Octave, GAUSS, Ox, and RATS languages, as well as code in FORTRAN, C and Perl. The ability to catalog and describe software components affords users of these languages the ability to search for code applicable to their problem—even if it is written in a different language. Software archives that are restricted to one language, such as those maintained by individual software vendors or volunteers, do not share that breadth. Since many programs in high-level languages may be readily translated from, say, GAUSS to MATLAB, this breadth may be very welcome to the user.

    13.3 The ReDIF metadata

    From the material that we have covered in the previous section, we can draw a simple organizational model of RePEc as:

    Many archives ⇒ One dataset ⇒ Many services

    Let us turn from the organization of RePEc to its contents. RePEc is about more than the description of resources. It is probably best to say that RePEc is a relational database about economics as a discipline.

    One possible interpretation of the term "discipline" is given by Karlsson and Krichel (1999). They have come up with a model of a discipline as consisting of four elements arranged in a table:

    resource collection
    person institution

    A few words may help to understand that table. A "resource" is any output of academic activity: a research document, a dataset, a computer program, or anything else that an academic person would claim authorship for. A "collection" is a logical grouping of resources. For example, one collection might be comprised of all articles that have undergone the peer review process. A "person" is a physical person; a person may also be a corporate body acting as a physical person in the context of RePEc.

    These data collectively form a relational database describing not only the papers, but also the authors who write them, the institutions where the authors work, and so on. All this data is encoded in the ReDIF metadata format, as illustrated in the following examples.

    A closer look at the contents

    To understand the basics of ReDIF it is best to start with an example. Here is a piece of ReDIF data at http://www.econ.surrey.ac.uk/discussion_papers/RePEC/sur/surrec/surrec9601.pdf:[2]

    Template-Type: ReDIF-Paper 1.0
    Title: Dynamic Aspect of Growth and Fiscal Policy
    Author-Name: Thomas Krichel
    Author-Person: RePEc:per:1965-06-05:thomas_krichel
    Author-Email: T.Krichel@surrey.ac.uk
    Author-Name: Paul Levine
    Author-Email: P.Levine@surrey.ac.uk
    Author-WorkPlace-Name: University of Surrey
    Classification-JEL: C61; E21; E23; E62; O41
    File-URL: ftp://www.econ.surrey.ac.uk/pub/ RePEc/sur/surrec/surrec9601.pdf
    File-Format: application/pdf
    Creation-Date: 199603
    Revision-Date: 199711
    Handle: RePEc:sur:surrec:9601

    When we look at this record, the ReDIF data resembles a standard bibliographical format, with authors, title etc.. The only thing that appears a bit mysterious here is the "Author-Person" field. This field quotes a handle that is known to RePEc. This handle leads to a record maintained at a RePEc handle server.[3]

    Template-Type: ReDIF-Person 1.0
    Name-Full: KRICHEL, THOMAS
    Name-First: THOMAS
    Name-Last: KRICHEL
    Postal: 1 Martyr Court
    10 Martyr Road
    Guildford GU1 4LF
    England
    Email: t.krichel@surrey.ac.uk
    Homepage: http://openlib.org/home/krichel
    Workplace-Institution: RePEc:edi:desuruk
    Author-Paper: RePEc:sur:surrec:9801
    Author-Paper: RePEc:sur:surrec:9702
    Author-Paper: RePEc:sur:surrec:9601
    Author-Paper: RePEc:rpc:rdfdoc:concepts
    Author-Paper: RePEc:rpc:rdfdoc:ReDIF
    Handle: RePEc:per:1965-06-05:THOMAS_KRICHEL

    In this record, we have the handles of documents that the person has written. This record will allow user services to list the complete papers by a given author. This is obviously useful when we want to find papers that one particular author has written. It is also useful to have a central record of the person's contact details. This eliminates the need to update the relevant data elements on every document record. In fact the record on the paper template may be considered as the historical record that is valid at the time when the paper was written, but the address in the person template is the one that is currently valid.

    In the person template, we find another RePEc identifier in the "Workplace-Institution" field. This points to a record that describes the institution, stored at another RePEc handle server.

    Template-Type: ReDIF-Institution 1.0
    Primary-Name: University of Surrey
    Primary-Location: Guildford
    Secondary-Name: Department of Economics
    Secondary-Phone: (01483) 259380
    Secondary-Email: economics@surrey.ac.uk
    Secondary-Fax: (01483) 259548
    Secondary-Postal: Guildford, Surrey GU2 5XH
    Secondary-Homepage: http://www.econ.surrey.ac.uk/
    Handle: RePEc:edi:desuruk

    This information in this record is self-explanatory. Less apparent is the origin of these records.

    Institutional registration

    The registration of institutions is accomplished through the Economics Departments, Institutions and Research Centers (EDIRC) project, compiled by Christian Zimmermann, an Associate Professor of Economics at Unversité du Québec à Montréal on his own account, as a public service to the economics profession. The initial intention was to compile a directory of all economics departments that have a web presence. Many departments that have a web presence now; about 5,000 of them are registered at the time of this writing. All these records are included in RePEc. For each institution, data on its homepage is available, as well as postal and telephone information. For some, there is even data on the main area of work. Thus it is possible to find a list of institutions where—for example—a lot of work in labor economics in being done. At the moment, EDIRC is mainly linked to the rest of the RePEc data through the HoPEc[4] personal registration service. Other links are possible, but are rarely used.

    Personal registration

    HoPEc has a different organization from EDIRC. It is impossible for a single academic to register all persons who are active in Economics. One possible approach would be to ask archives to register people who work at the related institution. This will make archive maintainers' work more complicated, but the overall maintenance effort will be smaller once all current authors are registered. However, authors move between archives, and many have work that appears in different archives. To date, there is no satisfactory way to deal with moving authors. For this reason, the author registration is carried out using a centralized system.

    A person who is registered with HoPEc is identified by a string that is usually close to the person's name and by a date that is significant to the registrant. HoPEc suggests the birth date but any other date will do as long as the person can remember it. When registrants work with the service, they first supply such personal information as the name, the URL of the registrant's homepage, and the email address. Registrants are free to enter data about their academic interests—using the Journal of Economic Literature Classification Scheme—and the EDIRC handle of their primary affiliation.

    When the registrant has entered this data, the second step is to create associations between the record of the registrant and the document data that is contained in RePEc. The most common association is the authorship of a paper; however, other associations are possible, for example the editorship of a series. The registration service then looks up the name of the registrant in the RePEc document database. The registrant can then decide which potential associations are relevant. Because authentication methods are weak, HoPEc relies on honesty.

    There are several significant problems that a service like HoPEc faces. First, since there is no historical precedent for such a service, it is not easy to communicate the raison d'être of the service to a potential registrant. Some people think that they need to register in order to use RePEc services. While this delivers data about who is interested in using RePEc services—and to whom we have been unsucessful to communicate that these services are free—it clutters the database with records of limited usefulness. Last but by no means least, there are all kinds of privacy issues involved in the composition of such a dataset.

    To summarize, HoPEc provides information about a person's identity, affiliation and research interests, and links these data with resource descriptions in RePEc. This allows the identification of a person and the maintainance of related metadata in a timely and cost-efficient way. These data could fruitfully be employed for other purposes, such as maintaining membership data for scholarly societies or lists of conference participants.

    13.4 The open library

    This section attempts to find a general theory applicable to a wide set of circumstances in which systems similar to RePEc are desirable. I call this general concept the open library. The parallel to the open source concept is intentional. It is therefore useful to review the open source concept first.

    The open source concept

    There is no official and formal definition what the term, open source, means. On the Open Source Initative at http://opensource.org/ an elegant introduction to the idea is found:

    The basic idea behind open source is very simple. When programmers can read, redistribute, and modify the source code for a piece of software, the software evolves. People improve it, people adapt it, people fix bugs. And this can happen at a speed that, if one is used to the slow pace of conventional software development, seems astonishing.

    We in the open source community have learned that this rapid evolutionary process produces better software than the traditional closed model, in which only a very few programmers can see the source and everybody else must blindly use an opaque block of bits.

    Open source software imposes no restrictions on the distribution of the source code required to build a running version of the software. As long as users have no access to the source code, they may be able to use a running version of the software, but they can not change the way that the software behaves. The latter involves changing the source code and rebuilding the running version of the software from the source code. Since building the software out of the source code is quite straightforward, software that has a freely available source code is essentially free.

    Open Source and open library

    The open source movement claims that the building of software in an open, collaborative way—enabled by the sharing of the source code—allows software to be built better and faster. The open library concept is an attempt to apply the concept of the open source to a library setting. We start off with the RePEc experience.

    Within the confines of RePEc as a document collection, it is unrealistic to expect free distribution of a document's source code. Such a source code is, for example, the word processor file of an academic paper. If such a source code were available for others to change, then the ownership of the intellectual property in the document would be dissolved. Since intellectual ownership over scientific ideas is crucial in the academic reward system, it is unlikely that such source code distribution will take place. Within the confines of RePEc's institutional and personal collection, there is no such source code that could be freely shared.

    To apply the open source principle to RePEc we must conceptualize RePEc as a collection of data. In terms of the language adopted by the open source concept, the individual data record is the "source code". The way the data record is rendered in the user interface is the "software" as used by the end user. We can the define the open library as a collection of data records that has a few special properties.

    The definition of the open library

    An open library is a collection of data records that has the following characteristics:

    • Every record is identified by a unique handle. This requirement distinguishes the library from an archive. It allows for every record to be addressed in an unambiguous way. This is important if links between records are to be established.

    • The syntax in all records of field names and field values is homogeneous. This constraint causes the open library to appear like a database. If this requirement were not present, all public access pages on the Web would form an open library. Note that this requirement does not constrain the open library to contain a homogeneous record format.

    • The documentation of the record format is available for online public access. For example, a collection encoded in MARC format would not qualify as an open library because access to the documentation of MARC is restricted. Without this requirement the cost of acquiring the documentation would be an obstacle to participation.

    • The collection is accessible on a public access computer system. This is the precondition to allow for the construction of user services. Note that user services may not necessarily be open to public access.

    • Contributing to the collection is without monetary cost. There are of course non-monetary costs to contribute to the open library. However the general principle is that there is no need to pay for either contributing or using the library. The copyright status of data in an open library should be subject to further research.

    The open library and the Open Archive

    Stimulated by work of Van de Sompel, Krichel, Nelson, et al. (2000), there have been recent moves towards improving the interoperability of e-print archives such as arXiv.org, NCSTRL, and RePEc. This work is now called the Open Archive Initative, see http://www.OpenArchives.org . The basic business model proposed by the OAI is very close to that of the RePEc project. In particular, the open archive technical protocols allow data provision to be separated from data implementation, a key feature of the open library model as pioneered by RePEc since 1997. In addition, because of their ability to transport multiple data sets, the open archive protocols allow for several open libraries to be established on one physical system.

    The conceptual challenge raised by the open library

    The open library as defined in Subsection 13.4 may be a relatively obvious concept. It certainly is not an elaborate intellectual edifice. Nevertheless, the open library idea raises some interesting conceptual challenges.

    Supply of information. To me as a newcomer to the Library and Information Studies (LIS) discipline, there appears to be a tradition of emphasizing the behavior of the user who demands information rather than the publisher—I use the word here in its widest sense—who supplies it. I presume this orientation comes from the tradition that almost all bibliographic data were sold by commercial or not-for-profit vendors, just as the documents that they describe. Libraries then see their role as intermediaries between the commercial supply and the general public. In that scenario, libraries take the supply of documents and data as given.

    The open library proposes to build new supply chains for data. If all libraries contribute metadata—data about data—about objects that are local to them—what that means would have to be defined—then a large open library can be built.

    An open library will only be as good as the data that contributors give to it. It is therefore important that research be conducted on what data contributors are able to contribute; on how to provide documentation that the contributor can understand; and on understanding a contributor's motivation.

    Digital updatability. For a long time, libraries could only purchase material that is essentially static. It might decay physically, but the content is immutable. The advent of digital resources provoked a debate. Because they may be changed at any time, digital resources may be used for more than the preservation of ideas. Traditionally inclined libraries have demanded that digital resources be like non-digital resources in all but appearance, and view the mutability of digital data more as a threat than as an opportunity. The open library, however, is more concerned with digital updatability than preservation. Clearly, this transition from static to dynamic resources poses a major challenge to the LIS profession.

    Metadata quality control. In the case of a decentralized dataset, an important problem is to maintain metadata quality. Some elements of metadata quality cannot be controlled by a computer. For example, each record must utilize a structure of fields and values associated with these fields to be interoperable with other records. In some cases the field value only makes sense if it has a certain syntax. This is the case, for example, with an email address. One way to achieve quality control is through the use of relational metadata. Each record has an identifier. Records can use the identifiers of other records. It is then possible to update elements of the dataset in an independent way. It is also simple to check if the handle referenced in one record corresponds to a valid handle in the dataset. Highly controllable metadata systems are an important research concern related to the open library concept.

    13.5 Conclusions

    To my knowledge, Richard Stallman was the pioneer of open source software. In 1984, when he founded the GNU ("GNU is not UNIX") project to write a free operating system to replace Unix, few people believed that such an operating system would come about. Building GNU took a long time, but in the late 1990s, the open source movement basically realized Stallman's dream. My call for an open library may face similar skepticism, but the obstacles it faces are fewer and less daunting than those faced by the open source movement:

    • The operating system of a modern computer is far less complex than that of a metadata collection.

    • Computer programming is a highly profitable activity for the individual capable of doing it; therefore the opportunity cost of participating in what is essentially an unpaid activity is much higher. These costs are much lower for the academic or the academic librarian who would participate in an open library construction.

    • A network effect arises when the open library has reached a critical mass. At some stage the cost of providing data is much smaller than the benefit—in terms of more efficient dissemination—of contributing data. When that stage is reached, the open library can grow without external public or private subsidy.

    It remains to be seen how great an inroad the open library concept will make into the library community.

    Appendix A: The main use services [5]

    BibEc at http://netec.mcc.ac.uk/bibec.html & WoPEc at http://netec.mcc.ac.uk/wopec.html provide static html pages for all working papers that are only available in print (BibEc) and all papers that are available electronically (WoPEc). Both datasets use the same search engines. There are three search engines: a full-text WAIS engine, a fielded search engine based on the mySQL relational database and a ROADS fielded search engine. The mySQL database is also used for the control of the relational components in the RePEc dataset. BibEc and WoPEc are based at Manchester Computing in Japan and the United States.

    EDIRC at http://edirc.repec.org/ provides web pages that represent the complete institutional information in RePEc.

    IDEAS at http://ideas.repec.org/templates.html provides an Excite index of static html pages that represent all paper, article and software templates. This is by far the most popular RePEc user interface.

    NEP: New Economics Papers at http://nep.repec.org/ is a set of reports on new additions of papers to RePEc. Each report is edited by subject specialists who receive information on all new additions and then select the papers that are relevant to the subject of the report. These subject specialists are PhD students and junior researchers, who work as volunteers. On 14 March 2000, there are 2753 different email addresses that subscribe to at least one list.

    The Tilburg University Working Papers & Research Memoranda service was at http://www.kub.nl/~dbi/demomate/repref.htm, but is now closed. The interface is archived at http://web.archive.org/web/20010305214804/cwis.kub.nl/~dbi/demomate/repref.htm

    socionet at http://socionet.ru is a server in Russian. Its maintainers also provide archival facilities for Russian contributors.

    INOMICS at http://www.inomics.com/ not only provides an index of RePEc data but also allows simultaneous searches in indices of other Web pages related to Economics.

    HoPEc at http://authors.repec.org/ provides a personal registration service for authors and allows searches for personal data.

    Appendix B: The ReDIF metadata format

    The ReDIF metadata format is inspired by Deutsch et al. (1994) commonly known as the IAFA templates. In particular, it borrows the idea of clusters from the draft:

    There are certain classes of data elements, such as contact information, which occur every time an individual, group or organization needs to be described. Such data as names, telephone numbers, postal and email addresses etc. fall into this category. To avoid repeating these common elements explicitly in every template below, we define "clusters" which can then be referred to in a shorthand manner in the actual template definitions.

    ReDIF takes a slightly different approach to clusters. A cluster is a group of fields that jointly describe a repeatable attribute of the resource. This is best understood by an example. A paper may have several authors. For each author we may have several fields of interested: name, email address, homepage, etc.. If we have several authors then we have several such groups of attributes. In addition, each author may be affiliated with several institutions. Here each institution may be described by several attributes for its name, homepage etc.. Thus, a nested data structure is required. It is evident that this requirement is best served in a syntax that explicitly allows for it, such as XML. However when ReDIF was designed in 1997, XML was not available. While the template syntax is more humanly readable and easier to understand, the computer can not find which attributes correspond to the same cluster unless some ordering is introduced. Therefore we proceed as follows. For each group of arguments that make up a cluster, we specify one attribute as the "key" attribute. Whenever the key attribute appears a new cluster is supposed to begin. For example, if the cluster describes a person then the name is the key. If an "author-email" appears without an "author-name" preceding it, the parsing software aborts the processing of the template.

    Note that the designation of key attributes is not a feature of ReDIF. It is a feature of the template syntax of ReDIF. It is only the syntax that makes nesting more involved. I do not think that this is an important shortcoming. I believe that the nested structure involving the persons and organizations should not be included in the document templates. What should be done instead is to separate the personal information out of the document templates into separate person templates. This approach is discussed extensively in the main body of the paper.

    ReDIF is a metadata format that comes with tools to make it easy to use in a framework where the metadata is harvested. A file that is simply harvested from a computer system could contain any type of digital content. Therefore the harvested data must be parsed by a special software that filters the data. This task is accomplished by the rr.pm module written by Ivan V. Kurmanov. It parses ReDIF data and validates its syntax. For example, any date within ReDIF has to be of the ISO8601 form yyyy-mm-dd. A date like "14 Juillet 1789" would not be recognized by the ReDIF reading software and not be passed on to application software that a service provider would use.

    The rr.pm software uses a formal syntax specification redif.spec . This formal specification is itself encoded in a purpose-built format code-named spefor . Therefore, it is possible for ReDIF-using communities to change the syntax restrictions or even design a whole new ReDIF tag vocabulary metadata vocabulary from scratch.

    Notes

    The work discussed here has received financial support by the Joint Information Systems committee of the UK Higher Education Funding Councils through its Electronic Library Programme. A version of this paper was presented at the PEAK conference at the University of Michigan on 2000 03 24. I am grateful to Ivan V. Kurmanov for comments on that version. In March 2001, I revised and updated the paper following suggestions by Jeffrey K. MacKie-Mason and Emily Walshe. Neither of them bear responsibility for any remaining errors. This paper is available online at http://openlib.org/home/krichel/salibury.html.return to textreturn to text

    1. Reports of research results in research "papers" form the bulk of academic digital or digitisable data. I refer to these as documents.return to text

    2. I suppress the Abstract: field to conserve space.return to text

    3. I leave out a few fields to conserve space.return to text

    4. HoPEc stood initially for Home Page Papers in Economics, but this would be totally misleading now.return to text

    5. I list them by order of historical appearance. The "Tilburg University working papers & research memoranda" service is operated by a library-based group that has received funding from the European Union. INOMICS is operated by the Economics consultancy Berlecon Research. All the other user services are operated by junior academics. return to textreturn to text

    IV. Building and Using Digital Libraries

    14. Building and Using Digital Libraries

    The forces that converged in the 1990's were extraordinary in loosening the physical and temporal constraints on information. Technology tools and capacity grew rapidly. Network infrastructure became more robust. Information in digital form—both converted and born digital—came online in unprecedented volume and with new functionality. The impact on library organizations and their users was significant, yet far from straightforward. The volatile mix of digital resources, organizations, and individual behaviors set in motion shifts in expectations, in roles of stakeholders, and in the distribution of costs. The authors in this section explore, through project and program descriptions, the experiences of exemplary digital library development.

    In his paper, JSTOR's Kevin Guthrie notes that old metrics, methods, and intuition are not reliable guides for our sense of value in evaluating digital libraries. Projects featured, such as JSTOR, Early Canadiana Online (Kingma) and Columbia University's Online Books (Kantor, Summerfield, and Mandel) drive home the changing notions of value. These studies document the increased access to information that is realized through digital content, and also shed light on the potential that is created for reduced costs, new research capability, and innovation. As Kantor points out, there is also a symbolic utility. Digital libraries may not always perform as hoped; nonetheless, they have significant utility. Digital libraries, despite early clumsiness, have stimulated considerable user interest and exploration.

    The realization of digital potential within library contexts is captured here in the organizational descriptions of Drexel University (Montgomery), University of Louisville (Rader), and British Telecommunications (Alsmeyer). Notably, the capabilities of digital libraries prompted these organizations to rethink the existing configuration of resources and shift focus to new user services. Although libraries may realize cost savings, there are also new costs associated with technology infrastructure and more expert staff. Mission-based questions also arise as digital libraries take shape. Do libraries still have an archival role if publishers manage digital collections? Can or should libraries add value to the provision of intellectual access that publishers offer to their digital content? In the disintermediated context of online resources, how (and by whom) are users supported? As several authors note, users are not uniformly able nor ready to exploit the capabilities of digital media, so the library's instructional and outreach roles may become far more critical.

    The descriptions presented also suggest unfolding tensions between stakeholders. Libraries and publishers have yet to agree on policies governing digital content related to resource sharing, course environments, and archives. While users and libraries may derive new value from digital content, publisher arguments for increased costs are seldom acceptable to the library community. Tensions are also evident between libraries and their users. At a time of constrained or modest growth in resources, libraries are often unable to meet user expectations for both the traditional resources and new digital titles. And as the complexity of the digital environment grows, demands for greater integration and interoperability between and among library systems, publisher resources, and user tools will also grow, further entangling responsibilities and interests of each stakeholder.

    Clearly, agreement on the differentiation of roles and allocation of costs in a complex, interdependent environment will be increasingly central to the evolution of digital libraries. Digital library developers and directors face several questions: Who is responsible for what functions? Who adds value and at what cost? How flexibly can resources be shifted to accomplish new roles? How will roles and responsibilities be sustained over time as the environment takes shape? The chapters that follow capture some answers found in early instances of digital programs. While the digital library environment has matured as more recent developments have taken hold, these fundamental questions remain.

    15. The Economics of Digital Access: The Early Canadiana Online Project[†]

    This project examined the economics of the production, storage, and distribution of information in print, microfiche, and digital format for the Early Canadiana Online Project. The Early Canadiana Online Project digitized over 3,300 titles and over 650,000 images of the Canadian Institute of Historical Microreproductions collection of pre-1900 print materials published in Canada. An economic model was developed of the stakeholders—-publishers, libraries, and patrons—-and the costs to stakeholders of the three formats—-print, microfiche, and digital. A detailed cost analysis was performed to estimate the costs of each of the three formats. The results of this cost analysis can be used as benchmarks for estimating the costs of other digitization projects. The analysis shows that digital access can be cost-efficient so long as there are a number of libraries that receive sufficient benefit such that they are willing to share the costs of digitization and access.

    15.1 Introduction

    Digital texts in a networked environment hold the promise of lower-cost access to information by a greater number of users than print texts. Projects such as The Making of America, Project MUSE, JSTOR, and the Early Canadiana Online project investigated in this study offer access to digital texts over the Internet to millions of potential users. These digital projects also offer the promise of lower costs by avoiding the cost of printing and shipping multiple copies of a text for patrons. In theory, once the fixed costs of digitization are incurred, the marginal cost of providing an additional electronic copy is zero.

    The potential benefits of digital access are considerable. Patrons who previously traveled to a repository of rare books or a microfiche room at a research library can instead access historical information from their desktops. This dramatically decreases the time and effort patrons spend traveling to the source of the information. This also increases the potential benefits to new patrons who can now access historical texts that previously were only available at sites too distant for them to consider. The economic question is whether the cost of digitization is lower than this stream of future benefits.

    This study examines the economics of digital, microfiche, and print access for the Early Canadiana Online project. The costs for these three methods of access include the costs of archiving and providing access to original print materials, microfiche copies of these materials, and digital copy accessible over the Internet. This study examines the production and storage costs and opportunity costs to patrons for digital access to the Early Canadiana online collection.

    Data collected and analysed for this study will be important in determining the level of investment for future digitization projects of historical materials. Other studies at Cornell University (Kenney, 1997), Yale University (Conway, 1996), and Columbia University (Kantor et al., this volume) investigated the costs of online texts. The study at Columbia University described in this text examines the cost of using publisher-provided electronic files to produce text in HTML format. The studies at Cornell University and Yale University, like this study, examine the cost of digitizing print or microfiche. The Cornell and Yale studies measure the marginal costs per image of primarily in-house scanning. This study includes all costs associated with the production, cataloging, and sales of texts in microfiche or digital format. The cost estimates in this study are considerably higher than the marginal cost estimates in previous studies but are a more accurate estimate of the full costs of the production of microfiche or digital projects from start to finish.

    This study also investigates the benefits of digitization. The primary benefit of these digital projects is the return to patrons from accessing these materials. Once digitized, stored, and made accessible over a campus network or the Internet, the materials are more easily accessible to more patrons. Patrons who previously had to travel to a library with the original or microfiche copies of the materials can now view them online from home or the office. Analysis of the data collected on use of the digital images, microfiche, and original texts will be helpful in predicting the use and benefits of other digital projects of historical materials. This will enable researchers to determine the return to investment of future digital projects.

    The remainder of the paper is organized as follows; first, an economic model of digital access which includes a stakeholder analysis is examined. This provides a general framework for analyzing the costs of print, microfiche, and digital access. Second, a cost analysis of the Early Canadiana Project is presented. This cost analysis includes estimates of the cost of print, microfiche, and digital access; an analysis of the economies of scale for digitization studies; and an analysis of the institution and user cost of access to digital information.

    15.2 The Costs and Benefits of Digital Access to Information

    Digitization of information provides lower costs than do print products for the production, distribution, and access to information for producers, consumers, and intermediaries. Digital access results in on-demand access to information for patrons or consumers, lowering the opportunity cost of access. Consumers can more easily view digital information over networks without having to spend time traveling to the library. The low cost of web development and word processing lowers the cost of producing information in digital form. Producers do not have to print and distribute copies but can instead mount digital products on a local server enabling network distribution. Likewise, digital access provides lower cost distribution by intermediaries such as libraries, saving the costs of storing and circulating printed materials.

    The Early Canadiana Online project includes all three of the stakeholders in the production and consumption of information. Libraries with rare book collections such as the University of Toronto Library and the Laval University Library provide access to original print texts. These libraries also provide access to the Canadian Institute for Historical Microreproduction's (CIHM) microfiche copies of these print materials. In this instance, CIHM is the producer of the information while the libraries are intermediaries in providing access to patrons. Patrons of this information include students and faculty accessing the CIHM collection whether in print, microfiche, or digital form.

    Patrons of Early Canadiana Online

    With the creation of Early Canadiana Online, patrons have three possible methods for accessing this information: digital, fiche, or original copy. Patrons incur a cost of access depending on their choice of method of access. These costs can be divided into fixed and variable costs. Variable costs are costs incurred each time information is reproduced or retrieved. Fixed costs are costs incurred regardless of the number of items retrieved.

    A patron's choice of access will depend on which method provides lower total costs. A patron viewing images from a single text may have lower fixed and marginal costs in using the print than in using the fiche or digital formats. Accessing the print may require only travel to the library, selection of the text, and turning the pages. There are no learning costs or costs of expensive machines or network connections specifically related to using a print item. Accessing the fiche may require travel, selection, and determining how to use the fiche. Accessing the digital format requires the use or purchase of a computer with a network connection as well as determining how to search and use the digital collection.

    Microfiche or digital access is more likely to have a lower total cost when more than a single text is used. While multiple texts may be found in the same library, the fiche collection may contain items not found in a library's print collection. Accessing print items from another library would require the patron to incur an additional cost of traveling to a second library. Learning how to use the microfiche collection is likely to have a lower total cost than traveling to more than one library in order to use the needed items in print form. Likewise, digital access may provide access to more images at a lower total cost than fiche or print.

    Figure 15.1 illustrates total patron costs for access to information in the three formats. Figure 15.1 assumes that the fixed cost of digital access is greater than the fixed cost of fiche, which is greater than the fixed cost of print. Figure 15.1 also assumes the number of images available in digital form is greater than the number available in fiche at one library, which is greater than the number of print images available at one library.

    The break-even points represent the levels of use at which two methods of access have the same total cost. Initially a patron will have a lower total cost from print texts. As use of images increases and the library's print collection is exhausted, a patron must incur the additional fixed costs of traveling to another library. At this point the total cost of fiche access is lower than the total cost of print access. As use continues to increase, the total cost of digital access becomes lower than the total cost of fiche and print access.

    If Figure 15.1 accurately reflects the fixed and variable costs of access then high-frequency users who require more access to more digital images are more likely to use digital assets. These users incur a high total cost of access to the digital copy but gain greater access to more information. Patrons desiring only a few images from a single text are more likely to look at the original, print copy if it is available in their library. Mid-level users are more likely to use the fiche.

    Figure 15.1: Patron Costs of Access to ImagesFigure 15.1: Patron Costs of Access to Images

    However, what may be an inaccurate assumption in Figure 15.1 is that digital access has higher fixed costs and equal marginal costs to fiche or print. The fixed costs of digital access include the learning costs patrons unfamiliar with digital copy must spend, as well as the costs of having access including a personal computer with network access. Once these costs are incurred patrons may have a fixed cost of digital access less than the fixed costs of fiche or print access. Patrons can also avoid the fixed costs of traveling to the library if they have at-home or office access to the network. The marginal costs of digital access may also be less than print or fiche. Patrons familiar with digital access are less likely to print materials, instead saving electronic copy on a disk or drive. If the marginal cost and fixed costs for digital access are lower than for print and fiche then digital access will have a lower cost at all levels of access and patrons will only access the information online.

    Producers

    The initial promise of digital information was that production costs would decrease when the costs of printing and distribution were replaced in the networked environment. These lower costs have led to an increasing number of free electronic journals published by faculty at colleges and universities. However, digital production also has costs. HTML programming costs, patron service costs, and production in both print and digital formats can increase the costs of production. Traditional print publishers have found that additional costs of production are necessary to publish a journal in both print and digital format, increasing subscription costs to libraries that require access to both forms.

    Digital costs are lumpy, with a large fixed cost of production, and zero marginal cost to produce an additional digital copy over the Internet. However, digital copies have the same, if not greater, patron service costs as print and microfiche. Patrons need service in any environment. In the digital environment patron services include the cost of server maintenance, the cost of updating web pages, and the cost of answering electronic mail from patrons who are having difficulties with access. Since the networked environment allows more patrons to access the information than at the library, the cost of patron service may be greater for the digital information producer. Unlike print publications that are produced and then sent to information intermediaries, customer service in the digital environment requires the information producer to provide direct service to patrons.

    Pricing in the networked environment can also be a difficult problem for information producers. Classic economic theory would indicate that the price of access should be set equal to the marginal cost of zero to achieve economic efficiency. However, a zero price does not allow for information producers to recover the costs of production. Access to digital products will be sold above the marginal cost of reproduction in the same way that books, journals, and other print products are sold above the marginal cost of an additional copy. This pricing, based on the value of the information good to consumers rather than the cost of providing an additional copy, is necessary in the networked environment to recover the costs of production.

    The role of the library as intermediary is critical in the pricing of information. Libraries purchase information materials and provide access to patrons typically without an access fee. Patrons efficiently use the information since, in the networked digital environment, providing the information has no marginal costs and patrons are not charged for access. The charge to libraries covers the cost of production of the information while the absence of a charge for patrons insures economic efficiency.

    Intermediaries

    Libraries serve a crucial economic role as intermediaries in the distribution of and access to information. Libraries serve as a point of collective demand for information products, providing access to information as a public good to patrons.

    The economic role of the library as an information intermediary is to estimate the collective demand of patrons and purchase and provide access to information goods. The collective value of any information product in a library is the sum of the value or benefit all patrons receive from it. This can be estimated by the number of times the information product is used multiplied by an estimate of the benefit from each use. If this collective value exceeds the purchase price then it is economically efficient for the library to purchase it and provide access to patrons. Additional access should be priced at zero to insure economic efficiency.

    For digital products there are two possible benefits to library patrons. If the library does not subscribe to the print or fiche copy of the information, then patrons benefit by accessing information previously not available. If patrons have access to the fiche or print original, and access to the digital copy is available over the Internet or campus network, then the benefit of digital access is equal to the value of time saved from using the digital copy from the home or office instead of the fiche or original at the library.

    Individual and Shared Costs and the Role of Information Intermediaries

    The costs for information products and services can be categorized into private and shared costs or the costs of individual demand and public demand for a good. Private costs are the costs to an individual or consumer of his purchase of a good or service. Private costs include the costs of a personal subscription, personal home computer, photocopying papers, and downloading and printing information from the Internet. Shared costs are the costs of information products purchased for public use. Shared costs include the costs of library goods and services. The costs of library goods and services are shared among patrons through tuition payments, tax revenue, membership fees or other sources of revenue used to support the library.

    Information intermediaries also have what can be considered private and shared costs. A subscription to a print or electronic database can be considered a private cost to the library, paid from the library's budget, although it is a shared cost to the library's patrons. The fixed cost of producing the database or print journal that is purchased by several libraries is a shared cost among the subscribing libraries. Each subscribing library pays for a share of the fixed costs of production.

    The costs of digital information in a networked environment are shared. On the Internet, the costs of reproduction and distribution are zero. The fixed costs of production and storage are, by definition, shared among the patrons or information intermediaries that purchase access.

    Market Forces: Demand and Supply

    Digital information in a networked environment results in lower costs of reproduction and distribution for producers and intermediaries and lower opportunity costs for users. The lower costs contribute to an increase in the supply of information. Lower costs also result in more producers providing more methods of access to more information.

    Lower costs mean new information products are produced. New publishers including universities, libraries, faculty and students find that they can produce web-based journals using low cost desktop publishing tools. This has led to an explosion in the supply of new electronic journals.

    Paradoxically, this explosion has raised some costs while lowering others. Although the costs of many new electronic journals are relatively lower than those of print journals, for libraries this ever-increasing supply of digital information can dramatically increase the total costs. As the number of subscriptions purchased rises so will the staff costs involved in cataloging these journals, both elements impacting library budgets. Patrons find that the opportunity cost of accessing any given source of information has declined, while the overwhelming increase in the number of information products results in more time being spent on digital information than was spent consuming print products. The digitization of an information product previously made available only in print or microfiche can lower the cost per unit of production, the cost per unit for subscription by libraries, and the cost of access to the information by patrons, while at the same time dramatically increasing the supply of information products. This increase in the number of information products results in substantially higher total costs of production, subscription and access to information.

    15.3 Cost Estimates of Early Canadiana Online

    Estimating the costs of digital projects is necessary to determine efficient investments in digitization of print or microfiche information products. The primary goals of this project are to estimate and compare the costs of three methods of information delivery; print, microfiche, and digital. Data from the University of Toronto, Laval University, and the Canadian Institute for Historical Microreproductions was collected to estimate these costs. Data on the cost of construction of a new electronic library at the University at Albany was also collected for current library construction and maintenance costs.

    One significant contribution of the cost estimates in this paper is that average costs are estimated for the production, storage, and use of information in print, microfiche, and digital formats. Previous estimates have either focused on one type of cost—production, storage, or use; one format—print, fiche, or digital; or have focused on the marginal costs rather than the full costs of production. In this paper all costs for each format are included.

    The Cost of Print

    Table 15.1 shows the cost estimates for book storage and access. These costs are based on the cost of the Thomas Fisher Rare Book Library at the University of Toronto. Construction costs are based on the 1999 library construction project at the University at Albany. Special environmental controls used in a rare book library imply that the construction costs in Table 15.1 may underestimate the actual construction costs. All costs are shown in Canadian dollars (CD).

    Table 15.1: Annual Average Cost of Book Storage and Access
    Cost Cost/volume Cost/use
    Construction, utilities and maintenance $1,586,056 $3.17 $72.51
    Salaries $1,105,031 $2.21 $50.52
    Equipment and supplies $255,799 $0.51 $11.69
    TOTAL $2,946,885 $5.89 $134.72
    NOTE: Costs of utilities, maintenance, salaries, equipment and supplies are based on University of Toronto cost estimates. Construction costs are based on new library construction at the University of Albany. All costs are in Canadian dollars. Exchange rate used is 1.5257. Costs are amortized using 5% rate of interest and life spans as follows: construction 25 years, equipment 5 years, and computers 3 years. Cost per volume based on library capacity of 500,000 volumes. Cost per transaction based on 21,874 transactions.

    The cost of construction, utilities, and maintenance is comparable to an estimate of $4.68CD (Bowen, 1998) and a 30-year amortization of $6.33CD reported in this book (Kantor et al., this volume). However, the cost per use of $134.72CD is significantly higher than the $1.50CD cost of retrieval previously reported (Bowen, 1998), the $3CD cost of retrieval for the New York Public Library and $6CD for the Harvard Depository Library (Lesk, 1998), or the $9CD maximum retrieval cost reported elsewhere (Getz, 1997). In Table 15.1, the cost per use is derived by dividing total cost by the number of annual requests for books. This inflates the cost per retrieval by adding the costs of storage into the equation. However, it is important to note that the "service" of a library is the use of its materials. All costs when divided by the use of those materials gives an average cost for service which will be higher than separating out only part of these costs for retrieval.

    For a comparison with other estimates of retrieval costs, an estimated 80 percent of salaries at the Thomas Fisher Rare Book Library are for access. Taking 80 percent of salary costs yields an estimate of $40CD per transaction for labor, still significantly higher than other estimates. However, a rare book library has concerns of preservation that require additional staff care and monitoring for patron access. In addition, this estimate includes the total cost of administration, vacations, and benefits for employees rather than the marginal cost of retrieval based on a staff member's time spent multiplied by his salary.

    Table 15.1 does not include the cost of purchasing a book. This is important although it will be a small percentage of total costs once the purchase price is amortized over the expected life of storage and use of the book. For example a rare book that costs $500 but is expected to last 100 years in storage has an annual cost, when amortized, of $4.80. Table 15.1 also does not include the value of the land. This can be significant but is different depending on the location of the library.

    Figure 15.2 shows the categories of rare book annual storage and access costs as a percentage of total costs. As expected, the largest component of total costs for books is the cost of space to store them.

    Figure 15.2: Storage and Access Costs for PrintFigure 15.2: Storage and Access Costs for Print
    The Cost of Microfiche

    The annual costs of microfiche storage and access at the University of Toronto are shown in Table 15.2. Cost per volume is based on a 216-page text, the average size of a text digitized in the Yale Open Book Project.[1] As with Table 15.1 these costs represent the average cost per unit for storage or access. Just as the cost of purchasing a book is not included in Table 15.1, the cost of purchasing the microfiche is not included in Table 15.2.

    Both the cost of storage per volume and the cost per use are significantly lower for microfiche than for rare books. This is not surprising since microfiche is intended to provide access to and storage of information at a lower cost than print.

    The cost per use is derived by dividing the total costs of microfiche storage and access by total use. As with Table 15.1, this assumes that the value of microfiche storage is for access to patrons. If salaries and equipment are the only costs for access, and 80 percent of salaries are for access, then the cost per transaction can be estimated as $3.75CD, which is comparable to estimates of the costs of book retrieval. Both retrieval functions are similar in that staff must locate, check out, and re-shelve the requested materials.

    Table 15.2: Annual Average Cost of Microfiche Storage and Access
    Cost Cost/volume Cost/use
    Construction, utilities and maintenance $170,527 $0.06 $2.71
    Salaries $251,602 $0.09 $4.00
    Equipment and supplies $34,423 $0.01 $0.55
    TOTAL $456,552 $0.16 $7.26
    NOTE: All costs are amortized with the exception that amortization for micrfiche readers used a 15-year life span. University of Toronto microtext use was 62,856 in 1997 for 3,387,777 unit stored in a room of 810 square meters.

    Figure 15.3 illustrates that microfiche costs are more salary intensive. Salaries constitute a larger percentage of the costs of microfiche than in the case of rare books. Rare books take up more space and therefore have a higher percentage of costs in construction, utilities and maintenance.

    Figure 15.3: Storage and Access Costs for MicroficheFigure 15.3: Storage and Access Costs for Microfiche

    Table 15.2 does not include the subscription price of the microfiche to the library. These costs are part of the economic cost of producing microfiche and are shown in Table 15.3: The Costs of Microfiche. By counting these costs in the production but not the purchase of the microfiche, we avoid double-counting these costs. The costs of microfiche production are shared costs. Library subscription fees, grants and donations are used to jointly finance the production of the microfiche as a public good.

    Table 15.3 includes all economic costs of microfiche production including the value of space The Canadian Institute for Historical Microreproduction uses at the National Library of Canada. While this space is donated to CIHM, it still represents an economic cost of producing microfiche. As with previous tables, the average cost of production is calculated by dividing total cost by the number of units. All costs are in Canadian Dollars.

    Table 15.3: Average Cost of Microfiche Production
    Cost Cost/fiche Cost/image Cost/volume
    Master Copies $150,000 $16.07 $0.22 $46.88
    Salaries $602,932 $64.58 $0.87 $188.43
    Equipment and supplies $125,880 $13.48 $0.18 $39.34
    Construction, utilities and maintenance $187,066 $20.04 $0.27 $58.46
    TOTAL (shared costs) $1,065,878 $114.17 $1.54 $333.11
    cost of microfiche reproduction and sales $236,092
    TOTAL COST $1,301,970
    Total cost per library (30-42 copies) $43,399-$30,999 $4.65-$3.32 $0.06-$0.04 $13.56-$9.69
    Annual cost per library (30-42 copies) $2,254-$1,487 $0.22-$0.16 $0.01-$0.00 $0.65-$0.46

    The first four rows of Table 15.3 show the cost of producing master copies of microfiche. The cost of producing master copies of microfiche is $114.17CD per fiche, $1.54CD per image, or $333.11CD per 216-page volume. This is the cost of producing a set of master copies that are then used to produce additional microfiche copies for distribution to subscribing libraries. The cost of the master copies is a shared cost for all subscribing libraries.

    If we compare the cost per volume of creating and storing a master microfiche copy relative to creating and storing a print copy, microfiche is expensive to create but has significant savings in storage ($0.16CD per volume per year) relative to print ($5.89CD). However, at an annual savings of $5.73CD per year, it would take over 50 years to cover the cost of creation ($333.11CD) if the master copies were created solely for the use of one library.

    Microfiche is produced by CIHM, not to have a single copy, but to provide multiple copies to libraries that would not otherwise have access to early Canadian literature. With a limited number of print copies, microfiche becomes a cost-effective alternative for providing access. CIHM produces several copies of each microfiche to sell as subscriptions for libraries throughout Canada, the United States, and the rest of the world. By purchasing a subscription, these libraries share the costs of the original microfiche production.

    CIHM produces about 30 copies each year for library subscriptions and additional copies of individual microfiche at an additional cost of $236,092CD. The last two rows in Table 15.3 show how these costs are shared among the subscribing libraries. If the full cost of microfiche production is averaged over the 30 copies, the cost of annual production is $43,399CD per library. This includes the shared costs of production plus the costs of making copies. If an additional 12 copies of each fiche are sold, the average cost is $30,999CD per library.

    The average cost per fiche, per image, and per volume for 30-42 copies are shown in the final three columns of Table 15.3. The sharing of the full costs of production among subscribing libraries reduces the cost to $0.04CD-$0.06CD per image or $9.69CD-$13.56CD per volume. This compares favorably to the cost of each library acquiring a printed manuscript. At an annual savings of $5.73CD per volume for storage and access relative to print for each library, it takes 1.7-2.4 years for the microfiche to cover the costs of creation ($9.69CD-$13.56CD).

    Once produced, it is anticipated that a microfiche copy of a text will last for 100 years. The purchase of microfiche is an investment in an archival copy of materials that is expected to provide access to patrons to the information for many years. If the cost of the microfiche is spread out or amortized over a 100-year period, then the annual cost of microfiche production is only $0.65CD-$0.46CD per 216-page volume per year. When this is added to the cost of storage from Table 15.2, the annual cost comes to $0.81CD-$0.62CD per volume per year for producing, storing, and providing access to a text in microfiche format.

    These costs indicate that when microfiche is produced in large numbers to accommodate several libraries, it costs significantly less to produce, store, and provide access to microfiche than to books. This shared cost per library declines further if the number of libraries acquiring subscriptions increases. In addition, the CIHM microfiche subscription provides access to a larger collection of texts than is likely to exist in any single library of rare books. These cost estimates show that microfiche is the more cost-effective alternative to library storage of print to provide patron access to out-of-print texts.

    The Cost of Digital

    The previous section showed that microfiche is a cost-effective alternative to print. Digitization of texts may be able to provide even greater savings relative to microfiche and print. Unlike print and microfiche, which must be produced and delivered to a library, digital texts have the advantage of being stored remotely but accessed globally via the Internet. The cost of reproduction and distribution of digital information in a networked environment is zero. The only costs are the one-time fixed costs of producing and the annual fixed costs of storing and maintaining the data. These fixed costs can be shared by subscribing libraries that, in theory, could drive the cost per library to a significantly lower level than the cost of microfiche.

    In the Early Canadiana Online Project microfiche was converted to digital format. Microfiche was sent to Preservation Resources for scanning and the University of Michigan for optical character recognition (OCR). Cost estimates shown in Table 15.4 are based on contractual costs for scanning and OCR.

    The total costs for production are $236.08CD per title or $1.20CD per image. Costs in the second and future years for digital storage and access are $35.76CD per title or $0.18CD per image. This includes the cost of salaries for maintaining the ECO Project database (1.5 full-time equivalents for administration, server and database maintenance) and annual costs of hardware storage. Although the cost of producing digital copy from fiche is less than the cost of microfiche, the cost of storage and access for digital, in this project, is more expensive. This is the result of costs averaging over a smaller number of available digital images that will be higher than the average cost per fiche in a university micro-text room that contains hundreds of thousands of microfiche.

    Table 15.4: Average Cost of Microfiche to Digital Production and Storage
    Cost Cost/title Cost/image Cost/volume
    Digitization $439,548 $132.87 $0.67 $145.67
    OCR $159,098 $48.09 $0.24 $52.73
    Salaries $153,264 $46.33 $0.24 $50.79
    Equipment & supplies $7,975 $2.41 $0.01 $2.64
    Construction, utilities & maintenance $21,053 $6.36 $0.03 $6.98
    TOTAL $780,938 $236.08 $1.20 $258.82
    Annual costs of storage & access $118,290 $35.76 $0.18 $39.20

    There are two factors that significantly lower the average cost per image of digital production and storage: the number of libraries subscribing to the database and the number of images stored. The production costs of the digital images are fixed costs that are constant regardless of the number of libraries that subscribe to the database. If 30 libraries subscribe to the database, the cost per library is $8.63CD per volume. An increase in the number of libraries or other organizations that subscribe to the database will decrease the "cost-share" for each organization. In addition, the annual cost of storage and access to the database is also a "shared" cost. If this cost is shared among 30 libraries it decreases to $1.31CD per volume per library per year.

    As the number of images available in the ECO Project increase, the cost per volume will also decline. Space costs (utilities, construction, etc.) and salaries for maintaining and updating the database and server constitute 97 percent of the costs of storage and access. These costs are incurred regardless of the number of images. Storage costs per volume are $0.90CD of annual costs. As the number of images in the database increase, total storage costs will increase, but the average cost will continue to decline.

    The cost estimates from Table 15.4 can be compared to similar recent studies estimating the cost of digital production. Estimates from studies at Cornell University and Yale University are shown in Table 15.5. (Cost estimates from Cornell and Yale are shown in Canadian dollars for comparison. Cost per volume is based on a 216-page text.)

    Table 15.5: Average Cost of Digital Image Production
    Cost/Image Cost/Volume
    Early Canadiana Online $1.20 $258.82 Average cost estimate
    Yale $0.40 $83.96 Marginal cost estimate
    Cornell $0.43 $91.37 Marginal cost estimate

    These earlier studies show a significantly lower cost of digitization. The Cornell study created digital copies from paper while the study at Yale created digital copy from microfiche. The major difference between the Early Canadiana Online Project and these earlier studies is the method used for estimating costs. Both the Yale and Cornell studies estimated costs by timing staff scanning pages of print or microfiche. These studies are based on the marginal cost of scanning images and producing digital copy. The cost estimates for the ECO project are average costs based on dividing total project costs by the number of images, titles, or volumes. The ECO Project cost analysis includes the full cost of producing digital copies and mounting the database on a server for access over the Internet. The ECO project is larger in scope, number of titles, and number of images. ECO costs include all salaries, space costs, and outsourcing of digitization and OCR. Therefore this cost analysis should be viewed as a liberal cost estimate of a large digitization project with Internet access to the database.

    Economies of Scale

    The ECO project scanned a larger number of titles and images than the projects at Yale University and Cornell University. The ECO project scanned 3,308 titles compared to the 1,270 titles scanned at Cornell or the 2,000 titles scanned at Yale. Table 15.6 compares fixed, variable, and total cost estimates for the three projects.

    Table 15.6: Digital Project Average Cost Comparisons
    Annual fixed project costs (equipment & salaries) Per image variable cost estimates Total Cost Estimate Extras Project size
    Yale University Project Open Book (film to digital, 1994/95, marginal cost estimate) $142,420 $0.182 (based on 600 volumes timed) $221,177 432,000 images; 2,000 volumes
    Cornell University (paper to digital to COM, 1994/96, marginal cost estimate) $27,931

    $0.319 manual

    $0.288 auto (150 volumes timed)

    $197,275 manual

    $183,353 auto

    $80,417 (COM costs)

    450,000 images

    1,270 volumes

    Early Canadiana Online (fiche to digital, 1998/99, average cost estimate) $161,239 $0.674 (digitization contract) $600,787

    $159,098 (OCR)

    $21,053 (space)

    651,742 images

    3,308 titles

    The variable cost estimates in Table 15.6 for the ECO project include only the cost of scanning the images. OCR, space, and other salary costs contribute to total costs. For comparison with the Yale and Cornell studies, however, the vendor's cost of providing digital access may be more relevant. If texts were digitized without OCR then the additional cost would be $0.674CD per page. The relative costs and size of the three projects are shown in Table 15.7 and Figure 15.4.

    Figures 15.4 illustrates the increase in cost per image and cost per title between the three projects. This may show diseconomies of scale, i.e. an increasing average cost as output increases. Larger projects may require more staff or have a greater complexity of task that results in higher costs per unit. However, much of the difference shown may simply be the result of different methods of estimating costs.

    Table 15.7: Digital Project Average Cost Comparisons
    Number of images Cost/image Number of titles Cost/title
    Yale University (fiche to digital, 1994/95, marginal cost est.) 432,000 $0.51 2,000 $111
    Cornell University (paper to digital to COM, 1994/96, marginal cost est.) 450,000 $0.42 1,270 $155
    CIHM Early Canadiana Online (fiche to digital, 1998/99, average cost est.) 651,742 $0.92 3,308 $182
    Figure 15.4: Project Cost Comparisons by Number of Images and VolumesFigure 15.4: Project Cost Comparisons by Number of Images and Volumes
    Cost of Access to Digital Information

    The cost of access to digital information is very difficult to quantify. Access to digital information includes the personal computer, network connection, and space used by the patron. Since these are all fixed costs of access that a patron or library must incur regardless of what information is accessed, the marginal cost of accessing any image or database is zero.

    We can attempt to quantify the average cost per use to the library of providing access to digital information. This is shown in Table 15.8.

    Table 15.8: Average Annual Cost of Access to Digital Images
    Cost Cost per internal use Cost per use
    construction, utilities & maintenance $186,274.25 $0.01 $0.00
    Salaries $114,000.00 $0.01 $0.00
    equipment & salaries $398,000.00 $0.03 $0.01
    TOTAL $698,274.25 $0.04 $0.01

    Table 15.8 includes the cost of computers within the library, staff to maintain the server and network, and the cost of space for each computer. Cost per use is shown in terms of internal use and all uses of library databases regardless of the source. Internal use is defined as the number of unique and significant hits to the library server which originate from within the library (0.3 million per week). Use is the number of hits regardless of source (1.2 million per week). Regardless of which definition of use is applied, access to digital documents comes at a very low average cost per use. This is significantly lower than the average cost per use for microfiche or rare books.

    Table 15.8 also illustrates the importance of understanding the difference between total, average and marginal costs. Table 15.8, like previous tables, shows the total and average costs per use. The total cost of providing electronic access within a university library is significant, but the high level of use of terminals within the library result in a very low average cost per use. The marginal or additional cost for each patron's use is zero. All costs in Table 15.8 are fixed costs, incurred regardless of whether a patron uses a terminal or not. Investments in information technology within university libraries can be expensive although digital documents in a networked environment come at a zero marginal cost of distribution.

    User Costs of Access

    The final economic cost of access is the cost to the user. With print and microfiche the user must travel to the library to use the information. Any library will have only a limited collection of print titles. To read other titles in print from the collection a patron may have to travel to another research library. With the CIHM microfiche collection a research library can offer patrons access to a greater number of titles than are typically available in print, although the patron must still travel to the library to access the microfiche.

    Digital copies are accessible to all patrons of subscribing libraries with a network connection. This increased accessibility of the collection to patrons may result in a greater number of subscribing libraries and greater access to the CIHM collection of materials.

    The cost to patrons of using information is the opportunity cost of their time spent in acquiring and consuming it. The value of access to information by patrons is reflected in the demand for using the database. Hypothetical demand for use of Early Canadiana Online is illustrated in Figure 15.5.

    In theory, if the user has a cost of time of $10 per use of a manuscript in a rare books library, he may only use the manuscript 5 times a month. If the patron's opportunity cost of time spent consuming the information decreases then use will increase.

    Microfiche is easier and takes less effort to use than books in a rare book library. Microfiche delivery by library staff takes less time than retrieval of a rare book. Once a patron understands how to use a microfiche reader he can view several books with relative ease. In addition, patrons do not have to travel to another library to view early Canadiana texts if their library holds the entire CIHM collection on microfiche. If we assume that the cost to a patron of accessing an Early Canadiana text on microfiche is $5, then patron use of the microfiche will increase to 30 times a month.

    Figure 15.5: Demand for Early CanadianaFigure 15.5: Demand for Early Canadiana

    Digital access lowers the opportunity cost of access to the information even further. It enables patrons to view the information from their personal computer in their home or office or from a computer terminal in the library. Instant access to a large collection of images from the CIHM collection means faster, searchable access to the images. Using Figure 15.5, if we assume that the opportunity cost of patron access is only $2 per access for digital images, patron use of digital access will increase to 50 uses a month.

    To patrons the time savings from digital access has two parts. First, there is the value to patrons of lower cost access to images they would have traveled to the library to view on microfiche. If a patron would have used microfiche 30 times a month at a cost of $5 per use, and this cost declines to $2 per use in digital form, then this patron has a $3 lower cost of access for 30 uses, or has decreased his cost by $90 a month. Second, there are additional uses of digital access that provide additional benefits to patrons. These additional uses can be assigned an average value of $1.50 each, or one-half of the value of lower cost access to the first 30 uses a month. If use increases to 50, the additional 20 uses per month would provide a benefit to this patron of roughly $30. The total value to this patron would be $120, the $90 in lower costs plus the additional $30 in benefit from an increase in access.

    During this study, patron use of the print, fiche, and digital collection was observed. Patrons were also asked questions about their use and travel time to the library. Annual use of the collection at the University of Toronto and Laval University increased from 2,984 for print and microfiche to an estimated 7,030 uses of the digital texts. Travel time to the library for print and microfiche patrons varied from less than 30 minutes to more than one day, with 90 percent of patrons needing one hour or less.

    If we assume that digital access saves print and microfiche patrons 30 minutes of travel time and that the value of this time is $10 per hour, then the annual savings of 7,030 uses equals $25,035.[2] This represents a lower-end estimate of the savings from accessing the CIHM collection online versus traveling to the library to use the microfiche or print. Some patrons are likely to save more than 30 minutes of travel time. Other patrons are likely to have an opportunity cost of time greater than $10 per hour. Most significantly, use of the Early Canadiana Online collection is likely to increase as more scholars and students are made aware of it.

    15.4 Conclusions

    This project estimated the cost of access to information in print, microfiche, and digital format. The results include

    • The average cost of producing a 216-page book in digital format from microfiche is $258.82 (C$) plus any copyright fees. The annual cost of storage and access is $39.20 per book. In theory, these costs can be shared by the number of libraries and patrons that access the digital copy over the Internet significantly lowering the costs per library.

    • The average cost of producing a book on microfiche is $333.11. Given the number of number of copies sold by CIHM, the cost per library is between $9.69 and $13.56 per book. The annual cost of storage and access in a university library microtext room is $0.16 per book.

    • The cost of a book in print format is the purchase price of the book. The annual cost of storage and access is $5.89 in a rare book library.

    These cost estimates show that digitization of texts can provide significant savings if shared by a sufficient number of subscribing organizations. Networked access to digitized texts also provides several economic benefits to users including (1) increasing the availability of these texts to patrons of organizations with access to the Internet, (2) decreasing the opportunity cost of patrons' time spent accessing digital copies rather than traveling to libraries to use print or microfiche copies, and (3) providing electronically searchable texts making it easier for users to find items of interest. These increased benefits should result in a significant increase in use of the digital information relative to use of the print or microfiche copies.

    Previous studies have estimated the marginal costs of production, acquisition, and storage of books, microfiche, and digital copies of texts. This study included all costs associated with the production, cataloging, and sales of texts in microfiche or digital format. Therefore, the estimates of the cost per book in print, microfiche, or digital format are average cost estimates based on digitizing over 3,300 titles and 650,000 images. Individual libraries engaging in small digitization or microfiche projects may have lower costs per text but the final product may not be of a quality needed for national or international sales. Large-scale projects that include cataloging and sales of several thousand texts are likely to experience a similar cost structure as estimated in this study.

    The remaining paradox of digital information is finding the correct financial strategy to collect sufficient revenues to pay for the benefits of digitization. Digital information provides greater access to information at a lower cost. However funding the production, archiving, and access to the information requires creative financing including value-based pricing of information as well as the solicitation of grants and donations.

    Information production and access comes at a cost. An accurate measurement of the full economic costs of different methods of information delivery is essential in determining the most cost-effective method. This study has shown the costs of three methods of access; print, microfiche, and digitization of microfiche. The cost of digital information is lower on a cost-per-library or per-patron basis so long as a sufficient number of libraries are interested in subscribing to the database.

    In general, the lower cost of digital production will continue to result in more information products appearing in digital format on the Internet. The increase in the number of digital products will further contribute to the information overload of patrons and librarians. Information consumers are confronted with too many journals, databases, and research sources for the limited amount of time and attention they can give to any one source. Given a limited amount of time for information consumption, patrons will search for information of higher quality for use of their time. Any new digital product must have an assurance of quality in order to convince patrons and librarians that there is value in spending time consuming it. Manuscripts of historical significance, such as the ECO Project, produced by trusted organizations, such as CIHM, provide libraries and patrons with an assurance of quality.

    Notes

    I would like to thank Pam Bjornson, Meredith Butler, Marshall Clinton, Malcolm Getz, Wendy Lougee, Tim Nef, Guy Teasdale, and Karen Turko for their comments and assistance. Funding for the Early Canadiana Online Project and this economic analysis was provided by the Andrew Mellon Foundation.return to textreturn to text

    1. Volumes are considered to contain 216 images. Images are page images. Each microfiche image has two page images. return to text

    2. This comes from the original 2,984 uses multiplied by $5 saved per use plus an additional 4,046=7,030-2,984 multiplied by an average of $2.50 savings per use.return to text

    16. The Columbia University Evaluation Study of Online Book Use[†]

    16.1 Introduction

    This paper reports some observations about cost, use, and users of online books during the Columbia experiment. From winter 1995 to autumn 1999, the Online Books Evaluation Project at Columbia University explored the potential for online books to become significant resources in the academic world. The project prepared books in HTML format, a choice that seemed reasonable at the time it was made (1995). Later observation of user behavior makes us less certain of that choice. The evaluation component of the project included monitoring of the national technological environment. The project analyzed (1) the Columbia community's adoption of and reaction to online books, (2) relative life cycle costs of producing and owning online books and their print counterparts and (3) the implications of traditions in scholarly communications and publishing. The experience involved integration of two very diverse cultures, and has taught us the relevance of the following joke.

    A manager, an engineer and a computer scientist are all traveling in a car in the mountains when the brakes fail and the car careens down the road and eventually stops just hanging over the edge of a cliff. They carefully climb out of the car and the manager says, "Well, now we'll have to form a focus team for a matrix review of vision and objectives." The engineer says, "Let me have a screw driver; I may be able to fix this in 10 minutes." And the computer scientist says, "Let's push this back up to the top of the hill and see if the brakes fail again."

    Our approach to online books at Columbia was like that of the engineer, but "10 minutes" has been more like four years. One of the lessons learned is that, as libraries become more interdependent with online information services, we must become more accustomed to the kind of trial-and-error approach exhibited in the joke.

    Figure 16.1: Abstract variables of interestFigure 16.1: Abstract variables of interest

    We started from an abstract formulation of the relation between users, libraries and constraints upon each of them; see Figure 16.1. Our goal is to understand the behavior of the user of the system, shown in the middle of the diagram. The capabilities of the individual users obviously influence their behavior. Their disciplines also probably influence it. The overall environment including technology and attitudes toward computers influences it. The resources available in the library also influence behavior. In turn, library management controls those resources. Our study is an effort to insert the dotted line shown in Figure 16.1, to provide management with feedback about the behavior of users which it can use to better manage the library resources.

    16.2 Economic Perspective

    We take an economic perspective on the complex problem of establishing an online books service at an existing major research library. We presume that the actors involved weigh the costs and benefits of various alternatives available to them. Each applies some kind of personal utility function to those costs and benefits, and chooses the action with the largest personal utility. In this complex setting there are many different kinds of economic actors: students, faculty, and staff. Indeed, the library itself, and even the entire university, can be thought of as "actors."

    Individual economic factors

    User Costs. What are some of the forces affecting individuals? First, there are costs of two kinds, capital and continuing. The capital costs are the cost of equipment needed to be able to use the digital library or online books, and the cost of acquiring the needed skills. Since, in the setting of our project, there is no transfer of funds from users to the library associated with use events, the continuing costs are (a) the cost of connecting to the library and (b) the mental costs or efforts associated with use. Not a lot is known about these costs to the user, at this point. However, in the transition from page-based books to the HTML format that we chose, we sense that certain kinds of mental landmarks that readers have developed over years of working with print on paper are removed. It seems likely that this results in additional mental cost to the users.

    User Benefits. There are also benefits to the users. First among these, of course, is ubiquity of access. Also, our system provided a search capability. In addition, the book-marking system (supported through the browser) permits users to store pointers to important locations within an extended text. Our system did not directly support annotations, but obviously annotations can be established in the users' own computers. Finally, using a system like this provides the intangible benefit of being up-to-date relative to one's peers.

    Beyond all this, having and using a digital library provides symbolic utility. Symbolic utility is a concept introduced by the philosopher Robert Nozick to represent the utility assigned to something good to have or to do, even if it doesn't necessarily "work" (Nozick, 1993, p. 226). In this case, members of the Columbia community may have felt proud about contributing early to the development of digital library systems.

    Given the nature and variety of benefits, it seems probable that they outweigh the costs to individual users.

    Staff Economic Factors

    Staff Costs. Economic forces also affect the staff of any library that introduces a substantial digital component. One important cost is the learning curve, representing costs that must be incurred in order to get the system to work. The other, which is becoming a pervasive feature of the library world today, is the cost of continuous change, which involves not only learning but psychological stress as well. Because some older librarians did not foresee and do not enjoy those stresses, we will probably see a gradual change in the psychological profile of the profession.

    Staff benefits. Among benefits to the staff, the first and most important is the ability to provide better service to patrons. Another important benefit is the ability, in the online books or digital library situation, to adapt materials developed by others. In focus groups conducted at New York University, we heard for the first time librarians reporting that they were pleased to be able to develop web resources in which pointers to resources developed by librarians at other institutions played a major role. A final benefit to staff is the fact that by working in the digital environment they are developing skills that are much more portable than traditional library skills.

    Library Economic Factors

    Library Costs. There are several incremental costs to the library in implementing this project. The most notable are costs of equipment, materials development, and training.

    Library Benefits. Among the important library benefits are the contributions to the competitiveness of the university, and the contribution that digitizing the library contributes to the shared professional goals of growth and service.

    Publisher Economic Factors

    Publishers must consider the potential of electronic books in terms of their business plans and goals. While publishers share some objectives with libraries, authors, and readers, the relationship is sometimes antagonistic, because some portion of the price to readers and libraries goes to publishers rather than authors. We presume that for-profit publishers seek to maximize profit, while non-profit publishers seek to maximize the net of income over expenses attributable to each book.

    16.3 Issues affecting the design of the studies

    Based on the economic framework above, we studied the environment, publishing costs, and library costs. We explored various views on the function and design of online books. We conducted numerous and diverse studies of use and of user preferences. In this chapter we summarize some of what we've learned and discuss implications for the future.

    In concrete terms, the Columbia University Online Books Evaluation Project repackaged books for online delivery, studied the use of those books, and estimated the costs for publishers and libraries of providing print and online books. There were four publishing partners in the project: Columbia University Press, Oxford University Press, Garland Publishing, and Simon and Schuster Higher Education. We analyzed the costs of development and delivery, and the use of digital texts. We sought to relate those costs to that use, within the context of university library service, and to the potential for service. Our analysis of the potential for service is by no means complete.

    In the remainder of this section we review several considerations that framed our studies and motivated particular choices we made in the design of the studies.

    Why put books online?

    Online books have several advantages. First, we anticipate that online books will be cheaper to produce, to purchase, to acquire, and to maintain. We also expect that online books will provide increased functionality such as searching and linking. They offer obvious potential for enriched content through the addition of links to multimedia, computer simulations, and other features. There is also potential for developing expanded products, rather like a collection of books linked through a web site. Not least in importance, online books can provide availability around the clock and calendar.

    Why not put books online?

    The issue of whether to put books online at all was a serious one in 1994 and 1995, when the project was planned and launched. At that time, it seemed that the most important negative point was users' objections to reading books online. We do not know how true this is in the year 2000. There are definitely many who do not want to read books online, but we must entertain the possibility that most of those users are of an older generation, and will eventually be replaced by people who do want to read online.

    Usability. When the project began, it was anticipated that online books would be difficult to use. At that time (1995), it was not even apparent that Web technology would be easy to use. We were also concerned that there was no feasible market model for the development of online books. We cannot say today that there is a clearly defined market, but the activities of netLibrary, Questia and other online, commercial accademic libraries show that there are multiple possible paths into the market for scholarly books, aimed at libraries and students, respectively.

    Accessibility. We were concerned about the adequacy of access and connectivity. We had in mind, primarily, people working at home, connecting over telephone lines with top speed of 14.4 kilobits per second, which seemed likely to be inadequate. However, about halfway through the project the typical home-access speed had moved up to about 56 kbps, and increases in access speed continue to occur.

    Production Cost. We were concerned that online books would be too costly to produce. In fact, we shall see that the production method we employed was relatively costly. Nonetheless, when compared with the total life cycle cost of paper, online production is something of a bargain.[1]

    Author Interests. We also believed that authors might oppose the presentation of their books in online form. Authors might fear loss of royalties, and object to aesthetic compromises. HTML is a limited rendering language, and the connection to HTML might remove some important aspect of layout. Also important is a fear, on the part of young scholars, that exclusive publication in an online form might become common for first time authors and that this would demean their works and lessen their chances for career advancement. On the other hand many academic authors are concerned with documenting the impact of their works and the extent to which they are being read. The online environment is ideal for this.

    National Environment: Access

    Reviewing the environment for online books from 1995 to 1999, we see a number of changes. First is the improved price-to-power ratio for personal computers, discussed further below. We saw penetration of Internet use to more than 50% of all USA households by 1999. In addition, by 1999 half of all adults in the USA were Internet users. There was little improvement in Internet service provider pricing between 1997 and 1999. Hand-held book readers emerged in 1998, and some of our focus group work suggests that this will be important in the future growth of the online book market.

    National Environment: Computer pricing
    Figure 16.2: Prices do not follow Moore's Law (impressionistic)Figure 16.2: Prices do not follow Moore's Law (impressionistic)

    Moore's famous law is that computing power at a given price doubles every 18 months. The inverse formulation is that the cost of a given amount of computing power falls by half every 18 months. However, the corollary that consumer prices for computers falls at the same rate does not hold. Starting from when the base price of an adequate computer was about $4000 we would have expected that by the end of our study this price would have dropped to well below $1000. What we actually saw, through a program of tracing ads for an entry-level computers, is that prices dropped fairly rapidly to around $2000, and held there for some time. Towards the end of the study period there was a new break down to $1000. Apparently the strategy of manufacturers was to identify market price points that are acceptable to consumers and to improve the configuration of the computers rather than drop the price past those points. If Moore's law held strictly, a general purpose computer adequate for the use of online books over the 56 kbps lines should now cost only $300.

    Local Columbia Environment

    The local environment at Columbia for online books changed substantially during the period 1995 to 1999. By the end of this period there was Ethernet connectivity to every building and dormitory. By 1997, which is the last time that we could justify the costs of surveying to ask the question, 80% of students and faculty had adequate access to a network computer. By 1997, most library users reported an average of six hours per week of online activity of all kinds. That works out to about an hour a day and we estimate that by now this has probably at least doubled if averaged over the entire community.

    By spring 1999, online use of complete texts had become common at Columbia. For example, the level of JSTOR use was equal to one use per month per potential user, on average. We found that most online book use was from on-campus computers. This is consistent with a concern that access from home might not be adequate. It is quite possible that, as bandwidth to the home increases, the usage of online books will increase further.

    16.4 Cost data

    We developed a variety of sources of data in the online books evaluation project. We conducted surveys online, by mail, by telephone and in class. We also conducted individual in-person and telephone interviews of scholars and a number of focus groups involving users, potential users, and librarians. In this report, we focus on cost analyses and on Web data.[2]

    In a traditional print production environment, preparing texts for online access incurs an additional cost. We found an amazing range of estimates for this cost, from four cents per page to more than $2.00 per page, which works out to approximately $100 to $1000 per title. The range of cost is due to the enormous variation in the format and quality of source files from the publisher at the time, and in the conversion processes employed by various projects. Achieving the low-end cost requires a very standard and well-behaved PostScript source file. In addition, these figures include some unknown component of experimentation cost, as this project and others adapted to variations in input, and in desired presentation format.

    Table 16.1: Sample e-book production costs ($)
    1.51/pg. Conversion: OCR or SGML or HTML
    1.00/pg. Conversion: ASCII to HTML
    0.04/pg. Conversion: Postscript to PDF
    20.00/title Conversion management
    1.00/MB/yr. Server maintenance

    In Table 16.1 we present some sample electronic book production costs. One conversion route is from OCR (or from SGML) to HTML, and the other, somewhat less expensive route begins with ASCII and goes to HTML. Conversion from PostScript to PDF is done using software from Adobe and yields a cost of about four cents per page. Note that this process, which has been tested at the University of Pennsylvania, does not yield fully navigable HTML files, but yields PDF output only. Management of conversion is estimated to have cost about $20/title at Columbia. Maintaining books on the server is steadily less expensive, estimated by the end of 1999 to cost about $1/megabyte per year.

    A fully electronic production process (bypassing print) would be less expensive. Through conversations with scholarly publishers, we have been able to estimate that the potential savings for moving to online format, without paper would be about 10% at the plant (that is changes in typesetting costs) and perhaps an additional 15% in costs avoided for paper, printing and binding. Also eliminated would be costs associated with warehousing and shipping, which we did not attempt to estimate.

    On the other hand, there are offsets to these savings for online production. They include costs of customer service, continuing file maintenance, and migration. These latter, archival, functions are very important. A rational economic publisher will only maintain the file for a book as long as the discounted total expected future revenue from sales exceeds the total discounted projected cost of keeping the file. Thus, libraries cannot rely on publishers to maintain the files of books with very low demand, unless they are willing to pay service fees that cover the publishers' expenses.

    From our review of the literature we have prepared an estimate of life cycle costs to the library for online and paper books. These, projected over a thirty-year life cycle and discounted at a 5 percent real cost of money, are lower for online books. Our summary is shown in Table 16.2. The difference is essentially equal to the avoidance of the costs of managing circulation. In addition, long run costs for online books would likely be quite a bit lower as copy cataloging would prevail rather than the original cataloging experienced, and included in the costs, for this project. Original cataloging costs about $25 per title while copy cataloging would cost significantly less per title.

    Table 16.2: Estimated life cycle costs ($)
    Print Online
    Acquisition/Processing 47.00 39.00
    Storage/Maintenance 14.00 38.00
    Circulation 44.00 (included above)
    TOTAL 105.00 77.00
    NOTE: Calculated over a 30-year life, at a 5% discount rate.
    Design Considerations: Librarians

    We conducted focus groups with librarians to identify market and design features that they consider important in building a collection of online books. The first feature emerging is the ability to search across selected groups of titles. A second, rather technical issue is the existence of "stable, granular" URLs. Stable means that the URLs remain the same over time, or at least that the system does not have to be manually updated. Granular has to do with the level of specificity with which a user can access a book. In the Columbia approach to online books, an individual file corresponds to a chapter within a book. We found that librarians want good bibliographic control of online books, with direct linking from the catalog into the book. But they would also like to see usage data on individual titles in some standard form. This usage data can feed back to rationalize online book acquisition policies. Finally, librarians want to be assured that an online book system will support reliable migration to new platforms.

    Design Considerations: Scholars

    Both in-depth interviews and focus groups with scholars generated a somewhat different list of desired design features. Scholars would like to be able move directly into the online book via direct link from the online catalog. They would like to be able to define groupings of texts on the fly, and search across that collection of texts. They would like a comprehensive and detailed table of contents, with direct linking into the book (providing, in effect, analytic indexing). When images are a significant part of the text they would like to see browsable, linked, thumbnail images. They would like screens and displays supporting the ability to show two nonconsecutive pages at once, permitting comparisons. They would like to be able to see footnotes and text in parallel displayed on the same screen, even if the "footnotes" are actually endnotes. They would also like to see pagination matching the print version, not only for navigational bearings, but also because, frequently, the citation that led them to a book specified a particular page.

    Scholars would prefer that, whenever the collection contains the relevant material, references be hyperlinked directly into the cited material. They would also like to be able to link to a dictionary. They would like to be able to adjust fonts and formats for easier reading on screen. They would like to have annotation and highlighting capability that they could store with the book. They also expressed an interest in having the ability to share annotations on a single text.

    16.5 Study of Users

    The remainder of this paper discusses some of what we have learned about the users. The first interesting point is a relation among technology, behavior, and attitudes. We expected that the technology, as it grew, would influence the attitudes of scholars, both faculty and students, which in turn would influence their behavior. However, we tracked attitudes carefully over the entire study and saw only the smallest movement towards believing that online books are a better way to do one's scholarly work. This forces us to conclude that, in fact, technology effectively influences behavior and that attitudes simply have to catch up. This may mean that scholars are moved to technology by a subliminal perception of benefits, which they cannot articulate. On the other hand, it may mean that fashions in scholarly behavior are simply no more rational than any other kinds of fashion.

    Analysis of individual use

    A key innovation in the Columbia online books project was the introduction, in 1997, of the ability to identify the activity of unique users. This was a fortunate byproduct of the security system, developed to permit people to read online books from home. To maintain confidentiality of the users, system analysts replaced the identities of individual users with uninformative labels.

    Table 16.3: Status of users at time of first use
    Frequency Percent
    Undergraduate Student 2088 58.0
    Other 607 16.9
    Missing 328 9.1
    Graduate Student 295 8.2
    Other Student 145 4.0
    Faculty 136 3.8
    TOTAL 3599 100.0

    With anonymity ensured, we were permitted to link usage to administrative files containing demographic information about the users. Typical results are those shown in Table 16.3, reporting the distribution of the status of individual users at the time they first used a particular resource. The resource in this case was the online version of the Oxford English Dictionary. While we had a number of reference works available online and, by the close of the project, close to 200 books in online form, the total usage of the OED represented approximately 50 percent of all online usage, and so it is used here to illustrate the types of analyses that we performed.

    N Stem Leaves
    1049.00 0 00000000000000000000000000000000000000000000001111111111111111111111111111111111111111111111111
    491.00 0 22222222222222222222222222333333333333333333
    265.00 0 444444444444455555555555
    202.00 0 666666666667777777
    156.00 0 88888889999999
    140.00 1 0000000111111
    92.00 1 22223333
    102.00 1 4444455555
    86.00 1 66667777
    62.00 1 888999
    68.00 2 0000111
    49.00 2 22233
    57.00 2 44455
    48.00 2 6677
    38.00 2 899
    38.00 3 001
    41.00 3 2233
    29.00 3 45
    35.00 3 667
    32.00 3 899
    34.00 4 011
    25.00 4 23
    18.00 4 45
    20.00 4 67
    16.00 4 89
    11.00 5 0&
    Figure 16.3: Stem and Leaf diagram of time spent viewing the OED
    NOTE: This figure is a stem-and-leaf diagram that represents a histogram of use. It will look like an ordinary histogram if rotated 90 degees to the left, but contains rather more detail, as individual values are represented by numbers, rather than marks. Each leaf digit represents an observation. The value of the observation in minutes of usage is equal to 10*Stem+Leaf. The last row represents fifty or mroe minutes of usage. The value in the first column (N) is the total number of observations summarized in that row.

    There were 3,600 individuals who used the OED during the study period. Just over 2,000 of these were undergraduate students at the time of first use. Nearly 300 were graduate students and close to 140 were faculty members.

    We analyzed the ways in which individual users used the resource. To do this we introduced the rule that an inactive period of 15 minutes or more was considered to mark the end of a session. This is a reasonable rule based on detailed analysis, which showed that there was a natural break in the distribution (over all users) of the interval between "clicks" at somewhere around 10-15 minutes. We interpret this as meaning that continuation of a session over a break of this duration will be a rare event, which we can safely ignore. We also studied the total amount of use that individuals made of specific resources. This is illustrated by data on the OED. The mode (that is, most common) number of clicks that an individual user made on the OED is somewhere between 2 and 3. Above that number the number of clicks that a person made on the OED drops exponentially. The rate of the drop is such that the chance to go on to two more clicks is about 2/3 at any time. (The chance to add one more click is the square root of this number, or about, 83%.)

    As shown in Figure 16.3, the time spent using the OED online follows an exponential distribution. This indicates that at any time in the course of using the OED an individual has a constant probability of just quitting and deciding never to use it again (roughly 100%-83%=17%).

    This apparently exponential behavior is intriguing and we pursued it in another way. Since we could anonymously track individual users, we could plot how much an individual used the resource against how long it was since the first time that the individual used it. With 100% adoption this graph would be roughly linear. We show the actual data for the OED (which had heavy use) in Figure 16.4.

    Figure 16.4: Scatter plot of total use against time since first useFigure 16.4: Scatter plot of total use against time since first use

    Figure 16.4 is a scatter plot. Each point represents one individual user. The y-coordinate of the point represents the number of sessions that an individual had with the OED and the x-coordinate represents the number of days since that individual first used the OED. The steep line represents the expected usage relationship if adopters continued to use the resource at a steady rate.[3] In fact, a regression analysis shows that the best fit is nearly horizontal, which indicates there is little ongoing use by individuals. It is apparent that many observations are not well-predicted by this model, and indeed, that some usage did persist.

    We can plot this data in a more familiar form by showing the distribution of time since first use, without paying attention to how much use there has been. We do so by projecting the preceding figure onto a horizontal axis; See Figure 16.5. We see, as have most researchers in the academic setting before us, that it is very easy to discover the existence of the semester. Each of the five peaks in this graph corresponds to an academic semester. There might be some cause for optimism in the fact that the leftmost peak, which represents the most recent surge in use, spring 1999, seems to rise higher than any of the earlier ones. However we don't know quite what to make of the fact that the one before it (fall 1998) represents a drop from the preceding fall.

    Online Versus Paper: Usage Data

    Our data (based on comparison between the online book usage figures and data collected through circulation statistics and slips placed in corresponding reference titles in the library) suggest that online books were used more than their print counterparts. If we count circulation alone we find that there were about three times as many accesses per book online as for the paper version. After consultation with librarians we believe that a reasonable correction for in-house use is to increase circulation by 50%. This would reduce the ratio to twice as many online uses per book.

    Figure 16.5: Histogram of time since first useFigure 16.5: Histogram of time since first useNOTE: Height of the bar is the number of sessions logged by users starting the indicated number of days before data collection.

    We conjecture that higher usage for online books is due to lower convenience costs than for other access options. Having purchased a paper copy for the library does not ensure that the book is available. The book might be in circulation, or missing from the shelf. If the library is closed the paper copy of book is not available to a user. A common access option is an online public access catalogue (OPAC). However, an online public access catalog does not support even the roughest form of browsing into the book until the book itself is put online. An OPAC provides so little information about a book that a scholar might not be aware that it contains material relevant to his work. If so, the mere ownership of that book by his library does not make it truly available to him. Catalog records enhanced with tables of contents and book indexes are a relatively new offering and a major asset to the scholar in locating books relevant to his or her research, but do not eliminate the higher convenience costs of accessing the physical book at the library.

    Hence, the online access to a full book represents a quantum leap in the availability of the contents of that book, and, we believe, lowers the barriers to access for many modalities. Perhaps the only modality for which it is not clear that online access is preferable is "plain old reading at length."

    We were also interested in studying patterns of access when readers use online books. We have approached this in two different ways. One is essentially qualitative, in which we asked people in surveys and in interviews how they used online books. In doing that we were able to identify at least the following kinds of activity: browsing, grazing (that is, reading portions of text scattered through the book, punctuated by visits to the index or table of contents) citation checking, the finding of individual facts or quotations, reading on reserve for a course, determining the need for a paper copy, printing (that is, turning the online book into paper), and directly reading online.

    We have also, because we can track individual users, been able to break some new ground in quantitative analysis of how people use books online. Generally, each chapter is a separate file, and hence a separate entry in the web sever log. Thus, by analyzing the sequence of clicks on chapters, we are able to distinguish a number of different ways in which individuals use online books. The first style we characterize as linear use: an individual reads chapters of a book in exactly the same order in which they appear in the printed volume. The second pattern of use is quasi-linear, in which the sections of the book are visited in some personalized order but each section is read once and only once. We also observe a pattern we call hyper-linear, in which sections are visited in an arbitrary order and some sections are visited more than once. Hyper-linear usage occurs about 12% of the time. See Figure 16.6.

    Figure 16.6: Patterns of motion in online booksFigure 16.6: Patterns of motion in online books
    Figure 16.7: Use of index in online booksFigure 16.7: Use of index in online books

    There are several ways that a use pattern may involve use of the index (or, more generally, search tools); see Figure 16.7. The first format is to use a search tool once, at the outset, and then to view portions of the book in some linear or quasi-linear order. Another possibility involves using the index, going to a section, and then going back to the index and out to another section and continuing in this pattern. Whether this is a natural behavior evolving in the presence of online books or an artifact introduced by the fact that returning to some index or search tool may be the easiest way to get to the next section is something we don't know at this point. In thinking about these patterns of use, we may compare them to what a person might do with the book in hand, at the library shelf, or with access to the catalog, in some online format.

    16.6 Economic Behavior

    Economic Behavior of Scholars

    Given our original framework, we would like to bring together everything that we have learned, to formulate some economic model about scholars' preferences for modalities of book access. We believe that, for this issue, one key variable is cost which we characterize simply as low or high. (For the moment let us imagine that this is the purchase price of the book, as far as the scholar is concerned.). We propose that the other key variable is whether the scholar intends to read much or read little. We believe that whether the book is cheap or expensive, if only a little of it is to be read, the scholar will prefer to get it online. Based on data available to us during the span of this project, we believe that if much of the book is to be read, the scholar will prefer to get it in paper form. If the cost is low, the scholar will buy it; and if the cost is high, the scholar would like the library to buy it so that he or she can borrow it.

    In short, what we seem to find is that users want online books for convenient access and for assured availability. They also want online books for many of the purposes discussed above. They are particularly attracted by the added functionality of annotating and hyperlinking. Nonetheless, our results indicate that when scholars want to read books at length, they still want them in paper form.

    Economic Perspective of Librarians

    Complementary to this analysis of when scholars will prefer online books, our focus group studies with librarians indicate that librarians want online books for high demand books (for example instead of buying a second copy). Librarians also want online books to meet transient demand, rather than having to purchase additional copies which will be unused later. And, of course, librarians want online books for the anticipated cost savings.

    On the other hand librarians are concerned about having to pay separately for the online version of a book that they hold in paper. They are concerned about the uncertainty of preservation and migration of digital forms. They also are concerned about the appearance of unwanted and unused material in bundled packages. While bundling in general can increase both consumer and producer benefit (Shapiro and Varian, 1999), librarians are particularly concerned with the flow of cash from the institution to the publishers, and would like to have the finest possible detailed control to optimize the allocation of those funds, by avoiding materials that are less in demand.

    Speculations on Marketing Strategies

    We have tried to speculate on options for library-oriented strategies for the introduction of online books. For example, one might imagine that online versions are made available for little or no additional cost to purchasers of paper copies. One might hope to see entire collections of online material priced very attractively. On the other end of the bundling spectrum, one might see some kind of on-demand licensing, or on-demand print ordering. The netLibrary is offering yet another alternative by mimicing the circulation system for print books. It provides online books to individual libraries or library consortia and allows just one user at a time for each book. Other marketing strategies are more reader-oriented and less tailored to the concerns of a library. These include Questia's effort to build an online book collection the size of a college library (250,000 volumes) and to sell subscriptions to students. Another path is the hand-held device and downloadable book that is now coming to market. Generally speaking, in reader-oriented strategies, pricing of the electronic form will be unrelated to print purchase, as there is little chance that consumers can be persuaded to buy the same "book" twice.

    We speculate, but at this point can only ask, whether different strategies will emerge for different classes of print materials such as text books, scholarly books, and narrow interest (sometimes called endangered) scholarly books.

    At the end of our study it appears that a number of transitional strategies are available or being developed. The leading one is the dual provision by publishers of publications in print and online. Among other virtues of the strategy, there is the possibility of electronic publication of both a backlist (the books that have been available for over a year) and a front list (the books newly published). Since publishers still need to protect ultimate paper sales, some limits may be placed on the accessibility or functionality of new titles that are presented in front-list form.

    16.7 Concluding Unscientific Postscript

    Use of online books can be tracked at a micro level, providing valuable information for authors and publishers. In fact, scholarly authors must become concerned about these data since their advancement may depend on being able to document the degree to which their works are used, as well as the degree to which they are cited.

    Having studied the provision and usage of online book for four years, we feel emboldened to make a few predictions. Due to cost, complex functionality will be reserved for books that have large sales or are developed in subsidized projects. We anticipate that endangered monographs will be available from academic or society servers, from sites like the Los Alamos Preprint site , or from the individual authors themselves. In other words, they won't be "published" as we understand it today. Many books will appear in both electronic and print versions. Commercial enterprises or academic organizations and not library experiments will define the product that eventually comes to dominate.

    Notes

    This research has been supported by the Andrew W. Mellon Foundation and Columbia University. The views expressed herein are not necessarily those of that Foundation or the University. The first author acknowledges support from Columbia University through a contract with Tantalus Inc., SCILS, Rutgers University, the Fulbright Foundation, and Ragnar Nordlie and the Journalism, Library and Information Science Department of the Oslo University College, Norway. At Columbia the authors are indebted to many individuals in the Libraries, in Academic Information Systems, and in the academic departments for their participation, encouragement, and cooperation. Elaine Sloan, University Librarian, was critical to the formulation of the project and an insightful supporter. Walter Bourne, David Millman and Gordon Dahlquist were particularly improtant to the process of creating the online books and various online questionnaires. Lynn Jacobsen Rohrs was a key project participant as the analyst of the web server data. Kate Wittenberg of Columbia University Press, Leo Balk of Garland Press, and Ursula Bollini of Oxford University Press provided books from their presses and shared their insights into the publishing business and critical issues for our research. The authors thank the editors for careful revision of the manuscript.return to textreturn to text

    1. As we will hear during this conference, publishers are working hard to reduce those prices to make online books very competitive.return to text

    2. The reader may visit the project web site: to review other studies and reports.return to text

    3. This is a qualitative relationship: our prediction is merely that the relationship would be linear and rising, but the slope in the figure is arbitrary.return to text

    17. Revitalizing Older Published Literature: Preliminary Lessons from the Use of JSTOR[†]

    Perhaps it would be best to begin this chapter by stating explicitly what it is not. This chapter does not present a scientific study. It does not purport to present evidence that will lead the reader to a carefully argued conclusion. Rather, it is an attempt to highlight some of the questions that usage of the JSTOR database is enabling us to ask and to begin to assess whether there are answers that will prove interesting or valuable to the scholarly community. At this stage, and with the relatively small amount of data and minimal degree of analysis that has been conducted, this report should be regarded as highly preliminary.[1]

    JSTOR began as a research project sponsored by The Andrew W. Mellon Foundation at the University of Michigan. Its original objective was to test whether the digitized versions of older research journals might serve as a substitute for the paper versions, thereby offering libraries the possibility of long-term savings in shelving and archiving costs while simultaneously improving their usability. A pilot database was created that included the back runs of ten journals — five in history and five in economics — and access was made available at five liberal arts colleges and the University of Michigan.[2] By the summer of 1995, it was apparent that the concept held great promise, and JSTOR was established as an independent not-for-profit organization. JSTOR was founded to carry on the original objective stated above, but with the added charge that it develop an economic model that would allow it to become self-sustaining.

    The JSTOR Phase I database now includes the backfiles[3] of 117 journal titles (see Table 17.1) from 15 academic disciplines, a collection numbering nearly 5,000,000 pages. As of March 2000, more than 650 academic institutions from 30 countries were participants in this collaborative enterprise, with approximately 100 colleges and universities having had access to the database since early 1997. The amount of usage of the resource and its growth rate have been surprising. In 1999, over 1.4 million articles were printed from the JSTOR database, over 4 million searches were performed, and users accessed the database more than 17 million times.[4]

    Table 17.1: JSTOR Phase I Journal Titles
    • African American Review
    • The American Economic Review
    • The Americal Historical Review
    • American Journal of International Law
    • American Journal of Mathematics
    • American Journal of Political Science
    • American Journal of Sociology
    • American Literature
    • American Mathematical Monthly
    • The American Policital Science Review
    • American Quarterly
    • American Sociological Review
    • The Annals of Applied Probability
    • Annals of Statistics
    • Annual Review of Anthropology
    • Annual Review of Ecology and Systematics
    • Annual Review of Sociology
    • Anthropology Today
    • Applied Statistics
    • Biometrika
    • Callaloo
    • The China Journal
    • Contemporary Sociology
    • Current Anthropology
    • Demography
    • Ecological Applications
    • Ecological Monographs
    • Ecology
    • Econometrica
    • The Economic Journal
    • Eighteenth-Century Studies
    • ELH
    • Ethics
    • Family Planning Perpectives
    • Harvard Journal of Asiatic Studies
    • International Family Planning Perspectives
    • International Organization
    • The Journal of American History
    • Journal of Animal Ecology
    • Journal of Applied Econometrics
    • Journal of Asian Studies
    • Journal of Black Studies
    • The Journal of Blacks in Higher Education
    • The Journal of Business
    • Journal of Ecology
    • The Journal of Economic History
    • Journal of Economic Literature
    • The Journal of Economic Perspectives
    • The Journal of Finance
    • The Journal of Financial and Quantitative Analysis
    • Journal of Health and Social Behavior
    • Journal of Higher Education
    • Journal of Industrial Economics
    • The Journal of Military History
    • The Journal of Modern History
    • Journal of Money, Credit and Banking
    • Journal of Negro Education
    • Journal of Negro History
    • The Journal of Philosophy
    • The Journal of Political Economy
    • The Journal of Politics
    • The Journal of Southern History
    • Journal of Symbolic Logic
    • Journal of the American Mathematical Society
    • Journal of the American Statistical Association
    • Journal of the History of Ideas
    • Journal of the Royal Anthropological Institute/Man
    • Journal of the Royal Statistical Society
      • Series A (Statistics in Society)
      • Series B (Statistical Methodology)
    • Mathematics of Computation
    • Mind
    • MLN
    • Monumenta Nipponica
    • Ninteenth-Century Literature
    • Noûs
    • Pacific Affairs
    • Philosophical Perspectives
    • Philosophical Quarterly
    • The Philosophical Review
    • Philosophy and Phenomenological Research
    • Philosophy and Public Affairs
    • Political Science Quarterly
    • Population and Development Review
    • Population Index
    • Population Studies
    • Population: An English Selection
    • Proceedings of the American Mathematical Society
    • Proceedings of the American Political Science Association
    • Proceedings of the Royal Anthropological Institute of Great Britain and Ireland
    • Public Opinion Quarterly
    • The Quarterly Journal of Economics
    • Renaissance Quarterly
    • Representations
    • The Review of Economic Studies
    • The Review of Economics and Statistics
    • The Review of Financial Studies
    • Reviews in American History
    • Shakespeare Quarterly
    • SIAM Journal on Applied Mathematics
    • SIAM Journal on Numerical Analysis
    • SIAM Review
    • Social Psychological Quarterly
    • Sociology of Education
    • Speculum
    • Statistical Science
    • Statistician
    • Studies in Family Planning
    • Studies in the Renaissance
    • Transactions of the American Mathematical Society
    • Transition
    • William and Mary Quarterly
    • World Politics
    • Yale French Studies

    Figure 17.1 illustrates the growth in the total number of accesses since the database was first made available.

    Figure 17.1: Total Accesses September 1997 - December 1999Figure 17.1: Total Accesses September 1997 - December 1999

    When JSTOR was established, many people questioned the wisdom of converting journal backfiles. With comparatively little use of these materials in paper form, one could not help but wonder whether there would be sufficient interest in gaining access to the resource to warrant the substantial investments that would have to be made to create it. It is clear that it would not have been possible even to conceive of pursuing a project like JSTOR without the interest of the Mellon Foundation. Through its grant-making, the Foundation provided the financial resources necessary to establish the technological infrastructure required to create the database. Perhaps more importantly, however, the Mellon Foundation contributed staff time, most notably that of its President, William G. Bowen, to launch the enterprise.

    The investments of the Mellon Foundation have made it possible for JSTOR to pursue and begin to fulfill its important not-for-profit mission, one component of which is to enhance the accessibility of little-used and inconvenient-to-retrieve journal literature. Another primary component of JSTOR's mission is to act as a trusted archive for the material under its care. This part of JSTOR's mission is reflected in the number of articles in the database that are not being heavily used today, but which may someday be a critical component of a new line of argument for an important paper or research article.

    Early analysis of JSTOR's usage data allows us to begin to ask questions about how scholars and students use older literature in electronic form. Do scholars and students make use of the older articles? Are the materials being used more now than they were in paper format only? Can these data provide guidance about what material should be digitized? Does the usefulness of the older literature vary by academic discipline? These are some of the questions that we hope JSTOR will answer over the long run.

    17.1 Comparing JSTOR Use to the Usage of the Journals in Paper Format

    As part of the original JSTOR pilot project, an effort was made to collect circulation and usage information for the ten pilot journals. The hope was that the data would serve as a benchmark for comparison purposes. Unfortunately, it was not easy to collect reliable data. Since many of the journals were available in open stacks, it was not possible to obtain accurate circulation figures (although some circulation data were obtained from the University Reserves office at the University of Michigan Library). Instead of regular circulation data, two counting methods were employed to obtain information about use of these journals. First, slips of paper were placed in each journal volume with a request that a user mark when they had used the volume. Signs were also placed in the area of the journals to instruct users of the survey being conducted. Second, staff at the library pilot sites were instructed to check the shelves each business day for several months and make note of which volumes were not on the shelves. The volumes not on the shelves were counted as having been used.

    Also, only the journal volumes housed on the main library shelves at the participating pilot libraries were included in this work. Usage of the paper volumes in faculty offices or in departmental libraries was not captured. Because of the lack of a controlled environment and the relatively narrow scope of this study, one must be careful about conclusions drawn when comparing these data to site license access to JSTOR at the institutions.

    It does appear, however, that the electronic articles in JSTOR are being used much more frequently than they were used in the paper form. The paper usage data was collected over varying lengths of times at the five institutions that returned data, but a minimum of three months of information was collected. There were a total of 692 uses of the ten journals at the five test sites over the course of the entire survey period. Usage of the same journals in JSTOR at the same five sites for the months of September, October and November of 1999 yields a total of more than 7,696 article views. In addition, although there is presumably substantial overlap in articles viewed and those printed, 4,885 articles were printed — a total of 12,581 views and prints during the three month time period. When compared to the 692 uses in the benchmarking survey, it would seem that the convenience of having electronic access is facilitating greatly increased use of the material.

    Another way to assess whether usage of the older journals in electronic form is greater than in paper is by evaluating the growth in usage. As Andrew Odlyzko points out in Chapter 2, growth rates may matter more than absolute numbers. It is rather unlikely that the usage of older articles in paper form was growing at measurable rates. That contrasts markedly with usage of JSTOR (as well as other resources discussed in this book). Growth in the aggregate use of the JSTOR database has increased dramatically in the period since 1997 when it first became available. Table 17.2 below shows the total accesses to the database by institution type.[5] Total accesses to all content in the database increased 4.4 times from 1997 to 1998 and 3 times from 1998 to 1999.

    Table 17.2:
    JSTOR Class Accesses 1997 Accesses 1998 1997-1998 Growth Factor Accesses 1999 1998-1999 Growth Factor
    Very Large 817,893 3,291,648 4.0 8,550,945 2.6
    Large 160,700 785,244 4.9 2,766,100 3.5
    Medium 110,254 637,950 5.8 2,468,666 3.9
    Small 110,312 490,854 4.4 1,323,894 2.7
    Very Small 43,754 207,170 4.7 73,823 3.4
    Totals 1,242,913 5,412,846 4.4 15,814,475 2.9

    Because some of the growth in aggregate usage of JSTOR is a result of new institutions signing up for the database during this time period, we have compiled usage figures at institutions that had JSTOR installed prior to April 1, 1997. Aggregate accesses at these institutions increased by a factor of 3.4 times from 1997 to 1998 and by a factor of 2.5 times from 1998 to 1999. The cumulative growth of usage over the three-year time period at existing sites is 740%!

    As one contemplates this impressive growth in JSTOR usage, it is perhaps valuable to note that JSTOR is available "for free" to end users. Libraries have paid participation site license fees that allow authorized users (faculty, staff, and students) to make unlimited use of the resource. For the most part, authentication is handled by IP address, thereby making the authentication process virtually invisible. This unfettered access contributes to the rapid growth in use of the resource; it is consistent with the kind of growth one is seeing in other resources available on the World Wide Web. This picture might be very different indeed if JSTOR were charging either users or libraries based on usage.

    17.2 The Interdisciplinary Appeal of JSTOR

    An additional variable that is likely to be a contributing factor to the increasing use of JSTOR is the addition of new content. Since 1997 JSTOR has been digitizing new journals and making them available to participating institutions. Content in new academic disciplines introduces new scholars and students to the resource. Additional content in existing fields broadens the appeal of the resource within that discipline.

    As the resource has grown, it is evident that the cross-title and interdisciplinary appeal of the resource has grown as well. Pulling from the search logs of a recent week of JSTOR use reveals that approximately 68,000 searches were conducted. Of these, just under 62,000 (91%) specified more than one title. Because JSTOR offers the option to search by cluster (pre-defined discipline-specific collections[6]), it is convenient for users to search across journals in a single discipline. Approximately 58,000 searches specified clusters. Of those cluster searches, 69% specified more than one cluster. This is quite significant because the JSTOR interface does not offer an option to select all clusters. Judging from this behavior, the ability to search across disciplines is important to users.

    17.3 Nature and Distribution of Use

    There are a total of 831,087 articles in the JSTOR database. Our use of the term "article" may be a bit misleading in that it refers to all items that are indexed as an item for retrieval. Full-length articles are a sub-set of this total, of which there are presently 356,978. Other "articles" are items like book reviews, letters to the editor, membership lists, and the like.

    The distribution of the use of JSTOR is interesting because it speaks to the extent to which JSTOR functions as an archive. Many libraries, particularly research and academic libraries, have a mission to collect not only that material that is likely to be used today, but also to collect and care for that information which may be valuable in the future. JSTOR has surprised us in the extent and degree that it has been used, but there is something to be learned also from what has not been used.

    After three years, 430,429 different articles have been viewed, representing 51.8% of all articles in the database. (Many of these articles have been viewed multiple times; the figure above relates to whether the article has ever been viewed.) 248,683 articles have been printed, representing 29.9% of all articles.

    Figure 17.2: The number of article views accounted for by the top n articlesFigure 17.2: The number of article views accounted for by the top n articles

    The complement to the statement above is that nearly half of the articles in the JSTOR database have never been viewed or printed. Will they ever be used? We do not know. Further, we find the distribution of use among the articles to be rather concentrated. Figure 17.2 presents the number of article views accounted for by the top n articles. For example, the top 100 articles viewed represent 112,072, or 2% of the total article views. The top 10,000 most viewed articles were viewed 1,987,982 times, or 36% of the total. And the top 100,000 most viewed articles were viewed 4,613,610 times, or 82 % of the total. This last figure means that 12% of the articles accounted for 82% of the views. This high concentration may be somewhat misleading because our count of total "articles", as mentioned before, includes all items in the database, such as reviews, and front matter and back matter, not just full length articles. Since it is natural that many of these items may never be viewed or cited, but are included in JSTOR to present the complete and comprehensive digital version of the originally published journal, this level of concentration probably should be expected. In any event, it is not a concern to JSTOR since its mission is to serve as an archive and not to make its decisions on preservation of content based on the amount of use of the various articles contained in the database.

    17.4 Selection Criteria

    Since it is generally accepted that it will not be possible to digitize all journals that have ever been published, an important question for any digitization project is how to select the retrospective content to be made available electronically. In JSTOR a variety of factors are taken into consideration in the selection process, including surveys of faculty and library professionals in the field in question, library subscription levels, citation impact factor measures, and length of the run, among other things.

    Looking at JSTOR usage at the article level, it is evident that citations should not be used as the sole factor in determining what content should be digitized. To test the question of whether citation or citation frequency correlates with database usage, we conducted a preliminary analysis on use of particular articles in JSTOR. First, we identified the top ten most frequently used articles for each of the 117 journals in the database. We then looked up their citation data using ISI Social Science Citations. What we found was that usage and citation data were not correlated. For the purpose of illustrating the point, Table 17.3 displays an abbreviated version of the data we collected. Shown below are the top three articles in terms of JSTOR use since 1997 (through March 20, 2000) for three Economics titles. The number of citations to each article in the period from 1997 to 1999 is displayed,[7] as are the average number of citations to each article for the period from 1972 through 1999.

    Table 17.3: JSTOR Usage — Economics Cluster
    Journal Title Number of Times Cited Average cites/year JSTOR views Year of Publication
    American Economic Review
    Article 1 79 24.1 1,670 1968
    Article 2 77 15.7 1,232 1945
    Article 3 181 35.9 1,316 1981
    Quarterly Journal of Economics
    Article 1 175 32.4 2,426 1970
    Article 2 104 26.6 2,400 1992
    Article 3 216 50.9 1,583 1991
    Journal of Political Economy
    Article 1 4 0.5 1,895 1973
    Article 2 8 21.1 1,480 1990
    Article 3 93 17.2 1,258 1983

    Citations do not appear to provide anything like a complete picture of the potential usefulness of a journal article. The most notable example of this point is the number one article for the Journal of Political Economy. Even though this 1973 article has rarely been cited (4 times between 1997 and 1999) and only an average of .5 times per year between 1972 and 1999, it has emerged as the most often-used article from that journal. This article has been viewed 1,895 times and printed 1,402 times during the period that it has been accessible in JSTOR. What this example reveals is not only that citation data may not be the most useful measure for determining what should be digitized, but also that citations focus on what might be called the "reference" or "documentation" value of an article, not its usefulness defined more broadly. Articles with four citations may end up, for a variety of reasons, being the most used. Or, alternatively, highly cited articles may not be used very often at all. This is a factor to keep in mind when selecting content for digitization initiatives.

    17.5 Age of Useful Articles

    Table 17.4 shows calculated summary data for the most frequently used articles in each of the 15 JSTOR clusters. The purpose of this assessment was to take an initial snapshot of the relative value of older literature in each of our JSTOR fields. The chart was assembled by first collecting the number of article views from the JSTOR database, ranking the articles in order from most-often viewed to least viewed, and as in the case of the analysis above, pulling out the ten most frequently used articles. We know the year of publication for each article, so we were able to calculate the average age of the top ten articles for each title. We then averaged these data across each discipline to provide an estimate of the average age of the most-used articles in each field. When evaluated in this way, it was apparent that some older articles have truly lasting value, that in most of the JSTOR fields, older articles were well-represented among the "top ten", and that the value of older material seems to vary with the discipline.

    Again, to use the field of economics as an example, a surprising number of older articles have emerged as the most heavily used. The average age of the articles in the top ten most printed and viewed articles in the economics cluster is 13 years. This is rather surprising, as our expectation before starting JSTOR would have been that usage of economics journals would be much more focused on more recent issues.

    Table 17.4: Summary data for the most frequently used articles in each of the 15 JSTOR clusters.
    Number of Titles Num. of Views from Top 10 Share of Top 10 Views Avg. First Year of Publication Avg. Most Recent JSTOR Year Avg. Age in years of Top 10 Articles
    African American Studies 7 16,637 4% 1959 1996 3
    Anthropology 6 12,301 3% 1954 1994 4
    Asian Studies 4 5,433 1% 1936 1994 11
    Ecology 6 19,293 5% 1943 1996 11
    Economics 13 87,711 22% 1936 1994 13
    Education 4 13,153 3% 1946 1995 11
    Finance 5 13,201 3% 1958 1995 10
    History 15 58,365 15% 1934 1995 12
    Literature 11 23,992 6% 1946 1995 7
    Mathematics 11 7,344 2% 1932 1994 32
    Philosophy 10 16,538 4% 1931 1994 16
    Political Science 9 52,201 13% 1933 1995 8
    Population/Demography 8 15,808 4% 1965 1995 5
    Sociology 9 41,387 11% 1945 1994 6
    Statistics 11 8,480 2% 1936 1994 9

    An even more dramatic example is Mathematics, where the average age of the most used articles in the field is 32 years! This result is consistent with what mathematicians have told us about their field; that is, that older mathematics literature remains valuable. (Mathematicians are some of the most enthusiastic supporters of JSTOR and regularly urge us to include more mathematics titles). However, it is worth pointing out that usage of the mathematics cluster in JSTOR has lagged behind usage in other fields. With the long runs of its 11 journals, as a cluster mathematics has the highest number of pages in JSTOR, and yet usage of the mathematics cluster represents just 3.3% of total usage. One reason for making this point here is that there simply is not enough data to make too much of the average length of the article in mathematics. With a small number of total accesses for the field, the actions of a few people can sway the data significantly. As mentioned earlier, one has to be careful about drawing conclusions from the data.

    Nevertheless, the apparent contradiction between the qualitative value of JSTOR to mathematicians and the usage of the mathematics journals in JSTOR dramatically illustrates an extremely important point. One must define clearly what one means by "value". Usage does not necessarily equate to value in the research sense. Older articles may be absolutely vital to the continuation of high-quality scholarship and research in the field, but that may not lead to extensive use. Increasingly, one hears that libraries are planning to use electronic usage data to help make subscription decisions. If relied upon exclusively, this could prove to be a very dangerous tool, making it more difficult for lesser-used but valuable research journals to survive. Other measures, like citation data, need to be incorporated as well. The nature of these data will also change with the availability of electronic resources. One wonders, for example, if the number of citations to older articles in JSTOR will increase as the older articles become more conveniently accessible. This possibility is worth monitoring, but with the understanding that it will take years before changes in scholars' behavior will manifest itself in the citation data. Understanding the nature of a field and the way that research materials are used in the field is essential before making selection and cancellation decisions. It is our hope that, over the long run, JSTOR can contribution to this kind of understanding.

    17.6 Conclusion

    This paper provides a brief overview of preliminary information emerging from JSTOR usage data. As JSTOR usage increases, more interesting questions about the way that retrospective electronic collections are used can and should be asked and investigated. Although it is still too early to draw conclusions, and much more data will need to be collected, evidence points to preliminary hypotheses in five primary areas.

    1. Electronic access seems to have increased the use of older materials at JSTOR participating sites.

    2. The interdisciplinary nature of JSTOR seems to be valued by researchers and students.

    3. Citation data alone is not a good predictor of electronic usage, and probably should not be used to make digitization decisions for retrospective content.

    4. Older literature seems to remain valuable in many fields.

    5. Care should be taken to insure that there is clear understanding of the definition of "value" for research articles. Judging by the nature of the JSTOR articles that are most used, valuable research articles are not always those that push forward the research and intellectual understanding of an academic discipline; they may very well be "popular" articles used in larger classes. "Value" needs to be clearly defined as libraries consider acquisition and cancellation decisions for electronic content.

    What this preliminary evaluation of JSTOR usage does indicate is that electronic databases are leading us into new territory. Their availability impacts the use of scholarly resources in profound ways. It should come as no surprise that improving the convenience of access to an article increases the likelihood and frequency of use of that article. But does that impact the inherent value of the article? In evaluating usage of these materials, we will have to take a long view, as we cannot rely on old metrics, methods, and intuition to guide our sense of value. It will take time before we reach a new level of understanding — a kind of new equilibrium — of the relevant measures that will enable us to make useful comparisons between and among various resources.

    Notes

    This paper was first presented at the Pricing Electronic Access to Knowledge (PEAK) conference entitle "Economics and Usage of Digital Library Collections," held at the University of Michigan in Ann Arbor, Michigan on March 23 24, 2000.return to textreturn to text

    1. Participating JSTOR libraries and publishers have requested that JSTOR exercise care when presenting and distributing JSTOR usage data. We therefore aggregate these data whenever possible and do not identify the usage at individual sites, for individual publishers, or for individual articles by title.return to text

    2. The original test site libraries were Bryn Mawr College, Swarthmore College, Haverford College, Denison University, and Williams College, in addition to the University of Michigan.return to text

    3. JSTOR digitizes and makes accessible participating journals starting with the first volume published.In order to protect publishers' subscription revenue stream, JSTOR does not include current issues, but offers access up to a "moving wall" negotiated with each publisher.return to text

    4. In assembling usage statistics, JSTOR counts significant accesses, not server hits. Accesses include actions such as viewing electronic tables of contents or citation data, viewing an article, printing an article and executing a search.return to text

    5. JSTOR uses the Carnegie Classes of U.S. Institutions of Higher Education to place colleges and universities into one of five classes ranging from Very Large to Very Small. For a description of our methodology, see http://www.jstor.org/about/us.html#classification .return to text

    6. JSTOR's initial database offering, called Arts & Sciences I, includes journals from 15 academic disciplines, and are displayed in the interface in these groups, which are sometimes referred to as clusters.return to text

    7. Citation data were determined using the Dialog service to access ISI Social Science Citations. Individual articles were located, and citations to these articles were analyzed by year of publication. Data were determined for 1997-1999 inclusive by simple addition. Average number of cites per year was figured beginning in the year of publication or 1972, whichever is later, through the end of 1999, then dividing that total number by the number of years since publication or 1972, whichever is later. 1972 is the earliest year of data provided by ISI in the version of the database we consulted.return to text

    18. Measuring the Impact of an Electronic Journal Collection on Library Costs: A Framework and Preliminary Observations[†]

    18.1 Introduction

    Much has been written about the economic impact of electronic publishing on publishers. There also has been considerable discussion of the cost of subscribing to electronic publications. This paper addresses another important organizational impact triggered by the migration to electronic journals that has heretofore received little attention in the literature: the changes in the library's operational costs associated with shifts in staffing, resources, materials, space and equipment.

    In 1998 the W.W. Hagerty Library of Drexel University made migration to an electronic journal collection as quickly as possible a key component of its strategic plan. If a journal is available electronically, only the electronic version is purchased whenever possible.[1] The sole exceptions are (1) when the electronic journal lacks an important feature of the print version (e.g., equivalent visuals) and (2) when the journal is part of the browsing collection (e.g., Scientific American and Newsweek). With the year 2000 renewals, Drexel's journal collection consisted of 800 print only subscriptions and 5,000 electronic journals; in 2001 the library will subscribe to about 300 print-only journals and over 6,000 electronic journals. A dramatic change in staff workload is the most immediate impact on library operations, but space, equipment, and even supply needs are affected. Some of the aspects of this transformation were obvious and predictable; others were not. This paper describes the changes experienced so far in the Drexel Library.

    A common assumption is that converting library journals to digital format will ultimately improve library service and lower costs, but this is yet to be proven. Understanding the total costs associated with the library model for delivering digital information has now become a requirement for library survival since in the digital world, as opposed to print, the library has many viable competitors. Our goal is to develop a framework for assessing the shifts in personnel and costs that can be used for planning and budgeting at Drexel and provide guidance to other academic libraries who are not yet so far down this path.

    18.2 Background

    Commonly, journal cost analyses use subscription costs and ignore the operational costs associated with a journal collection. For example, White and Crawford (1998) undertook a cost-benefit analysis to determine whether acquiring Business Periodicals Online (BPO), a full-text database, was more cost-effective for supplying articles than obtaining articles in the database through interlibrary loan (ILL). They found that the out-of-pocket costs (ILL transaction costs versus the BPO subscription costs) were similar, but the level of service was much greater with BPO.

    Hawbaker and Wagner (1996) also compute only subscription costs when comparing the costs of print subscriptions to online access of full-text. They conclude that, for a full-text business database, the University of the Pacific's library can offer more than twice as many journals for a 15 percent increase in expenditures.

    The Electronic Libraries Programme (eLib, 2001) funded by the Joint Information Systems Committee (JISC) in the United Kingdom, consists of a series of major library projects investigating issues of digital library implementation and integration. The guidelines for evaluation of the eLib projects call for "modelling of functional, cost, organisational and technical variables" as one of the desired components (Kelleher et al., 1996). Pricing models for electronic journal subscriptions, licensing agreements, and infrastructure requirements to provide access are themes that are explored in these projects. Each project tends to be fairly focused in terms of the range of digital content and services offered or the topic addressed. Many deal with organizational and management models that can serve as the basis for scaling up initiatives.

    JSTOR (1998), which builds journal backfiles, does address building-related costs. One of the JSTOR objectives is "To reduce long-term capital and operating costs of libraries associated with the storage and care of journal collections." By guaranteeing online availability of backfiles, JSTOR not only makes these files more accessible, but also allows libraries to discard old journal runs without decreasing service to their users.

    In an analysis completed several years ago Tenopir and King (2000) report the average cost of acquiring and maintaining a print journal collection to be $71 per title in academic libraries and $81 in special libraries. These are 1998 figures adjusted for inflation.

    Odlyzko (1999) also focuses on non-subscription costs. He points out additional factors to consider in evaluating the impact of journal growth on libraries:

    Journal subscription costs are only one part of the scholarly information system...internal operating costs of research libraries are at least twice as high as their acquisition budgets. Thus for every article that brings in $4,000 in revenues to publishers, libraries in aggregate spend at least $8,000 on ordering, cataloging, shelving, and checking out material, as well as on reference help. The scholarly journal crisis is really a library cost crisis. If publishers suddenly started to give away their print material for free, the growth of the literature would in a few years bring us back to a crisis situation.

    Odlyzko's analysis shows that the library's non-subscription (i.e., operational) costs are on average double the subscription costs. His figures are derived from the Association of Research Libraries (ARL) statistics (Association of Research Libraries, 2000b). His is a macro level measurement that does not take into account, for example, the different processing costs for books and journals or library costs unrelated to the collections which might cause the non-subscription figure to be over-estimated. On the other hand, ARL statistics do not report the considerable costs associated with constructing and maintaining library buildings, a factor which if added to Odlyzko's number would lead to a higher estimate of non-subscription costs. Even if off by a factor of two, Odlyzko's estimate is astounding to consider, and points out the importance of looking at how these operational costs shift in the transition to an electronic model.

    18.3 Development of Drexel's Electronic Journal Collection

    In the spring of 1998 only one full-text journal collection was accessible via the Drexel Library's web site, and database access was limited to text-based systems. That summer the web site was completely re-designed and by the fall more than 20 databases and several collections of full-text journals were available. The total number of print journals was 1,700 titles at the time. For 1999 and 2000 the number of print-only journal subscriptions was reduced to 1,100 and about 800 respectively. Some of the reductions were made because we had subscribed to an electronic counterpart; the other journals were not renewed primarily on the basis of low use. During the fall of 1998 through 1999, and into 2000, electronic subscriptions were sought out aggressively and added as they became available, bringing the fall 2000 total to over 6,000 unique electronic titles. Print-only renewals for 2001 were about 300.

    The selection/ordering/acquiring process is far more complex for electronic journals than for print journals. The methods for purchase often include buying packages of titles or services, many with value-added features. Reviewing and negotiating proper terms for e-journal licenses is a major aspect of the complexity. Additionally, new variables must be considered (e.g., graphics, linking options, comparability with the corresponding print publication, web interface functionality, and other value-added features).

    When the publisher's policy is to require purchase of the print journal in order to obtain access to the electronic journal, we attempt to negotiate a discount for the e-journal only. This has met with limited success so far, but does have the advantage of educating publishers about our needs. Because of the added cost of receiving, processing, binding and storing the print issues, we do not retain print journals even if we have paid for them. Some of these journals are never shipped by the publishers; others are retained by our serials vendor for their back issue file; and still others arrive and are given to our back issue jobber.

    Drexel's approach to back files of print journals will seem cavalier, if not totally irresponsible, to those concerned with the archival role of libraries. Our position is that archival storage in most subject areas is not part of the mission of the Drexel Library.[2] On a national — even international — basis archiving of old, little-used journals would be much more cost effective if done centrally or in a few places for redundancy. This is true of both electronic and print formats. We are willing to make the leap of faith that this will happen, and are ready to pay the cost of access to the archived materials when they are needed. There are numerous well-qualified national and international organizations addressing this issue, including the Research Library Group and OCLC.

    18.4 Impact on Library Staffing and Other Costs

    Here I discuss changes in each area of the library's operations with particular attention to the changes in staffing patterns and shifts in costs. Table 18.1 summarizes these operational effects. No functional area of the library has been left untouched. I describe the changes in each library department below.

    Table 18.1:The Transition from Print to Electronic Journals: Changes in Staffing and Other Costs.
    Function Activity Electronic Format Print Format Net Impact
    Infrastructure Systems & Space campus network completely upgraded in last 2 years ↑ increased capital costs
    computer hardware (servers and workstations) 100% replacement/upgrade of library computers ↑ increased equipment costs
    computer systems maintenance installing software, imaging (1.0 FTE) ↑ increased staffing
    hardware, software maintenance service contracts ↑ increased costs
    setting up access new activity, requires troubleshooting ↑ increased staffing
    software purchase & development new activity to manage more complex process ↑ increased staffing
    printing increased activity ↑ increased costs and revenue
    space utilization content stored remotely fewer items added/extensive collection weeding ↓ reduced space needs
    Administration/Management negotiating contracts new activity ↑ increased staffing
    managing the change closer oversight required ↑ increased staffing
    attention to decisions increased number of variables ↑ increased staffing
    budgeting greatly increased tracking and planning time ↑ increased staffing
    subscription fees titles added titles reduced ↑ increased costs
    Technical Services print journal check-in fewer items to check in ↓ reduced staffing
    acquisitions requires higher skill level fewer items to purchase ↑ increased staffing
    claiming URL maintenance fewer items to claim ? net impact unclear
    bindary staffing effort and fees fewer items; costs down ↓ reduced staffing & costs
    cataloging new items significant increase in number of items significant decrease in number of items ↑ increased staffing
    OCLC transactions increased OCLC charges decreased OCLC charges ↑ increased costs overall
    catalog/e-journal list maintenance significant level of new effort expected decrease over time ↑ increased staffing
    Circulation/Access reshelving fewer items to shelve ↓ reduced staffing
    collecting use data complex, requires higher skill level to organize fewer items to count, takes less effort ↑ increased staffing
    stack maintenance fewer items out of place ↓ reduced staffing
    user photocopying fewer copies made; down 20% ↓ reduced use & revenue
    Reserve article file maintenance fewer articles on reserve ↓ reduced staffing
    article checkout fewer items checked out ↓ reduced staffing
    maintaining e-reserves requires equipment, higher skill level ↑ increased staffing
    Document Delivery faculty copy service copies from e-journals copies from print journals ? net impact unclear
    interlibrary loan-borrowing slight decline in activity ↓ reduced vendor charges
    interlibrary loan-borrowing slight decline in fees - all services expected to decline
    Information Services references at desk fewer but some longer transactions fewer transactions; down 15% ? net impact unclear
    instruction/promotion increased need - expect increase
    preparing documentation increased number of items greater level of review ↑ increased staffing
    journal selection more detailed evaluation process ↑ increased staffing
    Infrastructure

    Systems. While space is the most important requirement for the print format, networks, computer hardware/software and systems staff are required to provide access to electronic resources. These resources are rapidly becoming key components of a well-functioning operation in all academic institutions, as they are essential for so many other reasons. None of the library systems are used for electronic journals exclusively since we provide access to many other applications. The webmaster easily spent 30 percent of his time on electronic journal access during the start-up period. He maintains the entire library web site, which initially included over 200 static HTML pages listing e-journals by title and by subject. When it became clear that maintaining this many continually changing static HTML pages was a major burden, the webmaster developed an e-journal maintenance database using MySQL and PERL scripts to manage the lists and deliver them to the web dynamically.

    Space Utilization. The chief impact of print journals on infrastructure is in the physical space for growth of the collection over time. The transition to electronic journals essentially eliminates space concerns—no more trimming the collection, converting it to microfilm, or moving it to a remote location to make space for new volumes. Eventually, because of retrospective conversion efforts like JSTOR, we will be able to reclaim journal storage space for other purposes. The cost savings, both on a capital and annual basis, are considerable. Estimating $100 per square foot (Fox, 1999), the minimum cost for library buildings in large urban centers, the 20,000-square-foot space currently occupied by the Drexel journal stacks would cost $2 million to construct. Estimating annual maintenance costs at $12 per square foot, the cost of maintaining the space occupied by the library's journal collection is approximately $240,000 per year.

    Subscription Costs

    Budget allocations reflect the decision to shift from print to electronic subscriptions. Purchase decisions are based on two processes. First, we undertake a major initiative to analyze our current print and electronic holdings prior to renewal of subscriptions for the coming year. Secondly, throughout the year we invest significant staff resources to keep current with all e-journal offerings from vendors, publishers and consortia within the scope of our collection and initiate negotiations for pricing and packages tailored to our needs. In particular, we seek out electronic equivalents of current print holdings and replace the print with the electronic version of the title unless the title meets the exception criteria. Eventually, we expect to have a browsing collection of fewer than 100 titles.

    As a result of these efforts Drexel's total journal subscription costs will be approximately $636,000 for 2001. See Table 18.2 for the breakdown. Aggregator subscription costs are difficult to calculate since these resources are part database, part electronic journals. With a "best guess" allocation of the cost of these services, we are spending or expect to spend a total of $600,000 for electronic journals.

    Table 18.2: Subscription Costs FY2001
    Category # of Titles Amount
    Print only subscriptions 300 $36,000
    Electronic subscriptions* 2800 $550,000
    Aggregator/databases with full-text content** 3500 $45,000
    Total E-Journals (Unique titles) 6300 $595,000
    NOTE: *Approximately 200 of these titles are "bundled" and require print plus electronic subscriptions
    **These are services such as ProQuest and Wilson Select which are part database, part journal full-text. Half the cost of these services has been allocated to electronic subscriptions. The count is of unique titles.

    On a raw per-title basis the e-journal subscription dollar has superior purchasing power when the aggregators' titles are included. Our print-only journal subscriptions now cost an average of $120 per title while e-journals are $95 per title. This difference is far greater when one considers that nearly all the electronic journals come, even when a subscription is first entered, with several years of backfiles. In addition, the electronic subscriptions include many titles that cost several thousand dollars in print. The 300 print journals consist mainly of humanities and social science publications, along with some popular titles, all of which are low cost historically. The increased value of electronic journals is even more evident when coupled with use statistics, since our figures show that electronic journals are used more heavily than their print counterparts (see Montgomery and Sparks, 2000).

    Administration/Management

    Academic library directors have always paid considerable attention to journal subscriptions. Journal costs usually take most of the materials budget in science and technology libraries. Faculty often have strong feelings about particular titles which they do not hesitate to make known. Traditionally, the decision to subscribe to a new journal has required careful consideration because of the implication that the subscription is a long term commitment. For the last two decades, as prices escalated so dramatically, directors became increasingly involved in both advocating for additional funding to pay for journals and overseeing the time-consuming annual journal evaluation processes and cost-cutting measures. Electronic journals raise new issues that require the director's involvement to an even greater extent. Activities that are new or escalated for a director who makes a major commitment to electronic journals include:

    • communicating and obtaining institutional funding and support,

    • joining consortia and other "buying clubs,"

    • negotiating and reviewing contracts,

    • determining and revising strategies for e-resource acquisition,

    • building a library staff with the appropriate skills, and

    • managing the change in budget allocation.

    Drexel created a new position, Electronic Resources Librarian (ERL), to provide a focal point for integrated development of all electronic resources. This position crosses the traditional departmental functions of management, systems, technical services and reference. The person in this position shares the responsibility of keeping up-to-date on the availability of new electronic resources with the Information Services (IS) librarians who do collection development. The ERL initiates contacts with vendors to negotiate favorable pricing and packaging and arranges trials for each new service considered for purchase. She also reviews licenses and contracts and negotiates appropriate amendments and corrections to these documents. For example, one of our goals is to provide remote access to content we make available to our users; some contracts do not allow this. The ERL also interacts with consortia for purchase of electronic resources and evaluates the cost/benefits of going with a particular group offer. Once the purchase decision is made, IP information is communicated to vendors and content changes are made on our web site. The ERL collaborates closely with the webmaster in designing and populating our e-journal database. Finally, gathering and organizing use statistics for electronic resources is a major aspect of her responsibilities. A recent ARL SPEC Kit (Blieler and Plum, 1999) describes these activities and the various ways large academic research libraries have structured themselves to deal with them.

    It is always more difficult and time-consuming to manage change than to maintain the status quo. The amount of time spent managing and overseeing this transition is substantial, when one includes major ongoing efforts to restructure workflow and reorganize staff to respond to the migration to electronic journals.

    Technical Services

    In the Technical Services department, the transition to e-journals has had a direct impact on the day-to-day work of each staff member. Changes in workflow and procedures are dramatic, with very large shifts in costs. It is clear that the significant reduction in print titles has directly decreased workload related to the print format. Less time is needed to check in print issues, claim non-arrivals, replace missing pages, and prepare and receive bindery shipments. Also, direct costs for cataloging new print titles and maintaining existing MARC records (OCLC charges) have been reduced. Bindery fees are also reduced accordingly.

    Offsetting the decrease in activity levels and costs related to the print format is a large increased workload for both the serials acquisitions and cataloging functions related to providing access to electronic journals. Updating the e-journal maintenance database that now creates our e-journal lists is a major new task. The e-journal collection is much more volatile than a print collection: links break, coverage changes, and sometimes the electronic journals themselves are available through a new distributor. An advantage of electronic distribution that creates extra work is that we are not bound to calendar-year-only subscriptions, so journals are added continuously and sometimes discontinued during the year.

    Another activity that has greatly affected Technical Services is an expanded review process for journal renewals that includes the IS librarians (each represents the various colleges in the University). During the past two years we have evaluated every journal title — print and electronic — before it is renewed. The coordination and tracking of the renewal decisions has increased significantly.

    Not only has the format of materials shifted, but the volume of materials has increased more than three-fold. We are now managing over 6,000 journal titles as opposed to 1,700 titles two years ago. Unfortunately, we are not always able to switch existing staff to e-journal tasks. In the process of "re-engineering" the entire department, we upgraded two positions, added one temporary position, and replaced one position. We now require detail-oriented support staff who have advanced computer skills and who can adjust to continuous changes in procedures and methods as our environment evolves.

    Circulation/Access/Stack Maintenance

    Staffing. Obviously, shelving decreases when journals are no longer physically stored in the library. Bound journal re-shelving has been reduced by 40 percent and re-shelving of current journal issues is down over 20 percent over the past two years. At Drexel, the collection of print journal re-shelving statistics is only partly automated. Shelvers track use by title as they shelve bound volumes and current issues. Fewer journals to shelve also translates to less time collecting print re-shelving statistics.

    Electronic Use Statistics. In theory, it is easier to collect use statistics and richer, more accurate demographic and search information for electronic journal usage because data collection can be automated and expanded. In reality, at this time it is very difficult and labor intensive to obtain useful and comparable title-by-title use data for electronic journals and compile them in a way that is helpful for making management decisions. Activity measures and, in particular, comparable activity measures across journal vendor services are frustratingly difficult to come by. Mercer (2000) describes the problems encountered in trying to collect and analyze the vendor information to use it for service evaluation and decision-making. Among the statistics reported are session length, number of searches, journal title hits, page hits, types of pages hit, top XX titles accessed each month, "turnaways," form and type of articles downloaded, and number of unique IP addresses using a service or journal title.

    Since the data for print volumes are not strictly comparable, they must be interpreted carefully. Our print statistics represent volumes or issues re-shelved rather than actual articles copied or read, while the e-journal statistics below represent articles accessed which may or may not have been read. The print use data is somewhat under-reported because, even when asked not to, users re-shelve journals after they look at them. Even so, from preliminary data we can say confidently that our users are accessing the electronic journals in numbers far exceeding our print collection.

    Photocopying. Since our statistics have decreased so dramatically for print journal usage, it is only logical that photocopier use would also decrease since this is one of the primary uses of our library photocopiers. Photocopy use has decreased about 20 percent since electronic journals were introduced.

    Reserve

    Circulation of reserve materials, which had been steady at about 30,000 items per year, dropped by 50 percent during the 1999/2000 academic year. What portion of the e-resources used are electronic journals and what are other e-resources is an open question. We do expect this trend to continue for the print reserve format, particularly when our electronic reserve module is fully functional later this year. It appears that not only are students using fewer reserve materials, but our faculty also are placing fewer items on reserve. With respect to staffing impact, we have reorganized some of the work assignments in this department due to the reduced workload, and upgraded the reserve room supervisor position in anticipation of the electronic reserves activity.

    Document Delivery/Interlibrary Loan (DD/ILL)

    Our expectation with the implementation of electronic journals was that we would see a significant decrease in user requests for journal articles via our DD/ILL services. In fact, "borrowing" photocopies of journal articles from other libraries increased by 16 percent in FY1999-2000. The library's document delivery service, which provides copies of articles from the Drexel Library collections free of charge to faculty and distance learners, delivered 1,122 articles from the electronic journal collection in this same time period. The majority of these articles are for faculty who presumably are not aware of the ready accessibility of e-journals, or who either cannot or choose not to retrieve the articles themselves. At the moment, it is not possible to measure the net impact of the electronic journals on the DD/ILL department volume because too many other factors are influencing use of the service. Research activity has grown dramatically at Drexel in the past two years, and the provision of over 100 web-based databases likely has increased demand for articles. Our prediction is that ultimately we will see a decrease in net requests for this service as our users become increasingly self-sufficient and as electronic content continues to expand.

    Reference/Information Services (IS)

    Reference services are nearly always affected by any significant change in library resources. At Drexel the Information Services/Reference staff are responsible for materials selection in addition to the usual functions of answering questions, teaching classes, and performing public relations functions such as promoting the availability of services. So they are involved in several stages of the "life cycle" of electronic journals at Drexel. They share responsibility for identifying e-journal candidates for purchase, evaluating potential purchases, helping students and faculty use the e-journals effectively, incorporating information about them in their classes, and helping publicize them to their constituencies.

    Reference Service. Some interesting trends are occurring at the reference desk. Questions decreased in 1999/2000 by about eight percent although more of the transactions that do occur turn into "teaching opportunities" for those users who are less self-sufficient. Staff observe that, in particular, students using the web-enabled computers in the "hub" near the reference desk, are increasingly self-sufficient.

    Instructional Program. Offsetting the decrease in reference questions is the amount of time IS staff are spending on instruction and outreach activities to make faculty and students aware of the library's resources and services. Workshops and teaching sessions have increased. Vendor presentations are more frequent. IS librarians are engaging in greatly expanded public relations by personal visits and presentations, email updates to departments, exhibits and other activities. Another effort that has also expanded is the preparation of both online and printed documentation to help users understand how to use electronic journals.

    Staffing. The electronic journal option and new processes have most certainly increased the workload for selecting journals. We do expect that over time this increase will level off as the collections and offerings stabilize in the electronic environment. No new staff positions have been added in the IS department but there has been significant turnover and, again, we carefully screen new hires for expanded computer skills and experience using, selecting and promoting electronic resources. A lot of the increased journal evaluation work comes in the summer, a time when many of the other activities of the department are reduced. So far the staff have been able to handle the additional work at current staff levels.

    18.5 Discussion

    Drexel is probably farther along in the transition to an all-electronic journal collection than most, if not all, academic libraries in the United States. A late 1997/1998 survey of ARL and non-ARL academic libraries found that just 29 and 33.5 percent, respectively, had cancelled print journals in favor of electronic access in the previous 12 months (Shemberg and Grossman, 1999). Fifty-one percent of the ARL libraries and 40 percent of the non-ARL libraries had not cancelled print subscriptions in favor of electronic and declared that they will not in the future. Their reluctance was attributed to the enormous change required in academia to relinquish print.

    This has not been a problem for Drexel. Faculty and students have embraced the transition almost universally. Organizational readiness, important in any successful organizational change, has been critical to the ability of the Drexel Library to move so rapidly to a new model. The most important factors have been: (1) a highly computer-literate faculty and student body; (2) programmatic emphases in science, technology and business, areas where publishers have been quickest to provide e-journals; (3) the existence of a high-speed ubiquitous network; (4) general dissatisfaction with the print journal collection; (5) a supportive administration that provided a significant increase in funding; (6) a strong and growing distance education program and (7) a large number of academic institutions in the Philadelphia area, including the nearby University of Pennsylvania, with substantial libraries that are available to Drexel faculty and students

    This description of the Drexel experience should be useful to others because our transition is indicative of what most academic libraries will eventually experience. There are accredited academic institutions that are functioning with completely digital libraries, i.e., they never had a print library. Examples are Jones International University (2000) and the University of Phoenix (2000). Other libraries have created large electronic journal collections—e.g., the University of California system (California Digital Library, 2000) and most, if not all, large research libraries—but they are maintaining large print collections concurrently. The approach Drexel is implementing—substituting electronic for print—will be the typical scenario in most academic libraries because it will be necessary to make electronic collections affordable.

    Preliminary cost comparisons for processing print versus electronic journals indicate that the electronic collection is substantially more expensive to maintain. We estimated staff costs by allocating percentages of individual staff members' time to the various tasks and projects described in this paper using the functional cost analysis approach of Abels, Kantor, and Saracevic (1996). The amount of time spent per task was determined by interviewing staff and supervisors to analyze the impact for each area and by reviewing library statistics and other records. Then, we computed the annual cost in salaries using indiviual rates of pay. The result indicated that the substantial costs in maintaining an electronic journal collection more than offset the savings from eliminating the clerical chores associated with maintaining a print journal collection. While fewer staff are needed the new staff are more skilled, and therefore more highly compensated. Likely, as the electronic journal publishing industry and related service industries mature, the change process will become easier, and thereby less costly, for libraries.

    Drexel's per-title subscription costs are lower for electronic journals. While this is a function of our selection process and the particular "deals" we have been able to obtain, we suspect that the majority of academic libraries will have the same experience, particularly if they purchase a large number of titles through aggregator collections. Since use is much higher for e-journals the cost benefit is even greater. We plan further analysis to refine our calculations of operational costs, as well as subscriptions costs that include factors such as backfiles and use data, in order to come up with good estimates of "real" per-title costs that include all factors of operational costs, and subscriptions costs that include all factors.[3]

    There are many areas where improvements made by publishers and vendors could decrease the library workload. Of particular value would be

    • better information about the existence of electronic journals and their characteristics,

    • standards for presentation of use data by vendors,

    • easier methods of providing access to electronic journals, either through cataloging or in list form, and

    • an assured solution to archiving.

    As the entire electronic publishing system matures, we anticipate that these improvements will come.

    Notes

    Reprinted with permission from D-Lib Magazine, Vol. 6, No. 10, October 2000. (http://www.dlib.org/dlib/october00/montgomery/10montgomery.html) To conform with publication standards this publication has some format differences from the D-Lib article. Also, some notes have been added.return to textreturn to text

    1. Some publishers continue to insist that print journals must be purchased to gain access to the electronic equivalent, i.e., bundling. We have developed strategies to discontinue receiving and storing the print copies.return to text

    2. return to text

    3. The Drexel Library recently received a $128,000 grant from the Institute of Museum and Library Services to study these costs and develop models for measurement.return to text

    19. The Impact of Digital Collections on Library Use: The Manager's Perspective

    This paper focuses on the impact of digital collections on library use based on three years of experience in a metropolitan research university. Through statistics and observations it will be demonstrated that an academic library can become user-centered in the electronic environment. It will also be demonstrated that new educational initiatives from the state government and the university administration can help the library gain a more central place within the academic enterprise. Various education initiatives in Kentucky from 1997 to the present will be used as a case study for this paper to demonstrate that the electronic information environment can lend itself to improve learning and teaching outcomes for a educationally underdeveloped population. Information will be presented on cooperative and consortia-based initiatives to contain costs and to expand access to electronic databases.

    19.1 Introduction

    The availability of electronic information continues to increase. Based on a study at University of California, Berkeley, the world's total annual production of information amounts to about 250 megabytes for each man, woman and child on earth.[1] The challenge will be to learn how to navigate in this sea of endless information. Information is also becoming more accessible each year as more people are acquiring personal computers. Based on 1997 statistics there were 407 computers for each 1000 people in the United States, the highest computer-per-persons ratio in the world. 42.1 percent, almost half of the households in the United States, have a computer.[2] In 1998 higher education spending on computer hardware including personal computers reached $1.4 billion, and this is expected to increase seven percent annually.[3] The growth of the Internet and related Web information during the past several years has likewise been phenomenal and continues to increase at a rapid pace. For example, the sale of electronic books (e-books) was $40 million in the year 2000. The marked increase in Internet and Web use has caused analysts to project that e-book sales will surge to $2.3 billion in 2005. This growth projection is based on convenience in updating data and information, especially as related to college textbook and reference books.[4] As people access and use the Internet and the Web their expectations for finding information quickly and conveniently are growing rapidly, especially in the higher education environment, and academic libraries increasingly experience the effects of these growing information expectations.

    In the current information and technology environment academic library users, students, professors, and researchers have a variety of expectations. They continue to need and want print, monographs, serials, documents, manuscripts, maps, photographs, archives, and related items for teaching, learning, and research. They also want multi-media formats for learning and curriculum support such as films, slides, videos, digital videodisks (DVD), compact disks, tapes, and microforms. Ultimately, academic library users want and need information electronically. It must be available anytime, anywhere, for multipurpose uses, quickly, conveniently, and in a portable and easy-to-use form. Library users not only want to locate the information quickly, they also want to be able to take it with them either by printing it, copying it, sending it by e-mail to their personal computer, or by downloading it unto a disk.

    Librarians must ensure that they are capable of satisfying their users' diverse information needs. In cooperation with the faculty, and based on the curriculum and research needs, academic librarians must continue to build appropriate print and multimedia collections. They must offer convenient access, preferably Web-based, to a large array of electronic information resources including books, documents, journals, and other digitized information, all in full text. They must also provide adequate computer workstations, strong and supportive networks, and printing and downloading capabilities. Finally, they must ensure that their users have or learn the skills to find, access, evaluate, organize, and use electronic information appropriately and responsibly.

    In the 1999-2000 academic year 1,787 academic libraries spent almost $56 million, or 4.7%, of $1.2 in billion acquisition expenditures on electronic resources.[5] To accommodate their users' increasing demands for efficient access to and use of electronic information, most academic libraries have had to update their technologies as well as their infrastructures at substantial cost and effort.

    Users want up-to-date computing, and they need good training and instruction. Academic librarians have begun to rethink their operations and services in terms of the electronic information environment and their users' needs and demands for electronic information access. They have had to gain expertise in evaluating electronic information and appropriate access mechanisms, as well as gain the technical expertise to handle the networking and computer infrastructure. While in many academic institutions use of the physical library and of print resources has begun to decrease slightly in recent years, the use of electronic information sources has increased rapidly. Although national and international standards in reporting statistical data related to electronic information use have not yet been, developed substantial amounts of data on electronic information use is being accumulated.

    Librarians are slowly beginning to understand information-seeking behavior in the electronic information environment and how users search for online information. They are starting to work with vendors and aggregators of electronic information to produce better designs for electronic product use and adequate methods to collect appropriate use statistics for electronic information formats.

    19.2 Electronic Information Environment — Kentucky

    Kentucky's population is undereducated for the challenges they will face in the 21st century information environment. The average income per person falls in the lower third for the United States. The majority of Kentucky citizens do not have a college degree and Kentucky's full-time enrollment in higher education is one of the lowest in the United States.

    Under the leadership of Kentucky Governor Paul Patton, the Legislature and leaders in education have worked together to upgrade the state's total education system. In 1997 the Governor collaborated with the leaders in higher education to increase the percentage of Kentucky's population who have access to higher education. He allocated more than $167 million in additional resources to higher education during the 1998-2000 biennium to improve research initiatives, technology, development of the workforce, physical facilities and to increase financial aid for students.

    Governor Patton supported the creation of a new governing structure for higher education by creating the Council of Post-Secondary Education and gave them responsibility for technical schools, community colleges, comprehensive and research universities and, most recently, continuing education and the newly created Kentucky Virtual University. The Kentucky Virtual University was created in 1998 to help address the problem of access to higher education for Kentucky's citizens living in remote rural areas. The goal of the virtual university is to provide Kentucky's citizens with access to higher education both undergraduate and graduate, no matter where they reside, through electronic learning and online educational support. Utilizing any type of library, school, community center, or other computer, with on-line access, any citizen can have access to information and to instruction.

    The Kentucky Virtual Library (KYVL, at www.kyvl.org ) is a library consortium including all types of libraries. KYVL is funded jointly by the Council of Postsecondary Education and all participating libraries. One of the consortium's initial major initiatives was to ensure that all state universities and community colleges utilize the same client-server library system, Endeavor, to provide common electronic access to these collections. The Kentucky Virtual Library is accessible twenty-four hours a day, seven days a week, from any Internet-connected computer and provides access to a variety of commercial, state and local databases for all citizens of Kentucky. Online tutorials help citizens learn valuable information skills. Timely document delivery, online reference services, cooperative digitizing projects and a common interface assist all citizens to have equitable access to information, including access to such databases as OCLC's First Search and EBSCO.

    19.3 University of Louisville — Libraries

    The University of Louisville (U of L) is the second largest metropolitan research institution in Kentucky with more than 21,000 students. With close to 12 percent of its student body belonging to minorities, U of L has the highest percentage of minority students in the state. Most students, 85.3 percent, are from Kentucky, while the other 14.7 percent are from other states and other countries. The University is a Research I institution with 1,350 faculty, and featuring 163 degree programs including 30 doctoral programs. There are seven libraries including a medical library located on the health sciences campus, a music library, an art library, a science and engineering library, and a law library, as well as Ekstrom Library, the main library. In addition to serving the university community, the libraries are a net lender of library materials for the state of Kentucky. The libraries possess more than 1.8 million volumes, 16,000 current print serial subscriptions and diversified special collections and other media. Access to several hundred databases and more than 25,000 electronic full-text journals is provided.

    At the U of L expenditures for electronic resources during the past five years have more than tripled from 6.7 percent to 15.3 percent, or from $301,000 to $1,259,000 of the acquisitions budget. That trend is continuing. It is noteworthy that U of L's expenditure for electronic information is more than three times that of Kentucky's average and three times that of the national average. To support the growing expenditures for electronic information resources at the U of L, $2.5 million was spent during the past three years on a new client-server system and to update both the technological infrastructure and library computers for staff and the public. The libraries went from a mainframe computer system to a state-of-the art client-server system, from no network to an Internet network featuring 100-megabit connections and a wireless computer environment, from no servers to seven servers, and from fifty "dumb" terminals to 550 state-of-the art computer workstations. Through major rethinking and reallocation the libraries' technology department grew from four to seven full-time staff and gained a support structure of a ten-member technology team. The libraries have several state-of-the-art interactive computer classrooms utilized for more than 900 class sessions with 11,000 students a year to teach curriculum-related information skills. A state-of-the-art computer laboratory is used by more than 75,000 persons during one year.

    19.4 Library Use — Statistics and Observations

    From 1997 to 2002 overall library use increased by forty percent. This use statistic included use of reference services, circulation, including in-house use, reserve and interlibrary loan.

    Approximately 1.8 million users enter the libraries physically each year and use library services including electronic information. The number of persons coming to the libraries has steadily increased each of the last five years. Based on annual assessment data, collected by the Library Assessment Team using student and faculty surveys and focus groups, it has been found that people like the services provided for them because they are based on their information needs. The campus community also appreciates the state-of-the-art electronic information environment in the libraries and last but not least they enjoy such amenities as the computer lab and e-mail facilities.

    In 1998-99 more than seven million electronic uses of the online catalog, web sites and electronic journals were registered; in 1999-2000 such electronic use went up to eleven million, a fifty-seven percent increase. In 1999-2000, only 38 percent of total use of electronic materials came from inside the U of L libraries. External accesses outnumbered internal at nearly a three-to-one ratio. Close partnerships with the faculty have resulted in teaching information skills to more than 8,000 students a year while beginning to integrate information literacy throughout the curriculum.

    The U of L libraries feature eighteen distinct web sites including 1,158 pages and more than 54,000 links. These web sites are updated and increased on a regular basis. Last year alone seven million uses of the electronic catalog and web sites were recorded, an increase of 350 percent over the previous year. It must be noted that the libraries are only at the very beginning of collecting use statistics related to electronic information and the Web. Much more has to be learned to ensure that these statistics are truly meaningful in measuring use.

    The U of L libraries are beginning to allocate significant resources for, and to rethink services related to electronic information. In 2002-20031 access to library users to 270 electronic databases has been made available, as compared to forty-two databases in 1996-97, an increase of over 600 percent. Among these new resources are large databases and services with access to abstracts and full-text articles, such as ABI/Inform, First Search, EBSCO, Biological Abstracts, Beilstein, INSPEC, Medline, Science Direct, Lexis-Nexis, and Web of Science. Databases in almost all subject areas and covering a variety of sources, such as reference books, theses, encyclopedias and biographies, are available.

    The U of L libraries, similar to other academic and research libraries, have been forming partnerships and cooperative agreements with one another to ensure preservation and cost containment for electronic and scholarly publications. The U of L libraries subscribe to several of these, such as SPARC, the Scholarly Publishing and Academic Resources Coalition; MUSE, a consortium of more than twenty-six journals from University Presses and Scholarly Societies; JSTOR, a consortium of over 1,000 academic libraries and preserver of online back files; and IDEAL (International Digital Electronic Access Library), a publisher consortium for online journal subscriptions.

    Access to and utilization to many electronic journals is achieved through a variety of interfaces and search engines such as OVID, a search engine for psychology and health sciences databases. Users prefer this search engine since it allows them options to use and understand complex databases. The OVID search engine enjoys heavy use among health sciences and science students and faculty. Use of this search engine continues to increase dramatically, for example in 1999 43,000 uses were recorded, compared to 89,036 in 2000.

    Another example of user preferences is Web of Science, a citation database for sciences, social sciences, arts and humanities that covers thousands of research journals across many disciplines and offers searchable author abstracts as well as citations to support research. In 1998 access to all components of the Web of Science and its substantial back files was offered for the first time at U of L libraries. Use statistics indicate increasing use of this database. In 1999 37,378 searches were recorded compared to 44,221 in 2000. [6]

    19.5 Electronic Information Use in Distance Education

    Access to the multitude of electronic information resources has had a major impact on library users in distance education. The University of Louisville has been offering a variety of distance education programs both within the United States and in other countries. At this time programs are offered in several countries, including Greece, Egypt, Panama, Czech Republic and Germany. Approximately 3,500 course enrollments per year are registered in distance education offered through the University and each of the participants receives timely and individualized information support.

    During the past nine years the libraries have developed a special program in support of distance education programs. Included in the library support program are document delivery in all types of formats, reference services and instruction in information skills. In 1998 a library distance education office was created with two staff, a library faculty and a technological support person, and with state-of-the-art technology including a proxy server to keep track of students and faculty involved in these programs. The libraries have installed Ariel software in various locations outside of the United States to facilitate document delivery activities. They work with teaching faculty to create appropriate Web pages for the teaching of the courses and including appropriate library and information support. Based on five years of experience the librarians have also developed cost data for library support to distance education students and faculty.[7]

    19.6 Assessment of Library Users

    Assessment of library users takes several forms. During the past two years the libraries' assessment team has completed several surveys of students and faculty. Last year the University, including the libraries, contracted with an assessment firm, Dey Systems, Inc. to develop instruments for measuring students' educational outcomes. In addition, librarians hold meetings and focus groups with graduate students, undergraduate students and faculty to assess information needs and concerns of these groups. Suggestion forms are available throughout the libraries and on the Web sites and help the library staff address library and information needs and concerns.

    The libraries have utilized information obtained from the student and faculty surveys in 1997, 1998, 1999 and 2000 to improve library services and document delivery. Users indicated a need for more computers, more electronic information, more books, more journals, better photocopying, quicker interlibrary loans and additional hours. In response, the libraries have regularly added to and improved library holdings and access to electronic information.The libraries have extended library hours, updated and added computers, and instituted a state-of-the art photocopying service. The libraries also began loaning wireless laptops to people for use in the libraries.

    The trends in user needs and utilization of the libraries during the past three years show an increase in the use of all library services, but especially in reference, the online catalog and electronic databases.

    19.7 Future

    Experience at the University of Louisville during the past several years indicates that users want access to electronic information wherever and whenever possible. They need state-of-the-art computing equipment and strong, supportive networks to make this possible. They also need much training and professional advice to be successful information users. These findings based on library surveys and interviews conducted at U of L are similar to information presented at library conferences and discussions with colleagues around the country. Computers and software products can make information use difficult. Librarians need to provide assistance and make information more usable. Librarians already provide value-added services such as instructional tools, teaching sessions and reference assistance to create a layer of intervention between the user and the products. Librarians are concerned with user needs and provide a user-centered environment. As librarians build good web sites it will help them provide more user-centered information by providing attractive, easy-to-use sites, intuitive navigation, currency and appropriate text links. Librarians facilitate information retrieval, helping users avoid aimless information surfing.

    Librarians have been utilizing the Web and electronic information while working with vendors for several years now to provide their users with the best possible access to online information. They need to work with vendors and providers of electronic information to ensure consistency, and user control. They must also work with electronic information providers to utilize feedback from users. Vendors of electronic information and databases should work with librarians to create better common interfaces to electronic databases, and consistent statistical reporting. Such statistical reports should include number of logons, number of actual searches of a particular database, number of actual users of full text articles, type of subject searches completed, and how many users were unsuccessful. Such data will enable librarians to assess actual use of particular databases and specific journal articles so they can make electronic material selection decisions based on actual user needs.

    Librarians need to assess the use of digital collections in terms of comparisons to print use; previously underserved populations; change in usage patterns; value of the collections for campus information support; change in and preservation of scholarly communication; and finally, the effect on overall expenditures.

    Furthermore librarians need to regularly assess the impact of electronic information on library operations and services. Already operations have been and are in the process of changing, especially in terms of cataloguing, processing and collection building. More outsourcing of processing to obtain shelf-ready monographs is becoming the norm. Use of approval plans is increasing. Networking in cataloguing facilitates faster and less expensive cataloguing.

    Services are similarly changing in terms of electronic information provision, reference, instruction, reserves and document delivery. Academic libraries have been implementing electronic reserves to facilitate access and faster document delivery. They have been implanting software packages such as Ariel and Illiad to improve interlibrary loan processes. Academic librarians have energetically made electronic information available through their libraries and they are beginning to rethink reference in an electronic environment.

    User studies are beginning to indicate that the following factors statistically influence the use of electronic information: form of access; available technology; available guidance and instruction; full text availability. Librarians need to work closely with teaching faculty to assess the impact of digital information in terms of learning outcomes for students and with researchers to assess the effect of electronic information on research results.

    Notes

    1. Lyman, Peter and Hal R. Varian, How Much Information? 2000. Retrieved from http://www.sims.berkeley.edu/research/projects/how-much-info/summary.html on 24 July 2003.return to text

    2. Statistical abstracts of the United States. Washington, D.C.: U.S. Dept. of Commerce, Bureau of the Census, 1999.return to text

    3. U.S. industry & trade outlook. New York : DRI/McGraw-Hill : Standard & Poor's. Washington, D.C. : U.S. Dept. of Commerce/International Trade Administration, 2000.return to text

    4. Standard & Poor's Corporation. Standard & Poor's industry surveys. New York: Standard & Poor's Corp., 2000, p. 6.return to text

    5. Although there are 4,723 academic libraries in the United States, only 1,787 report information on their acquisition expenditures nationally and the above statistics originated from these 1,787 libraries. The Bowker annual library and book trade almanac. New York : Margaret M. Spier, R.R. Bowker, 2000, pp. 420-421.return to text

    6. More cooperation is needed from vendors to ensure that such statistics are recorded and that they measure different types of usage. return to text

    7. Edge, S. M. (2000). Faculty-librarian collaboration in online course development. In S. Reisman (Ed.), Electronic Learning Communities - Issues and Practices (pp. 135-185). Greenwich, CT: Information Age Publishing, Inc.return to text

    20. Economics and Usage of a Corporate Digital Library

    This paper analyses the usage of the journals and books available at the BT Digital Library, looking in detail at the usage by the 3,500 people at BT's development site at Adastral Park, near Ipswich, and the impact of this usage on purchase decisions in the library.

    20.1 Background

    British Telecommunications (BT) is a leading provider of telecommunications services. Its main products and services are providing local, long distance, and international calls; providing telephone lines, equipment, and private circuits for homes and businesses, providing and managing private networks, and supplying mobile communications services.

    BT's library is organisationally located in a unit called Advanced Communications Engineering (ACE). ACE's 3,500 people mainly work in software and telecoms engineering. ACE develops advanced communication technologies for the companies across the BT Group and for selected other businesses.

    ACE is headquartered at Adastral Park, Martlesham, in the East of England. It's the centre of technical expertise for the BT Group and works with the company's businesses worldwide to help them to deliver new products and services to their customers and to build infrastructure for their future. Its reputation was established with BT's pioneering work in the field of optical communications.

    While the majority of ACE's people are based at Adastral Park, significant numbers are located in offices worldwide, including locations in Asia, continental Europe, and North America. They include many who are in the forefront of their specialist fields, leading the development of standards and new technologies in areas including multimedia, IP and data networks, mobile communications, network design and management, and business applications.

    Like other companies in high-tech sectors, BT experiences major pressures on costs and products, requiring a challenging combination of cost-cutting and innovation to maintain competitiveness. These pressures have led to a change of direction in BT's research focus, moving from research in the optics area towards software and internet engineering.

    20.2 The BT Library

    Until the mid-1990s, BT's library had a large collection to meet the needs of these researchers, with more than 800 titles, 23 staff, and accommodation totalling 450 square meters. The library provided a full range of services including document delivery from its own collection, inter-library lending, and carrying out online searches for its users.

    The pressures on BT have fed through to the library. Costs were increasing at a time when pressure to reduce overheads was insurmountable. The library's user community was changing as BT shifted its research efforts away from pure science into more directed research. As they moved into these new areas, users became much less inclined to visit the physical library, preferring the information they needed to be delivered to them.

    In 1994, the library's management team realised it could no longer pursue the path of looking for incremental budget cuts and savings. Radical change was needed. Benefiting from an extensive study of the usage of library material and building on increasing confidence in the in-house enhancements to library automation systems, the library chose to fundamentally rethink the way it provided services to its users.

    The library collection was cut to a core 250 journals that were heavily used by people visiting the library in person. Accommodation and staff have been trimmed by 67%, with library staff being redeployed elsewhere in the company. Attention was focussed on establishing a digital library that provided users with access to content over the network through online journals backed up by commercial document delivery.

    The provision of loans and photocopies was outsourcedwith the development of the BLADES system. BLADES accepts and validates user requests, tracks the progress of requests and provides status reports for users and input for billing systems for those users who pay for requests (Broadmeadow, 1997). User requests are transmitted semi-automtically to the British Library for fulfilment, with the BL delivering photocopies and loans directly to the user. The substantial savings in journal subscriptions more than offset the cost of commercial document delivery. Part of the savings were due to a communications campaign that highlighted the real cost to BT of requesting a photocopy, thus reducing demand.

    The BT Library provides over 800 online journals to its users, either loading them onto its own server or linking through to the publishers' or aggregators' server.The Inspec and ABI/Inform databases act as gateways to these journals, using software developed by BT's knowledge management research team for searching, current awareness and collaboration features. The databases are used to provide end-user searching and browsing, with the provision for users to save searches for selective dissemination of information (SDI) as the databases are updated each week. A table of contents service is also provided.

    In the physical library, the librarian could readily see users when they were floundering in their search for information and discretely offer assistance. In the digital library, users are relatively invisible to the librarian. The Digital Library is developing methods for understanding its users, behavior more effectively. Studies of user behavior are intended to highlight the server's problem areas, which then can be redesigned to make them easier to use, and to develop ways of automatically profiling users' interests and work areas. The unspoken purpose of this analysis is also to develop a compelling case showing how effectively the library supports BT's business processes.

    20.3 Methodology

    Although no detailed study of the usage of the Digital Library has been undertaken, it is important to understand how users access the collection, what problems they find there, and what their usage patterns are. Data about where users are in the organisation are important because of the need to allocate costs and charges.

    The Digital Library studies the library server's log files and uses a number of channels to encourage user feedback. The difficulties of log file analysis are well documented (Wright, 1999). The log files are huge and processing them can be time-consuming. Unless user authentication is required, logs record only machine addresses and not personal identifiers. Each server transaction is logged, so a user retrieving a page with five graphics is recorded in six lines in the log file. Users share machines at cybercafes, operate behind proxies, or they use dynamic IP addresses, so that the IP address cannot readily be tied to a single user. Although Wright is describing the problems of log file analysis for servers on the internet, the Digital Library faces the same challenges. BT's intranet is large and proxies and firewalls are installed between different portions of the network. Users share machines in public spaces or borrow colleagues'PCs. Dynamic allocation of IP addresses is used within the intranet to increase flexibility.

    A number of packages are available to assist in log file analysis (Busch, 1997). These packages report statistics such as the total hits on the server, the number of Not Modifieds (304's), Redirects (302's), Not Founds (404's), Server Errors (500's), the number of unique URL's served, the number of unique client hosts accessing the server, the total kilobytes transferred, the top one second, one minute, and one hour periods, the most commonly accessed URL's, and the top 5 client hosts accessing server. These deliver a higher level of management information than the librarian needs and are not used in the BT Library

    Wright describes techniques for grouping unidentified readers into "constituencies", based on their usage patterns (Wright, 1999). These constituencies, such as robot checkers, users checking the What's New page, new users, or demonstrators, are identified by analysing the server's log files and can then be used to observe navigation of the site and spot usability problems.

    The BT Digital Library adapted Wright's ideas in its own analysis of its log files. The purpose of the analysis is

    • to understand who is using the Digital Library,

    • to track individual usage of the library, to enable personalisation and collaborative filtering,

    • to understand which library resources are being used and the extent of that usage to inform renewal decisions,

    • to track which material was being requested through the document delivery system to suggest additions to the collection,

    • to ensure that usage of material is within the licenses agreed upon with publishers.

    Perl scripts are used to extract meaningful usage data from the server's log, concentrating on the html pages and pdf files read and the server's cgi-scripts run, and ignoring less meaningful traces of usage. Accesses from robot checkers and from the library's own staff were excluded in the analysis. Weekly reports are prepared detailing

    • the number of distinct IP addresses accessing the server (as a proxy for the number of individual users),

    • the number of users who logged into the server,

    • the number of searches done in each of the library databases,

    • the number of individuals searching each of the library databases,

    • the total number of online journal articles read from the library server,

    • the number of readers of the library's collection of online books,

    • the number of articles read from each of the online journals purchased through individual subscriptions,

    • the number of users accessing their SDI pages,

    • the number of users who subscribe to journals' tables of contents and the number of journals which have subscriptions to their tables of contents,

    • the number of users annotating database records and the number of annotations made.

    20.4 Qualitative feedback

    The Digital Library strenuously encourages user feedback, although it has not yet carried out formal user surveys. Less formal methods of obtaining feedback are used, such as user meetings and publicity events, e-mailing users when they have had their password reset, and user feedback links on the server.

    20.5 Usage of the Digital Library

    Journals

    The BT Library offers 800 online journals to its user community, within the limits of the publishers' agreements. The collection of online journals is supplemented by a disintermediated document delivery service providing what might be called near-online journals. Articles not available online on the Library's server can be requested in a straightforward way and are usually delivered in two to three days.

    A noticeable impact of the move from the physical to the digital library is in the distribution of the user base. In 1994 the library served the research community almost solely. In spite of current awareness bulletins, which were distributed throughout the company, and a document delivery service to supplement this, approximately 90% of the library's usage was from the Adastral Park site. By the end of 1998, when ACM, IEE, and IEEE journals became available on the server in addition to the in-house journals and a selection of titles from Elsevier, this figure had gone down to 61%. In 1999 the Library's collection was enhanced with the addition of material from ABI/Inform. Since then, the balance has shifted so that only 40% of users come from the BT's Adastral Park site.

    As part of this study, the usage of the 3,500 potential users at Adastral Park was examined. These users are readily traceable, because they use relatively static IP addressing, allowing a more detailed study of individual usage.

    In 1999, 1,091 users from BT ACE read 9,108 journal articles from the digital library 12,919 times. (These figures exclude journals from the ACM Digital Library and from the selection of other journals available only on the publishers' sites, where usage data is not available.) In comparison, the library had 1,500 users registered for access to the physical library and lent fewer than 8,000 documents in the same period.

    IEL

    The IEEE's IEL product offers IEE and IEEE journals and conference proceedings in Adobe Acrobat format. These are received monthly and loaded onto the library server, so that data on usage are available for study. In the BT implementation, user registration is not necessary to access these publications, so user analysis is limited to studying access by IP address. Data on usage are available from November 1998, which is too short a period to do more than speculate on seasonal variations beyond noticing the obvious drop in readership at the Christmas/New Year period.

    A mean of 39 users read IEL papers each week, with a minimum of 7 and a maximum of 56. These users read 11 papers each, with a minimum of 4 and a maximum of 38.

    The IEL collection offers a typical usage pattern, with 80% of journal usage concentrated on 21% of the titles in the package.

    ABI/Inform

    ABI/Inform offers a wide range of management and trade journals. Some of these, such as Harvard Business Review or trade journals in the telecoms area are of key interest to the library's user base. ABI/Inform has been fully available through the Digital Library for two months, which restricts the possibility for usage analysis.

    During this limited period, a mean of 97 users a week have read ABI/Inform papers, with a minimum of 25 and a maximum of 214. These people have read 3 papers each, with a range from 2 to 6 papers read per user during this limited period.

    Elsevier

    The Digital Library holds a collection of more than 20 Elsevier titles online. Unlike ABI, IEL, and ABI/Inform which offer a package of publications, the Elsevier collection is based on the set of journals the library took in paper form. These were selected as being core journals, based on library usage and on the library's understanding of BT's research interests. Because access to these journals can be more easily recorded than that to those in paper format, usage patterns can be tracked more easily, advising the librarian which titles are no longer appropriate to hold in the library's collection. In addition, the library's document delivery system records the publishers of documents requested, allowing the librarian to monitor new titles for inclusion. Three titles are heavily used, but even these are not accessed at all in 30% of the weeks. BT's recent decision to stop research in the speech processing area is reflected by a sharp drop in accesses to journals in these areas. In between these two extremes are the majority of journals that are used sporadically. In spite of the additional data on how frequently these journals are used, collection management is still difficult because the masses of data now available make it challenging to extract meaningful information.

    Books

    The Library has made a limited start to providing online books to its user base. Twenty-four computing books from O'Reilly on Perl, Java, Unix, and networking were made available in 1999. 87 users a week access one of the O'Reilly books online, with the number of users ranging from 8 to 179 in any one week. These books, serving as reference books for problem-solving as well as textbooks, are ideal for online publishing. They have certainly produced the most positive unsolicited feedback.

    20.6 Conclusion

    The corporate librarian is pressed to demonstrate his or her value to the parent organisation, leading to efforts to reduce costs and improve library usage. In BT's case, these pressures resulted in outsourcing labour-intensive activities, such as document delivery, and in replacing paper-based publications with online versions. Susan Rosenblatt is reported as commenting that "available information drives patterns of usage" (Odlyzko, 1997a). BT's experience bears this out. Making more information available on the intranet increases library usage both by local users choosing online access in preference to using the library in person and by remote users who previously had no practical means of access.

    Bibliography

    E. G. Abels, P. B. Kantor, and T. Saracevic.Studying the costs and value of library and information services: Applying functional cost analysis to the library in transition.Journal of the American Society for Information Science, 47: 217-227, 1996.

    A. Albanese. AAPA negotiates wiley journal price cut! Library Journal Academic Newswire, September 12 2000.

    B. Albee and B. Dingley. U.S. periodicals prices-2000. American Libraries, pages 78-82, May 2000.

    American Association of University Professors. Faculty salaries 1985-86. Chronicle of Higher Education, 32 (5): 25, 1986.

    American Association of University Professors. Average salaries for full-Time faculty members, 1998-99. Chronicle of Higher Education, 43 (33): A18-A21, 1999.

    K. Anderson, J. Sack, L. Krauss, and L. O'Keefe. Publishing online-only peer-reviewed biomedical literature: Three years of citation, author perception, and usage experience. Journal of Electronic Publishing, 6 (3), March 2001.

    R. Arjoon. (In)efficient market models: the reality behind economic models in the publishing industry. Learned Publishing, 12 (2): 127-133, 1999.

    C. J. Armstrong and R. Lonsdale. The Publishing of Electronic Scholarly Monographs and Textbooks. An eLib Supporting Study. London: South Bank University , 1998.

    Association of American Universities. A national strategy for managing scientific and technological information. In Reports of the AAU Task Forces, pages 43-98. Washington, DC: Association of Research Libraries, 1994.

    Association of Research Libraries. ARL Statistics 1998-99. Washington, DC: Association of Research Libraries, 1999.

    Association of Research Libraries. Directory of Scholarly Electronic Journals and Academic Discussion Lists. Washington, DC: Association of Research Libraries, 2000a.

    Association of Research Libraries. Statistics and measurement program, 2000b.

    Association of Research Libraries. Supplementary statistics, 2000-2001. Available from http://www.arl.org/bm~doc/sup01.pdf , 2001.

    S. J. Bensman and S. J. Wilder. Scientific and technical serials holdings optimization in an inefficient market: A LSU serials redesign project exercise. Library Resources and Technical Services, 42 (3), July 1998.

    T.C. Bergstrom. Free labor for costly journals? Journal of Economics Perspectives, 15 (4): 183-198, 2001. Available at http://www.econ.ucsb.edu/~tedb/Journals/jeppdf.pdf .

    J. S. Birman. Scientific publishing: A mathematician's viewpoint. Notices of the AMS, 47 (7): 772-773, 2000.

    R. Blieler and T. Plum. Networked Information Resources. SPEC Kit 253. Washington, DC: Association of Research Libraries, 1999.

    Bodleian Library. (2007). In Encyclopædia Britannica. Retrieved December 7, 2007, from Encyclopædia Britannica Online: http://www.britannica.com/eb/article-9080363

    Maria S. Bonn, Wendy Pradt Lougee, Jeffrey K. MacKie-Mason, and Juan F. Riveros. The PEAK project: A field experiment in pricing and usage of a digital collection. In this volume.

    W. Bowen. JSTOR and the economics of scholarly communication. In M. Butler and B. Kingma, editors, The Economics of Information in the Networked Environment. Haworth Press, 1998.

    S. Broadmeadow. Outsourcing document supply — the BT experience. Interlending & Document Supply, 25 (3): 108-112, 1997.

    A. Buckholtz, 2001. E-mail to the SPARC Steering Committee, 22 March.

    M. K. Buckland. Redesigning Library Services: A Manifesto. American Library Association, 1992.

    Bureau of Labor Statistics. Consumer price index—All urban consumers, medical care, 2000. Retrieved from http://www.bls.gov/ on 12 June 2000.

    DD. Busch. Count your blessings. Internet World, 1997.

    I. Butterworth. Introduction. In I. Butterworth, editor, The Impact of Electronic Publishing on the Academic Community. London: Portland Press, 1998. Available at http://www.portlandpress.com/pp/books/online/tiepac/session1/intro.htm .

    California Digital Library, 2000.

    M. Case. Capitalizing on competition: The economic underpinnings of SPARC. In this volume.

    George A. Chressanthis and June D. Chressanthis. The determinants of library subscription prices of the top-ranked economics journals: An econometric analysis. Journal of Economic Education, 25 (4): 367-382, 1994.

    C. M. Christensen. The Innovator's Dilemma: When New Technologies Cause Great Firms to Fail. Harvard Business School Press, 1997.

    K. G. Coffman and A. M. Odlyzko. The size and growth rate of the internet. First Monday, 2 (10), October 1998. http://firstmonday.org/issues/issue3_10/coffman/index.html . Also available at http://www.dtc.umn.edu/~odlyzko/ .

    P. Conway. Yale University library's project Open Book. D-Lib Magazine, 1996. Available at http://mirrored.ukoln.ac.uk/lis-journals/dlib/dlib/dlib/february96/yale/02conway.html .

    Colleen Cook, Freed Heath, and Bruce Thompson. Zones of tolerance in perceptions of library service quality: a LibQUAL+ study. portal: Libraries and the Academy, 3 (1): 113-123, January 2003.

    Cornell. Journal price study: Core agricultural and biological journals, 1998. Ithaca, N.Y.: Faculty Taskforce, College of Agriculture and Life Sciences, Albert R. Mann Library, Cornell University. Available at http://jps.mannlib.cornell.edu/jps/jps.htm .

    John Cox and Laura Cox. Scholarly publishing practice: the ALPSP report. (Association of Learned and Professional Society Publishers). Available from http://www.alpsp.org/ngen_public/article.asp?id=200&did=47&aid=278&st=&oaid=-1 , 2003.

    H. Darmon, F. Diamond, and R. Taylor. Fermat's last theorem. In Elliptic Curves, Modular Forms & Fermat's Last Theorem (Hong Kong, 1993), pages 2-140. Internat. Press, 1997.

    P. M. Davis. Patterns in electronic journal usage: Challenging the composition of geographic consortia. College & Research Libraries, 2002.

    Peter Deutsch, Alan Emtage, Martijn Koster, and Markus Stumpf. Publishing information on the internet with anonymous FTP. Internet draft, expired March 1, 1995, 1994.

    E. Duranceau. The economics of electronic publishing. Serials Review, 21 (1): 77-90, 1995.

    Economic Consulting Services Inc. A study of trends in average prices and costs of certain serials over time. In Report of the ARL Serials Prices Project. Washington, DC: Association of Research Libraries, May 1989.

    Electronic Publishing Services Ltd (EPS Ltd). Scientific, Technical and Medical Journal Publishing: a Market in Transition. London: EPS Ltd, 1999.

    eLib. The electronic libraries programme, 2001.

    Elsevier Science, 2000. Elsevier Science Ad, Science, 289:890.

    Elsevier Science. TULIP Final Report. New York: Elsevier Science , 1996.

    P. C. Fishburn, A. M. Odlyzko, and R. C. Siders. Fixed fee versus unit pricing for information goods: Competition, equilibria, and price wars. First Monday, 2 (7), July 1997. Also in Internet Publishing and Beyond: The Economics of Digital Information and Intellectual Property, B. Kahin and H. Varian, eds., MIT Press, 2000, and available at .

    J. H. Fisher. Comparing electronic journals to print journals: Are there savings? In Session 1, Economics of Electronic Publishing: cost issues. Association of Research Libraries , 24-25 April 1997. Scholarly Communication and Technology, a conference at Emory University organized by The Andrew W. Mellon Foundation.

    F. Fishwick, L. Edwards, and J. Blagden. Economic Implications of Different Models of Publishing Scholarly Electronic Journals for Professional Societies and Other Small or Specialist Publishers. London: South Bank University , 1998. Report to the Joint Information Systems Committee Electronic Libraries Programme.

    L. A. Fletcher. Developing an integrated approach to electronic publishing: tailoring your content for the web. Learned Publishing, 12 (2): 107-117, 1999.

    B. L. Fox. Library buildings 1999: Structural ergonomics. Library Journal, 124: 57-67, 1999.

    K. Frazier. The librarians' dilemma: contemplating the costs of the 'big deal'. D-Lib Magazine, 7 (3), 2001. Available at http://www.dlib.org/dlib/march01/frazier/03frazier.html .

    Amy Friedlander. Dimensions and Use of the Scholarly Information Environment. Introduction to a Data Set Assembled by the Digital Library Federation and Outsell, Inc. Council on Library and Information Resources, November 2002. Available from http://www.clir.org/pubs/reports/pub110/contents.html (accessed July 19, 2003).

    Amy Friedlander and Rändi S. Bessette. The Implications of Information Technology for Scientific Journal Publishing: A Literature Review. National Science Foundation, Division of Science Resources Statistics, Arlington, VA, 2003. Available from http://www.nsf.gov/statistics/nsf03323/ .

    L. R. Garson. Can electronic journals save us? — a publisher's view. In M. E. Butler and B. R. Kingma, editors, The Economics of Information in a Networked Environment, pages 115-121, 1996. Proceedings of the Conference Challenging the Marketplace Solutions to Problems in the Economics of Information, Washington, DC, 18-19 September 1995.

    W.D. Garvey and B. Griffith, 1963. The American Psychological Association's Project on Scientific Exchange in Psychology, Report No. 9, Washington, D.C.: APA.

    R. Gazzale and J. K. MacKie-Mason. PEAK: System design, user costs and electronic usage of journals. In this volume.

    M. Getz. Evaluating digital strategies for storing and retrieving scholarly information. In S. H. Lee, editor, Economics of Digital Information: Collection, Storage and Delivery. Haworth Press, 1997.

    M. Getz. Electronic publishing in academia: An economic perspective. The Serials Librarian, 36 (1-2): 263-300, 1999.

    C.B. Grannis. U.S. book exports and imports, 1989. In Library and Book Trade Almanac, 1991, pages 436-445. R.R. Bowker, New Providence, N.J., 1991.

    A. N. Greco. International Book Title Output: 1990-1996. Bowker Annual. New Providence, NJ: R.R. Bowker, 1999.

    J.-M. Griffiths and D. W. King. Special Libraries: Increasing the Information Edge. Washington, D.C.: Special Libraries Association, 1993.

    K. M. Guthrie. Revitalizing older published literature: Preliminary lessons from the use of JSTOR. In this volume.

    Kevin M Guthrie. JSTOR: From project to independent organization. D-lib Magazine, 3, July/August 1997. Available at http://www.dlib.org/dlib/july97/07guthrie.html .

    J. Haar. Project PEAK: Vanderbilt's experience with articles on demand, June 1999.

    Karla L. Hahn. Electronic Ecology: A Case Study of Electronic Journals in Context. Association of Research Libraries, 2001.

    L. Halliday and C. Oppenheim. Economic models of digital-only journals. In this volume.

    L. Halliday and C. Oppenheim. Comparison and evaluation of some economic models of digital-only journals. Journal of Documentation, 56 (6): 660-673, 2000a.

    L. Halliday and C. Oppenheim. Economic models of digital only journals. Serials, 13 (2): 59-65, 2000b.

    L. Halliday and C. Oppenheim. Economic aspects of a national electronic reserve service. Journal of Documentation, 57 (3): 434-443, 2001a.

    L. Halliday and C. Oppenheim. Economic aspects of a resource discovery network. Journal of Documentation, 57 (2): 296-302, 2001b.

    S. Harnad. PostGutenberg galaxy: How to get there from here, 1995a.

    S. Harnad. Electronic scholarly publication: Quo vadis? Serials Review, 21 (1): 70-72, 1995b.

    S. Harnad. Implementing peer review on the net: scientific quality control in scholarly electronic journals. In R. Peek and G. Newby, editors, Scholarly Publishing: the Electronic Frontier, pages 103-118. Cambridge: MIT Press, 1996.

    S. Harnad and M. Hemus. All or none: no stable hybrid or half-way solutions for launching the learned periodical literature into the post-Gutenburg galaxy. In I. Butterworth, editor, The Impact of Electronic Publishing on the Academic Community. London: Portland Press , 1997.

    A. C. Hawbaker and C. K. Wagner. Periodical ownership versus full-text online access: A cost-benefit analysis. Journal of Academic Librarianship, 22: 105-109, 1996.

    M. Hedstrom and J. L. King. On the lam: Library, archive, and museum collections in the creation and maintenance of knowledge communities. Working paper, 2003. University of Michigan School of Information.

    A. Holmes. Electronic publishing in science: Reality check. Canadian Journal of Communication, 22: 105-16, 1997.

    K. Hunter. PEAK and elsevier science. In this volume.

    C. Jenkins. User studies: electronic journals and user response to new models of information delivery. Library Acquisitions: Practice and Theory, 21: 355-363, 1997.

    Jones International University, 2000.

    JSTOR. The need, 1998.

    P.B. Kantor, C. Mandel, and M. Summerfield. The Columbia University Evaluation Study of Online Book Use. In this volume.

    Sune Karlsson and Thomas Krichel. RePEc and S-WoPEc: Internet access to electronic preprints in economics. presented at the Third ICCC/IFIP Conference on Electronic Publishing in Ronneby, May 10-12, 1999. Available at http://swopec.hhs.se/admin/elpub99.pdf , 1999.

    J. Kelleher, E. Sommerlad, and E. Stern. Evaluation of the electronic libraries programme: Guidelines for eLib project evaluation, 1996.

    A. R. Kenney. Digital to microfilm conversion: A demonstration project 1994-1996, 1997.

    D. W. King and C. H. Montgomery. After migration to an electronic journal collection: Impact on faculty and doctoral studies. D-Lib Magazine, 8 (12), 2002.

    D. W. King and C. Tenopir. Scholarly journal & digital database pricing: Threat or opportunity? In this volume.

    D. W. King and H. Xu. The role of library consortia in electronic journal services. In A. Pearce, editor, The consortia site licence: Is it a sustainable model? Ingenta Institute, Cambridge, 2003.

    D. W. King, D. D. McDonald, and C.H. Olsen. A survey of readers, subscribers, and authors of the jnci. Avail. , 1978.

    D. W. King, D. D. McDonald, and N.K. Roderer. Scientific journals in the United States: Their production, use and economics. Out of Print, Avail. , 1981.

    D. W. King, P. Boyce, C. H. Montgomery, and C. Tenopir. Library economic metrics: Examples of the comparison of electronic & print journal collections and collection services. Library Trends, 51 (3), 2003.

    D.W. King and J.M. Griffiths. Economic issues concerning electronic publishing and distribution of scholarly articles. Library Trends, 43 (4): 713-740, 1995.

    D.W. King and C. Tenopir. Using and reading scholarly literature. In M.E. Williams, editor, Annual Review of Information Science and Technology, volume 34. Information Today, Inc., Medford, NJ, 2000.

    Bruce R. Kingma. The economics of digital access: The early Canadian online project. In this volume.

    B. Klopfenstein. Problems and potential of forecasting the adoption of new media, pages 21-41. L. Erlbaum Associates, 1989. J. L. Salvaggio and J. Bryant, editors.

    Knight Higher Education Collaborative. Who owns teaching? Policy Perspectives, 10 (4), August 2002. http://www.thelearningalliance.info/Docs/Jun2003/DOC-2003Jun9.1055175454.pdf .

    G. Kolata. Web research transforms visit to the doctor. New York Times, pages A1 & A6, March 2000.

    Thomas Krichel. ReDIF version 1. available at http://openlib.org/acmes/root/docu/.papers/redif_1.a4.pdf , 2000.

    Thomas Krichel. About NetEc, with special reference to WoPEc. Computers in Higer Education Economics Reviev, 11 (1): 19-24, 1997. Available at http://netec.mcc.ac.uk/doc/hisn.html .

    Martha Kyrillidou and Mark Young. ARL Statistics 2001-2002: Research Library Trends. Association of Research Libraries, 2003. Available from http://www.arl.org/bm~doc/arlstat02.pdf .

    S. Lawrence. Online or invisible? Nature, 411 (6837): 521, 2001.

    Steve Lawrence and C. Lee Giles. Accessibility of information on the web. Nature, 400 (8 July): 107-109, 1999.

    D. Lenares. Faculty use of electronic journals at research institutions. In Proc. ACRL April 1999 conference, 1999. .

    M. Lesk. Substituting images for books: The economics for libraries, 1998.

    M. Lesk. Practical Digital Libraries. Morgan Kaufmann, 1997.

    J. C. R. Licklider. Libraries of the Future. MIT Press, 1965.

    L. Lieberman, R. Noll, and W. E. Steinmuller. The sources of scientific journal price increase. Center for Economic Policy Research, Stanford University, working paper, 1992.

    Wendy P. Lougee. The university of michigan digital library program: A retrospective on collaboration within the academy. Library Hi Tech, 16 (1): 51-59, 1998.

    Wendy Pradt Lougee. Diffuse Libraries: Emergent Roles for the Research Library in the Digital Age. Council on Library and Information Resources, August 2002. Available from http://www.clir.org/pubs/abstract/pub108abst.html .

    J.M. Lufkin and E.H. Miller. The reading habits of engineers: A preliminary study. IEEE Transactions on Education, E9 (4), 1966.

    H. Lustig. Electronic publishing: Economic issues in a time of transition. In A. Heck, editor, Electronic Publishing for Physics and Astronomy. Kluwer, 1997.

    J. Luther. White paper on electronic journal usage statistics. J. Electronic Publishing, 6 (3), March 2001.

    Clifford Lynch. From automation to transformation: Forty years of libraries and information technology in higher education. Educause Review, 35: 60-68, January/February 2000.

    F. Machlup. Uses, value and benefits of knowledge. In Knowledge, Creation, Diffusion, and Utilization. Sage Publications, Beverly Hills, CA, 1979.

    F. Machlup and K.W. Leeson. Journals, volume 2 of Information Through the Printed Word: The Dissemination of Scholarly, Scientific and Intellectual Knowledge. Praeger, New York, 1978.

    J. MacKie-Mason and H. Varian. Economic FAQ's about the internet. Journal of Economic Perspectives, 8 (3): 75-96, 1994.

    Jeffrey K. MacKie-Mason and Juan F. Riveros. Economics and electronic access to scholarly information. In B. Kahin and H. Varian, editors, Internet Publishing and Beyond: The Economics of Digital Information and Intellectual Property. MIT Press, 2000.

    Jeffrey K. MacKie-Mason, Juan F. Riveros, and Robert S. Gazzale. Pricing electronic access to knowledge (peak), a field experiment. In I. Vogelsang and B.M. Conpaine, editors, The Internet Upheaval: Raising Questions, Seeking Answers in Communications Policy. MIT Press, 2000.

    R.H. Marks. The economic challenges of publishing electronic journals. Serials Review: Economics of Electronic Publishing, 21: 85-88, 1995.

    M.J. McCabe. The impact of publisher mergers on journal prices: An update. ARL: A Bimonthly Report on Research Library Issues and Actions from ARL, CNI, and SPARC, 207: 1-5, December 1999.

    M.J. McCabe. Academic journal pricing and market power, 2000. Georgia Institute of Technology, working paper, available at .

    M.J. McCabe. A portfolio approach to journal pricing. In this volume.

    C. McKnight. Electronic journals: What do users think of them? In S. Sugimoto, editor, Proceedings of International Symposium on Research, Development and Practice in Digital Libraries 1997, ISDL '97, pages 23-27. University of Library and Information Science , Tsukuba, Ibaraki, Japan, 18th November 1997, 1997.

    G. McMillan, E. A. Fox, and J. L. Eaton. The evolving genre of electronic theses and dissenrtations, 1999. .

    L. S. Mercer. Measuring the use and value of electronic journals and books. Issues in Science and Technology Librarianship, 2000.

    S.C. Michalak. The evolution of SPARC. Serials Review, 26 (1): 10-21, 2000.

    C. Montgomery and J. Sparks. Framework for assessing the impact of an electronic journal collection on library costs and staffing patterns, 2000.

    C. H. Montgomery and D. W. King. Comparing library and user-related costs of print and electronic journal collections. D-Lib Magazine, 8 (10), 2002. .

    C.H. Montgomery. Measuring the impact of an electronic journal collection on library costs: A framework and preliminary observations. In this volume.

    G. Moore. Cramming more components onto integrated circuits. Electronics, 38 (8), April 19 1965.

    Morgan Stanley. Scientific publishing: Knowledge is power. Available from http://www.econ.ucsb.edu/~tedb/Journals/morganstanley.pdf , September 2002.

    D. Nicholas and P. Huntington. Big deals: Results and analysis from a pilot analysis of web log data. In A. Pearce, editor, The consortia site licence: Is it a sustainable model? Ingenta Institute, Cambridge, 2003.

    R. Noll. The economics of information. Journal of Library Administration, 26 (1/2): 47-55, 1993.

    Robert Nozick. The Nature of Rationality. Princeton University Press, Princeton, 1993.

    J.P. Ochs, 2001. E-mail from Ochs, American Chemical Society, to the author 5 March.

    A. M. Odlyzko. Tragic loss or good riddance? The impending demise of traditional scholarly journals. Intern. J. Human-Computer Studies (formerly Intern. J. Man-Machine Studies), 42: 71-122, 1995. Also in the electronic J. Univ. Comp. Sci., pilot issue, 1994, . Available at .

    A. M. Odlyzko. Competition and cooperation: Libraries and publishers in the transition to electronic scholarly journals. J. Electronic Publishing, 4 (4): 163-185, June 1999. Also available at .

    A. M. Odlyzko. Silicon dreams and silicon bricks: the continuing evolution of libraries. Library Trends, 46 (1): 152-167, 1997a.

    A. M. Odlyzko. The rapid evolution of scholarly communication. In this volume.

    A. M. Odlyzko. The slow evolution of electronic publishing. In A. J. Meadows and F. Rowland, editors, Electronic Publishing — New Models and Opportunities, pages 4-18. ICCC Press, 1997b.

    A. M. Odlyzko. Internet growth: Myth and reality, use and abuse. iMP: Information Impacts Magazine, November 2000. Available at .

    A. M. Odlyzko. Internet pricing and the history of communications. Computer Networks, 36: 493-517, 2001.

    J. J. O'Donnell. The pragmatics of the new: Trithemius, McLuhan, Cassiodorus. In G. Nunberg, editor, The Future of the Book. Univ. California Press, 1996.

    A. Okerson. Of making many books there is no end: Report on serial prices for the Association of Research Libraries. In Report of the ARL Serials Prices Project. Washington, DC: Association of Research Libraries, 1989.

    Outsell, Inc. Industry trends, size and players in the scientific, technical & medical (STM) market. Information About Information, 3 (18): 1-42, August 2000.

    A. Petersen Bishop. Measuring access, use and success in digital libraries. Journal of Electronic Publishing, 4, 1998.

    John Price-Wilkin. Moving the digital library from 'project' to 'production'. Presented at DLW99 in Tsukuba, Japan. Available at http://jpw.umdl.umich.edu/pubs/japan-1999.html , March 1999.

    A. Prior. Normal service continues — the role of intermediaries in electronic publishing. Learned Publishing, 10: 331-338, 1997.

    Albert Prior. Electronic journals pricing—still in the melting pot? UK Serials Group 22nd Annual Conference / 4th European Serials Conference, 12-14 April, UMIST, Manchester, England, 1999.

    M.L. Rosenzweig. Protecting access to scholarship: We are the solution, 2000. Retrieved from http://www.evolutionary-ecology.com/citizen/spring00speech.pdf on 12 June 2000.

    F. Rowland, C. McKnight, and Meadows. J. Project Elvyn: an experiment in electronic journal delivery, facts and figures and findings. London: Bowker Saur, 1995.

    F. Rowland, I. Bell, and C. Falconer. Human and economic factors affecting the acceptance of electronic journals by readers. Canadian Journal of Communication, 22: 61-75, 1997.

    D. Rusch-Feja and U. Siebeky. Evaluation of usage and acceptance of electronic journals. D-Lib Magazine, 5 (10), 1999. .

    T. J. Sanville. A method out of the madness: OhioLINK's collaborative response to the serials crisis, 2000. North American Serials Interest Group proceedings.

    Don Schauder. Electronic publishing of professional articles; attitudes of academics and implications for the scholarly communication industry. Journal of the American Society for Information Science, 45 (2): 73-100, March 1994.

    Carl Shapiro and Hal R. Varian. Information rules : a strategic guide to the network economy. Harvard Business School Press, Boston, Mass, 1999.

    D. Shaw and D. Price, editors. The Perils of Oversimplification: What Are the Real Costs of Online Journals?, 1998. ICSU Press Workshop on Economics , Real Costs, and Benefits of Electronic Publishing in Science — a Technical Study. .

    M. Shemberg and C. Grossman. Electronic journals in academic libraries: a comparison of ARL and non-ARL libraries. Library Hi Tech, 17 (1): 26-45, 1999.

    N. J. A. Sloane. My favorite integer sequences, 1998.

    SPARC. SPARC collaboration with math journals supports promising publishing model, 2001. Retrieved from https://arl.org/Lists/SPARC-MEDIA/Message/39.html on 5 February 2001.

    M. Spinella. Electronic publishing models and the pricing challenge. In this volume.

    S. Stevens-Rayburn and E. N. Bouton. "If it's not on the Web, it doesn't exist at all": Electronic information resources — Myth and reality, 1998. .

    Peter Suber. Open access to the scientific journal literature. Journal of Biology, 1 (1), June 2002.

    SuperJournal. Stakeholder workshops, 1999a.

    SuperJournal. Summary of superjournal findings, 1999b.

    C. Tenopir and D. W. King. Towards Electronic Journals: Realities for Scientists, Librarians and Publishers. Washington, DC: SLA Publishing, 2000.

    C. Tenopir and D. W. King. Communication Patterns of Engineers. J.Wiley/IEEE, 2004.

    C. Tenopir, D. W. King, R. Hoffman, E. McSween, C. Ryland, and E. Smith. Scientists' use of journals: Differences (and similarities) between print and electronic. In M. E. Williams, editor, Proc. 22-nd National Online Meeting. Information Today, 2000.

    C. Tenopir, D. W. King, P. Boyce, M. Grayson, Y. Zhang, and M. Ebuen. Patterns of journal use by scientists through three evolutionary phases. D-Lib Magazine, 9 (5), 2003. Available at http://www.dlib.org/dlib/may03/king/05king.html .

    Sharon Traweek. Beamtimes and Lifetimes: The World of High Energy Physicists. Harvard University Press, Cambridge, MA, 1988.

    J. Trithemius. In Praise of Scribes: De Laude Scriptorum, edited with Introduction by K. Arnold, translated by R. Behrendt. Coronado Press, 1974. Original manuscript circulated in 1492, first printed in 1494.

    Pravin K. Trivedi. An analysis of publication delays in econometrics. Journal of Applied Econometrics, 8 (2): 93-100, 1993.

    Ulrich's. Ulrich's International Periodicals Directory. R.R. Bowker, New Providence, N.J., 2000.

    University of Phoenix. University of Phoenix online collections, 2000.

    U.S. Census Bureau. Statistical Abstract of the United States. U.S. Census Bureau, 2000.

    Herbert Van de Sompel, Thomas Krichel, Micheal L. Nelson, et al. The UPS prototype project: exploring the obstacles in creating a cross e-print archive end-user service). Old Dominion Computer Science Tech Report. Available at http://openlib.org/home/krichel/upsproto.ps , 2000.

    H.R. Varian. Differential pricing and efficiency. First Monday, 1, 1996.

    G. W. White and G. A. Crawford. Cost-benefit analysis of electronic information: a case study. College and Research Libraries, 59: 503-510, 1998.

    A. Wiles. Modular elliptic curves and fermat's last theorem. Ann. Math. (2), 141: 443-551, 1995.

    Wisconsin. Measuring the cost-effectiveness of journals: Ten years after Barschall, 1999. Retrieved from http://www.library.wisc.edu/projects/glsdo/cost.html on 12 June 2000.

    MJ. Wright. Constituencies for users: How to develop them by interpreting logs of web site access. In AAAI Spring Symposium on Intelligent Agents in Cyberspace, Stanford, CA, USA, March 22-24 1999.

    B. J. Wyly. Competition in scholarly publishing? What publisher profits reveal. ARL: A Bimonthly Report on Research Library Issues and Actions from ARL, CNI, and SPARC, 200: 7-13, October 1998.