Economics and Usage of Digital Libraries: Byting the Bullet
Skip other details (including permanent urls, DOI, citation information) :This work is protected by copyright and may be linked to without seeking permission. Permission must be received for subsequent distribution in print or electronically. Please contact : [email protected] for more information.
For more information, read Michigan Publishing's access and usage policy.
III. Digital Publishing Economics
In this section of the book several distinguished authors address the economics of scholarly journals in the digital age. Our authors include academics, publishers and leaders of innovative not-for-profit projects. They focus on the economic issues facing publishers and academic libraries, with some attention to other stakeholders.
These diverse authors agree on the fundamental facts. Case (chapter 12) provides a pithy summary: "The library community has been faced with high and ever-rising prices for scholarly resources. A number of factors have contributed to this situation, most fundamentally, the commercialization of scholarly publishing". King and Tenopir (chapter 8) document the rising prices, calculating several different metrics based on one of the most extensive data sets ever collected on scholarly publishing. McCabe (chapter 11) and Case also report evidence on the faster-than-inflation price increases for scholarly journals.
The price increases, particularly from commercial publishers, are not in dispute; the important question is what forces are driving these price increases? In their chapters McCabe, and King and Tenopir, analyze data to assess the extent to which different causes explain rising prices. King and Tenopir find some of the increase in prices is due to rising costs. For example, the number of articles per journal has increased. More subtly, the average number of subscribers for new (often more specialized) journals has been decreasing, necessitating higher average prices per page to recover the fixed costs of publishing a journal. However, King and Tenopir conclude that only part of observed price increases can be explained by higher costs.
McCabe uses multivariate statistical methods to analyze his data on journal prices. Like King and Tenopir, he concludes that costs can explain part but not all of the the rise in journal prices. He also finds evidence that increases in quality, which in turn increase production costs, also explain part of the price increases. However, McCabe further concludes that commercial publishers have gained what economists call market power, which is the power to raise prices above the level necessary to recover costs and earn a normal rate of profit. He finds that increases in market power can explain a substantial share of the gap between price and cost increases.
Spinella reports on the role of cost increases from a publisher's perspective. Many observers have written that with all-electronic publishing, variable (per copy, or per subscriber) costs are approximately zero, focusing the issue of pricing entirely on the fixed costs of publishing. Spinella points out that this characterization is generally incorrect. For example, all-electronic publications have variable costs from maintaining subscriber records, sending renewal notices and bills, providing subscriber services, and so forth. He also notes that the importance of variable costs differs across journals, depending on such factors as the frequency of publication, the size of their circulation, and whether they distribute globally. Further, digital publishing usually is accompanied by the creation of new reader and author services, many of which generate new variable costs of their own.
To the extent that price increases are driven by rising quality and production costs, libraries and other buyers are getting what they pay for. Improvements in price necessarily depend on system improvements that reduce costs. Each author in this section agrees that the shift to electronic publication offers some hope for tempering costs. On the other hand, new service possibilities available from digital publishing (for example, hyper-linking from bibliographic references to the referenced article) may only be obtained at higher cost (Spinella, chapter 10). In a well-functioning market, library and reader demand should determine the extent to which publishers create new services that require higher prices: if purchasers find the services to be worth less than they cost, competitive publishers will not develop or offer the services.
However, if McCabe and Case are right that a substantial share of cost increases are due to limited market competition, then cost reductions alone will not abate the path of rising prices. When the market is not highly competitive, publishers with market power may choose to bundle in more high-priced new services than purchasers actually want, or publishers may simply raise prices above cost for traditional services. McCabe, Case, and Halliday and Oppenheim (chapter 9) each explore the role of competition in the pricing of scholarly journals.
Case describes the Scholarly Publishing and Academic Resources Coalition (SPARC) which is a project to increase competition, particularly in the market for science, technology and medical journals. SPARC is a collective of libraries and scholarly societies. One of its programs is the development of new journals targeted to compete with especially high-priced commercially published journals. Case argues that pressure from efforts like SPARC has started to have an effect. McCabe conjectures that the characteristics of electronic publishing are especially well-suited to reduce publisher market power.
Halliday and Oppenheim construct an economic/business process model of journal publishing to analyze the impact of different cost-generating activities on prices. They model three different pricing scenarios: a traditional commercial model, a non-commercial author-pays model, and a free market model with payments to and from authors as well as readers. They conclude that, given the cost structure of digital publishing, the traditional professional publisher model is most efficient. However, the sharing of the benefits of cost efficiency between authors, readers and publisher shareholders depends on the competitiveness of the publishing market. If market power is substantial, authors and readers might be better off with less efficient publishing business models in which they receive a higher share of the value created.
Krichel (chapter 13) describes a working example of a non-commercial effort to reduce costs and and improve access. Research Papers in Economics (RePEC) is an "open" digital library: open to contribution by third parties, and open to implementation of new user services. The core idea is to create a centralized metadata library that facilitates access to the distributed "library" of scholarly articles (in the field of economics, in this case) available across the Internet. Although the protocols have been implemented and the database has received a large number of contributions, Krichel notes that a number of economic issues remain that will determine the viability of his open library model. In particular, it is not clear whether the incentives for users to contribute are sufficient to establish critical mass in various scholarly fields. There is also no business model to recover the costs of quality control on the metadata repository.
Taken together, the chapters in this section document the facts about rising prices, provide evidence on the causes for these increases, and investigate a variety of approaches to improve the situation. If prices for scholarly publications continue to rise faster than general inflation, there is no question that research libraries will either have to reduce access (that is, acquire access to a smaller fraction of published material), or reduce spending on other valuable library services. These authors have tackled one of the fundamental troublesome puzzles of the digital information age: how do we ensure that access increases rather than decreases during a time when the quantity of information produced is increasing and the technologies for using that information are improving?
8. Scholarly Journal and Digital Database Pricing: Threat or Opportunity?
8.1 Introduction
For over three centuries, scientific scholarly journals have demonstrated remarkable stability. A large number of studies performed during the past few decades have shown their continued use, usefulness, and value. However, two phenomena have evolved over the last thirty years that have the potential either of destroying the scholarly journal system or substantially enhancing its considerable usefulness and value. These two phenomena are the maturation and integration of communication technologies and the economics of the journal system, particularly pricing of traditional journal subscriptions and access to digital full-text databases through site licensing and package "deals". Certainly, the new technologies should, if deployed with care, enhance the journal system (e.g., Tenopir et al., 2003), but contemporary pricing policies have been a greater threat to the journal system. Up to the mid-1990's rapid, and little understood, price rises posed a significant threat to the system and, and then more recently, policies of site licensing and negotiated journal packages have become commonplace even though little is known as to their sustainability.
The early pricing policies resulted in substantially reduced personal subscriptions, increased reliance on library access, library prices raised far higher than inflation or increased journal sizes would warrant, and libraries and scientists having to rely more heavily on obtaining separate copies of articles through interlibrary loan, document delivery, preprints, reprints, and photocopies or electronic copies from authors and colleagues. Recently, most academic libraries in the U.S. and many other types of libraries have negotiated licenses with individual publishers, library consortia, and other vendors to obtain access to multiple journals. While there are appreciable benefits to both publishers and libraries of such arrangements (King and Xu, 2003), there are considerable concerns as well (Frazier, 2001). One concern is that negotiation seems to vary from deal-to-deal and it is not at all clear that long-term revenue to publishers will be sufficient. In this chapter, we discuss the early pricing policies and why prices spiraled upward and we show that problems leading to this dilemma are also inherent to the current licensing policies.
This chapter provides some insights gained from analysis of over 15,000 responses from readership surveys of scientists; cost analysis of publishing, library services and scientists' communication patterns; tracking of a sample of scholarly journals from 1960 to 2002; and review of over 600 publications dealing with scientific scholarly journals. This chapter will attempt to dispel some myths concerning communication costs, system participants' incentives, and reasons for increased prices. It will also present perspectives on pricing that might help in an electronic age and provide some suggestions concerning subscription pricing, site licensing, and online access to separate copies of articles.
8.2 Are Scientific Scholarly Journals Worth Saving?
Over the years there have been a number of skeptics regarding the use, usefulness, and value of scientific scholarly journals. However, since the 1950s, there have been over twenty studies that show that scientists in general rely more on journals than any other source for their information, although this is not true for engineers or "technologists" (King and Tenopir, 2000; Tenopir and King, 2004). Consider evidence from surveys of scientists conducted by King Research from 1977 to 1998, the University of Tennessee School of Information Sciences 2000 and 2001, Drexel University 2002, and University of Pittsburgh 2003. A 1977 national survey of scientists showed that they averaged 105 readings of scholarly journals per scientist per year, and a follow-up survey in 1984 revealed about 115 readings per scientist; several surveys in organizations from 1993 to 1998 yielded combined estimates of 120 readings; and surveys in 2000-2003 resulted in a weighted average of 134 readings, thus suggesting that amount of reading might have increased over the years.[1] Extrapolated to the entire population of scientists and articles published, these data indicate that the average readings per article was about 640 readings per article in 1977 and about 900 readings in the late 1990s. Three studies in the 1960s and 1970s estimated the amount of reading per article by asking sampled scientists to indicate which articles listed on recently published tables of contents they had read. Average readings per article, extrapolated to the population of scientists sampled, showed that psychology articles averaged 520 readings per article (Garvey and Griffith, 1963), economic articles averaged 1,240 readings (Machlup and Leeson, 1978), and Journal of the National Cancer Institute articles averaged 1,800 readings per article[2] (King, McDonald, and Olsen, 1978), or 756,000 readings for the entire volume of 12 issues. Thus, there is ample evidence that scientists read many scholarly articles and that journals are well read.[3]
Scholarly articles are read for many purposes ranging from supporting specific research projects and teaching to administrative purposes. They are also read by people wanting to keep current in their disciplines. A number of studies have shown the importance of scholarly articles for these and other purposes. Our recent surveys of university scientists show that readings for teaching purposes are rated high in importance (5.10 on a scale of 1-not at all important to 7-absolutely essential) while readings for research are rated even higher (5.32). One-third of the readings are said to be "absolutely essential" to the teaching or research. Similar results are observed in surveys of non-academic scientists, who individually read fewer articles than university scientists, but totally account for about three-fourths of all reading due to the overwhelming number of these scientists.
Machlup (1979) defines two types of value of the information provided by scholarly journals: purchase value and use value. Purchase value is what scientists are willing to pay for the information in monies exchanged and time expended in obtaining and reading the information. The purchase value expended on scholarly journal information exceeds $5,400 per year per scientist, most of which involves their time spent obtaining and reading the information. In fact, the price paid in scientists' time tends to be five to ten times the price paid in purchasing journals, separate copies of articles, and other journal-related services. Of twenty studies by various researchers that provide estimates of time spent reading, the median time spent is 9.0 hours per month or about 108 hours per year per scientist. Our recent surveys show that scientists annually spend about 130 hours reading scholarly articles, up from 80 hours in 1977. Also, scientists are spending more time obtaining articles because they more often use library-provided articles than their own personal subscriptions (more is said about this later).
Use value involves the outcomes or consequences of using scholarly journal information. Examples of use value from our surveys include evidence of producing work with greater quality, faster, or at a lower cost in time or money. Several studies, dating back to the 1950s, have shown that amount of reading is correlated with productivity. Our surveys established that amount of reading is positively correlated with five indicators of productivity (i.e., outputs and input time measured in five ways) (Griffiths and King, 1993). Another indicator of use value is that scientists whose work has been formally recognized through awards, special assignments, or designated by personnel department (for our survey purposes) tend to read more than others.[4] This was observed in the 1960s (Lufkin and Miller, 1966) and was invariably observed in 21 of our surveys. Thus, there is also abundant evidence of the purchase and use values of scholarly journals, and one must conclude that any changes in the future should ensure that the use, usefulness, and value of scholarly journals be retained.
8.3 Scholarly Journals Examined from a Systems Perspective
In the late 1970s King Research performed a series of studies for the National Science Foundation on scientific and technical information communication, with particular emphasis on scientific scholarly journals.[5] As part of these studies we identified and characterized all the principal functions performed in the journal system, participants who performed the functions, and hundreds of detailed activities necessary to perform the many functions. For each activity we established quantities of output and amount of resources required (with dollar amounts placed on the resources). We traced the flow of messages transmitted among participants, which, in 1978, numbered in the billions. We also examined all of the activities in terms of the introduction of evolving technologies to assess when comprehensive electronic journals were likely to become commonplace.
As a result of our 1978 systems study we indicated that:
Recent technological advances, which were developed largely independently of the scientific and technical communication, provide all the components of a comprehensive electronic journal system. Such a system would provide enormous flexibility, particularly because individual articles can be distributed in the most economically advantageous manner. Much-read articles may still be distributed in paper form, and infrequently read articles can be requested and quickly received by telecommunication when they are needed (King et al., 1981).
We went on to say that:
This comprehensive electronic journal system is highly desirable and currently achievable. It is believed that within the next twenty years, a majority of articles will be handled by at least some electronic processes throughout but not all articles will be incorporated into a comprehensive electronic journal system.
At that time (1978), some communications researchers scoffed at this "pessimistic" view of when electronic journals would become widespread, and some at NSF were disappointed because other studies forecast much quicker implementation of electronic journals.
One aspect of the systems analysis done at the time was to sum the resource costs applied to all the activities identified in order to establish an overall journal system cost in the U.S. In 1975 we estimated the total amount of resources expended that year on scientific journals to be $5.05 billion (or about $15.6 billion in 1998 dollars, considering increases in resource costs). A reasonable estimate of the corresponding total system cost in 1998 is $45 billion.[6] This systems approach ignores the amount of money exchanged between participants, such as the price paid by scientists and libraries for subscriptions purchased, the price paid for online bibliographic searches, fees paid for document delivery services, and so on. Including such transfer payments would only duplicate the costs of system resources applied by publishers, online vendors, and document delivery services. Thus, the additional cost to the U.S. economy (or scientific community) for processing and using scientific journals was another $5.05 billion in 1975 (or $15.6 billion in 1998 dollars) and $45 billion in 1998. The $15.6 billion (1998 dollars) comes to about $7,000 per scientist or about $69 per article reading. In 1998 we estimated the comparable system cost to be about $7,100 per scientist or $59 per reading.
The 1998 total system cost per scientist ($7,100 per scientist) is sub-divided as follows: authors ($640), publishers ($500), libraries and other intermediaries ($420), and readers ($5,540). Thus, scientists' time spent with writing and reading dominates the total system costs (i.e., 87% of the total costs). The costs per scientist of authorship, publishing, and libraries and other intermediaries have all decreased over time, but readers' cost per scientist has increased. The reader increase in cost per scientist is attributable to an increase in their time spent acquiring and reading articles. The number of personal subscriptions of scientists has decreased by over one-half, with nearly all prior reading from personal subscriptions replaced by reading from library-provided journals. Thus, scientists spend more time obtaining articles, and they also appear to spend more time reading an article (due perhaps to an increase in size of articles as shown later). The decrease in cost per reading is due to relative decreases in library and publishing resources expended.
The relative resource expenditures of libraries (and other intermediary services) are down, whether calculated by cost per scientist or cost per reading. The library cost per scientist is down because of relative reduction in library budgets, but also because of efficiencies due in part to library automation, resource sharing (King and Xu, 2003), and replacement of print journals by electronic versions (see Section 8.9). The library cost per reading is down due in large part to the increase in the amount of reading from library-provided journals resulting from the shift from personal subscriptions to library-provided articles. For example, from 1977 to the current era, the number of personal subscriptions declined from 5.8 to 2.4 and number of readings from library collections increased from 15 to 66 readings per scientist.
The relative cost of publishing has apparently also decreased. For example, the cost per page published is down, due in part to use of technologies, increased efficiencies, and increased sizes of journals. The cost per scientist is down, due in part to the factors mentioned above, but also to the fact that there is an average of over three fewer subscriptions circulated per scientist. The publishing cost per reading is also down due, in addition to the factors above, to a greater amount of reading. This discussion of the systems perspective causes us to ask this question: Why have average prices risen by a factor of nearly nine over a period of time during which the relative cost of publishing has actually decreased?
8.4 To Understand Price One Must Understand Publishing Costs
While there have been literally hundreds of articles written about the price of scholarly journals in recent years, very little has been written about the cost of publishing journals. To understand why prices are what they are, one must know about the cost of publishing journals. One reason that costs are not often discussed in the literature is that publishers do not want their competitors to know their costs. Also, costs vary a great deal among journals, depending on the characteristics of journals such as manuscript rejection rates, number of articles, number of pages, number of issues, and circulation and the type of resources used such as location and experience of editors, technologies applied, and quality of paper. With that concern in mind, we decided to develop a cost model of journal publishing in order to analyze effects of circulation, changes in characteristics of journals over time, and how such factors might affect the price of journals. We formulated a cost model using data we collected for the 1978 journal systems analysis and more recent pieces of information gleaned from the literature. The model has been reviewed by staff from different types of journal publishers, who found it reasonable with the caveats mentioned above. We also compared our model data with other published data and found them a good source of validation.[7]
The cost model consists of five functions or groups of activities as follows:
Article processing including manuscript receipt processing, initial disposition decision-making, identifying reviewers or referees, review processing, subject editing, special graphic and other preparation, formatting, copy editing, processing author approval, indexing, coding, redaction, and preparation of master images.
Non-article processing including many of the same activities involving editorials, letters to the editor, brief communications, and book reviews. It also includes preparation of issue covers (for paper versions), tables of contents, and indices.
Reproduction involving printing, collating, binding of issues, and printing for reprints (all of which activities are not necessary for electronic versions).
Distribution of paper versions involving wrapping, labeling, sorting by zip code, and mailing; distribution of electronic versions including storage and access. Subscriptions maintenance is required of both versions.
Support activities including marketing and promotion, rights management and other legal activities, administration, financing, and other indirect activities.
In 2002 the average US science journal characteristics were estimated to be 10.8 issues, 154 articles, 213 manuscripts submitted, 1,910 article pages, 397 special graphics, 2,215 total pages, and 4,800 subscriptions.[8] The cost model estimates for these functions are $255,897 for article processing, $22,957 for non-article processing, $215,392 for reproduction and distribution, and $197,908 for support, for a total of $692,154. The article processing cost per article is $1,660 per article and the reproduction and distribution cost per subscription is about $45 per subscription (without allocation of support costs).
By holding all other journal characteristics and cost parameters constant, we can assess the effects of journal characteristics on the total and unit cost. For example, we find that the cost per hypothetical subscription varies substantially by number of subscribers (see Table 8.1).
Subscribers | Cost per Subscription |
500 | $993 |
1,000 | $519 |
2,500 | $235 |
5,000 | $140 |
10,000 | $93 |
The price necessary to recover costs at 500 subscribers is at least $993 per subscriber, but it decreases sharply at the 2,500-5,000 subscription range, at which point the unit costs decrease slowly approaching an asymptote (which is the incremental reproduction and distribution costs). At 500,000 subscribers the cost is $2 above these costs. Of course, in reality the journal characteristics and cost parameters among journals vary. For example, large circulation journals tend to publish more issues, have expensive photos and graphics, reject more manuscripts, and use more expensive covers and paper. Spinella (this volume) makes this point in discussing publication of large circulation journals, such as Science. However, by holding non-circulation characteristics and cost parameters constant we get a good picture of the effect of size of circulation. Halliday and Oppenheim (this volume) present similar results as above, but expand by showing effects of varying overhead and profit levels (which we call support above).
Similarly, by varying the number of articles published from, say, 50 to 200, we find that cost per subscriber increases from $77 to $172 (at 4,800 subscribers). The direct article processing costs per article do not vary much—$1,747 per article with 50 articles and $1,651 per article with 200 articles in a journal—but the difference in cost per article is substantial when non-article processing, reproduction and distribution, and support functions are included ($7,375 vs. $4,130). Similarly, the cost per article received by subscribers decreases from $1.54 per article with a 50 article journal to $0.86 per article for a journal with 200 articles. That cost per article decreases as journal size increases may be the reason that publishers have steadily increased the size of journals over the years (from an estimated average of 85 articles per title in 1975 to 154 in 2002).
8.5 What do Average Prices Mean?
Prior to discussing reasons why journal prices have increased so much, it is worth noting that there are several ways in which one can measure average price. In the literature, average price is nearly always calculated as the average price per title. That is, the prices of a set of journals are summed and divided by the total number of journal titles in the set. This average has specific meaning. For example, it makes sense for an individual library to estimate the average price for their collection in this way, particularly for comparison over time. However, from a total systems perspective it makes more sense to measure average price by the price per subscription. That is, one takes the total price of all journals circulated and divides by the total circulation. This average price is much lower than the average price per title and has a much different meaning. The point can be made through a simple arithmetic example, taking into account that low circulation journals have higher prices due to relatively higher fixed costs. In 1995 we observed the following equal number of journals in four ranges of circulation (i.e., quartiles) and the average circulation observed in each quartile as shown in Table 8.2.[9] In the table we also present the price necessary to recover publishing costs at the average circulation and with the other characteristics and cost parameters mentioned in the previous section held constant.
Circulation | No. of Journals | Avg. Circulation | Price |
< 900 | 1,693 | 520 | $747 |
901 - 1,900 | 1,693 | 1,310 | $316 |
1,901 - 5,700 | 1,693 | 3,290 | $145 |
> 5,700 | 1,693 | 18,100 | $53 |
ALL | 6,772 | 5,805 | $315 |
Average price per journal can be roughly estimated by summing the four sets of prices of all journals in each quartile (e.g., $747 x 1,693) and dividing the total of the four quartiles by 6,772 journal titles (recognizing that this estimate is below the real average). As shown, the average cost/price per journal title is $315.
The average price per subscription is estimated by summing the four sets of prices of all subscriptions in each quartile (e.g., $747 x 1,693 x 520) and dividing the total of the four quartiles by the total number of subscriptions in 1995, which is about 39.3 million (i.e., 6,772 journal titles x 5,805 subscriptions per title). The average price per subscription is $96—far less than the price per journal title ($315). Thus, it is clear that the highly skewed distribution of journal circulation means that large circulation journals dominate average price calculated in this way. Yet this measure of average price is more meaningful when considering the impact of price on the U.S. economy or in terms of examining price trends to the entire scientific community, not just to individual libraries.
8.6 Reasons Why Journal Subscription Prices Spiraled Upward
There is overwhelming evidence that individual scholarly journal prices increased dramatically from 1960 to 1995. For example, we sampled 430 U.S. scientific scholarly journals and tracked them from 1960 to 1995.[10] In this sample, prices rose from an average of $8.51 per title in 1960 to $284 in 1995. One particular concern is that the rate of increase accelerated, even in constant dollars. There are many reasons that prices increased in this manner. Okerson (1989) provides an excellent discussion of some reasons for this phenomenon, and below we present some numeric examples as to why prices per title increased so much.
Some of the high increases in price over these two decades can be explained by inflation and increase in the size of journals. Referring back to the publishing cost model above, one can establish an indication of how much increasing journal size has affected prices over time. As mentioned earlier, the average number of articles published in science journals has increased from 85 to 154 articles per title from 1975 to 2002. Other journal characteristics (e.g., number of issues, pages, special graphics) increased in size as well. By substituting 1975 and 2002 characteristics in the cost model and keeping number of subscriptions and cost parameters at 2002 levels we estimate that the cost per subscriptions for the 1975 size journal is about one-half that of the 2002 journal. Thus there is evidence that the increased size of journals has resulted in a substantial increase in journal publishing cost and, therefore, the necessity to increase prices accordingly.
A more subtle factor is that the estimated number of scientific scholarly journals increased from 4,447 in 1975 to 6,772 in 1995. Most of the new journals had a small circulation and, therefore, must have a higher-than-average price per title. Consequently, the continued addition of new journals had the effect of increasing average price both per title and per subscription. In fact, journal prices increased at a rate greater than inflation since at least 1960, when there were only 2,815 scientific journals provided by U.S. publishers (Tenopir and King, 2000).
This phenomenon can be documented by examining the 1975 number of journals in the quartile ranges shown for 1995 above and applying the same calculation of average price per journal title and per subscription as shown in Table 8.3. One can see that in 1995 there were more of the smaller-circulation journals and fewer larger ones.
No. of Journals | Proportion of Journals (%) | |||
Circulation | 1975 | 1995 | 1975 | 1995 |
< 900 | 880 | 1,693 | 19.8 | 25 |
901 - 1,900 | 805 | 1,693 | 18.1 | 25 |
1,901 - 5,700 | 1,579 | 1,693 | 35.5 | 25 |
>5,700 | 1,183 | 1,693 | 26.6 | 25 |
ALL | 4,447 | 6,772 | 100.0 | 100 |
In order to make unbiased comparisons, we again assume that all cost parameters remain the same and that average prices in the four ranges do as well. We find that the average price per journal title of 1975 journals with their circulation would be about $270 per title compared with $315 in 1995. Thus, this average price per journal would have increased about 17 percent due only to the change in distribution of circulation. A much smaller increase is observed in the average price per subscription, from $91 per subscription for 1975 circulation to $96 in 1995.[11] Note that the average circulation per title did not decrease much from 1975 to 1995, from 6,100 to 5,800 subscriptions, but the median dropped from about 2,900 to 1,900 subscriptions.
The shifts in the distribution of circulation are attributable to more than the influx of new, small-circulation journals. Increased prices had a spiraling effect. As mentioned above, the average number of personal subscriptions per scientist dropped more than 50 percent over a twenty-year period. Had the average remained constant, there would be about 19 million more personal subscriptions than there actually were in 1995. Even at modest personal subscription prices, publishers undoubtedly lost billions in annual revenue from cancelled personal subscriptions, in which case they probably tried to recover the lost revenue through exceptionally high price increases to libraries. They would have been able to do this because library demand is much less sensitive to price changes than personal subscription demand.[12] Both personal and institution (library) prices jumped dramatically in the late 1970s due to high inflation, fluctuating international exchange rates, and other factors. When this happens, subscriptions can decrease even though the number of scientists interested in a discipline continues to increase. With small-circulation journals, decreases in circulation result in an accelerated increase in cost per subscription. For example, if circulation decreases by 100 subscribers from a 2,500 level, the cost to publishers at 2,400 subscribers would be $6 more per subscriber. However, a 100-subscriber decrease from 500 to 400 subscribers would require an increased cost of $186 per subscriber in order to recover costs. Examples of required cost increases are outlined in Table 8.4:
Circulation Decrease | Required Cost Increase |
2,500 to 2,400 | $6 |
2,000 to . 1,900 | $8 |
1,500 to . 1,400 | $18 |
1,000 to . 900 | $41 |
500 to . 400 | $186 |
Thus, the accelerated publishing cost increases can result in corresponding price increases and further decreases in circulation, leading to higher costs and in turn, by necessity, to spiraling prices. Since personal subscriptions are much more sensitive to price changes than library subscriptions, the spiraling effect was initially observed with personal subscriptions.
Even with these reasons for the price increases of the past few decades, other factors must contribute as well. One explanation is that publishers have grown substantially in terms of the number of journals published. Some of this is due to publishers starting new journals and "splitting" journals into two or more when they increase in size, although the trend in recent decades has been to let them grow in size. Another factor has been growth through mergers. McCabe (this volume) provides evidence that such growth results in higher prices of journals due to market power. We believe that size of labor-intensive organizations such as publishers, tend to have relatively higher support costs as they grow in size. In our cost model for 2002, we estimated support costs to be about $198,000 or 29 percent of all costs. Others have speculated that commercial journal publishers are making an exorbitant profit by increasing prices, although this has yet to be proven for all commercial publishers. Furthermore, net revenue may also be positive for some society and other non-profit publishers. Case (this volume) emphasizes the importance of competition among publishers in order to minimize the potential for monopolization of the system.
8.7 Factors That Affect Demand
Clearly, demand for scientific journals is affected by price, but other factors affect demand as well. Scientists are willing to pay more for better journal attributes such as special electronic journal features, quality, speed of publishing, comprehensiveness and relevance of articles, and reputation of authors. In fact, studies in the 1970s suggest that such attributes were more important at that time than price. Our studies have shown that availability and relative cost of alternative sources of information determine to a large degree whether or not scientists and libraries will purchase journals. For scientists there are three types of alternative information sources. One alternative, discussed in Odlyzko (this volume), involves information from other research that has led to the research reported in an article or from near equivalent research done by others. A second alternative source exists because research results are often reported via a number of different channels, such as discussions, presentations, conference proceedings, technical reports, patents, and books, in addition to journal articles. A third alternative source involves the many distribution means and media in which journal articles are found. Alternative distribution means from which scientists can choose include personal subscriptions, library subscriptions, and separate copies of articles such as preprints, reprints, interlibrary loans and document delivery, and copies provided by colleagues, authors, and others. These distribution means can be in paper, electronic, or microform. The point is that numerous combinations of distribution means and media are used by scientists based on their assessment of availability and relative access costs.
Sources of articles that are read have changed dramatically over the years as shown by the proportion of readings from three sources in Table 8.5:
Proportion of Readings by Years of Observation | |||
Source of Article | 1977 | 1993-1998 | 2000-2003 |
Personal Subscriptions | 68.4% | 27.5% | 31.7% |
Library-provided | 14.7% | 55.0% | 52.7% |
Other | 16.9% | 17.6% | 15.6% |
Total | 100.0% | 100.0% | 100.0% |
Clearly, scientists are reading less from their personal subscriptions, which undoubtedly is due to their subscribing to fewer journals (5.8 per scientist in 1977 to 2.4 in 2000-2003). Library-provided articles have been the alternative source of choice. The proportion of readings from other sources (e.g., shared department collections, colleagues, and authors) has remained consistent over the years. Few of these readings are currently from author web sites or preprint archives.
Our cost studies show that there is a break-even point in the amount of reading over which it is less costly to subscribe to a journal and below which going to the library or author source is less expensive. The break-even point, of course, is higher with higher prices. By knowing the distribution of sources among journals, we have determined the sensitivity of demand to personal subscription prices. We have also shown that scientists' time is an important component in the cost equations, and that scientists generally behave in an economically rational manner in deciding whether or not to purchase a journal. For example, distance to the library also affects the break-even point and the purchase of journals. As corroborating evidence, we have observed that
Scientists close to libraries purchase fewer personal subscriptions than those further away (e.g., 1.8 subscriptions per person for those less than ten minutes away versus 2.6 for those further away).
Scientists close to libraries and shared department print collections read more from these sources than from personal subscriptions (e.g., 91 percent of readings by those less than 5 minutes away; 65 percent for those 5 to 10 minutes away; 43 percent for those more than 10 minutes away).
Even with availability of electronic personal subscriptions, most scientists prefer to subscribe to print versions. This may be because, as we have observed, it takes them less time to browse current print journals than electronic versions. However, when library journals are available online, scientists prefer to browse these journals online because it saves nearly 15 minutes per reading by not having to go to the library to browse or obtain older articles.
It is clear that the relative cost of alternative sources is important and that scientists' time is an essential component of cost that must be kept in mind. Now that scientists can obtain some copies of articles online, the choice is complicated somewhat. However, as will be discussed later, amount of reading from a journal and scientists' time both remain dominant factors in the decision.
Libraries are faced with similar choices between purchasing (in paper or electronic media) or relying on obtaining separate copies of articles. The amount of reading of specific journals, their price, and the cost of obtaining separate copies are all important factors which should play a role in decision-making.[13] Over time, scientists pretty well know how much they will read a journal, but it is more difficult for libraries to establish the extent to which individual journals are used, particularly with electronic journals. With print versions, common practice is to ask library users to leave journal issues and bound volumes on the table to be counted when re-shelved (or to use circulation bar codes). A weakness in this method of observation is that use of an issue (or bound volume) may involve reading of several articles and all readings should be counted when deciding between purchase or obtaining separate copies of articles. However, reasonable adjustments can be made to the use data.
8.8 What Are We Really Buying?
We mentioned earlier that scientists consider journal attributes to be important in their decision-making process and that availability and relative costs of alternative sources of information are important as well. Another perspective is that scientists are buying two product components: (1) the information contents and their attributes and (2) combinations of distribution means and media. With traditional scientific scholarly journals (and articles) the information contents and attributes remain the same regardless of combination of distribution means and media used.[14] Furthermore, article processing cost required to provide the information contents is essentially the same regardless of distribution means and media. That is, regardless of where scientists obtain articles—from personal subscriptions in paper or electronic medium, library-provided articles in paper or electronic medium, or in separate copies from a database, colleague, or author—the article processing cost is about the same for all distribution alternatives. Thus, one can ignore the article of processing costs and focus on the costs of the alternative distribution means and media.
First, just a note of clarification concerning the article processing costs. In the literature one finds widely varying estimates of these costs, say, from $400 per 20-page article (Harnad, quoted in Halliday and Oppenheim, this volume) to $8,000 per article in mathematics journals (Odlyzko, 1995). The lower estimates tend to be made by those publishing exclusively electronic journals and who are strong advocates for doing away with the paper medium. Yet, in a sense, making such cost comparison is a moot point because journals in which costs are as low as $400 per article could just as easily be distributed in paper issues at the additional cost of reproduction and distribution (i.e., about $40 to $50 per subscription). Thus, in order to at least breakeven the price that publishers charge would have to recover two components of their cost: (1) article processing to provide information content (i.e., anywhere between $400 to $8,000 per article) and (2) the cost of distribution means/media of the version preferred by users. Obviously, distribution cost by electronic media is negligible, whether through access by subscription or by separate copy of articles. Paper distribution of subscriptions tends to be in the $40 to $50 per subscription range and compared with paper distribution by interlibrary loan or document delivery which tends to be in the $15 to $30 per article range (see also Spinella, this volume).
Thus, based on the added $40 to $50 per subscription of paper distribution, it might seem that electronic distribution would always be the alternative of choice. However, when amount of reading and costs to users other than the price paid are taken into account, the choices are not so clear. For example, most of the readings of current articles are identified through browsing for the purpose of keeping up with the literature. Assuming the $50 paper distribution cost and that a scientist reads 50 articles from a year's subscription, the distribution component of the price would cost the scientist only $1 per reading versus near zero cost for electronic access. Yet when the cost of scientists' time for browsing and equipment are included, it appears that the paper version costs less per reading or is very close to that of the electronic version. Other aspects, then, of the two versions could prevail in decision-making which may explain why scientists overwhelmingly choose print over electronic personal subscriptions, but electronic over print for use of library collections.
Similar arguments can be made for library decisions concerning purchase of paper or electronic subscriptions or access to separate copies of articles. Here the unit cost per reading paper distribution can also be negligible because reading is in the hundreds for some journals. Thus, again, libraries can choose one or both versions depending on factors other than cost to them of the price paid and the cost of processing electronic or print issues.
Of course, publishers do not distinguish between the information content and distribution components of price. However, Harnad and others have suggested that authors or their funders pay for the information content (i.e., article processing) and then journals would be "free," since articles would be distributed electronically (Halliday and Oppenheim, this volume). This suggestion ignores the potential desirability of the paper distribution medium that might be less expensive to some users and/or preferred for some other reason.
The point is that there is some merit in distinguishing between the information content and distribution components of costs/prices. The article processing costs have remained relatively stable or perhaps decreased some over the years, and these costs are now recovered primarily by library budgets versus an earlier combination of lower library payment and payment by scientists through subscription albeit often from discretionary funds provided by their employers. This transfer of cost recovery from scientists to libraries resulted in publishers being publicly criticized or blamed for spiraling prices, libraries paying more for less information, and scientists paying more in the scarce resource of their time. Funders of the scientists and libraries are questioning the whole process, even though in fact they may be paying less in cost per reading considering all resources expended.
One can make a strong argument for author funders paying the information content costs since they already pay for authors' time. The 2003 survey at the University of Pittsburgh yielded an estimate of 95 hours of scientists' time per article authored. Thus, their funders appear to pay far more than the cost to publishers in processing articles. At least two initiatives are trying this approach to publishing. The Public Library of Science proposes to charge $1,500 per article (not too different from our $1,660 article processing model cost above) and BioMedCentral proposes a $500 per article fee (with some institutional membership alternatives),
However, for such initiatives to be widely accepted by all system participants, they must be convinced of the economic incentives involved. We believe that this can be achieved by understanding the flow of funds among the participants. For example, preliminary analysis of the flow of funds from sources (e.g., government, universities, industry, etc.) to organizational R&D performers and then to authors and readers suggests that author sources of funds come roughly from the following sources: industry (25%), government (33%), foundations (7%), and solely unversities (35%). However, readership sources of funds are not nearly equally allocated (i.e., solely university [20%]; universities funded through external sources [4%]; and solely non-university [76%]).
There are other important aspects of the flow of funds as well. For example, where do publishing R&D funds come from—government, foundations, commercial investors, and so on? What is the international "balance of information" determined by authorship and reading? For example, the 2003 University of Pittsburgh survey shows that 9 percent of articles read by these scientists are authored by US scientists, 24 percent by non-US scientists and seven percent are collaboration by US and non-US scientists.
8.9 Era of Site Licensing and Package Arrangements
The previous sections have dealt largely with the traditional journal system and pricing policies. We have tried to describe the journal system environment and what led to spiraling journal prices in this environment. Recently, however with the growth of electronic publishing, publishers and libraries have taken a new approach to their participation in the system through site licenses involving multiple journal packages over an extended period of time, say up to five years. This is a form of economic bundling. The multiple journal packages are sometimes negotiated directly between a library and a publisher, but more often libraries have formed or used existing library consortia to negotiate arrangements between groups of libraries and publishers, or libraries have made arrangements through aggregators. Such arrangements have proven to be beneficial in many ways to both libraries and publishers, not the least of which is that libraries can plan their budgets more accurately (often with lower prices) and publishers can build a steadier revenue flow (King and Xu, 2003).
An example is given below concerning the many sources of journals used by libraries in electronic journal acquisition. In 1998, the medium-sized W.W. Hagerty Library (Drexel University) had gone through a phase in which many of its high-price core journals had been cancelled and its acquisition was down to about 1,700 titles averaging a price of $120 per title. The new Library Dean, Carol Montgomery and the university administrators decided to migrate to a nearly all-electronic journal collection. In fact, by 2002 the Library acquired only 370 print journals and 8,600 unique electronic journal titles (Montgomery and King, 2002; Montgomery, this volume). The Library made several different arrangements that are categorized as follows:
Individual subscriptions. Almost always purchased from a subscription agent (e.g., Wiley titles, specialty design arts titles).
Publishers' packages. May or may not be a part of a consortium or from the publisher directly (e.g., Science Direct, Kluwer titles).
Aggregator journals. From vendors that provide access to different publishers' journals. The aggregators do not drop content, only add (so far). The collections started as full-text content and added searching (e.g., JSTOR, MUSE).
Full-text database journals. Provide access to electronic journals from different publishers but do not make title or issue level access available (except ProQuest). Examples are WilsonSelect and Lexis/Nexis. Titles are added or removed regularly according to the database vendor's contracts with publishers. They often have an embargo on current issues of six months or more. There is considerable overlap among the journals in these collections, and between the full-text database journals and the other two types.
This example demonstrates the complexity resulting from site licensing and the various kinds of arrangements that can be made.
These arrangements meant that there was an overlap in electronic titles acquired (e.g., 13,500 total titles, but only 8,600 unique titles in 2002). As a result, many acquired electronic journals are not used and some of the cancelled high price journals have very high use (for other observations see Davis, 2002; Nicholas and Huntington, 2003; Sanville, 2000). The price per title varied among the four types of arrangements made: $432 per title for individual electronic subscriptions; $134 per title for publishers' packages; $60 per title for aggregator journals; and $6 per title for full text database journals. However, the migration to electronic journals has affected library costs in many more ways than the price paid for the journals (King et al., 2003; Montgomery and King, 2002). The library operational costs and staffing patterns have shifted. For example, the electronic journal collection has required higher costs for collection development for negotiation, training of staff and users, reference support, and equipment and systems. On the other hand, print input processing and space costs are down, as are reshelving, photocopying, and directional reference costs. On balance, overall operational costs are less for the electronic journal collection than for the print collection.
A particularly revealing way to examine the effects of an electronic journal collection is to compare the cost per use of the alternative collection services; that is, access to the electronic, current periodicals and bound volume collections. Drexel obtained publisher and vendor online use statistics and maintained its own server use counts by journal title. Drexel also observed reshelving counts for the current periodicals and bound volume collections. However, there are well documented flaws with such methods, and thus measured electronic use is not fully comparable to measured print use. We also obtained estimates of the amount of actual reading from user surveys that, while flawed as well, at least provide a common measure of use for the three access services (King and Montgomery, 2002). The costs per reading (including price paid and operations) are: $2.00 per reading of the electronic collection; $3.90 per reading of current periodicals; and $23.50 per reading of bound volumes. One particularly important cost is to users who may save as much as 24 hours per year per person by having external access to library journals.
The Drexel situation is unique in that they migrated to nearly all-electronic collection. Most libraries are not doing this, but rather depend on some duplication of print and electronic collections due to concern of the viability of long term archives. The problem with large duplication is that electronic collection use dominates (i.e., over 80% of library use in Pittsburgh and Drexel). Thus, the cost per use increases substantially for print collections. In fact, if Drexel had also continued its core print collection it probably would cost about $7.80 per reading versus $3.10 under the strategy chosen by Drexel. While there are journals in the large electronic collection that are infrequently read, the overall subscription and processing costs of the electronic collection is less than the cost had Drexel continued its core print collection and not acquired an electronic collection.
Thus, the impact of site licensing and multiple journal arrangements appear to be highly advantageous to libraries and their users. However, the long-term advantages to publishers are not as clear. Ultimately, as with single subscriptions, publishers must recover the high cost of processing articles and any other related activities. Generally, decline in reproduction and distribution costs of print journals have been counter-balanced with extensive computer and systems costs so that large costs still must be recovered. The question then becomes whether the many, varied license arrangements can produce sufficient revenue over time to cover these costs. While long-term licenses help reduce revenue volatility, there is no guarantee that the license policies provide the solution to the library and publisher problems.
8.10 Some Alternative Pricing Policies
One way in which the two cost/price component approach can be addressed is with site licenses. We have suggested one possible scheme to achieve this type of site license, as detailed below:
The license would cover the price paid for all journals provided to the organization by the publisher, regardless of whether the organization's library, department, or any employee subscribes to the journal.
The library and publisher would establish the current subscription cost of all print subscriptions to the publisher's journals in the organization.
The library would estimate the total readership in the organization of the currently purchased journals and estimate the subscription cost per reading (i.e., current revenue divided by total readings).
The first annual access cost would be this current total subscription amount.
Any electronic access to currently purchased print journals would be free. Electronic access to any other journals available from the publisher would be at the calculated cost per reading plus some allocated support costs.[15] Distribution of paper issues from any of the journals would be at the reproduction, distribution, and allocated support costs.[16]
During the first year, each access to the articles would be counted electronically and used as a basis for future charges on a cost per reading basis.[17]
The publisher must agree to ensure future access to all the journals covered by the term of the agreement, thus permitting the library to discard all relevant paper issues.[18]
This type of site license provides advantages to every participant. While libraries and their constituents pay the same amount to publishers as they do now, they achieve considerable savings in input processing, storage and maintenance (e.g., approximately $90 per subscription for a large library and $125 for a small one). They also save an estimated $1.43 per reading by avoiding current reshelving, directional reference and photocopying costs which, for a frequently-read journal, can be as much as the subscription price. Libraries also save on interlibrary borrowing or document delivery costs from journals in the publisher's database that they did not purchase. Finally, the library has the option to retain certain current periodicals or department collections in paper. These savings exceed any advantages that might have been achieved from reduced electronic journal prices.
Publishers have the advantage of retaining any cost savings they might obtain from electronic publishing, plus they receive additional revenue from distribution of electronic separates, either from their digital databases or royalties from document delivery services, that previously took place outside of their control.
Readers benefit by having the choice of obtaining articles in paper or electronic versions, both at substantial savings in their time and to their parent organizations. In other words, by this kind of negotiation, publishers win, libraries win, readers win, and funding sources win.
This kind of agreement, of course, may have downsides, but it is given to demonstrate the need to arrive at arrangements that can be beneficial to all participants in order to end the adverse effects of traditional pricing strategies.
Another pricing approach is to extend current price differentiation to reflect potential readership by purchasers. Varian (1996) argues that small niche markets, which accurately describe most scholarly publishing, are generally not well served if the producer is required to charge a uniform, single price. As mentioned earlier, purchasers/users always have alternative sources available to them if cost per reading is too high. Thus, amount of reading serves as a useful means for identifying classes of purchasers for differentiation. In fact, negotiating "bundles" of journals can achieve this objective. Furthermore, electronic journals provide a useful vehicle for charging on a transaction or potential transaction basis.
In another vein, Getz (1999) has suggested that readers be given personal debit accounts with libraries to access separate copies of articles. This would permit scientists to order separate copies from services depending on attributes of speed, image, quality, and accessibility that are provided at appropriate prices. This interesting notion, of course, can be extended to subscriptions in print or electronic media and other related services as well. Getz feels that such an account would end up serving users more effectively and relieve libraries of some clerical activities. The examples given involve academic libraries but are even more feasible in a special library environment.
Several alternative approaches to distribution that will require careful pricing policies are presented by others in this book. For example Halliday and Oppenheim (this volume) discuss three alternative models: one that follows traditional print, without the reproduction and distribution of print; one suggested by Harnad in which authors bear the article processing costs by producing and archiving the articles and providing them free of charge on the web (although recently he advocates institutional archiving); and a free-market model suggested by Fishwich and colleagues. Hunter (this volume) and Gazzale and MacKie-Mason (this volume) explore results of the PEAK experimentation. Hunter presents some innovative approaches to pricing and their advantages and disadvantages. Gazzale and MacKie-Mason examine three access products, how they are used, and what they cost users. Case (this volume) discusses the SPARC initiative and argues its merits.
All of these and other approaches warrant detailed examination, but one must keep in mind that the scholarly journal system has been successful because it has achieved certain minimal objectives, including
serving as a means of communication of new, peer-reviewed, and edited information. Thus, the information should be trustworthy and, to the degree possible, supported by other research findings;
being readily available to readers and accessible to an unlimited audience beyond the author's primary or immediate community;
providing permanent, locatable, and retrievable archives for the information, since many articles are read years after they are published;
continuing to provide alternative distribution means and media so that authors and readers can choose from alternatives that satisfy their specific needs and requirements, particularly to minimize their time and effort;
protecting against plagiarism, copyright ownership violation, and unauthorized modification or altering of the record of ideas, discoveries, and hypotheses tested;
properly conveying the concept of prestige and recognition for authors, their research, and their institutions.
Any proposals for changes in pricing policies or other modifications in the scholarly journal system should take such desirable objectives into account. Then the system use, usefulness, and value will be maintained and future pricing can be an opportunity and not a threat.
Notes
1. Surveys involved national probability samples of scientists (1977, 1984), audiences of Science and the Journal of the National Cancer Institute, and samples of scientists in organizations such as the National Institutes of Health, AT&T Bell Labs, Oak Ridge National Lab, The Johns Hopkins University, University of Tennessee, Drexel University, University of Pittsburgh and American Astronomical Society members. There may be some bias in organization surveys because the organizations are self-selected.
2. Estimates of readership of articles by this survey method are in fact biased on the low side because they miss readings that take place after the survey responses, they do not include readings of separate copies of articles (over 100 million currently), and they miss other article distribution means.
3. All uncited data come from Tenopir and King (2000) or are new, unpublished results.
4. Of course, it may be that intelligent professionals read more and get more recognition for their work, but the latter for their intelligence, not necessarily because they read a lot. Regardless, it shows that this resource is important to them.
5. For example: King, D.W., D.D. McDonald, N.K. Roderer, and B. Wood. 1976. Statistical Indicators of Scientific and Technical Communication. (1960-1980): Vol. 1 A Summary Report. GPO 083-000-00295-3 and King, D.W. and N.K. Roderer. 1978. Systems Analysis of Scientific and Technical Communications in the U.S.: The Electronic Alternative to Communication Through Paper-Based Journals. NTIS: PB281-847.
6. The increase in total cost is largely attributable to an increase in estimated number of scientists who are active in research, teaching, and/or other endeavors that involve reading scholarly journals; i.e., 2.23 and 6.38 million scientists in 1975 and 1998 respectively. Estimates of number of scientists are inexact (see Science & Engineering Indicators-2000, p.3-3 to 3-5).
7. (e.g., Halliday and Oppenheim this volume, Holmes 1997, Marks 1995, and Shaw and Price 1998)
8. The cost model also included 20 fixed and variable cost parameters such as setup costs associated with each issue, cost per page of editing and proofing.
9. We use 1995 data in Table 8.2, 8.3 and 8.4 because we had better data on circulation in 1995. Also, introduction of licenses and negotiated packages of journals has diminished the meaning and count of circulation. In 2002 we estimate average circulation to be about 4,800 subscriptions.
10. The tracking process took into account births, deaths, and splitting of journals into two or more journals.
11. There is a small distortion in the 1975 average circulation in that calculation from the data gives 6,300 subscriptions per title, but the average calculated from the sampled journals was 6,100.
12. See Tenopir and King (2000) for detailed evidence of this phenomenon.
13. Detailed examples of economic break-even points are given for decisions with personal subscriptions vs. use of the library and library subscriptions vs. obtaining separate copies in Tenopir and King 2000.
14. Of course, there are some attributes achievable through technology, such as links to back and forward citations, searchable databases, numeric data sets, moving graphics, and so on (Halliday and Oppenheim, this volume; Tenopir et al., 2003).
15. Support costs vary greatly among publishers. Our average is 29% above direct article processing costs. Halliday and Oppenheim (this volume) present other amounts.
16. We have observed allocated support costs of about 15% on direct reproduction and distribution costs.
17. Of course, one must establish what constitutes a "reading" based on electronic use as pointed out in Odlyzko (this volume).
18. The question of archiving journal articles is a contentious one between libraries and publishers, but it must ultimately be resolved. Some are proposing institutional archiving (see, for example, Harnad's September Forum).
9. Economic Models of Digital-Only Journals
We are exploring economic aspects of digital-only journals using Ithink Analyst, a modelling software package. We have produced three models and have used simulations to test model sensitivities. We will first describe some background to the models. We will then describe the software tool and how we used it. We will then describe each model in turn and, finally, describe our plans to further develop models of digital journal production and delivery.
9.1 Background
Much development of digital journals, especially digital parallels of print journals, has been conducted by commercial publishers. Their pricing models do nothing to address the serials crisis. More innovative pricing models have been developed by stakeholders from within the higher education (HE) community (e.g., Harnad and Hemus, 1997; Harnad, 1995b; Fishwick et al., 1998; Harnad, 1996). They seek an effective and affordable system for disseminating peer-reviewed scholarly articles. Their models often bypass commercial publishers; in other words, the journals are produced by the HE community. Proponents of these models claim that digital publishing can be significantly cheaper than print publication. They argue that as much as 70% of the total cost of journal production and distribution is incurred by printing and distributing print copy and that this is saved in a digital environment (Harnad, 1995; Duranceau, 1995; Harnad and Hemus, 1997; Harnad, 1996). This is contested by publishers who claim that the variable costs they claim, including print and distribution, account for only 20-30% of the total (Garson, 1996; Arjoon, 1999; Noll, 1993; Rowland et al., 1995; Fisher, 1997). Some of the difference between these positions is related to level of functionality that writers assume is necessary.
Proponents of alternative models argue that many publisher functions are unnecessary. Their models are often based on production of unsophisticated text articles produced at significantly lower cost. This approach can be criticised for two reasons. First, journal users expect additional functionality (Elsevier Science, 1996; SuperJournal, 1999a,b). They anticipate that digital journals will allow them to work more efficiently. Users consider core features to include the ability to browse, search and print, good system performance, critical mass and currency, and the facility for seamless discovery and access (SuperJournal, 1999a; McKnight, 1997; Electronic Publishing Services Ltd , EPS Ltd; Armstrong and Lonsdale, 1998; Butterworth, 1998; Jenkins, 1997; Rowland et al., 1997; Fletcher, 1999; Prior, 1997; Rusch-Feja and Siebeky, 1999; Petersen Bishop, 1998; SuperJournal, 1999b). User acceptance is essential if digital journals are to succeed.
The second criticism is that the elimination of some of the filtering and organisation that is traditionally done by publishers increases the work of librarians and end users. The net effect on the academic community may be increased cost. For these reasons, we did not study an end product consisting of unsophisticated text. Our models assume the core level of functionality that users demand. The development and inclusion of this enhanced functionality requires technical skill that is expensive. Publishers claim that the additional costs more than compensate for any savings from print and distribution. They argue that digital journals cost at least as much to produce and distribute as print journals.
It is difficult to compare the cost of digital and print journal production and distribution. Publishers are reluctant to disclose costs. Even if they did so, it would be difficult to compare journal costs across companies because different accounting practices are employed. The publishing industry does not employ activity-based costing. There has been academic work on activity-based costing of print journals, notably that of Carol Tenopir and Don King (see Chapter 8 and Tenopir and King 2000). The costs associated with digital publication are, as yet, unknown. The activities involved in digital publishing have yet to stabilise, making it difficult to determine costs.
We are building activity-based models so that we can develop a better understanding of the production and delivery of digital-only journals and of the different roles and costs involved in that process. These models also allow us to explore alternative cost-recovery and pricing mechanisms.
To date, we have built and tested three models of digital-only journal production and delivery. These models were based on a review of the literature supplemented by personal communication with practitioners. The models were built as part of a project which evaluated economic models of a number of aspects of the digital library within a four-month period.[1] In 2000, Leah Halliday conducted interviews with several stakeholder groups and revised the models in line with the data that she collected. The results suggest that publication is most efficiently undertaken by professional publishers within an organisation that is dedicated to journal publishing (Halliday and Oppenheim, 2001a,b,2000b,a).
9.2 The models
We will now describe the three models that we have developed. Journal production and delivery is an international business, but these models were built from a UK perspective. Thus, for example, staff costs are based on UK figures and where value added tax is applicable in the UK it is applied at the rate of 17.5%. Where we quote figures, however, we have converted them to US$ at the exchange rate in February 2000.[2]
We refer to the first model as "traditional". It models a process similar to that of print journals. This model is included for comparison with current practice but does not include production of print. In this traditional model, authors, referees and editors are unpaid. Editors receive from the publisher only a contribution towards editorial office costs. Production and delivery costs are recovered through sales of subscriptions and individual articles. The model differs from print production in that the entire editorial process is conducted electronically and the product is delivered to libraries in electronic form.
The second model is of a non-commercial journal that is available for use free of charge on the Internet. This model is based on the work of Stevan Harnad (Harnad and Hemus, 1997; Harnad, 1995b,1996). His model is based on the premise that academics submitting papers to journals for publication seek to disseminate their findings widely and would contribute to costs to facilitate widespread dissemination. In a print environment, it was necessary to accept access restrictions because print publication is expensive and publishers had to recoup their costs. In a digital environment, Harnad argued, costs can be reduced by as much as 70%, bringing them to a level that can be recovered from authors rather than subscribers. Harnad proposed that authors pay page charges and that journals be available to all users free of charge on the Internet. He suggested that the author fee should be around $400 for a 20-page article. Recovering costs from authors would actually contribute to cost reduction as subscription administration would be unnecessary.
The third model is a free-market model. It is based on a supporting study commissioned by the UK Electronic Libraries Programme (eLib) and conducted by Fishwick et al. (1998). Fishwick et al. compared a number of different models for pricing electronic scholarly journal articles. Their report suggested that the current academic information delivery chain is inefficient due to a number of distortions in the supply-demand chain. Among these are that: (1) authors represent a principal source of demand for publication but make no contribution to publication costs; (2) those consuming the information, i.e. the readers, seldom pay for it, preferring instead to obtain it from libraries; and (3) much of the journal publication work is undertaken by editors and referees without payment, or with minimal honoraria.
Fishwick et al. proposed an alternative model which introduced `normal' market feedback mechanisms into the academic information delivery chain with a view to developing an efficient market for scholarly articles. Publication would be funded by a combination of author submission fee and by sales of subscriptions and/or individual articles. Thus, both authors and users would contribute to costs, reflecting the fact that both contribute to demand. Editors and referees would be paid to encourage efficiency, and authors would receive royalties. Fishwick et al. argued that if authors paid to have their work published and received royalties based on the number of copies sold, they would submit for publication only their best work. Rather than publishing as many papers in `minimum publishable units' to maximise their perceived research output, they would concentrate their best work in fewer, high-quality papers. to encourage them to submit for publication only material of the highest quality. The system includes a mechanism to support authors who cannot afford to pay a submission fee. The editorial office would apply to charitable foundations to fund these papers. Papers would then be available individually or in customised bundles from the publisher database.
Fishwick et al. also suggested that the facility to print from digital journals be rationed even when the library obtains a journal or database of articles by site license and thus, has paid in advance for unlimited access by end users. They argued that this would force end users to identify and select only journal articles that they genuinely read rather than filtering after printing. This would generate usage data for librarians (and possibly publishers) that would reflect real need, argued Fishwick et al..
This recommendation suggests that end-users currently waste resources by gathering information that they do not need. Given that researchers' time is scarce, this seems unlikely. Rather than making the system more efficient, rationing might prejudice researchers' ability to do their jobs. This is a potential practical problem. There are also cultural barriers to the market model. It is important to some academics in their roles as authors, editors and referees, that scholarly publishing operate independently of market forces. They believe that direct financial remuneration introduces motives that have no place in the system (L. Halliday, unpublished data 2001).
All three of our models represent the full publication cycle from receipt of manuscripts by the editor to delivery to end users. The resources required to produce and deliver journals are similar in each model. Staff costs are most significant. All of the models include two half-time staff responsible for production and systems. In the market model, where editors and referees are paid, the total financial cost is substantially higher than in the other models. We included an overhead on staff costs which represents, for example, buildings and support such as personnel and training, i.e. resources that are not related directly to products such as journals. We pitched the overhead rate at 120%. This reflects true costs in a large organisation such as a university. As these alternative models are proposed as HE-based operations, we think it realistic that they be costed as if housed in universities. It is important to recognise that just because work is undertaken without charge does not mean that it is cost-free. In economic terms, production that distracts an academic from her/his core tasks, i.e. research and teaching, may be more expensive than production that is undertaken by someone with the required skills who is dedicated to journal production. Nevertheless, we recognise that it may be possible to produce journals in a leaner organisation so we applied the overhead at 60% and re-ran model simulations for comparison. We also varied the surplus applied from zero to 20% in two of the models. We assumed that some surplus would be required for development of the journal. The free-access model is a much leaner model and does not include a surplus. Development would have to be funded through grants or other sources of funding.
9.3 Modelling software and simulations
The software package we used is called Ithink Analyst.[3] Four key element types are used to build Ithink models.
A stock represents an accumulation. The items accumulate by flowing into and/or out of the stock (see description of the `flow' below). The total content amounts to the inflow minus the outflow at each time period in a model simulation. In many of the stocks represented in our models the inflow and outflow are equal. For example, a journal editor receives a number of manuscripts every year. Of those, he or she rejects a very small percentage and the remainder are sent for peer review. The same number of manuscripts enter and leave the editorial office.
A flow either fills or drains a stock in the direction of the flow arrow. A cloud at either end of a flow indicates an infinite source of or destination of the material flowing to or from a stock. Basically this indicates that the source of material passing through the flow is beyond the scope of the model.
A converter informs other elements in the model. It may contain a constant value, e.g. tax at 17.5%; an incremental value, e.g. 1 in year 1 and rises by 1 in each subsequent year; a variable which can be manipulated by a model user; or an algebraic relationship between different elements in the model.
A connector is like a wire which transmits information between elements in a model, e.g. in Figure 9.1 the flow labelled `xfer to ref' represents the number of manuscripts that are sent to referees to be reviewed. The value of this flow is determined by the number of manuscripts received by the editor (MS received), and the number rejected immediately, e.g. because the subject is unsuitable. The value of the converters is conveyed to the flow by connectors.
Each of our models consists of four interconnected sectors: content origination, publication, information brokerage, and the library function. The models all simulate production of a small journal which publishes 120 10-page papers per annum. We used Ithink to represent graphically the interrelationships that characterise each system. We then defined numerically each element in the model. Some of these definitions are equations which describe the relationship between two or more elements in the model. The bases of the equations and the assumptions in each model element are described within the model in element `documents.'[4] These can be viewed by a model user. The models are designed to be used rather than viewed. Although we deliberately kept them as simple as possible, the systems modelled are fairly complex. Pictures of whole models cannot be captured in a page.
Figure 9.2 is an example of the publisher section from the market model. It includes two stocks: "publication" and "publ budget". "Publication" is the accumulation of articles published in the journal. The flow from the "origination" sector into this stock is not shown. "Publ budget" is the publisher's budget. Costs ("publ spend") and profit ("publ profit") flow out of it and revenues flow in. The flows representing revenues from other sectors are not shown in Figure 9.2. Converters are used to calculate various values in the model. For example, total publishing costs are calculated with reference to publication costs, editorial costs and the overhead applied to those. The total publication costs informs the authors' contribution, i.e., the value of the author fee. Converters with rectangular buttons in them (e.g. "overhead" and "profit margin") represent those whose values are determined by the model user.
9.4 Results
We varied the value of elements in each of the three models and ran a series of simulations to establish the costs and benefits for different stakeholders in manipulating elements in this way and also to identify model sensitivities. As is evident from Figure 9.2, which shows only one quarter of a model, each model has a large number of elements that could be varied. The time period of the project severely limited the number of simulations that we were able to run.[5] We will now report the results of some of those simulations.
9.5 Traditional model
First, we ran a series of simulations to determine the subscription price of a traditional-model journal if the following elements were varied: the overhead rate, the profit margin, and the size of the subscription base. We display the results in Table 9.1.
Overhead rate | 120% | 60% | ||||
Profit margin | 0% | 10% | 20% | 0% | 10% | 20% |
No. of subscribers | Subscription fee ($) | |||||
200 | 1,062 | 1,167 | 1,274 | 772 | 849 | 927 |
500 | 425 | 467 | 510 | 308 | 340 | 371 |
1,000 | 212 | 233 | 255 | 154 | 170 | 186 |
2,000 | 105 | 116 | 127 | 77 | 85 | 93 |
20,000 | 11 | 11 | 13 | 8 | 8 | 9 |
It is clear from these figures that a journal making a modest profit and recovering full costs can be supplied to users for a modest fee as long as the subscription base consists of at least 500 subscribers. This gives an idea of how inexpensive journals can be without adopting an alternative cost-recovery model. We acknowledge, however, that the journal modelled is slightly smaller than the average scientific journal. Our modelled journal publishes 1,200 article pages per annum whereas an average journal publishes 1,434 article pages per annum (Tenopir and King, 2000, p.237). The effect of this is likely to be negligible.
9.6 Free-access model
We also ran simulations to determine the level of author fees that would be required to fund the free-access model. We present the results in Table 9.2. The fee varies depending on the overhead and profit margin. These fees were submission fees, i.e. they are based on the assumption that all authors whose papers are refereed contribute to costs. It has been argued that all authors should contribute to journal costs as some costs are related to administration and refereeing of papers regardless of whether they are accepted. It may be unrealistic, however to expect UK authors whose papers are rejected to contribute. Some journals in the USA charge submission fees which are not returnable but UK authors are less willing to pay fees for submission or publication (L. Halliday, unpublished data, 2001).
Overhead rate | 30% | 60% | 120% | |||
Rejection rate | 10% | 90% | 10% | 90% | 10% | 90% |
Submission fee ($) | 816 | 58 | 1005 | 112 | 1383 | 154 |
Per Page ($) | 81 | 6 | 101 | 11 | 138 | 15 |
Harnad suggested that fees of tens of dollars a page rather than hundreds of dollars a page would be acceptable and estimated that it would cost approximately $400 to produce a 20-page article. This gives a page charge of $20 which is insufficient to support our model. That is not surprising considering that ours is a model involving the employment of paid professionals to produce a journal with what we consider to include core functionality. Harnad suggested that professional publishing staff are unnecessary. His model and relied largely on unpaid contributions. Nevertheless, the fees generated by our model fall within a range that some authors consider acceptable. Acceptance of the free-access model requires authors to take a system-wide view of the costs and benefits of scholarly publishing as it affects the whole organisation including the library (see Tenopir and King 2000). The main barrier to implementation of this model is cultural: that is, getting authors to accept the principle of page charges. Few journals have tested this model. One example is the New Journal of Physics published by the Institute of Physics Publishing . Authors pay $500 per accepted paper. Submissions to this journal have been slow, but this is the case for any new journal. Authors' reluctance to publish in NJP may be related to concerns about digital publication per se rather than to the pricing model.
9.7 The market model
Clearly, the financial cost of producing a market-model journal is high because editors and referees are paid and authors receive royalties on their papers. Again, we will report on subscription fees and author fees. Fishwick et al. suggested that published papers be sold to users either by subscription to the publisher's whole list, by subscription to specific parts (e.g., within a specific subject area), by a two-part tariff which consists of a reduced subscription price combined with reduced transaction cost per individual article, or simply on a pay-per-use basis. We were unable to explore the likely proportion of subscriptions to sales of individual articles but we did consider the effect of sales of individual articles on author royalties. The author fee pays for editorial and refereeing work and contributes 10% of production costs. The author receives a royalty of 5% on subscriptions income and sales of articles. The administration of royalty fees adds to costs in this model as do additional tasks associated with unfunded papers — Fishwick et al. suggested that the editorial office should seek funding for these from appropriate charitable foundations. In the model, this administration is undertaken by a half-time secretary, who, we estimated, would be capable of processing 600 manuscripts per annum (a journal with an 80% rejection rate that publishes 120 papers per annum would process 600). Table 9.3 presents the submission if the overhead rate and rejection rate are varied.
Overhead rate | 60% | 120% | ||
Rejection rate | 10% | 90% | 10% | 90% |
Submission fee ($) | 651 | 562 | 703 | 579 |
Per Page ($) | 65 | 56 | 70 | 58 |
Obviously, the rejection rate has little impact on submission fees in this model because author fees contributed to only 10% of production costs. The fee is collected primarily to pay editors and referees who are unpaid in other models. The author pays for the peer-review function while subscribers pay for publication of journal articles.
In this model, we varied the value of the following elements to determine the effect on subscription price: rate of overhead, profit margin, and size of subscription base. The results, reported in Table 9.4, show that the subscription price of a market-model journal is generally 10-12% less than that for the traditional-model journal and the latter does not include a submission fee.
Overhead rate | 120% | 60% | ||||
Profit margin | 0% | 10% | 20% | 0% | 10% | 20% |
No. of subscribers | Subcription fee ($) | |||||
200 | 955 | 1,051 | 1,147 | 695 | 765 | 834 |
500 | 382 | 420 | 458 | 278 | 305 | 333 |
1,000 | 190 | 211 | 230 | 138 | 153 | 167 |
2,000 | 96 | 105 | 115 | 69 | 77 | 83 |
20,000 | 9 | 11 | 11 | 6 | 8 | 8 |
Royalty income is related to the sale of subscriptions and individual articles. The royalty is included in the market model as an incentive to publish only high-quality material. The royalty rate is related to journal income. Income is static as any increase in subscriber numbers is used to reduce the price of subscriptions and articles. Thus, author royalties increase only in relation to those of other authors published in the same journal, i.e. a relatively popular paper will generate more income for its author than one that is not frequently read.
Finally, we revised the traditional model to explore the figures generated if both authors and subscribers contributed to costs. This would effectively distribute costs across two groups both of which contribute to demand. The subscription fees generated by the traditional model are modest without author contributions. Author fees reduce them further. However, administration of both sets of fees would add to costs. It is often argued that authors and end users are drawn from the same group so the distinction is not necessary. This is not entirely true, however; many journal readers never write papers. Readers from industrial, professional and clinical settings often are not part of the academic research community. Thus, journals funded only by author fees would subsidise these users. The question to be asked is whether or not this matters as long as scholarly publication is as efficient as possible for the academic community.
9.8 Discussion
These models are first drafts. They contain flaws and omissions, some of which we have discovered although some may remain to be discovered. One example is that we were unable to separate subscriptions administration and maintenance from other publisher costs. We would have liked to represent costs associated with subscriptions by calculating part of the overhead as a percentage of sales income. This would reflect the fact that costs vary with the number of subscriptions. However, calculating the overhead in that way would have required a circular connection between model elements which is prohibited by the software package. It is important that we isolate subscriptions-related costs because they are eliminated when costs are recovered from authors. A fair comparison between models that recover costs only from authors and those charging subscription fees is impossible unless we can do so.
During 2001, L. Halliday built two models based on data from interviews with established commercial and learned society publishers, and with alternative publishers who publish from within universities. Subscriptions administration costs are isolated in these models.
Another important factor is the staffing level required to produce a digital journal. The models documented here were criticised as overstaffed. Halliday's work work during 2001, however, suggested that the models described here are understaffed. All of the activities associated with publishing the journal including production, marketing, and development are undertaken by these staff. Interview data suggest that a journal publishing 120 papers per annum on this basis would require two full-time employees. As staff costs and the overheads on them are the most substantial costs, alteration to staff levels would impact significantly on total costs.
Despite their flaws, these models have been useful for developing our understanding of the digital-journal production and delivery process, and for eliciting feedback. The models allowed us to explore journal publishing and elicited feedback that informed the design of a project, conducted during 2000 and 2001, during which Halliday built models that break journal publishing costs into discrete functions. Incurred by libraries, the costs associated with providing end users with access to digital journals were not modelled, as none of the librarians interviewed had a clear idea of the activities involved let alone the costs of those activities. The model building and simulation was supplemented with qualitative exploration of digital journal publishing and use. Many of the barriers to implementing `alternative' models or to the success of digital-only journals are cultural in getting authors to accept new charging mechanisms. Full details of this work have yet to be published.
Notes
1. The report from this project is available at the following URL: http://www.ukoln.ac.uk/services/elib/papers/supporting/#ukoln.
3. Information about Ithink can be found at the following URL: http://www.iseesystems.com/Softwares/Business/ithinkSoftware.aspx
.
4. They are also documented in the report of our project which is available at the following URL: http://www.ukoln.ac.uk/services/elib/papers/supporting/#ukoln.
5. Copies of the complete models are available to anyone who would like to manipulate them. They can be opened and simulations run using a free runtime version of the Ithink software which is available from the Ithink Web site.
10. Electronic Publishing Models and The Pricing Challenge[†]
How will the Internet change scholarly publishing, and how should it? Will print publishing become obsolete, or only be supplemented by online, searchable articles? Will the Internet somehow lead to a whole new system where raw `self-published' material is commented on widely by an expert community, supplanting the traditional notion of peer review by only a few, often anonymous, experts?
Amid these long-term, philosophical questions about the very nature and purpose of publishing, there is more immediate interest in understanding the economics of online publishing. Publishers are under great pressure to supply new capabilities available through online publishing, and to develop business models to ensure the future viability of scientific publishing in the new medium. Those who fall behind, or so it is feared, may ultimately fail as publishers, or face a kind of publishing oblivion as more and more readers rely on the Internet for locating information resources.
At the same time, librarians—-facing the ongoing explosion of new information sources, and consequent rising costs—-hope that Internet publishing will somehow lead to radical declines in publication prices. Lower prices are sought partly as a response to institutional budget pressures, and partly out of frustration with the perceived unreasonable pricing practices of some publishers. But a longer term and more important pressure exists as well: amidst the explosion of new information products, librarians are properly searching for a means to ensure that libraries continue to fulfill their traditional role providing broad access to comprehensive collections of information.
In this chapter, I will describe the current state of the transition underway in scholarly publishing from the publisher's point of view. I will also review a number of the electronic publishing models currently in use, focusing on Science and contrasting it with the typical scholarly publication. Finally, I will discuss the challenges faced by publishers in setting online prices, noting the critical importance of sales volume as a driver of online prices.
10.1 Scholarly Publishing in Transition
Scholarly publishing used to be a quiet, respectable backwater of the much larger, and more volatile, publishing industry. Where many consumer publications rise each year and fail quickly, scientific journals seem to prosper over time, once they overcome the considerable entry barriers. Cross-platform searchability and other features of the Internet are ideal tools for improving the utility of scholarly journals. This characteristic has cast scholarly publishers on the leading edge of the publishing industry, and has forced them to explore both the technical and business aspects of online publishing. Many scholarly publications are managed by non-profit associations, or by relatively conservative commercial publishers, who have little experience in managing such a complex transition.
Besides the reader benefits, publishers are attracted to Internet publishing for a number of reasons. One key attraction is the promise of wider readership and recognition for the publication. Publishers see an opportunity for brand preservation and extension in this new medium which has rapidly become widely available. These opportunities for expanded readership and branding, of course, spell a possible business opportunity in selling subscriptions and advertising online. Some see new revenue possibilities that never existed in print, such as the chance for widespread pay-per-article, an extended economic life for older content, and e-commerce in conjunction with advertisers or other new partners. Finally, many publishers also view their online presence in a defensive mode: bringing the traditional printed work online may be necessary in order to maintain the interest of current readers and protect the revenue base that already exists in print.
The Internet is regarded as a 'disruptive' technology, one that changes the service norms and economics of publishing (and other industries) in unpredictable ways. Therefore, online publishing is seen as a threat. To understand the current milieu of scholarly publishing, it may be useful to consider how online publishing has changed four of the forces that act on publishers: competitors, suppliers, buyers, and the market environment.
The Internet reduces fundamentally the cost of publication distribution. Thus, it lowers barriers to entry, inviting many new players into the field, along with the traditional, known competitors. Start-ups that seek to exploit a new technology are generally more nimble than traditional players, partly because they have less existing revenue at risk, and tend to eschew ingrained ideas about the new business.
Furthermore, a new technology with uncertain parameters leaves incumbent competitors unsure of how to manage the new situation. As leading companies attempt different models for creating a viable business, they send confusing or unreadable signals into the market. Competitors do not know how to assess these moves, or how to respond to them.
Suppliers to the scholarly publishing industry are also undergoing realignment as the Internet comes into wide use. Key suppliers for scientific publishers are the researchers themselves, who submit papers describing their latest findings for review and publication. As authors find they have more outlets for their work and discover new ways to communicate findings to their colleagues, they are empowered in their dealings with publishers. Though they still seek the imprimatur of independent refereed journals, they have more leverage to demand services and concessions. Technology workers have become similarly empowered as publishers (and most other industries as well) increasingly depend on their capabilities. These influences will tend to increase costs as publishers must compete for the raw materials and resources of their business.
On the flip side, suppliers on the distribution side of publishing have lost ground. Postal services and other distribution outlets (such as international carriers) face significant threats if many publications ultimately choose to publish online only. But this phenomenon doesn't benefit publishers by bringing them cost savings unless they can entirely abandon the distribution chain. In the short run costs will, perversely, tend to rise. Since distribution costs are to some extent volume based, publishers may face rising delivery costs per unit as their volumes drop. The same problem applies to the printers of scholarly publications. Savings are only available to the publisher by abandoning the medium altogether. Reductions in print volume will only tend to increase costs per unit.
Demand is another force operating on publishers. Publishers are affected by both buyers and readers, who for scholarly journals are often not the same persons. They include individual subscribers and the libraries that support many scholarly publications. Consumers of information gain from the Internet by obtaining much better access to alternative information sources. As competition increases among information sources trying to bring value online, consumers' expectations will tend to increase for quality benefits and features from the publishers they support. Meanwhile, the sharing and copying of digitized information is far easier and less expensive for users than any previous copying method or device. This will tend to lower demand for subscriptions, licensing or pay-per-view services through which publishers may have expected to generate revenue. In addition, the interactivity enabled by chat groups and listservs gives consumers access to more accurate pricing and term information, which in turn makes them better negotiators for the services of publishers. Publishers will find that, with the easy access provided through convenient and inexpensive e-mail, their customer service costs will rise.
The general market environment for publishers is also undergoing transitions with unforeseeable consequences. Electronic distribution of information and the ability of users to duplicate and redistribute materials at very low cost hold unclear legal implications for copyright enforceability. In any case, assuming a constant level of enforcement, we should expect that lower copying costs will result in more copying, even if it is illegal. This would tend to suggest that publisher revenues are more vulnerable. Meanwhile, increased consumer awareness of privacy issues places pressures and restrictions on publishers' use of the data they gather about their readers, and has a dampening effect on marketing activities.
At the same time, some aspects of the market environment may be changing in favor of publishers. Some book publishers, for example the National Academy Press, are said to have seen an increase in sales as a result of making their content available for free on the Internet. This seemingly paradoxical result may be due to the general preference for reading things in print, combined with the wider resource location possibilities presented by the Internet. The availability of the products online for free may have served as an effective marketing tool. Where traditional marketing may have been too expensive or inaccessible to the small non-profit publisher, online the credibility of the publisher and the ability of the reader to `sample' the product by reading a chapter or two may have stimulated sales.
Might similar effects apply to scholarly periodicals? The remarkable usage figures of JSTOR articles reported by Guthrie (this volume) suggest that articles may enjoy a longer shelf-life by being digitized and made easily searchable online as part of a collection of refereed works. Nevertheless, it is not clear how or even whether increased usage of archived articles might lead to enhanced sales for the periodicals themselves.
Amidst downward pricing pressures from their buyers, publishers face increasing demands for sophisticated online services, which adds to cost. At the same time, it appears that a `centralization' of buyers is occurring. Even though a publisher may locate more readers online, it is less clear that these readers will become paying subscribers. Library services have, in effect, broken out of the library, as site-wide subscriptions bring the information to users' desktops. This is, of course, a good thing for the availability of information, but it does not necessarily lead to lower prices, since it may discourage personal subscriptions.
One reaction of publishers to all these influences and conflicting goals has been to develop a proliferation of access and pricing models in search of viable solutions to the business challenges they face. Next, I will examine some of the main access systems and pricing models in place among scholarly publishers, with an emphasis on the models in use by Science Online.
10.2 Access and Pricing Models
A variety of revenue strategies exist in periodical publishing. Controlled-circulation magazines rely almost exclusively on advertisers to pay the cost of producing a journal for a targeted market. The key to success in this type of publishing is to achieve very high coverage of a targeted market that also is a focal point for advertisers. This is an uncommon strategy for peer-reviewed journals however. The scholarly publishing industry primarily relies either on library sales or on personal subscriptions or memberships in a non-profit society for the ongoing revenue to produce the journal. Some rely on a mixture of these circulation revenue streams, along with advertising sales.
In these early days, it is not surprising that online strategies tend to reflect the print strategy of the publisher. Table 10.1 provides a summary of the main print business models, and their online corollaries.
Print Model | Revenue Stream | Market Features | Online Corollary |
Controlled circulation | Advertisers or external funds | Requires high market coverage; Seldom used for refereed journals | Free, ad-supported site; or free with print and registration |
Personal or Member subscriptions | Individuals | Refereed; tends to large audience; may be more general | Free or small fee with print; may allow online only |
Institutional subscriptions | Libraries | Small, specialist audience; tend to higher prices | Site-wide for fee or free with print |
Mix | Members and libraries (+ads?) | Complex interplay of markets | Unsettled, but usually fee + print required |
Controlled magazines will tend to put their contents online for free to users, attempting to attract advertising revenue to the site. Those journals relying on library subscriptions will most likely develop a library site-wide access model, while those relying on personal subscriptions or memberships may treat the online product as an added-value benefit of the print subscription. Mixed revenue models may include a mixture of advertising, individual subscriptions, and library subscription, not to mention licensing and authors fees, to provide revenue. Of course, nearly every magazine may have some mixture of these various types of revenue, but what is important in devising an online strategy is to fully understand which revenue stream or streams are the main drivers of the business.
Science receives some revenue from every one of the sources named above, and so is a rather complex case. However, there is no question that the economic drivers of the journal are membership dues and advertising. Though subscription sales and membership dues represent less direct revenue than advertising, they may nevertheless be seen as the underlying driver for the publication, since advertising sales are also premised on the journal's relatively large circulation. Library subscription sales represent a significant third revenue stream. Though the smallest revenue stream of the three, they represent a critical part of the mix. If there were no library sales, personal subscription rates would be appreciably higher. The corollary is that if there were no personal subscription sales, library rates would be significantly higher. The other revenue sources named, such as licensing, are modest in comparison. There is little reason to think the journal could be sustained on these revenue sources alone.
I have jumped directly to the complex case of Science, but it is worth mentioning the other paid subscription models. Essentially, there are two: library driven, and membership or personal-subscription driven. A rule of thumb, for which there may be many exceptions, is that if the circulation of a magazine is under 5,000, its underlying economics are probably library-subscription driven. Many scholarly journals fall into this category, even those published by associations. Of course, the economic underpinnings of a given journal may be as much a reflection of publisher's choices as of market conditions. Some journals may have low circulation because they serve a small, highly specialized audience and are only sustainable through library subscriptions. Others may have restricted their personal subscription support by setting prices at too high a level, or by deliberately choosing to focus on the library market.
What are the online corollaries to these print subscription models? Many scholarly journals, whether commercial or non-profit, are struggling with this question, and many different experiments with access types are being conducted. Among the most widely in use are
free public access after an embargo period,
free personal online access with membership or paid print subscription,
institutional site-wide access free with print subscription,
institutional site-wide subscription,
institutional access by subscription and restricted in some way, such as via
embargo on when the content becomes available online,
incomplete content,
limited geographic access, for instance to a library or a portion of campus or a single `site' (which may be a building, a campus, or a city),
limited virtual access to certain workstations or a subnet.
The online strategy a given publisher will pursue is closely related to the publisher's view of the likely interaction between print and online publishing in the short run. Since there are many difficulties and unknowns with the revenue model for online access, publishers will often hedge their bets by pursuing what may be thought of as a forced print model. In these models, online access may or may not be charged, but is conditioned on the retention of a print subscription. In most cases, the online product will be treated as a supplement to the print, and charged (if at all) as an ancillary service. Though this is a conservative approach, the forced print model cannot be written off as merely a reactionary and futile attempt to preserve print. Science has substantial user feedback suggesting print is still highly valued among readers. Online values such as immediacy and searchability are highly desirable as complements to the print, but not as substitutes. Offering the two media together currently seems to be the best way for many journals to provide the best of both worlds and, coincidentally, to protect the principal revenue streams that emanate from the print product.
At the other extreme, some publishers will seek to capture the cost-saving potential of online publishing and, perhaps, to steal a march on competitors in the transition to electronic-only publishing. This approach will be particularly appealing to start-ups, although it is certainly not unheard of among traditional publishers. The strategy is reflected in business models that encourage the buyer to purchase online access only, and provide pricing incentives for doing so. In the case of many start-ups, the strategy could be called a forced online strategy, i.e., there is no print product at all. Other publishers will continue to provide print in response to reader demand, while setting discounted prices for online only, usually at 80% to 90% of the print price, to give buyers incentives to make the switch. Forced online strategies are relatively risky because of the many unknowns about user acceptance, ability to generate revenue from subscriptions or advertising, and sustainability of the system at reasonable costs. But they do make sense for smaller circulation publishers, especially of high frequency or high page-count journals, where substantial savings can be gained by pushing toward online delivery.
Forced print models attempt to preserve a journal's established revenue base of print subscriptions by offering online access as a free or low-priced added-value service. Generally, this will be a less risky approach than a forced online model. However, forced print models carry their own set of risks, and can be difficult to administer if the publication relies on both personal and institutional subscriptions. This is because institutional site-wide online subscriptions impinge on the personal subscription market, both online and in print, far more severely than institutional print subscriptions do. In print, accessibility to library copies is limited (one-at-a-time usage) and inconvenient (the reader must go to the library), so most frequent readers and many occasional readers will be strongly motivated to acquire personal subscriptions to the journals they find most important or useful. With the advent of site-wide access, however, the contents of journals are far more readily available to all users, thus presenting a temptation, especially among the marginal readers, to forego personal subscriptions. With site-wide subscriptions, there is no ability to reserve online access exclusively for paying individual subscribers. Further complications arise if an advertising revenue stream in print needs to be either preserved or migrated to online. Responses to this situation are the most complex and wide-ranging, in part because no one knows which will be most effective. Thus, many publishers are pursuing a variety of access models simultaneously. Table 10.2 shows the main access models currently in use by Science Online. Note that some of the access models provide only partial content in order to approach certain market segments, or achieve different business goals.
Access Model | Target Audience | Business Goal |
Personal access | ||
Free Samples/Searching | All potential users | Attract prospects |
Abstracts with registration | Moderate user | Readership for advertising; attract prospects subscription |
Pay-per-View | Infrequent user | Attract prospects subscription |
Full text access with fee | Members only | Subscription revenue; readership for advertising |
Institutional access | ||
Workstation access | Libraries, mainly public or high school, or colleges with minimal science focus | Economy access for broad range of primary ed. institutions |
Site-wide full text access | Universities/Research Institutes/Corporations | Subscription revenue; readership for advertising |
Consortial access | Universities, 2-year, and HE institutions | Expand site-wide subscription market; readership for adverstising |
Licensed content with embargoes or usage limits | Library segments with specialized needs | Ancillary revenue |
If the proliferation of access models appears confusing, the price structures in use for these many different models are all the more so. Some principles from print subscription pricing do seem to carry over into the online world so far, although not always with the same results.
There are several regularities in traditional scholarly journal pricing. In general, institutional subscription prices are well above the price for personal subscriptions. Higher circulation journals tend to have relatively lower prices than small circulation specialty journals. And lastly, the narrower titles serving very small populations tend to rely on library sales much more heavily than personal subscription sales. All these rough principles seem to hold true, at least so far, in online pricing. However, as we shall see, publishers face a series of perplexing problems and risks in setting online subscription prices. These issues are far from settled at this time.
10.3 Pricing Challenges
It is widely understood that the Internet presents an opportunity for substantial decreases in the costs of scholarly publishing. Because paper, printing and postage—the principle variable manufacturing costs of publishing—are quite substantial for nearly all publications, there is an opportunity for both publishers and buyers to capture some cost savings through online delivery. But there is also a good deal of misunderstanding about the economics underlying print publication costs and pricing.
There is more to publishing than covering these variable manufacturing costs. In accounting terms, any price must cover variable costs, fixed costs and margin. When a journal has other sources of revenue than subscription sales, of course, the costs may be spread out over different sources; thus, the subscription price will reflect a contribution to the total fixed and variable costs, but not necessarily full coverage. Even so, many scholarly journals are largely dependent on a single revenue stream for their existence, and more often than not, that single revenue source is library subscriptions.
Not all journals experience the same level of variable costs. The cost of serving each new subscriber can vary greatly, depending on factors such as the frequency of publication, whether the journal is distributed globally, the size of the circulation, and the number of pages per issue. In general, we can expect online distribution to significantly improve these costs, thus, in theory, making it possible for low circulation journals to publish many pages, circulate them worldwide, and publish as frequently as needed. But, of course, these savings will only be realized if and when publishers can abandon print publication altogether.
Another misunderstanding that may need clarification is the idea that manufacturing costs are the only variable costs a journal faces. They are not. For instance, the cost of maintaining subscriber records, sending renewal notices and bills, and providing customer service are all variable costs that rise as circulation of the journal increases. Some of these items may also be improved, but not eliminated, by use of the Internet. Science, for example, has begun to accept orders and renewals online, and this source of orders has increased rapidly, relative to more traditional sources such as direct mail.
The fixed costs of publishing cover things like overhead and the cost of all the staff (not just editors) needed to run a professionally produced journal. Fixed costs are not uniform across all types of print journals. They may vary based on the depth of peer review undertaken, the breadth of disciplines and issues covered, and the extent of the marketing and other support efforts needed to produce the journal. Staffing costs are not likely to be reduced by online delivery, and in fact may increase substantially. Increased reader expectations can drive demand for more editors, more technical staff, and more sophisticated customer services.
Besides staff costs, fixed costs include major technical systems required to maintain the publication. One reason for the pricing disarray that exists in scholarly publishing right now is the uncertainty about what the steady-state cost structure of online publishing will be. Everyone, by now, has come to realize that merely throwing a few files onto a server will not constitute a viable publishing operation. Publishers are expected to provide value-added services that exploit the special features of the Internet to improve searchability, linking to outside resources, and other aspects of the readers' experience; to maintain a number of back issues indefinitely; and to provide for a more permanent archive. Quality control is also a much larger problem online than in print. With the expectation of retaining back issues online indefinitely and integrating them with new material for searchability, quality control is a job that is, in a very real sense, never completed. All these activities represent new costs associated only with online publishing, and until a more settled view of expectations is reached, it will be difficult for publishers to assess accurately what their fixed costs will be.
Further complicating the situation is the centralization of buyers. As mentioned earlier, there is some reason to think that, even though the Internet may bring more readers than ever to a journal, there will be fewer paying subscribers. Libraries used to maintain multiple subscriptions to the most popular journals, but will purchase only one site license to online publications, no matter how popular they become.
More important—if the publisher relies on individual subscriptions—is the problem of library subscriptions cannibalizing the publication's personal subscription base. In print, this phenomenon is a minor factor, because many people will still decide to purchase their own copies for convenience and portability. Some individuals also like to retain their own personal collection of key journals. All this is swept away by institutional site licenses to journals. Many of the compelling benefits of personal print subscriptions are lost if the very same product is available online at one's desktop through the university. Though most readers still report a preference for the look and feel of print, and for its portability, these benefits are strained against the economic incentive to drop print and save the subscription cost.
If buying centralization continues to grow, it means that the fixed publication costs will be spread over a smaller number of payers, and thus will rise as a portion of total price. Depending on how much the size of the buying market declines, the effects on price can be surprisingly steep.
Margin, the third component of pricing, is usually expressed as a percentage of the gross cost of production. There may be endless arguments about how much margin (profit) is appropriate for a scholarly journal, or even whether any margin should be charged by non-profit entities. The fact of the matter is that nearly every important and vibrant publication will charge some sort of margin. A publisher cannot produce cash for improvements, fund startup projects (whether charitable or commercial), or even merely ensure that the journal has enough financial flexibility to weather an unforeseen crisis or to pursue an unexpected opportunity without generating some revenue in excess of the precise costs of producing the journal.
In a durable business, margin is expected to increase with risk. Among the risks faced by publishers navigating the transition from print to online publishing are
new competitive challenges,
increased demand for technically sophisticated information products,
potentially diminished print revenue base,
unclear cost basis,
centralization of buyers.
All these risk factors have been mentioned in other contexts in this paper. Given the number of unknowns and their financial implications, it may be predicted that publishers will price their new online products to compensate for substantial risk.
This review of publishing costs should provide a more nuanced understanding of the complexity of moving from print to online publication of scholarly journals. Although some publication costs will decrease in the transition to online, others will increase. Further, the total cost may be borne by a smaller number of paying subscribers. The net effect on subscription pricing is uncertain.
A simple example, summarized in Table 10.3, will illustrate the point. See the King and Tenopir (this volume) chapter for a substantive discussion of these effects, using actual industry cost and price averages. For this example, assume no other major revenue stream that will share costs or be affected by a transition to online, and no price differentiation among target market segments. The purpose is only to illustrate the effects of changes to the paying base on the pricing for a journal. Imagine a print periodical with 10,000 subscribers and a frequency of 12 issues per year. Suppose the fixed costs for producing the journal are $1 million, the manufacturing and distribution costs are $2 per issue, other variable costs are $.50 per issue per subscriber. Then the cost base per subscriber, exclusive of margin, would be $130: $24 in manufacturing costs, $6 in other variable costs, and $100 for fixed cost contribution. The publisher would likely add between $13 and $26 dollars of margin to produce a price per subscriber of, say, $149.
Print Scenario | Online, no centralization | Online, w/ centralization | |
Circulation | $10,000 | $10,000 | $7,000 |
Total Fixed Cost ($) | $1,000,000 | $1,000,000 | $1,000,000 |
Fixed contribution/subscriber/year | $100 | $100 | $143 |
Variable cost/subscriber/year | $30 | $11 | $11 |
Straight margin | $19 | $19 | $19 |
Percent margin | 15% | 17% | 12% |
Total subscription price | $149 | $130 | $173 |
Now suppose this journal switches to online publication, entirely abandoning print as a medium. Again, this scenario is simplistic in order to underscore what the economics of a fully online journal might look like after a transition is completed. I am ignoring, for now, the effects on pricing from producing the journal in two media simultaneously, although this is the reality facing many scholarly publishers today.
The middle scenario in Table 10.3 demonstrates the ideal circumstances for a journal moving online. Assuming that fixed costs remain the same, variable costs decline, and circulation sales hold steady when the journal moves online, there is reason to expect that both buyers and the publisher will gain by making the transition. With online production, the variable costs will decrease quite significantly. Suppose the manufacturing decreases by 75% to only $.50 per issue ($6 per year), while the other variable costs reduce to $5 per year. Costs that had represented $30 of the print price now represent only $11. Of course, if all other factors remained equal, this should be a boon to all parties. The publisher could lower the price and still maintain the same gross profit as before.
But all other factors do not remain the same. In all likelihood the fixed costs will be higher, as reader expectations increase. Even if the fixed costs do remain the same, if there is a decrease in the number of buyers, the fixed costs will have to be spread across a smaller group. The percentage increase in the fixed-cost portion of the price can be greater than the percentage decrease in subscriptions. If, for example, the number of buyers falls by 30%, dropping from 10,000 to 7,000, the fixed cost portion of the price will rise by nearly 43%, from $100 to $142.85. And overall, this would result in a higher base cost of $142 + $11 = $153. Even if the publisher accepts the same gross margin (which would be a thinner percentage), the final price would rise to $172, a 15% price increase to subscribers, despite the substantial decrease in variable costs.
The last scenario in Table 10.3 summarizes the pricing effects of a 30% decline in circulation sales due to cannibalization from moving the contents of the journal online. Among the distortions caused by this scenario are that the price for buyers increases 15% over the print price, and the publisher receives a lower marginal percentage for a more risky model, which defies normal business practice. If the publisher decided to maintain the same margin as print, the end price would rise even further, to $177, nearly a 19% increase for subscribers. The greater the pricing impact, the more probable that the circulation figures would decline. If they decline even more sharply than the projected figures, it could set off a "death spiral" reaction in the journal, also described in the King and Tenopir (this volume) chapter, where prices keep rising to cover the circulation shortfalls, thereby dampening demand even further.
Of course, this is just an illustration. For many reasons the end result for a particular journal may be different. For instance, a creative publisher could turn an online presence into other revenue opportunities, such as advertising. But these opportunities may be wishful thinking. There is little reason to believe that a publisher who cannot sell ads in the print journal would succeed much better merely for having the journal online. Indeed, if the publisher does sell print advertising, there may be a loss of revenue, since many advertisers remain skeptical of the online medium, and highly resistant to paying prices similar to print advertising rates.
Another possibility is that subscription losses of this magnitude may not occur. It is certainly true that the scenario described above allows some flexibility for the publisher to lose subscriptions. The break-even amount of subscription loss in the case above is around 16%. That is, assuming a drop in subscriptions to 8,400, and assuming the fixed costs for producing the journal stay the same, then the variable cost savings are enough to offset the increased portion of the price dedicated to fixed costs. So the problem for publishers isn't whether they will lose any subscriptions, but a more complicated problem: how much will fixed costs increase due to online publishing, how much will subscriptions decline, and how much variable-cost savings will there really be? It is the complicated interplay of these uncertain effects, along with the enticing but uncertain prospect of developing other revenue streams, that leaves the pricing of online journals a very tricky matter.
10.4 Conclusions
From the above example, we can ascertain a few principles to help guide publishers in assessing the risks and costs of a transition to electronic publishing. First, when the publication's variable costs are a larger portion of the total cost than the fixed, moving online will likely be less risky. This is because the greater the savings that can be accomplished from electronic publication, the deeper the subscription losses would have to be before they caused the fixed-cost distribution to rise more than the variable-cost savings. This enables us to create a profile of the type of journal that would be the best candidate for moving to online publication rapidly:
low circulation,
high page counts and/or high frequency,
narrow, focused editorial scope,
small, if any, reliance on advertising revenue.
Low circulation and high page counts would tend to lead to poor economies of scale, thus one would expect substantial variable costs. In addition, if the circulation were mainly library subscriptions, not personal, and were mainly purchased at one copy per institution, the likelihood of revenue cannibalization or centralization of buyers impacting the journal would be smaller. Narrow editorial scope would contribute to maintaining relatively lower fixed costs. Lack of advertising revenue would simplify the risk assessment for moving online and reduce the chances of revenue cannibalization. These circumstances describe a very significant number of scholarly journals, particularly those published by non-profit discipline-focused societies.
Should we, therefore, not expect to see journals moving online that do not meet these criteria? In some ways, high-circulation journals with a variety of revenue streams might seem to have everything to lose and nothing to gain by undertaking the transition. However, recall from the discussion at the beginning of the chapter that the attractions of online publishing for larger scholarly journals are many:
greatly enhanced reader benefits,
broader and more convenient accessibility,
brand extension and preservation,
possibility of substantial variable-cost savings combined with the promise of novel revenue streams,
ability to remain up-to-date and relevant to readers, to defend against obsolescence.
With both reader demand and library demand for enhanced service so high, it is inevitable that journals of all stripes will begin moving online. One important factor for the larger publications will be trying to create some way to either preserve the print subscriptions or translate the broad print audience of buyers to an equally broad audience of buyers online. The more widely fixed costs can be distributed the lower will be the price for all parties. In the print world, of course, this would be a commonplace understanding. Ironically, however, in the current context of online publishing, this modest insight seems like heterodoxy, since it follows from the counterintuitive assertion that, in some circumstances, online publishing could actually result in higher priced subscriptions than print.
Publishers, librarians and readers of scientific journals are all rightly inspired and intrigued by the great possibilities for electronic publishing to revolutionize and democratize scholarly communication. But if the publishing and peer-reviewing processes add value to those communications—and most observers continue to agree that they do—then these cooperating parties will need to come to a fuller understanding of the economics that underlie the process. Care must be taken that in the rush to implement new technologies to benefit readers, we do not undermine the fundamentals that make publishing a useful, as well as a financially viable, enterprise.
11. A Portfolio Approach to Journal Pricing[†]
11.1 Introduction
In recent years access to print journals has been threatened.[1] Beset by persistent journal price inflation (especially in the so-called STM fields, or science, technology and medicine) and stagnant budgets, many university libraries have been forced to re-allocate dollars from monographs to journals, to postpone the purchase of new journal titles, and in some cases, to cancel titles. As a consequence, libraries have often relied on interlibrary loans to satisfy faculty demands. This situation and its possible causes has been studied at great length in the library science literature. With few exceptions, a consensus has evolved which focuses on the growing importance of commercial publishers in the market for scholarly journals: Over the past decade or more, commercial firms have aggressively raised prices at a rate disproportionate to any increase in costs or quality. This appears to be especially true for the largest commercial firms.[2]
The research discussed in this paper is the first to assess the merits of this consensus from an economic perspective.[3] Have changes in journal costs and quality accounted for most of the price inflation or has the exercise of market power by publishers played an important role? In addressing this question, I offer both theoretical and empirical support for the latter alternative. A model of journal pricing is proposed that reflects the underlying demand behavior of libraries. Although individual users are interested in just a handful of STM journals, libraries maximize the usage of broadly-defined collections, e.g. all biomedical journals, subject to a budget constraint. The result is demand for a portfolio of titles. In practice this means that libraries rank titles according to cost/use from lowest to highest and then select the largest set of low-ranked titles that they can afford. In other words, unlike most markets involving differentiated products, it is not appropriate to model demand as a discrete choice process. Rather, the typical library attempts to provide access to as many STM journals as possible through a combination of subscriptions and interlibrary exchanges.
Given this portfolio demand, publisher pricing strategies are determined by the distribution of budgets and a title's relative quality. Since all journals in a particular demand portfolio compete for the same budget dollars, relative quality determines demand for individual titles (if prices are equal, higher quality journals experience greater demand.). In turn, the budget distribution influences whether, for example, high quality titles choose low prices and sell to most libraries or set high prices and sell only to the largest-budget institutions. Furthermore, the pricing model predicts that in some cases firms controlling larger portfolios of journals have an incentive to charge higher prices, all else equal. Thus, past publishing mergers may account for some of the observed price increases.
To evaluate this and other conjectures, a unique data set was assembled that includes cost, US price, and quality of information for 900 biomedical titles as well as holdings information for these same journals at almost 194 biomedical libraries.[4] These data are used to estimate a structural model to identify the separate impacts of journal costs, quality, and publisher market power. The results indicate that the firm-level demand for journals is highly inelastic, that quality- and cost-adjusted price increases have been substantial over the past decade, and that past mergers have contributed to these price increases. The fact that firm-level journal demand is inelastic, e.g. demand for a firm's titles decreases less than 1% when its prices increase 1%, is a sufficient condition for the exercise of market power. But the econometric estimates suggest that firms are not profit-maximizing, at least not in a short-term sense. One possible explanation is that, in anticipation of future growth in library budgets, publishers preserve future sales by pricing less aggressively today. This story can also account for the estimated annual price increases. The third result is that merger-related price increases for the acquired firms' titles were substantial, about 25%. Yet US antitrust authorities expressed no concerns about the respective mergers.
These results raise a number of policy questions: (1) Since STM journal content is a public good (funded in most cases by tax dollars), does the performance of commercial STM publishing constitute a market failure? If so, do better alternatives exist? (2) Do antitrust authorities need a new paradigm for academic publishing and other portfolio-type markets? (3) How will the growing transition to electronic distribution affect the status quo? I briefly address these questions at the conclusion of the paper.
The chapter is organized as follows. I first discuss journal demand. Next, I describe the journal pricing model. I then discuss the empirical model, describe the data, and present the estimation results. Finally, I conclude by discussing the policy issues mentioned above.
11.2 Journal Demand
From the perspective of a journal user, it might seem that demand for each unique journal title should be treated separately. For example, articles in Brain Research are distinct from and cannot easily substitute for articles in the New England Journal of Medicine, much less those in the American Economic Review. If demand for each unique title is independent, then the publishers of individual titles have the capacity to obtain monopoly returns. Mergers won't matter.
The notion that the demands for individual titles are unrelated is incorrect because it misidentifies the purchaser. Libraries, not readers, buy most of the subscriptions, especially for expensive STM journals. Thus it is the demands by libraries for different titles that determine whether mergers will create additional market power. Discussions with dozens of librarians revealed the following: their purchase of academic journals is generally based on two factors: annual subscription price and expected usage. To assemble and maintain their collections, most libraries appear to construct a cost per use ratio for each title.[5] Given a budget for a relevant academic field, e.g., biomedicine, they then proceed to rank journals from lowest to highest in that field according to this ratio, and identify a cutoff above which titles are not subscribed.
From year to year, as budgets and titles' usage change, collections are adjusted accordingly.[6] Over the past decade or so the general trend is for increases in library budgets to lag journal price inflation; a consequence is that many libraries have been forced to re-allocate dollars from monographs to journals, to postpone the purchase of new journal titles, and, in some cases, to cancel titles.
The most interesting aspect of library demand for journals is that individual titles within a given field are considered simultaneously. That is, the content may be unique, but on a cost per use basis different titles are substitutes. Titles compete with each other for budget dollars across an entire field of users served by the library, rather than demand for each title being independent as the user perspective suggests. Demand by libraries, the actual purchasers of subscriptions, is for a portfolio of titles drawn from a rather broad set.
11.3 For-Profit Journal Pricing
Given this demand structure, how do for-profit publishers price their journals?[7] Commercial journal publishers, like firms in any industry, will take into account the structure of demand and the likely strategies of competitors when setting prices. As described earlier, libraries — which constitute the bulk of demand for STM journals — attempt to purchase the most usage given their serials budgets.
To model how prices are set in this demand environment I assume that there are two types of library budgets, small and large.[8] I assume that each journal title is sold by a separate publisher. No price discrimination is allowed, i.e. annual subscriptions are sold for a unique price. Journal production includes two components: fixed, first copy costs, and a marginal cost. I assume the latter equals zero.
I consider a two-stage game. In the first period, each of the firms consider whether to target (through choice of content) all libraries or just those with large budgets.[9] Once these sunk investments have been made, each firm takes into account the pricing strategies of firms that have made a similar marketing choice.
Given these and some additional assumptions, we can show that firms owning high-use titles will target all libraries, and that the remaining firms will focus on the large-budget customers (an ordered equilibrium). The intuition for the ordered equilibrium is that differences in journal use offer a competitive (dis) advantage to (lower-) higher-use titles. All else equal, libraries will purchase higher-use titles. And if we assume that there is a sufficient number of small-budget libraries, firms owning the high-use titles will find it profit maximizing to sell to all libraries, while the remaining firms sell only to the large-budget customers. Although the latter could set a price low enough to attract large- and small-budget customers, it is not optimal for them to do so.
Furthermore, journal pricing for each target population is similar: owners of high-use titles charge higher prices. On the other hand, journal prices decrease as the aggregate usage of competing titles increases. The explanation for the first result is straightforward: since libraries rank titles according to cost per use, firms that own high-use titles have an incentive to set prices that exceed those of lower-use titles. Aggregate use matters since budgets are finite in size, i.e. as total usage increases, the competition for a fixed number of budget dollars intensifies, forcing a title (whose usage is fixed) to lower its price.
How do mergers affect outcomes in this simple model? There are a number of potential scenarios: mergers within budget classes, those across budget classes, and some combination of these first two cases. Consider the case of a within-class merger involving two high-use titles. What pricing strategy does the merged firm adopt? As we noted earlier, a journal's profitability decreases in aggregate class usage. This suggests that the merged firm might benefit from raising the price of one of its titles enough to cause the small-budget libraries to drop it and replace it with a lower-use title. This "jumping" between budget classes lowers the aggregate usage of titles sold to all libraries, and thus enhances the profitability of the merged firm's remaining general circulation title. The profitability of the "dropped" title may go up or down, depending on the model's parameters.[10] The sum of these two components will determine the post-merger pricing strategy. If the net effect is positive, then the merger is harmful: the average quality of library collections decreases.
11.4 Testing the Portfolio Theory
The Institute for Scientific Information (ISI) tracks citations in peer-reviewed titles for over 8,000 STM journals in various fields. Not surprisingly, the number of publishers, both commercial and non-profit, is large as well. With respect to biomedical journals, ISI tracks titles published by at least 70 companies. Over the past decade a flurry of merger activity has been observed in the STM publishing market, particularly in the past two years. Since the latter half of 1997 alone, at least six major commercial publishers have been purchased by competitors. In addition, numerous small-scale transactions involving one or two journal titles occur every year.
Although these recent natural experiments will provide a rich empirical opportunity in the near future (once several years of post-merger data are available), two mergers that occurred in the early 1990s should shed some light on the likely impact of this ongoing merger wave. In 1991, Reed-Elsevier purchased Pergamon and its large portfolio of STM titles, including some 57 ISI-ranked biomedical journals. At the time, Elsevier's biomedical portfolio numbered 190 rankded titles. During the same period, Wolters-Kluwer added Lippincott's 15 ISI-ranked biomedical titles to its collection of 75 ranked biomed journals. Since that time both companies's portfolios have grown further. In 1998, according to ISI data, Elsevier's portfolio stood at 262 ranked titles; Kluwer controlled 112 ranked journals.
Empirical Models
Previous empirical studies of journal pricing have not attempted to assess the extent of market power in the academic publishing market. Chressanthis and Chressanthis (1994) specified a reduced form hedonic model to study the determinants of pricing for economics journals. Their results suggest that prices are related to journal characteristics (e.g., as journal quality and size increase, so does price). Lieberman et al. (1992) estimated a supply and demand system using data for 225 ISI-ranked science journals. They find that supply is downward-sloping, consistent with the notion that publishing is characterized by scale economies at the individual title level. Based on this evidence they indirectly argue that entry by new titles has lowered circulation for existing journals, forcing the latter to raise prices to cover fixed costs. However, their model is unable to explain a significant portion of the observed price increases.
Results for two empirical models are reported here. First, to test whether libraries' acquisition strategies reflect a ranking of journals according to cost/use values, I estimated an exponential cumulative distribution function (cdf).[11] The expectation is that cost per use and journal demand are inversely related. Confirmation of this hypothesis provides support for the portfolio approach to demand.
Second, I estimated a structural two-equation model of supply and demand that measures firm-specific demand elasticities and explicitly accounts for the possibility of increased market power due to past mergers. Recall that inelastic demand is a necessary condition for the exercise of market power by publishers. Evidence of merger-related price increases is consistent with a portfolio market definition as well as the type of strategic behavior implied by the pricing model.
Data
For the period 1988-98, the U.S. Department of Justice collected publisher and price data for some 3000 journals, and holdings information from various libraries. I supplement these data with additional information extracted from the ISI's Journal Performance Indicators database (JPIOD). This database allows me to calculate annual citation rates for individual journals;[12] JPIOD also includes the number of papers published annually by each journal during the sample period.
My empirical analysis is focused on a subset of these journals, namely, biomedical titles. The reasons for this choice are several. First, based on my discussions with various librarians, biomedical libraries are most likely to evaluate their purchases using the portfolio approach described earlier; furthermore, these libraries typically make no distinctions among various biomedical disciplines, permitting us to consider all biomedical titles as part of a single, large portfolio.[13] Finally, practical considerations, including the fact that biomedical holdings data are reported in a relatively standard fashion, supported an initial focus on this subset of titles.[14]
During the sample period, almost two thousand ISI-ranked biomedical journals were published; complete time series were available for about 1800 of these titles. Of this latter group, almost 1400 were published by organizations with at least three ISI-ranked titles. For the analysis presented here, only journals sold by commercial firms with portfolios consisting of ten or more titles were considered (thus excluding journals distributed by small private publishers as well as the non-profits), or about 900 titles. Complete holdings data for 194 U.S. medical libraries were collected, representing in aggregate some 60,000 subscriptions to ISI-ranked journals; the libraries were randomly selected from the approximately 1500 Medical Library Association members. Libraries of all sizes are represented in the sample, some holding less than ten subscriptions, while others report collections exceeding 1,300 titles.
The sample period, 1988-1998, is useful in at least two respects. First, it is sufficiently long to assess whether price increases continue in the journal market. Second, as described above, the period contains a number of natural experiments, i.e., publishing mergers, that enables me to identify the impact of mergers on pricing. Growth via merger should be distinguished from internal growth arising from the introduction of new titles. The latter may produce benefits (such as coverage of emerging fields of study) that helps to offset any intentional competitive harm. Harm associated with acquisitions, on the other hand, is less likely to be balanced by substantial benefits. Journals are simply reshuffled and, based on the public statements made by merging firms, the fixed cost savings seem to be small.[15]
Descriptive Statistics
Using the ISI-defined biomedical portfolio and the corresponding library holdings, I calculate the actual size of various commercial publishers' journal portfolios as well as the number of titles subscribed to by the libraries in the sample (see Table 11.1)
# of titles published | # of subscribed ISI titles** | % | |
Blackwell | 112 | 99 | 0.88 |
Churchill-Livingstone | 17 | 12 | 0.71 |
Elsevier | 262 | 225 | 0.86 |
Harcourt | 118 | 109 | 0.92 |
Karger | 45 | 39 | 0.87 |
Mosby | 27 | 25 | 0.93 |
Plenum | 22 | 20 | 0.91 |
Springer | 99 | 87 | 0.88 |
Taylor | 19 | 16 | 0.84 |
Thomson | 41 | 36 | 0.88 |
Waverly | 37 | 35 | 0.95 |
Wiley | 78 | 70 | 0.90 |
Wolters-Kluwer | 112 | 98 | 0.88 |
Totals | 989 | 871 | 0.88 |
It is clear from this table that significant variation in portfolio size exists in the industry. Note that, based on the ISI numbers, the proposed 1998 merger between Reed/Elsevier, Wolters/Kluwer and Thomson would have affected about 42% of the biomedical titles owned by large commercial publishers.
In Table 11.2, I present information on average price, citations, cost per use (price/citation), and number of papers published for each publisher in the years 1988 and 1998.
Though prices, citations and paper counts generally increased during the period, the rate of change for prices was far more striking, resulting in higher cost/use numbers by the end of the period. For example, Elsevier's average journal price more than tripled during the period, while the corresponding citation and paper counts increased less than 25%.
I provide average circulation rates for titles by publisher in 1988 and 1998 in Table 11.3.[16] Given that nominal prices increased dramatically over the sample period, the apparent inelasticity of demand indicated by these numbers is notable. It suggests that library serials budgets increased sufficiently during the period to absorb most of the price increases.
1988 | 1998 | |||||||
Price ($) | Cites | Cost per Use | Papers | Price ($) | Cites | Cost per Use | Papers | |
Blackwell | 193 | 1575 | 0.40 | 123 | 508 | 2652 | 0.55 | 156 |
Churchill-Livingstone | 183 | 1726 | 0.26 | 103 | 721 | 2821 | 0.62 | 146 |
Elsevier | 482 | 3477 | 0.36 | 179 | 1548 | 4222 | 0.78 | 204 |
Harcourt | 209 | 3713 | 0.18 | 164 | 518 | 5294 | 0.34 | 171 |
Karger | 321 | 893 | 0.59 | 86 | 711 | 935 | 1.01 | 79 |
Mosby | 100 | 4071 | 0.07 | 248 | 241 | 5369 | 0.15 | 269 |
Plenum | 233 | 1352 | 0.25 | 92 | 759 | 1733 | 1.86 | 121 |
Springer | 481 | 2268 | 0.44 | 141 | 1057 | 2386 | 0.84 | 153 |
Taylor | 259 | 759 | 0.48 | 74 | 658 | 572 | 1.67 | 55 |
Thomson | 207 | 1210 | 0.46 | 92 | 733 | 2788 | 0.45 | 140 |
Waverly | 119 | 3171 | 0.10 | 188 | 277 | 5770 | 0.16 | 237 |
Wiley | 333 | 2205 | 0.38 | 128 | 1409 | 3338 | 1.10 | 145 |
Wolters-Kluwer | 176 | 2535 | 0.19 | 154 | 504 | 3519 | 0.52 | 153 |
Unweighted Averages | 253 | 2227 | 0.32 | 136 | 742 | 3184 | 0.77 | 156 |
1988 (# subscribers) | 1998 (# subscribers) | |
Blackwell | 31.72 | 30.16 |
Churchill- Livingstone | 34.00 | 31.20 |
Elsevier | 30.08 | 27.92 |
Harcourt | 50.51 | 53.23 |
Karger | 28.81 | 22.77 |
Mosby | 94.50 | 96.55 |
Plenum | 27.61 | 22.89 |
Springer | 21.60 | 19.03 |
Taylor | 11.67 | 12.08 |
Thomson | 13.50 | 19.42 |
Waverly | 61.67 | 63.41 |
Wiley | 24.41 | 23.51 |
Wolters-Kluwer | 41.62 | 42.28 |
Unweighted Avgs | 36.28 | 35.73 |
Estimation Results
The results for the exponential cdf model are consistent with expectations. They suggest that higher cost/use journals are purchased by fewer libraries. For example, in 1991, the marginal journal for a $100,000 budget library has a cost/use value equal to about 0.22 and, using the parameter estimates, is held by about 30 of the 194 libraries in the sample. The marginal journal for a $200,000 budget library has a cost/use value equal to about 0.59, and is held by some 17 libraries.
Turning to the structural model results, the estimates imply that, after controlling for changes in citation rates and costs, publishers increased annual journal prices some 140% over the 1988-98 period (over the same period the Consumer Price Index increased by 37%). In addition, as a journal's citation rate improves relative to the average value in the sample, demand increases. Demand is apparently very inelastic. No firm-specific demand elasticity was more than 0.50 in absolute value. These small elasticities imply that publishers have an incentive to more than exhaust existing library serial budgets and any anticipated increases. This observation is consistent with numerous librarians' experiences and with what some publishers have privately acknowledged.[17] However, these estimates suggest that firms are not profit-maximizing, at least not in a short-term sense. One possible explanation is that, in anticipation of future growth in serials budgets, publishers preserve future sales by pricing less aggressively today. Under such circumstances, the estimated, firm-specific demand elasticities should lie somewhere between zero and one, in absolute terms.[18] Note that this story can also account for the estimated annual price increases.
Did the two publishing mergers earlier in the decade enhance the participating firms' market power? With respect to the Reed-Elsevier/Pergamon transaction the answer seems clear. Post-merger (1992-1998), Elsevier journal prices were unchanged but the former Pergamon titles experienced a 27% increase. This asymmetry is observed in the Kluwer-Lippincott merger as well. Post-merger, the former Lippincott titles experienced a 30% price increase while the Kluwer prices were unchanged. However, in this case the Lippincott price increase is not solely a consequence of enhanced market power. The results suggest that demand for Lippincott titles became slightly more inelastic in the post-merger period, contributing at least partially to the observed 30% price increase.
11.5 Policy Implications and Future Directions
Market Failure?
Efficient pricing is not sustainable in the declining average cost environment of academic publishing. This begs the question of how the performance of commercial publishers compares to a second-best break-even standard. Our analysis suggests that prices far exceed marginal costs, but do they exceed average costs? One way to assess this question is to examine the pricing of comparable non-profit titles; presumably non-profit publishers set prices closer to if not equal to average costs. If the latter prove to be cheaper, then scholars have a real alternative for disseminating scholarly information in a more efficient fashion.
Though a comprehensive analysis of non-profit journals is beyond the scope of the present paper it is useful to report some initial qualitative results.[19] In Table 11.4, I calculate average prices and citation rates for both commercial and non-profit ISI-ranked biomedical journals.
1978-1987 | 1938-1947 | |||||||
Non-Profit | Commercial | Non-Profit | Commercial | |||||
Variable | N | Mean | N | Mean | N | Mean | N | Mean |
PRICE($) | 27 | 287 | 343 | 736 | 17 | 379 | 28 | 838 |
CITES | 27 | 10304 | 343 | 2159 | 17 | 8946 | 28 | 7913 |
1968-1977 | 1928-1937 | |||||||
Non-Profit | Commercial | Non-Profit | Commercial | |||||
Variable | N | Mean | N | Mean | N | Mean | N | Mean |
PRICE($) | 26 | 306 | 221 | 919 | 4 | 139 | 19 | 591 |
CITES | 26 | 13907 | 221 | 2720 | 4 | 3547 | 19 | 3695 |
1958-1967 | 1918-1927 | |||||||
Non-Profit | Commercial | Non-Profit | Commercial | |||||
Variable | N | Mean | N | Mean | N | Mean | N | Mean |
PRICE($) | 17 | 446 | 101 | 1316 | 9 | 294 | 22 | 483 |
CITES | 17 | 14163 | 101 | 5067 | 9 | 6402 | 22 | 2949 |
1948-1957 | Before 1918 | |||||||
Non-Profit | Commercial | Non-Profit | Commercial | |||||
Variable | N | Mean | N | Mean | N | Mean | N | Mean |
PRICE($) | 11 | 289 | 58 | 625 | 16 | 292 | 69 | 702 |
CITES | 11 | 13445 | 58 | 3774 | 16 | 12593 | 69 | 3365 |
Titles are aggregated according to the decade of initial publication, going backward from 1987.[20] The discrepancy in average prices and citations for the two groups is striking. For example, if we compare titles that originated at similar points in time, we find that the average non-profit subscription price is between fifty to seventy-five percent less than the commercial rates for titles of similar vintage. At the same time average citation rates for the non-profit journals greatly exceed those of the commercial publishers' in most instances, sometimes by a factor of five. Among commercial journals, prices and citations are positively correlated. Thus, the substantially lower prices of comparable non-profit titles suggests that commercial publishers are setting prices well in excess of average costs.[21] Despite this apparent superiority,[22] the population of ranked non-profit titles is far smaller than that of the commercial journals, 148 versus 1032. Has the lucrative journals market induced too much entry or have the non-profits been too slow to exploit emerging research areas? Although this question deserves further attention it seems clear that the two distinct publishing models exist, each successful in their own way.
Antitrust Paradigms
When the proposed 1998 merger between Reed Elsevier and Wolters Kluwer collapsed, opposition from antitrust authorities in Europe and the U.S. was cited as a primary cause. Although no formal complaints were filed by agencies on either side of the Atlantic, regulators had sent a variety of signals indicating their serious concerns. Negotiations with the European Union had progressed the farthest and it appeared that the proposed deal would proceed only if the parties agreed to significant divestitures. It was widely reported at the time that the EU's preferred set of divestitures upset the financial logic of the merger and resulted in its demise.
What is interesting here is that the EU's main focus was not on academic journals, but rather legal publishing (in Europe), and that its theory of anti-competitive harm was based on a user-based approach to publishing mergers: excessive overlap in content (and therefore similar to the DOJ's approach to the 1996 merger of legal publishers Thomson and West). The U.S. focus was far different, in part because European legal publishing was not germane and because the model of harm relied upon was novel.
Though one can only speculate on how a U.S. antitrust case might have proceeded, it is clear that the combined Reed-Elsevier/Wolters-Kluwer entity would have controlled large journal portfolios in a number of broad fields, including biomedicine. Assuming that these broad fields constituted antitrust markets, some of these portfolios would have crossed the U.S. government's concentration threshold (based on the Antitrust Guidelines) with shares in excess of 30-35%. Based on the results discussed here, such a merger may have resulted in substantial price increases over time. If the U.S. had filed a complaint and had been successful with this market definition, an important legal precedent would have been set, one that would have made it easier to employ a portfolio theory in mergers involving combined market shares less than the threshold, e.g. the subsequent merger of Wolters-Kluwer and Waverly, and/or a large firm buying a relatively small portfolio of journals. The recent reluctance of the Antitrust Division to oppose several mergers in the publishing industry can be partially attributed to insufficient market shares. However, since many future deals are likely to be relatively small in scope, opposition to journal mergers will need to adopt novel approaches in the definition of both markets and concentration thresholds.[23]
A Digital Future
Scholarly journals render at least three services: research communication, archiving, and quality certification. Digital technology offers the potential to transform the first two by providing instantaneous access to current and past research. With modest investments in computer hardware and software, global scientific communities can dramatically lower the costs of exchanging information.[24] Though these innovations might seem to threaten the future of the traditional journal, the latter's role as a quality filter may be sufficient to preserve its existence, albeit in modified form. Although it is possible to conceive of new mechanisms for evaluating journal quality, e.g. measuring the number of hits generated by a journal website, it seems likely that the existing expert-based system for assessing new research will survive.[25]
Commercial publishers have begun to exploit these new opportunities by bundling their individual journal titles and providing libraries with electronic access to article databases.[26] In doing so, the economics of commercial publishing may change in (subtle) ways. Portfolio size will still matter, but the number of journals may matter less than the total article population. Digital technology will make it feasible to control, monitor and price access in new and myriad ways, suggesting that sophisticated price discrimination schemes could be observed someday. The prospect of bundling and price discrimination, of course, will inevitably raises antitrust issues. A few, large portfolios might reduce transactions costs for libraries yet have the potential for influencing new entry as well as pricing.[27]
11.6 Conclusions
This chapter offers a new framework for understanding the interaction between libraries and commercial publishers. A portfolio approach to journal demand is proposed that is consistent with the observed pattern of journal purchases. This approach to demand can be used to explain for-profit publisher pricing as well as the incentives for mergers in this market. Estimation of a structural model of supply and demand reveals that the firm-level demand for journals is highly inelastic, that quality- and cost-adjusted price increases have been substantial over the past decade, and that past mergers have contributed to these price increases. Together these theoretical and empirical results raise a number of policy questions regarding (1) the performance of commercial publishers, (2) the efficacy of current antitrust paradigms and (3) the possibility that electronic distribution may mitigate existing problems in the market for scholarly journals.
Notes
† I would like to thank many of my former colleagues at the Department of Justice, including Craig Conrath, Renata Hesse, Aaron Hoag, Russ Pittman, David Reitman, Dan Rubinfeld, and Greg Werden, as well as Jonathan Baker, Cory Capps, George Deltas, Luke Froeb, Jeffrey MacKie-Mason, Roger Noll, Dan O'Brien, Richard Quandt, Lars-Hendrik Röller, Steve Salop and Margaret Slad; seminar participants at the Federal Trade Commission, Georgia Tech, SUNY Stony Brook, and Wissenschaft Zentrum Berlin; and participants at the meetings of the American Economic Association, the European Association of Research in Industrial Economics, the Southern Economics Association, and Western Economics Association. The Association of Research Libraries and its members, the National Library of Medicine, the Georgia Tech Library, and the Georgia Tech Foundation have provided invaluable assistance. Expert data support was provided by a large group of individuals, including Deena Bernstein, Claude Briggs, Pat Finn, Doug Heslep and Steve Stiglitz. Finally, I would like to thank the dozens of librarians and publishers who have provided me with important insights.
1. Increasingly, journals are available in both print and electronic versions, and for some new titles only an electronic format is available. The advent of electronic journals is very recent, however, and is unlikely to have influenced behavior during the sample period analyzed in this paper. See Tenopir and King (2000), chapter 15, for a discussion of these changes.
2. See Tenopir and King (2000), Chapter 13, for a review of this literature. An alternative explanation for journal price inflation has been offered by Lieberman, Noll, and Steinmuller in their 1992 working paper, The Sources of Scientific Journal Price Increase, Center for Economic Policy Research, Stanford University. They argue that entry by new titles over time has lowered circulation for existing journals, forcing the latter to raise prices to cover fixed costs. They estimate a supply and demand system for a set of journals and find that supply is downward sloping, consistent with this notion that individual titles exhibit scale economies. However, after controlling for this and other factors there remains a significant inflation residual that is unexplained by the model.
3. See my working paper, Academic Journal Pricing and Market Power: A Portfolio Approach, November, 2000 for a complete exposition.
4. This data collection effort began in 1998 while I was still employed by the U.S. Justice Department's Antitrust Division. At that time, the Division was reviewing a number of proposed mergers between commercial publishers of STM journals, including (1) Reed-Elsevier, Wolters-Kluwer and Thomson; (2) Wolters-Kluwer and Waverly; and (3) Harcourt and Mosby.
5. This claim is generally true for medical libraries; though other types of academic libraries may not be as precise in their processes, they appear to behave in similar fashion. In any case, this is an empirical question that is tested using the holdings data.
6. Of course, this begs the question of how libraries measure usage for titles to which they do not currently subscribe. Presumably, evidence from interlibrary loans and citation data provide the basis for these measurements.
7. Commercial and non-profit journal publishers have different objectives. The latter are intent generally on disseminating knowledge, whereas the former are interested primarily in profits. I assume that the non-profit firms set prices to cover average costs and I ignore them in the analysis that follows.
8. I analyze this case, and also the case in which each library budget is unique (McCabe, 2000).
9. The choice of content will influence a journal's use in a library. So, for example, a general journal is likely to be used far more at a particular institution than a narrower, niche-oriented title.
10. Of course, a journal jump between budget classes influences the prices charged by other firms. In simulations of the merger scenario described above, the non-jumping journals experience modest price changes compared to the jump journals. The merged firm's high-use, jump journal exhibits large price increases; the non-merger, low-use, jump journal shows relatively large price decreases. This pattern persists as one increases the number of titles and budget classes. However, if the journal populations of particular budget classes are unchanged after a merger then the prices for those titles remain unchanged. Since it is likely that any observed merger will involve titles in different budget classes, it is possible that the merging firms' titles will jump in both directions, i.e. higher-use titles will jump "up" by increasing prices while lower-use titles will jump "down" by lowering their prices.
11. For those not familiar with the general concept of a cdf, consult any introductory probability textbook. Note that by specifying an exponential cdf I am assuming that cost per use and journal demand are inversely related. If this particular model "fits" the data, then there is support for the hypothesis.
12. The University of Wisconsin Libraries "Cost Per Use Statistics", previously available from http://www.wisc.edu/wendt/journals/costben.html (archived at http://wendt.library.wisc.edu/archive/journals/costben.html.
13. Note that confirmation of my portfolio hypothesis in the current context does not necessarily generalize to other acquisition environments in which cost per use is relied upon less or is more difficult to measure.
14. Unlike most fields, biomedical scholars enjoy the use of the National Library of Medicine's central database that contains information on several thousand medical collections. Although this data source offered substantial benefits with respect to the initial phase of data collection, the data were not ideally organized for analysis. One of the major difficulties was that many of the data — some 25% — were too idiosyncratic for data processing; as a consequence several hundred additional hours of manual effort were required to transform the data into usable form.
15. Furthermore, if publishing mergers do result in cost savings, economic theory implies that post-merger prices should decline, everything else equal.
16. These numbers exclude titles that commenced publication after 1988. Including these newer titles would tend to lower the reported 1988 figures relative to the later 1998 numbers.
17. According to one former publishing executive, "If we didn't raise our prices each year, our competitors would grab the surplus dollars available from our customers."
18. In a single period game, each publisher would attempt to forecast the size of journal budgets, and set prices so that its average absolute demand elasticity was close to one. However, in a multi-period context, with budgets increasing each period, a firm's pricing strategy changes. It is possible to show that firms will set prices so that absolute elasticities in each period lie between zero and one. The intuition is that lowering the price (and thus the absolute elasticity) in each period preserves future sales and, combined with budget growth, raises total profits.
19. The results described here are based on data for journals first published no later than 1988 and sold by publishers with at least three ISI-ranked titles.
20. Pre-1918 titles are considered together with the oldest titles dating from the 1820s. For younger titles, the average prices for the two groups are similar but the non-profit citation rates are about five times larger.
21. Since it is possible that smaller subscription bases may account for the commercial titles' higher prices, all else equal, a smaller subscription base raises average costs. I also calculated the average number of subscriptions for both types of publishers by decade. For some of these decade groupings, i.e. 1928-1937 and 1938-1947, the commercial titles actually exhibited larger subscription bases. In the other instances, commercial subscription bases were smaller than those for non-profit titles, but not enough, it would seem, to account for the observed price discrepancies. Except for the pre-1918 titles, commercial subscription bases were only 20 to 40% smaller than for the corresponding non-profit titles. In each of these cases, calculated revenues for commercial titles exceeded those of the non-profit titles. The average revenue level for commercial titles exceeded the corresponding non-profit values by 145%.
22. Some of this citation gap may be due to the more general subject matter of many non-profit titles, compared to the niche strategy of some commercial journals.
23. To avoid future antitrust scrutiny the large firms of the journal publishing world are likely to grow by adding relatively small numbers of journals at frequent intervals. If pursued diligently, this stealth strategy can be just as successful as any blockbuster merger.
24. The Los Alamos physics server is perhaps the best example to date of this digital future (http://xxx.lanl.gov/). This website, funded by US government sources, has become the standard method of exchange for physics working papers.
25. One important justification for this claim is that professional advancement within (academic) institutions relies on and supports the existing approach to quality assessment.
26. See the chapters by Gazzale and MacKie-Mason, and Hunter. For example, Elsevier's database product, ScienceDirect, www.sciencedirect.com, contains articles from its more than 1100 peer-reviewed journals in various disciplines. To gain access to the entire database or some customized subset, a library is required to maintain its Elsevier paper subscriptions. The access price is typically calculated as a percentage markup on the library's Elsevier "paper budget." Recently, Elsevier has begun to offer smaller bundles of titles that correspond to broad disciplinary markets, such as biomedicine.
27. For example, in the print context, cancellation of expensive journals has provided libraries with some modest ability to influence publisher pricing. If and when libraries begin to rely primarily on large, digital bundles for providing access to peer-reviewed research, the credibility of a threat to cancel an entire bundle will be far lower, reducing the effectiveness of this strategy. The impact of this change becomes particularly acute once a bundle grows beyond 50% of a specific market. It is easy to show that once this threshold is passed, perhaps due to a merger, that the profitability of the bundle is greatly enhanced.
12. Capitalizing on Competition: The Economic Underpinnings of SPARC
Over the last 15 years the library community has been faced with high and ever-rising prices for scholarly resources. A number of factors have contributed to this situation, most fundamentally, the commercialization of scholarly publishing. While libraries have tried a number of strategies to ameliorate the effects of high prices, the development of SPARC, the Scholarly Publishing and Academic Resources Coalition, finally seems to be having some positive effects.
This paper will review the current library environment, outline the elements that contribute to the marketplace for science, technology, and medical publishing, and briefly discuss the various calls for more competition in the scholarly publishing market. I will then discuss SPARC, a major initiative intended to introduce low-priced, high-value alternatives to compete with high-priced commercial publications for authors and subscribers.
12.1 The Environment
Over the past 15 years, libraries have struggled with the growing gap between the price of scholarly resources and their ability to pay. Data collected by the Association of Research Libraries (ARL), a membership organization of over 120 of the largest research libraries in North America, reveal that the unit cost paid by research libraries for serials increased by 207% between 1986 and 1999 (Association of Research Libraries, 1999). While serial costs increased at 9% a year, library materials budgets increased at only 6.7% a year. Libraries simply could not sustain their purchasing power with such a significant gap. Even though the typical research library spent 170% more on serials in 1999 than in 1986, the number of serial titles purchased declined by 6%. More dramatically, book purchases declined by 26%. With such a drastic erosion in the market for books, publishers had no choice but to raise prices (although not nearly as high as did journal publishers). In 1999, the unit cost of books had increased 65% over 1986 costs. As points of comparison, over the same time period, the consumer price index increased 52%, faculty salaries increased 68%, and health care costs increased 107% (Association of Research Libraries, 1999; Bureau of Labor Statistics, 2000; American Association of University Professors, 1986, 1999).
At the same time price increases were straining library budgets, an explosion in the volume of new knowledge and new formats was adding yet more stress. According to Ulrich's International Periodicals Directory, the number of serials published increased over 54% between 1986 and 2000 from 103,700 to over 160,000 titles (Ulrich's, 2000; Okerson, 1989). While the majority of these titles are not scholarly journals and would not be collected by research libraries, the data does give some indication of the health of the serials publishing industry. According to figures from UNESCO, over 850,000 books were published worldwide in 1996 (Greco, 1999). Data from the top 15 producing countries reveals that book production increased 50% between 1985 and 1996 (Greco, 1999; Grannis, 1991). In the meantime, electronic publishing is booming with the number of peer-reviewed electronic journals increasing well over 570 times between 1991 and 2000 (Association of Research Libraries, 2000a). While worldwide output of information resources increases dramatically, the research library is purchasing a smaller and smaller proportion of what is available. The typical library that subscribed to 16,312 serial titles and 32,679 monographs in 1986 is now able to afford only 15,259 serials and 24,294 monographs (Association of Research Libraries, 1999).
The overall high prices and significant price increases of journals have been traced to titles in science, technology, and medicine (STM). Price increases in these areas have averaged from 9 to 13% a year over at least the past decade (Albee and Dingley, 2000). Many librarians believe that the dominance of commercial publishers in STM journals publishing is one of the underlying causes of the high prices. In addition, the consolidation going on in the publishing industry raises even more concerns that fewer companies with greater market power will exacerbate current trends.
The growth of the commercial presence in scholarly publishing has introduced a market economy to an enterprise that had been considered by scholars as a "circle of gifts." Scholars have always been interested in the widest possible dissemination of their work and the ability to build on the work of others. To share their findings with colleagues and claim precedent for their ideas they have been willing to give away their intellectual effort for no direct financial remuneration. Their rewards come in the form of reputation within their fields and promotion and tenure from their institutions. Scholars trusted that the publishers to whom they gave their work were operating in their best interests, intent on furthering the scholarly enterprise through wide distribution of research results.
For a long time, this arrangement worked well. Publishers helped shape the disciplines by collecting manuscripts in specific fields, managing the peer review process, and marketing and selling subscriptions. But as a few large publishers recognized the earnings potential of a constant supply of free content that must be purchased by libraries, they raised prices higher and higher. As noted by King and Tenopir in Chapter 8, individuals who are more sensitive to price changes were the first to cancel their subscriptions. Eventually, however, even libraries were forced to launch major cancellation projects, decreasing access and resulting in additional price increases for the remaining subscribers. While it may seem counter-intuitive that selling fewer copies could increase revenues, there is some evidence to suggest that particularly in the case of mergers this is in fact the case (see McCabe, Chapter 11). So the wide distribution desired by authors can be directly opposed to the strategy used by publishers to maximize their profits.
It is important to acknowledge that commercial publishers are doing exactly what their stockholders would expect them to do. They are behaving responsibly toward their shareholders, their highest priority. This need to protect shareholder value, however, sometimes conflicts directly with the need for scholars to both distribute their own work widely and have ready access to the work of others.
12.2 The Noncompetitive STM Marketplace
Publisher profits begin to reveal the nature of the scholarly publishing enterprise, particularly that of journals publishing in science, technology, and medicine, and increasingly in business and economics. Some of the world's largest journal publishers are companies owned by Wolters Kluwer and Reed Elsevier. Data from these companies compared with that from the periodicals publishing industry as a whole show margins 2-4 times as high and return on equity almost twice as high (Wyly, 1998). In his analysis of these companies' financial data, Wyly notes that in conjunction with other evidence, "a high return on equity is at least a potential indicator that equity holders are benefiting from investing in activities not subject to competitive forces" (p. 9).
Other evidence consistent with high margins is noted by McCabe in Chapter 11 in his section on Data: Descriptive Statistics. McCabe's data show that while the prices of 1000 biomedical titles increased 3 times between 1988 and 1998, the average number of subscriptions held by 194 medical libraries decreased by only 1.5%. McCabe concludes that demand is very price inelastic at the firm level, a sufficient condition for the exercise of market power, and thus the existence of high margins. Can we conclude from this information that the market for scholarly journals is non-competitive? Of course high margins are not necessarily inconsistent with competition. Profits may be low due to large fixed costs. However, McCabe's econometric estimates reveal that quality- and cost-adjusted price increases have been substantial over the past decade. This evidence suggests that profits are high and that the market for scholarly journals is not competitive.
A number of factors contribute to this environment in which journal publishers operate. While faculty may desire to publish to share their work with colleagues, they are also driven to publish by the promotion and tenure system and the need to obtain grants. Faculty will submit their articles to the most prestigious journals in their fields or to the title that will most likely accept their work. As a title gains in prestige as measured by its impact factor (the frequency with which the average article in a journal has been cited in a particular year), it becomes more and more attractive to faculty, both as authors and readers, and takes on a dominance in the field. Faculty expect their libraries to subscribe to both the prestigious titles and to the second tier titles in which they publish.
Libraries, whose mission is to serve current and future scholars, purchase as many titles as their budgets will allow. With journals, their practice had been to set up a subscription which was generally not reviewed unless an unusual event, such as a dramatic price increase, drew attention to it. Once in a library's journals collection, there was little chance of a title being canceled. As publishers raised prices, libraries did everything they could to protect their journals budgets. But they had also inadvertently protected faculty from the reality of journal prices. In the professionalization of collection development in the 1960's and 70's, responsibility for selection had migrated from faculty to librarians. Faculty were no longer aware of the institutional prices that were being charged for the titles in which they published and to which they expected the library to subscribe. It is more typical, now, for libraries to review all of their journal titles on a routine basis to cull out little used, low value, or no longer relevant titles.
Yet another factor contributing to the market dynamic was the hesitance of scholarly societies to compete with well-established titles or to launch new titles in developing fields. Launching titles in new fields or niche areas could draw papers away from established society titles jeopardizing both their clout and financial stability. Thus, authors turned to commercial publishers to support their interests. As these new journals grew in size and prominence, it was less and less likely that societies would take the financial risk to compete. They could not afford to carry the loss for the approximately 5-7 years needed for a new journal to break even (Tenopir and King, 2000).
Libraries undertook a number of strategies to cope with the continuing increase in journal prices. They reduced dramatically the purchase of monographs, asked their administrations for special budget increases, and when these were not enough, canceled millions of dollars worth of serials. Libraries also turned to document delivery services and developed strategies to improve interlibrary lending performance. They sought to re-invigorate cooperative collection development programs. More recently, site licensing of electronic resources helps eliminate the need for duplicate print subscriptions while consortial arrangements are reducing unit costs at individual institutions while spreading costs across a wider range of libraries. While all of these strategies can help local institutions better manage their budgets, none of them have changed the underlying dynamics of a system where the publisher, operating in a non-competitive environment, can unilaterally set prices without a countervailing pressure from competitive forces.
12.3 Calls for Competition
In 1988, at the request of ARL, the Economic Consulting Services Inc. (ECS) undertook a study of trends in average subscription prices and publisher costs from 1973-1987. The study compared the price per page over time of a statistically valid sample of approximately 160 journal titles published by four major commercial publishers (Elsevier, Pergamon, Plenum, and Springer-Verlag) with an estimated index of publishing costs over the same time period. The study concluded that the price-per-page of the journals exceeded the growth in costs by 2.6 to 6.7% a year. This meant that these companies could be enjoying operating profits of 33 to 120% a year. The ECS concluded that "If such estimated rates of growth are reasonably accurate, then the library community would benefit greatly from such measures as the encouragement of new entrants into the business of serials publishing, and the introduction of a program to stimulate greater competition among publishers by injecting a routine of competitive bidding for publishing contracts of titles whose ownership is not controlled by the publishers (Economic Consulting Services Inc., 1989)."
In a companion piece to the ECS study, a contract report by Ann Okerson defined the causes of the "serials crisis" and proposed a set of actions to confront the problems. The report concluded that "the distribution of a substantial portion of academic research results through commercial publishers at prices several times those charged by the not-for-profit sector is at the heart of the serials crisis (Okerson, 1989)." The report went on to note that:
Satisfactory, affordable channels for traditional serials publication already exist. For example, there are reasonably priced commercial serials publishers. Many of the non-profit learned societies are already substantial publishers. University presses could substantially expand their role in serials publishing.... The serials currently produced by these organizations are significantly less expensive than those from the commercial publishers, even though they may increase in price at similar rates. Several analyses of the "impact" of serials, in terms of the readership achieved per dollar, show that those produced by non-commercial sources have a higher impact than commercial titles. (p.43)
Among the recommendations in the report was one centered on introducing competition: "ARL should strongly advocate the transfer of publication of research results from serials produced by commercial publishers to existing non-commercial channels. ARL should specifically encourage the creation of innovative non-profit alternatives to traditional commercial publishers." p.42
Over the next several years, ARL directed great energy at engaging stakeholders beyond the library community, such as societies, university presses, and university administrators, in the discussions of the scholarly communication crisis. The Association of American Universities (AAU) formed a series of task forces to address key issues related to research libraries. The Task Force on a National Strategy for Managing Scientific and Technological Information took up, as its name suggests, the issues related to scholarly journals publishing. In its report of May 1994, the Task Force called for competition, but this time competition facilitated through electronic publishing. The recommendation stated that the community should "introduce more competition and cost-based pricing into the marketplace for STI by encouraging a mix of commercial and not-for-profit organizations to engage in electronic publication of the results of scientific research (Association of American Universities, 1994)."
As a result of the work of the task force, ARL proposed several projects intended to address the crisis in scholarly publishing. These were rejected as too narrow or too broad or not directed at the appropriate leverage point in the system. Reaching consensus among the membership on a way forward seemed less and less likely. In the meantime, prices continued to climb. Finally, in May of 1997, at an ARL membership meeting, Ken Frazier, Director of Libraries at the University of Wisconsin, Madison, proposed that "If 100 institutions would put up $10,000 each to fund 10 start-up electronic journals that would compete head to head with the most expensive scientific and technical journals to which we subscribe, we would have $1 million annually.... I don't see any way around the reality that we have to put the money out in order to make this start to happen (Michalak, 2000)."
Within six months, Frazier's proposal had a name—SPARC, the Scholarly Publishing and Academic Resources Coalition, a business plan was in development, and potential partnerships were under discussion. In June 1998, a SPARC Enterprise Director was hired (Richard Johnson) and the first partnership (with the American Chemical Society) was announced.
12.4 SPARC
SPARC is a membership organization whose mission is to restore a competitive balance to the STM journals publishing market by encouraging publishing partners (for example, societies, academic institutions, small private companies) to launch new titles that directly compete with the highest-priced STM journals or that offer new models that better serve authors, users and buyers. In return, libraries agree to purchase those titles that fall within their collections parameters. By leveraging their subscription dollars, libraries reduce the financial risk for publisher-partners allowing them the time to build the prestige needed to attract both authors and readers.
Over 200 libraries and library organizations from Hong Kong, Australia, Belgium, Denmark, Germany, England, Canada, and the United States now belong to SPARC. Members pay a modest annual membership fee and agree to the purchase commitment.
A number of strategies must be pursued for SPARC to be successful. First, it must be able to deliver on library subscriptions to partners. This includes marketing support that reaches both SPARC members and the broader library community. SPARC is also working with prestigious societies and editorial boards. This is essential to build name recognition for SPARC and early interest in new titles. Raising faculty awareness of the issues in scholarly publishing is also a critical component of the SPARC program. Faculty who understand the context and are reconnected with the reality of journal prices are more likely to change their submission habits if there is a reasonably priced prestigious or promisingly prestigious alternative. This educational effort is also intended to encourage editors to become more engaged in the business aspects of the titles for which they work. Editors (or societies) can renegotiate contracts, move their titles, or start up competitors. SPARC must also catalyze the development of capacity and scale within the not-for-profit sector. Numerous studies have consistently demonstrated that journals published by societies or other non-profit publishers are significantly lower in price and higher in quality than commercial journals (See for example: Cornell, 1998; McCabe, 1999; Wisconsin, 1999; Bergstrom, 2001). However, STM publishing is clearly dominated by commercial companies. A recent market analysis by Outsell, Inc., estimates that commercial companies account for 68% of the worldwide revenue for STM primary and secondary publishers (Outsell, Inc., 2000). For a true competitive environment to exist, much greater capacity in the non-profit sector is essential.
As it has developed over the past two years, SPARC has categorized its efforts into three programmatic areas: SPARC Alternatives, SPARC Leading Edge, and SPARC Scientific Communities. In addition, SPARC is also supporting the Open Archives Initiative, an effort to develop standards to link distributed electronic archives. SPARC views the development of institutional and disciplinary e-archives as an important strategic direction for the future of scholarly communication.
SPARC Alternatives
The first and most directly competitive of SPARC's programs is the SPARC Alternatives. SPARC Alternatives are the titles that compete directly with high-priced STM journals. The first partnership in this category was that with the American Chemical Society (ACS) which agreed to introduce three new competitive titles over three years. Organic Letters, the first of these, began publication in July 1999. Organic Letters competes with Tetrahedron Letters, an $9036 title (the subscription price in 2001) published by Elsevier Science. ACS, one of the largest professional societies in the world and highly respected for its quality publications program, was able to attract three Nobel laureates and 21 members of the National Academy of Sciences to its new editorial board. Two hundred and fifty articles were posted on the Organic Letters website and more than 500 manuscripts were submitted in its first 100 days (Ochs, 2001).
A 2001 subscription to Organic Letters costs $2,438. The business plan calls for a fully competitive journal offering 65-70% of the content at 25% of the price. The effects of this new offering have already been felt. The average price increase for Tetrahedron Letters for several years had been about 15%. For 2000, just after Organic Letters was introduced, the price increase of Tetrahedron Letters was only 3%; in 2001 it was 2%. For 2000, the average price increase across all of the Elsevier Science titles was 7.5% and for 2001 it was 6.5%. If the price of Tetrahedron Letters had continued to increase at the rate of 15%, it would cost $12,070 in 2001. Subscribers have saved over $3,000 as a result of competition. Even if the title had increased at the more modest average rate of the Elsevier Science titles for 2000 (7.5%) and 2001 (6.5%), subscribers would be paying over $800 more for Tetrahedron Letters in 2001 than they are currently paying.
Even more importantly, the introduction of Organic Letters has had a significant impact on the number of pages and articles published by Tetrahedron Letters.[1] During the second half of 1999, the number of articles in Tetrahedron Letters declined by 21% compared to the same period in 1998 and the number of pages declined by 12%. In the first half of 2000, the number of articles decreased 16% compared to the first half of 1999 while the number of pages actually increased 5%. The loss in articles has been compensated for by increasing the number of pages per article, in the second half of 1999 by 11% and the first half of 2000 by 24%. Organic Letters, in the meantime, surpassed its projected pages and articles and has clearly demonstrated that quality, low-cost alternatives can attract authors. The second ACS SPARC Alternative, Crystal Growth and Design, will be introduced in 2001.
Another high profile SPARC Alternative is Evolutionary Ecology Research (EER), a title founded by Michael Rosenzweig, a Professor of Ecology and Evolutionary Biology at the University of Arizona. In the mid-1980's, Rosenzweig founded and edited Evolutionary Ecology with Chapman & Hall. The title was subsequently bought and sold, most recently in 1998 to Wolters Kluwer. During these years, the journal's price increased by an average of 19% a year. Fed up by the price increases and the refusal of the publishers to take their concerns seriously, the entire editorial board resigned. In January 1999, they launched their own independent journal published by a new corporation created by Rosenzweig. A subscription to EER was priced at $305, a fraction of the cost of the original title ($800).[2]
As of the end of 2000, EER had published 16 issues while the original title published only 6. Authors had no qualms submitting their papers to this new journal edited by respected scholars in the field. In fact, 90% of the authors withdrew their papers from Evolutionary Ecology when the editorial board resigned. EER was quickly picked up by the major indexes, surmounting yet another hurdle that faces new publications. And, most significantly, EER broke even in its first year. SPARC played a significant role in generating publicity about and, more importantly, subscriptions to EER. EER is another example of how a new title can quickly become a true competitor.
SPARC has a number of other titles in the Alternatives program. These include PhysChemComm, an electronic-only physical chemistry letters journal published by the Royal Society of Chemistry; Geometry & Topology, a title that is free of charge on the web with print archival versions available for a fee; the IEEE Sensors Journal, to be published by the Institute for Electrical and Electronics Engineers in 2001; and Theory & Practice of Logic Programming, a journal founded by an entire editorial board who resigned from another title after unsuccessful negotiations with the publisher about library subscription prices. New titles added recently include Algebraic & Geometric Topology, a free online journal hosted at the University of Warwick Math Department, and the Journal of Machine Learning Research, a computer science publication offered in a free web version. A number of other partnerships are under negotiation.
SPARC Leading Edge Partnerships
To support the development of new models in scholarly publishing, SPARC has created a "Leading Edge" program to publicize the efforts of discipline-based communities that use technology to obtain competitive advantage or introduce innovative business models. Titles in this program include the New Journal of Physics, the Internet Journal of Chemistry and Documenta Mathematica.
The New Journal of Physics, jointly sponsored by the Institute of Physics (U.K.) and the German Physical Society, is experimenting with making articles available for free on the web and financing production through the charging of fees to authors whose articles are accepted for publication. That fee is currently $500.
The Internet Journal of Chemistry is experimenting with attracting authors by offering them the opportunity to exploit the power of the Internet. This electronic-only journal was created by an independent group of chemists in the U.S., the U.K., and Germany. It offers the ability to include full 3-D structures of molecules, color images, movies and animation, and large data sets. It also allows readers to manipulate spectra. Institutional subscriptions to the journal cost $289.
Documenta Mathematica is a free web-based journal published by faculty at the University of Bielefeld in Germany since 1996. A printed volume is published at the end of each year. Authors retain copyright to articles published in the journal and institutional users are authorized to download the articles for local access and storage.
SPARC Scientific Communities
Another important program area for SPARC is the Scientific Communities. These projects are intended to support broad-scale aggregations of scientific content around the needs of specific communities of interest. Through these projects, SPARC encourages collaboration among scientists, their societies, and academic institutions. The Scientific Communities program helps to build capacity within the not-for-profit sector by encouraging academic institutions to develop electronic publishing skills and infrastructure, and seeks to reduce the sale of journal titles by providing small societies and independent journals alternative academic partners for moving into the electronic environment.
One of the most ambitious projects in the Scientific Communities is BioOne , a non-profit, web-based aggregation of peer-reviewed articles from dozens of leading journals in adjacent areas of biological, environmental, and ecological sciences. Most of these journals are available currently only in print. While there is a risk to societies of offering electronic versions of their titles through institutional site licenses, i.e., the loss of personal member subscriptions, there is a greater danger that scholarship not in electronic form will be overlooked and marginalized. But many of the societies do not have the resources or expertise to create web editions on their own. BioOne provides that opportunity.
BioOne, to be launched in early 2001 with 40 titles out of an eventual 150 or more, is a partnership among SPARC, the American Institute of Biological Sciences, the University of Kansas, the Big 12 Plus Library Consortium, and Allen Press. In an unprecedented commitment to ensuring that the societies not only survive but play an expanding role in a more competitive and cost-effective marketplace, SPARC and Big 12 Plus Library Consortium members have contributed significant funds to the development of BioOne. These funds will be returned over a five year period as credits against their subscriptions. BioOne offers participating societies a share in the revenues, protection against accelerated erosion of print subscriptions, and no out-of-pocket costs for text conversion and coding.
Several other Scientific Communities projects have received support from SPARC. These include eScholarship from the California Digital Library, Columbia Earthscape, and MIT CogNet. The goal of California's eScholarship project is to create an infrastructure for the management of digitally-based scholarly information. eScholarship will include archives of e-prints, tools that support submission, peer-review, discovery and access, and use of scholarship, and a commitment to preservation and archiving. Columbia's Earthscape is a collaboration among Columbia University's press, libraries, and academic computing services. The project integrates earth sciences research, teaching, and public policy resources. MIT CogNet is an electronic community for researchers in cognitive and brain sciences that includes a searchable, full-text library of major reference works, monographs, journals, and conference proceedings, virtual poster sessions, job postings, and threaded discussion groups. All three of these projects received funding from SPARC in a competitive awards process.
12.5 Evaluating the SPARC Model
The SPARC Purchasing Commitment
As SPARC was being developed, several key decisions had to be made to determine its scope of action. It was clear that the main goal of SPARC was to reduce the price of STM journals. Based on the several analyses of the journals crisis mentioned above, the SPARC founders believed that introducing direct head-to-head competition with high priced titles would be the most effective strategy for achieving this goal. But would SPARC itself be the publisher and actually fund and distribute the competing journals? Or would it provide development funds to established publishers who would launch the new titles? Or would the promise of library subscriptions be enough to encourage publishers to participate?
The SPARC working group quickly rejected the notion of SPARC becoming a publisher. Many able and sympathetic publishers already existed. Moreover, SPARC did not yet have a name. SPARC supported titles would need to develop prestige quickly to attract editors, authors, and readers, as well as subscribers. While prestige necessarily takes time to establish, the working group members believed that partnering with traditional scholarly societies and university presses known for their high-quality publications could help speed the process along. In addition, working with prestigious partners would help SPARC establish its own reputation.
SPARC, then, saw its role as a catalyst to encourage primarily not-for-profit scholarly publishers to create the new titles. Many working group members indicated their willingness to contribute substantial amounts of money to SPARC to allow it to provide incentives to publishers in the form of development funds. But early conversations with some potential partners revealed that, at least for traditional publishers, what was needed most was libraries' subscription dollars. The publishers were willing to absorb the up-front development costs if they could be assured that libraries would subscribe early on to the new titles. This would ensure wide visibility from the beginning, reduce the amount of time publishers would need to recover their investments, and avoid possible legal entanglements that could result from external funding arrangements. Hence the evolution of SPARC's incentive plan for publishers: a commitment by SPARC member libraries that they would subscribe to SPARC partner journals as long as the titles fit into their collections profile.
While the purchase commitment is one of the greatest attractions of SPARC for publishers, it is one of the most controversial parts of SPARC's program for some of its members. In essence, SPARC's alternatives program is creating new titles that members are expected to buy (or is contributing to journal proliferation, as some would say). The founders of SPARC recognized that changing the system would require investment by libraries. While they hoped that university administrators would provide special allocations to support SPARC fees and purchase commitments, it is more likely the case that funds are coming from already over-stretched collections budgets. Purchase of a new SPARC title likely requires the cancellation of another title. In theory, that other title should be the existing high-priced journal. But these are often established journals and cannot easily be cancelled. Over time, as competition works, the high-priced titles should lose authors to the new titles and should ultimately be forced to lower their prices or at least curtail their price increases. As valuable content is lost, the titles will become easier to cancel. But this takes time, and, in the meantime, some publishers have started to bundle their products, eliminating the opportunity to cancel.
Nevertheless, as the number of new SPARC alternatives grows, it may be possible for libraries to cancel only a few of the competitors to be able to recoup their investment in SPARC titles. In early 2001, the 10 commercial titles with which SPARC alternatives compete head-to-head cost a total of over $40,000. The 10 SPARC titles cost a total of just over $5,200. The cancellation of only a few of the established titles would easily pay for the SPARC titles.
In the meantime, SPARC has launched a program intended to make the cancellation of the original title easier. Called Declaring Independence, this effort is directed at journal editors and encourages them to evaluate the effectiveness of their current journals in meeting the needs of the researchers in their community. If the findings are unsatisfactory and they are unable to negotiate improvements with their current publishers, Declaring Independence gives the editorial board members suggestions for moving their journals elsewhere. As demonstrated by Evolutionary Ecology Research, prestigious boards will take authors with them creating a vulnerable time for the original journal as it struggles to find new editors and rebuild its author base. This is an opportune moment for libraries to cancel.
The Emergence of New Pricing Models
The founders of SPARC understood that publishing, however streamlined, cost money. The pledge of member subscriptions was a recognition of this reality. Most of the SPARC partners, particularly the traditional publishers, have maintained the typical subscription model for their new titles. A few community-based titles, however, are experimenting with alternative models. Three journals hosted by university mathematics departments are taking advantage of the ease of web-based publishing to offer their products online for free. Geometry & Topology and Algebraic & Geometric Topology are both hosted by the University of Warwick (U.K.). Documenta Mathematica is published at the University of Bielefeld in Germany. All 3 journals are run by faculty who are committed to "open-access e-journals [that] provide to authors and readers ... broad dissemination and rapid publication of research (SPARC, 2001)." All three produce a printed volume at the end of the year which is available at a minimal cost. According to the editors of Geometry & Topology, the most time-consuming part of the publishing process is the formatting of papers (Buckholtz, 2001). This work is being subsidized in part through the sale of the paper editions. This model may work while some libraries still feel compelled to purchase paper, but it is not clear what will happen when archiving and cultural issues are resolved.
Another model used by a SPARC partner is the charging of a fee to authors whose papers are accepted for publication. The New Journal of Physics (NJP), published by the Institute of Physics and the German Physical Society, is an electronic-only journal and available to the reader for free. A fee of $500 is charged to authors whose works are published. In order to encourage faculty to consider publishing in the NJP, a few libraries have offered to pay the fee for their faculty members. Approximately 60 papers have been published by the NJP in the last two years. As faculty have gotten less and less used to paying page charges, however, such fees may prove difficult to sustain.
Yet a third model is being explored by one of SPARC's newest partners, the Journal of Machine Learning Research (JMLR). JMLR is published by JMLR, Inc. in partnership with the MIT Press. Two electronic versions are offered: a free site maintained by JMLR, Inc., and a paid electronic edition available on the CatchWord Service. The paid version provides additional features including linking to abstracting and indexing services, archiving, and mirror sites around the world. Quarterly paid print editions are also available from MIT Press. It will be interesting to see whether the community will pay for enhanced features when a free edition is available and whether that choice may vary by "subscriber" type, i.e., a library or an individual.
12.6 Measuring Success
When SPARC was being designed, the developers set out a number of measures by which its success could be determined. These included
SPARC-supported projects are financially viable and significantly less expensive;
SPARC-supported products are attracting quality authors and editors;
New players have entered the STM marketplace;
An environment where editorial boards have been emboldened to take action has been created; and
STM journal price increases have moderated significantly.
At this point, SPARC has been in existence for only three years. But there are already signs that it is having the desired impact. Evolutionary Ecology Research is financially viable and is offering quality content for under 40% of the alternative. Organic Letters is on track to meet its financial goals and has been able to attract high quality editors and editorial board members. In addition, it has quickly attracted authors away from its competitor, as has EER. Others report strong starts and encouraging prospects.
Through the Scientific Communities program, SPARC is supporting new players in the market—partnerships have included libraries, library consortia, and academic computing centers working with societies, university presses, independent journal boards, and individual faculty. These projects are in their very early development but give a clear indication of the long term possibilities for expanding not-for-profit publishing capacity.
SPARC has also been very successful to date in focusing attention on issues through its advocacy and public communications efforts. This in turn has created an environment where editorial boards and societies are beginning to question their publishers about pricing and other policies. Some of these negotiations are successful leading to the lowering of prices as happened recently in the case of American Journal of Physical Anthropology. The American Association of Physical Anthropologists was concerned over the many cancellations of its journal that had resulted from high prices. The Association and the Publications Committee informed the publisher of its title that they were considering options, including the possible launch of a competitive journal. After extensive negotiations, the publisher and the Association were able to come to terms, which resulted in a reduction in the subscription price of more than 30% (Albanese, 2000).
Other negotiations between editorial boards and commercial publishers have not been as successful. In the case of the Journal of Logic Programming, the entire editorial board resigned after 16 months of unsuccessful negotiations about the price of library subscriptions. They have founded a new journal, Theory and Practice of Logic Programming, which began publication in January 2001 (Birman, 2000).
The ultimate aim of SPARC is to make scientific research more accessible by lowering prices for STM journals across the board. In 2000, the overall average increase in STM journal subscriptions fell below 9% for the first time since 1993 (Albee and Dingley, 2000). Elsevier Science, the largest STM journals publisher in the world, announced in 1999 that it was ending the days of double-digit price increases and set increases for 2000 at 7.5% and 2001 at 6.5% (Elsevier Science, 2000). These changes are significant.
For most SPARC member libraries, the savings represented by this decline is far more than their investment in SPARC and the creation of a more competitive market environment.
While SPARC may not be the only cause of these changes, it does seem clear that by raising the profile of the issues and achieving some early `proof of concept' success, SPARC has emboldened librarians, scholars, and societies to take action. Competition can work.
Notes
1. The following analysis is based on data collected by SPARC, July 2000.
2. An account of the development of Evolutionary Ecology Research can be found in Rosenzweig (2000).
13. RePEc, an Open Library for Economics[†]
arXiv.org, the eprints archive at founded by Paul Ginsparg and Los Alamos National Laboratory, continues to be the leading provider of free eprints in the world. Its subject focus is Physic, Mathematics and Computer Science. There is no evidence supporting the idea that similar collections can be built for other subject areas. This chapter is concerned with an alternative approach as exemplified by the RePEc digital library for economics. RePEc has a different business model and a different content coverage than arXiv.org. This chapter addresses both differences.
As far as the business model is concerned, RePEc is an instance of an "Open Library". Such a library is open in two ways. It is open for contribution (third parties can add to it), and it is open for implementation (many user services may be created). Conventional libraries—-including most digital libraries—-are closed in both senses.
As far as the content coverage is concerned, the term RePEc stands for Research Papers in Economics. However RePEc has a broader mission. It seeks to build a relational dataset about scholarly resources and other content relating to these resources. The dataset would identify all authors, papers and institutions that work in Economics. Such an ambitious project can only be achieved if the cost to collect data is decentralized and low, and if the benefits of supplying data are large. The Open Library provides a framework where these conditions are fulfilled.
13.1 Introduction
In this chapter I am not concerned with the demand for documents, nor am I concerned with the supply of documents.[1] Instead, I focus on the supply of information about documents. For some documents, holding detailed information about the document is as good as holding the document itself. This is typically the case when the document can be accessed on the Internet without any access restriction. Such a document will be called a public access document. Collecting data about documents is therefore particularly relevant for public access documents.
The main idea brought forward in this paper is the "Open Library". Basically, an open library is a collaborative framework for the supply and usage of document data. Stated in this way the idea of the open library seems quite trivial. To fully appreciate the concept, it is useful to study one open library in more detail. My example is the RePEc dataset about Economics. In Section 13.2, I introduce RePEc as a document data collection. In Section 13.3, I push the RePEc idea further. I discuss the extension of RePEc that allows one to describe the discipline, rather than simply the documents that are produced by the members of the discipline. In Section 13.4, I make an attempt to define the open library more precisely. The example of RePEc demonstrates the relevance of the open library concept. I conclude the paper in Section 13.5.
The efforts of which RePEc is the result go back to 1992. I deliberately stayed away from a description of the history of the work to concentrate on the current status. Therefore, insufficient attribution is given to the people who have contributed to the RePEc effort. See Krichel (1997) for an account of the early history of the NetEc projects. These can be regarded as precursors of RePEc.
13.2 The RePEc document dataset
Origin and motivation of RePEc
A scholarly communication system brings together producers and consumers of documents. For the majority of the documents, the producers do not receive a monetary reward. Their effort is compensated through a wide circulation of the document and peer approval of it. Dissemination and peer approval are the key functions of scholarly communication.
Scholarly communication in Economics has largely been journal-based. Peer review plays a crucial role. Thorough peer review is expensive in time. According to Trivedi (1993), a paper commonly takes over three years from submission to publication in an academic journal, not counting rejections. From informal evidence, slowly rising publication delays have been curbed in the past few years as journal editors have fought hard to cut down on what have been perceived to be intolerable delays.
Researchers at the cutting edge cannot rely solely on journals to keep abreast of the frontiers of research. Prepublication through discussion papers or conference proceedings is now commonplace. Access to this informally-disseminated research is often limited to a small number of readers. It relies on the good will of active researchers to disseminate their work. Since good will is in short supply, insider circles are common.
This time gap between informal distribution and formal publication can only fundamentally be resolved by reforming the quality control process. The inconvenience resulting from the delay can, however, be reduced by improving the efficiency of the informal communication system. This is the initial motivation behind the RePEc project. Its traditional emphasis has been on documents that have not gone through peer review channels. Thus RePEc is essentially a scholarly dissemination system, independent of the quality review process, on the Internet.
Towards an Internet-based scholarly dissemination system
The Internet is a cost-effective means for scholarly dissemination. Many economics researchers and their institutions have established web sites. However, they are not alone in offering pages on the Web. The Web has grown to an extent that the standard Internet search engines only cover a fraction of the Web, and that fraction is decreasing over time (Lawrence and Giles, 1999). Since much of economics research uses common terms such as "growth", "investment" or "money", a subject search on the entire Web is likely to yield an enormous number of hits. There is no practical way to find which pages contain economics research. Due to this low signal-to-noise ratio, the Web per se does not provide an efficient mechanism for scholarly dissemination. An additional classifying scheme is required to segregate references to materials of interest to the economics profession.
The most important type of material relevant to scholarly dissemination are research papers. One way to organize this type of material has been demonstrated by the arXiv.org preprint archive, founded in 1991 by Paul Ginsparg of the Los Alamos National Laboratory, with an initial subject area in high energy physics. Authors use that archive to upload papers that are stored there. ArXiv.org has now assembled over 150,000 papers, covering a broad subject range of mathematics, physics and computer science, but concentrating on the original subject area. An attempt has been made to emulate the arXiv.org system in economics with the "Economics Working Paper Archive" (EconWPA) based at Washington University in St. Louis, but success has been limited. There are a number of potential reasons:
Economists do not issue preprints as individuals; rather, economics departments and research organizations issue working papers.
Economists use a wider variety of document formatting tools than physicists. This reduces the functionality of online archiving and makes it more difficult to construct a good archive.
Generally, economists are not known for sophisticated practices in computer literacy and are more likely to encounter significant problems with uploading procedures.
There is considerable confusion as to the implications of networked pre-publication on a centralized, high-visibility system for the publication in journals.
Economics research is not confined to university departments and research institutes. There are a number of government bodies—central banks, statistical institutes, and others—which contribute a significant amount of research in the field. These bodies, by virtue of their size, have more rigid organizational structures. This makes the coordination required for the central dissemination of research more difficult.
An ideal system should combine the decentralized nature of the Web, the centralized nature of the arXiv.org archive, and a zero price to end users. I discuss these three requirements in turn.
The system must have decentralized storage of documents. To illustrate, let us consider the alternative scenario. This would be one where all documents within a certain scope, say within a discipline, would be held on one centralized system. Such a system would not be ideal for three reasons. First, those authors who are rejected by that system would have no alternative publication venue. Since Economics is a contested discipline, this is not ideal. Second, the storage and description of documents is costly. The centralized system may levy a charge on contributors to cover its cost. However, since it enjoys a monopoly, it is likely to use this position to extract rent from authors. This would not be ideal.
On the other hand, we need access points to the documents for both usage of the documents by end users, as well as for the monitoring of this usage. These activities are best conducted when a centralized document storage is availble, such as the one that arXiv.org affords. Otherwise the economics paperes become lost in the complete contents of the web and their usage is recorded in the web logs of many servers. Such usage logs are private to the manangement of the web servers. They can not be used to monitor usage.
To explain why the end-user access to the dissemination system should be free, it is useful to refer to Harnad's distinction between trade authors and esoteric authors (1995a). Authors of academic documents are esoteric authors rather than trade authors. They do not expect payments for the written work; instead, they are chiefly interested in reaching an audience of other esoteric authors, and to a lesser extent, the public at large. Therefore the authors are interested in wide dissemination. If a tollgate to the dissemination system is established, then the system will fall short of ideal.
Having established the three criteria for an ideal system, let me turn to the problem of implementing it. The first and third objectives could be accomplished if departments and research centers allow public access to their documents on the Internet. But for the second, we need a library to hold an organized catalog. The library would collect what is known as "metadata": data about documents that are available using Internet protocols. There is no incentive for any single institution to bear the cost of establishing a comprehensive metadata collection, without external subsidy. However, since every institution will benefit from participation in such an effort, we may solve this incentive problem by creating a virtual collection via a network of linked metadata archives. This network is open in the sense that persons and organizations can join by contributing data about their work. It is also open in the sense that user services can be created from it. This double openness promotes a positive feedback effect. The larger the collection's usage, the more effective it is as a dissemination tool, thus encouraging more authors and their institutions to join, as participation is open. The larger the collection, the more useful it becomes for researchers, which leads to even more usage.
Bringing a system to such a scale is a difficult challenge. Change in the area of scholarly communication has been slow, because academic careers are directly dependent on its results. scholarly communication. Change is most likely to be driven from within. Therefore, scholarly dissemination system on the Internet is more likely to succeed if it enhances current practice, without a threat to replace it. In the past, The distribution of informal research papers has been based on institutions issuing working papers. These are circulated through exchange arrangements. RePEc is a way to organize this process on the Internet.
The architecture of RePEc
RePEc can be understood as a decentralized academic publishing system for the economics discipline. RePEc allows researchers' departments and research institutes to participate in a decentralized archival scheme which makes information about the documents that they publish accessible via the Internet. Individual researchers may also openly contribute, but they are encouraged to use EconWPA.
Each contributor needs to maintain a separate collection of data using a set of standardized templates. Such a collection of templates is called an "archive". An archive operates on an anonymous ftp server or a Web server controlled by the archive provider. Each archive provider has total control over the contents of its archive. There is no need to transmit documents elsewhere. The archive provider retains the liberty to post revisions or to withdraw a document.
An example archive. Let us look at an example. The archive of the OECD is at http://web.archive.org/web/20010829193045/http://www.oecd.org/eco/RePEc/oed/. In that directory we find two files. The first is oedarch.rdf:
This file gives basic characteristics about the archive. It associates a handle with it, gives an email address for the maintainer, and most importantly, provides the URL where the archive is located. This archive file gives no indication about the contents of the archive. The contents list is in a second file, oedseri.rdf:
This file lists the content as a series of papers. It associates some provider and maintainer data with the series, and it associates a handle with the series. The format that both files follow is called ReDIF. It is a purpose-built metadata format. Appendix B discusses technical aspects of the ReDIF metadata format that is used by RePEc. See Krichel (2000) for the complete documentation of ReDIF.
The documents themselves are also described in ReDIF. The location of the paper description is found through appending the handle to the URL of the archive, i.e. at http://web.archive.org/web/20010627025821/www.oecd.org/eco/RePEc/oed/oecdec/. This directory contains ReDIF descriptions of documents. It may also contain the full text of documents. It is up to the archive to decide whether to store the full text of documents inside or outside the archive. If the document is available online—inside or outside the archive—a link may be provided to the place where the paper may be downloaded. Note that the document may not only be the full text of an academic paper, but it may also be an ancillary files, e.g. a dataset or a computer program.
Participation does not imply that the documents are freely available. Thus, a number of journals have also permitted their contents to be listed in RePEc. If the person's institution has made the requisite arrangements with publishers (e.g. JSTOR for back issues of Econometrica or Journal of Applied Econometrics), RePEc will contain links to directly access the documents.
Using the data on archives. One way to make use of the data would be to have a web page that lists all the available archives, and allow users to navigate the archives searching for documents of interest. However, that would be a primitive way to access the data. First, the data as shown in the ReDIF form is not itself hyperlinked. Second, there is no search facility nor filtering of contents.
Providing services that allow for convenient access is not a concern for the archives, but for user services. User services render the RePEc data in a form that make it convenient for a user. User services are operated by members of the RePEc community, libraries, research projects etc.. Each service has its own name. There is no "official" RePEc user service. A list of services in at the time of writing may be found in Appendix A.
User services are free to use RePEc data in whatever way they see fit, as long as they observe the copyright statement for RePEc. This statement places some constraints on the usage of RePEc data:
Within the constraints of that copyright statement, user services are free to provide all or any portion of the RePEc data. Individual user services may place further constraints on the data, such as quality or availability filters.
Because all RePEc services must be free, user services compete through quality rather than price. All RePEc archives benefit from simultaneous inclusion in all services. This leads to an efficient dissemination that a proprietary system can not afford.
Building user services. The provision of a user service usually starts with putting frequently updated copies of RePEc archives on a single computer system. This maintenance of a frequently updated copy of archives is called "mirroring". Everything contained in an archive may be mirrored. For example, if a document is in the archive, it may be mirrored. If the archive management does not wish the document to be mirrored, it can store it outside the archive. The advantage of this remote storage is that the archive maintainer will get a complete set of access logs to the file. The disadvantage is that every request for the file will have to be served from the local archive rather than from the RePEc site that the user is accessing.
An obvious way to organize the mirroring process overall would be to mirror the data of all archives to a central location. This central location would in turn be mirrored to the other RePEc sites. The founders of RePEc did not adopt that solution because it would be quite vulnerable to mistakes at the central site. Instead, each site installs the mirroring software and mirrors its own data. Not all sites adopt the same frequency of updating. Some may update daily, while some may only update weekly. A disadvantage of this system is that it is not known how long it takes for a new item to be propagated through the system.
The documents available through RePEc
Over 160 archives, some of them representing several institutions, in 25 countries currently participate in RePEc. Over 100 universities contribute their working papers, including U.S. institutions such as Berkeley, Boston College, Brown, Maryland, MIT, Iowa, Iowa State, Ohio State, UCLA, and Virginia. The RePEc collection also contains information on all NBER Working Papers, the CEPR Discussion Papers, the contents of the Fed in Print database of the US Federal Reserve, and complete paper series from the IMF, World Bank and OECD, as well as the contributions of many other research centers worldwide. RePEc also includes the holdings of EconWPA. In total, at the time of writing in March 2001, over 37,000 items are downloadable.
The bibliographic templates describing each item currently provide for papers, articles, and software components. The article templates are used to fully describe published articles. They are currently in use by the Canadian Journal of Economics, Econometrica, the Federal Reserve Bulletin, IMF Staff Papers, the Journal of Applied Econometrics, and the RAND Journal of Economics. These are only a few of the participating journals.
The RePEc collection of metadata also contains links to several hundred "software components"—functions, procedures, or code fragments in the Stata, Mathematica, MATLAB, Octave, GAUSS, Ox, and RATS languages, as well as code in FORTRAN, C and Perl. The ability to catalog and describe software components affords users of these languages the ability to search for code applicable to their problem—even if it is written in a different language. Software archives that are restricted to one language, such as those maintained by individual software vendors or volunteers, do not share that breadth. Since many programs in high-level languages may be readily translated from, say, GAUSS to MATLAB, this breadth may be very welcome to the user.
13.3 The ReDIF metadata
From the material that we have covered in the previous section, we can draw a simple organizational model of RePEc as:
Many archives ⇒ One dataset ⇒ Many services
Let us turn from the organization of RePEc to its contents. RePEc is about more than the description of resources. It is probably best to say that RePEc is a relational database about economics as a discipline.
One possible interpretation of the term "discipline" is given by Karlsson and Krichel (1999). They have come up with a model of a discipline as consisting of four elements arranged in a table:
resource | collection |
person | institution |
A few words may help to understand that table. A "resource" is any output of academic activity: a research document, a dataset, a computer program, or anything else that an academic person would claim authorship for. A "collection" is a logical grouping of resources. For example, one collection might be comprised of all articles that have undergone the peer review process. A "person" is a physical person; a person may also be a corporate body acting as a physical person in the context of RePEc.
These data collectively form a relational database describing not only the papers, but also the authors who write them, the institutions where the authors work, and so on. All this data is encoded in the ReDIF metadata format, as illustrated in the following examples.
A closer look at the contents
To understand the basics of ReDIF it is best to start with an example. Here is a piece of ReDIF data at http://www.econ.surrey.ac.uk/discussion_papers/RePEC/sur/surrec/surrec9601.pdf:[2]
When we look at this record, the ReDIF data resembles a standard bibliographical format, with authors, title etc.. The only thing that appears a bit mysterious here is the "Author-Person" field. This field quotes a handle that is known to RePEc. This handle leads to a record maintained at a RePEc handle server.[3]
In this record, we have the handles of documents that the person has written. This record will allow user services to list the complete papers by a given author. This is obviously useful when we want to find papers that one particular author has written. It is also useful to have a central record of the person's contact details. This eliminates the need to update the relevant data elements on every document record. In fact the record on the paper template may be considered as the historical record that is valid at the time when the paper was written, but the address in the person template is the one that is currently valid.
In the person template, we find another RePEc identifier in the "Workplace-Institution" field. This points to a record that describes the institution, stored at another RePEc handle server.
This information in this record is self-explanatory. Less apparent is the origin of these records.
Institutional registration
The registration of institutions is accomplished through the Economics Departments, Institutions and Research Centers (EDIRC) project, compiled by Christian Zimmermann, an Associate Professor of Economics at Unversité du Québec à Montréal on his own account, as a public service to the economics profession. The initial intention was to compile a directory of all economics departments that have a web presence. Many departments that have a web presence now; about 5,000 of them are registered at the time of this writing. All these records are included in RePEc. For each institution, data on its homepage is available, as well as postal and telephone information. For some, there is even data on the main area of work. Thus it is possible to find a list of institutions where—for example—a lot of work in labor economics in being done. At the moment, EDIRC is mainly linked to the rest of the RePEc data through the HoPEc[4] personal registration service. Other links are possible, but are rarely used.
Personal registration
HoPEc has a different organization from EDIRC. It is impossible for a single academic to register all persons who are active in Economics. One possible approach would be to ask archives to register people who work at the related institution. This will make archive maintainers' work more complicated, but the overall maintenance effort will be smaller once all current authors are registered. However, authors move between archives, and many have work that appears in different archives. To date, there is no satisfactory way to deal with moving authors. For this reason, the author registration is carried out using a centralized system.
A person who is registered with HoPEc is identified by a string that is usually close to the person's name and by a date that is significant to the registrant. HoPEc suggests the birth date but any other date will do as long as the person can remember it. When registrants work with the service, they first supply such personal information as the name, the URL of the registrant's homepage, and the email address. Registrants are free to enter data about their academic interests—using the Journal of Economic Literature Classification Scheme—and the EDIRC handle of their primary affiliation.
When the registrant has entered this data, the second step is to create associations between the record of the registrant and the document data that is contained in RePEc. The most common association is the authorship of a paper; however, other associations are possible, for example the editorship of a series. The registration service then looks up the name of the registrant in the RePEc document database. The registrant can then decide which potential associations are relevant. Because authentication methods are weak, HoPEc relies on honesty.
There are several significant problems that a service like HoPEc faces. First, since there is no historical precedent for such a service, it is not easy to communicate the raison d'être of the service to a potential registrant. Some people think that they need to register in order to use RePEc services. While this delivers data about who is interested in using RePEc services—and to whom we have been unsucessful to communicate that these services are free—it clutters the database with records of limited usefulness. Last but by no means least, there are all kinds of privacy issues involved in the composition of such a dataset.
To summarize, HoPEc provides information about a person's identity, affiliation and research interests, and links these data with resource descriptions in RePEc. This allows the identification of a person and the maintainance of related metadata in a timely and cost-efficient way. These data could fruitfully be employed for other purposes, such as maintaining membership data for scholarly societies or lists of conference participants.
13.4 The open library
This section attempts to find a general theory applicable to a wide set of circumstances in which systems similar to RePEc are desirable. I call this general concept the open library. The parallel to the open source concept is intentional. It is therefore useful to review the open source concept first.
The open source concept
There is no official and formal definition what the term, open source, means. On the Open Source Initative at http://opensource.org/ an elegant introduction to the idea is found:
The basic idea behind open source is very simple. When programmers can read, redistribute, and modify the source code for a piece of software, the software evolves. People improve it, people adapt it, people fix bugs. And this can happen at a speed that, if one is used to the slow pace of conventional software development, seems astonishing.
We in the open source community have learned that this rapid evolutionary process produces better software than the traditional closed model, in which only a very few programmers can see the source and everybody else must blindly use an opaque block of bits.
Open source software imposes no restrictions on the distribution of the source code required to build a running version of the software. As long as users have no access to the source code, they may be able to use a running version of the software, but they can not change the way that the software behaves. The latter involves changing the source code and rebuilding the running version of the software from the source code. Since building the software out of the source code is quite straightforward, software that has a freely available source code is essentially free.
Open Source and open library
The open source movement claims that the building of software in an open, collaborative way—enabled by the sharing of the source code—allows software to be built better and faster. The open library concept is an attempt to apply the concept of the open source to a library setting. We start off with the RePEc experience.
Within the confines of RePEc as a document collection, it is unrealistic to expect free distribution of a document's source code. Such a source code is, for example, the word processor file of an academic paper. If such a source code were available for others to change, then the ownership of the intellectual property in the document would be dissolved. Since intellectual ownership over scientific ideas is crucial in the academic reward system, it is unlikely that such source code distribution will take place. Within the confines of RePEc's institutional and personal collection, there is no such source code that could be freely shared.
To apply the open source principle to RePEc we must conceptualize RePEc as a collection of data. In terms of the language adopted by the open source concept, the individual data record is the "source code". The way the data record is rendered in the user interface is the "software" as used by the end user. We can the define the open library as a collection of data records that has a few special properties.
The definition of the open library
An open library is a collection of data records that has the following characteristics:
Every record is identified by a unique handle. This requirement distinguishes the library from an archive. It allows for every record to be addressed in an unambiguous way. This is important if links between records are to be established.
The syntax in all records of field names and field values is homogeneous. This constraint causes the open library to appear like a database. If this requirement were not present, all public access pages on the Web would form an open library. Note that this requirement does not constrain the open library to contain a homogeneous record format.
The documentation of the record format is available for online public access. For example, a collection encoded in MARC format would not qualify as an open library because access to the documentation of MARC is restricted. Without this requirement the cost of acquiring the documentation would be an obstacle to participation.
The collection is accessible on a public access computer system. This is the precondition to allow for the construction of user services. Note that user services may not necessarily be open to public access.
Contributing to the collection is without monetary cost. There are of course non-monetary costs to contribute to the open library. However the general principle is that there is no need to pay for either contributing or using the library. The copyright status of data in an open library should be subject to further research.
The open library and the Open Archive
Stimulated by work of Van de Sompel, Krichel, Nelson, et al. (2000), there have been recent moves towards improving the interoperability of e-print archives such as arXiv.org, NCSTRL, and RePEc. This work is now called the Open Archive Initative, see http://www.OpenArchives.org . The basic business model proposed by the OAI is very close to that of the RePEc project. In particular, the open archive technical protocols allow data provision to be separated from data implementation, a key feature of the open library model as pioneered by RePEc since 1997. In addition, because of their ability to transport multiple data sets, the open archive protocols allow for several open libraries to be established on one physical system.
The conceptual challenge raised by the open library
The open library as defined in Subsection 13.4 may be a relatively obvious concept. It certainly is not an elaborate intellectual edifice. Nevertheless, the open library idea raises some interesting conceptual challenges.
Supply of information. To me as a newcomer to the Library and Information Studies (LIS) discipline, there appears to be a tradition of emphasizing the behavior of the user who demands information rather than the publisher—I use the word here in its widest sense—who supplies it. I presume this orientation comes from the tradition that almost all bibliographic data were sold by commercial or not-for-profit vendors, just as the documents that they describe. Libraries then see their role as intermediaries between the commercial supply and the general public. In that scenario, libraries take the supply of documents and data as given.
The open library proposes to build new supply chains for data. If all libraries contribute metadata—data about data—about objects that are local to them—what that means would have to be defined—then a large open library can be built.
An open library will only be as good as the data that contributors give to it. It is therefore important that research be conducted on what data contributors are able to contribute; on how to provide documentation that the contributor can understand; and on understanding a contributor's motivation.
Digital updatability. For a long time, libraries could only purchase material that is essentially static. It might decay physically, but the content is immutable. The advent of digital resources provoked a debate. Because they may be changed at any time, digital resources may be used for more than the preservation of ideas. Traditionally inclined libraries have demanded that digital resources be like non-digital resources in all but appearance, and view the mutability of digital data more as a threat than as an opportunity. The open library, however, is more concerned with digital updatability than preservation. Clearly, this transition from static to dynamic resources poses a major challenge to the LIS profession.
Metadata quality control. In the case of a decentralized dataset, an important problem is to maintain metadata quality. Some elements of metadata quality cannot be controlled by a computer. For example, each record must utilize a structure of fields and values associated with these fields to be interoperable with other records. In some cases the field value only makes sense if it has a certain syntax. This is the case, for example, with an email address. One way to achieve quality control is through the use of relational metadata. Each record has an identifier. Records can use the identifiers of other records. It is then possible to update elements of the dataset in an independent way. It is also simple to check if the handle referenced in one record corresponds to a valid handle in the dataset. Highly controllable metadata systems are an important research concern related to the open library concept.
13.5 Conclusions
To my knowledge, Richard Stallman was the pioneer of open source software. In 1984, when he founded the GNU ("GNU is not UNIX") project to write a free operating system to replace Unix, few people believed that such an operating system would come about. Building GNU took a long time, but in the late 1990s, the open source movement basically realized Stallman's dream. My call for an open library may face similar skepticism, but the obstacles it faces are fewer and less daunting than those faced by the open source movement:
The operating system of a modern computer is far less complex than that of a metadata collection.
Computer programming is a highly profitable activity for the individual capable of doing it; therefore the opportunity cost of participating in what is essentially an unpaid activity is much higher. These costs are much lower for the academic or the academic librarian who would participate in an open library construction.
A network effect arises when the open library has reached a critical mass. At some stage the cost of providing data is much smaller than the benefit—in terms of more efficient dissemination—of contributing data. When that stage is reached, the open library can grow without external public or private subsidy.
It remains to be seen how great an inroad the open library concept will make into the library community.
Appendix A: The main use services [5]
BibEc at http://netec.mcc.ac.uk/bibec.html & WoPEc at http://netec.mcc.ac.uk/wopec.html provide static html pages for all working papers that are only available in print (BibEc) and all papers that are available electronically (WoPEc). Both datasets use the same search engines. There are three search engines: a full-text WAIS engine, a fielded search engine based on the mySQL relational database and a ROADS fielded search engine. The mySQL database is also used for the control of the relational components in the RePEc dataset. BibEc and WoPEc are based at Manchester Computing in Japan and the United States.
EDIRC at http://edirc.repec.org/ provides web pages that represent the complete institutional information in RePEc.
IDEAS at http://ideas.repec.org/templates.html provides an Excite index of static html pages that represent all paper, article and software templates. This is by far the most popular RePEc user interface.
NEP: New Economics Papers at http://nep.repec.org/ is a set of reports on new additions of papers to RePEc. Each report is edited by subject specialists who receive information on all new additions and then select the papers that are relevant to the subject of the report. These subject specialists are PhD students and junior researchers, who work as volunteers. On 14 March 2000, there are 2753 different email addresses that subscribe to at least one list.
The Tilburg University Working Papers & Research Memoranda service was at http://www.kub.nl/~dbi/demomate/repref.htm, but is now closed. The interface is archived at http://web.archive.org/web/20010305214804/cwis.kub.nl/~dbi/demomate/repref.htm
socionet at http://socionet.ru is a server in Russian. Its maintainers also provide archival facilities for Russian contributors.
INOMICS at http://www.inomics.com/ not only provides an index of RePEc data but also allows simultaneous searches in indices of other Web pages related to Economics.
HoPEc at http://authors.repec.org/ provides a personal registration service for authors and allows searches for personal data.
Appendix B: The ReDIF metadata format
The ReDIF metadata format is inspired by Deutsch et al. (1994) commonly known as the IAFA templates. In particular, it borrows the idea of clusters from the draft:
There are certain classes of data elements, such as contact information, which occur every time an individual, group or organization needs to be described. Such data as names, telephone numbers, postal and email addresses etc. fall into this category. To avoid repeating these common elements explicitly in every template below, we define "clusters" which can then be referred to in a shorthand manner in the actual template definitions.
ReDIF takes a slightly different approach to clusters. A cluster is a group of fields that jointly describe a repeatable attribute of the resource. This is best understood by an example. A paper may have several authors. For each author we may have several fields of interested: name, email address, homepage, etc.. If we have several authors then we have several such groups of attributes. In addition, each author may be affiliated with several institutions. Here each institution may be described by several attributes for its name, homepage etc.. Thus, a nested data structure is required. It is evident that this requirement is best served in a syntax that explicitly allows for it, such as XML. However when ReDIF was designed in 1997, XML was not available. While the template syntax is more humanly readable and easier to understand, the computer can not find which attributes correspond to the same cluster unless some ordering is introduced. Therefore we proceed as follows. For each group of arguments that make up a cluster, we specify one attribute as the "key" attribute. Whenever the key attribute appears a new cluster is supposed to begin. For example, if the cluster describes a person then the name is the key. If an "author-email" appears without an "author-name" preceding it, the parsing software aborts the processing of the template.
Note that the designation of key attributes is not a feature of ReDIF. It is a feature of the template syntax of ReDIF. It is only the syntax that makes nesting more involved. I do not think that this is an important shortcoming. I believe that the nested structure involving the persons and organizations should not be included in the document templates. What should be done instead is to separate the personal information out of the document templates into separate person templates. This approach is discussed extensively in the main body of the paper.
ReDIF is a metadata format that comes with tools to make it easy to use in a framework where the metadata is harvested. A file that is simply harvested from a computer system could contain any type of digital content. Therefore the harvested data must be parsed by a special software that filters the data. This task is accomplished by the rr.pm module written by Ivan V. Kurmanov. It parses ReDIF data and validates its syntax. For example, any date within ReDIF has to be of the ISO8601 form yyyy-mm-dd. A date like "14 Juillet 1789" would not be recognized by the ReDIF reading software and not be passed on to application software that a service provider would use.
The rr.pm software uses a formal syntax specification redif.spec . This formal specification is itself encoded in a purpose-built format code-named spefor . Therefore, it is possible for ReDIF-using communities to change the syntax restrictions or even design a whole new ReDIF tag vocabulary metadata vocabulary from scratch.
Notes
† The work discussed here has received financial support by the Joint Information Systems committee of the UK Higher Education Funding Councils through its Electronic Library Programme. A version of this paper was presented at the PEAK conference at the University of Michigan on 2000 03 24. I am grateful to Ivan V. Kurmanov for comments on that version. In March 2001, I revised and updated the paper following suggestions by Jeffrey K. MacKie-Mason and Emily Walshe. Neither of them bear responsibility for any remaining errors. This paper is available online at http://openlib.org/home/krichel/salibury.html.
1. Reports of research results in research "papers" form the bulk of academic digital or digitisable data. I refer to these as documents.
2. I suppress the Abstract: field to conserve space.
3. I leave out a few fields to conserve space.
4. HoPEc stood initially for Home Page Papers in Economics, but this would be totally misleading now.
5. I list them by order of historical appearance. The "Tilburg University working papers & research memoranda" service is operated by a library-based group that has received funding from the European Union. INOMICS is operated by the Economics consultancy Berlecon Research. All the other user services are operated by junior academics.