Economics and Usage of Digital Libraries: Byting the Bullet
Skip other details (including permanent urls, DOI, citation information) :This work is protected by copyright and may be linked to without seeking permission. Permission must be received for subsequent distribution in print or electronically. Please contact : [email protected] for more information.
For more information, read Michigan Publishing's access and usage policy.
IV. Building and Using Digital Libraries
14. Building and Using Digital Libraries
The forces that converged in the 1990's were extraordinary in loosening the physical and temporal constraints on information. Technology tools and capacity grew rapidly. Network infrastructure became more robust. Information in digital form—both converted and born digital—came online in unprecedented volume and with new functionality. The impact on library organizations and their users was significant, yet far from straightforward. The volatile mix of digital resources, organizations, and individual behaviors set in motion shifts in expectations, in roles of stakeholders, and in the distribution of costs. The authors in this section explore, through project and program descriptions, the experiences of exemplary digital library development.
In his paper, JSTOR's Kevin Guthrie notes that old metrics, methods, and intuition are not reliable guides for our sense of value in evaluating digital libraries. Projects featured, such as JSTOR, Early Canadiana Online (Kingma) and Columbia University's Online Books (Kantor, Summerfield, and Mandel) drive home the changing notions of value. These studies document the increased access to information that is realized through digital content, and also shed light on the potential that is created for reduced costs, new research capability, and innovation. As Kantor points out, there is also a symbolic utility. Digital libraries may not always perform as hoped; nonetheless, they have significant utility. Digital libraries, despite early clumsiness, have stimulated considerable user interest and exploration.
The realization of digital potential within library contexts is captured here in the organizational descriptions of Drexel University (Montgomery), University of Louisville (Rader), and British Telecommunications (Alsmeyer). Notably, the capabilities of digital libraries prompted these organizations to rethink the existing configuration of resources and shift focus to new user services. Although libraries may realize cost savings, there are also new costs associated with technology infrastructure and more expert staff. Mission-based questions also arise as digital libraries take shape. Do libraries still have an archival role if publishers manage digital collections? Can or should libraries add value to the provision of intellectual access that publishers offer to their digital content? In the disintermediated context of online resources, how (and by whom) are users supported? As several authors note, users are not uniformly able nor ready to exploit the capabilities of digital media, so the library's instructional and outreach roles may become far more critical.
The descriptions presented also suggest unfolding tensions between stakeholders. Libraries and publishers have yet to agree on policies governing digital content related to resource sharing, course environments, and archives. While users and libraries may derive new value from digital content, publisher arguments for increased costs are seldom acceptable to the library community. Tensions are also evident between libraries and their users. At a time of constrained or modest growth in resources, libraries are often unable to meet user expectations for both the traditional resources and new digital titles. And as the complexity of the digital environment grows, demands for greater integration and interoperability between and among library systems, publisher resources, and user tools will also grow, further entangling responsibilities and interests of each stakeholder.
Clearly, agreement on the differentiation of roles and allocation of costs in a complex, interdependent environment will be increasingly central to the evolution of digital libraries. Digital library developers and directors face several questions: Who is responsible for what functions? Who adds value and at what cost? How flexibly can resources be shifted to accomplish new roles? How will roles and responsibilities be sustained over time as the environment takes shape? The chapters that follow capture some answers found in early instances of digital programs. While the digital library environment has matured as more recent developments have taken hold, these fundamental questions remain.
15. The Economics of Digital Access: The Early Canadiana Online Project[†]
This project examined the economics of the production, storage, and distribution of information in print, microfiche, and digital format for the Early Canadiana Online Project. The Early Canadiana Online Project digitized over 3,300 titles and over 650,000 images of the Canadian Institute of Historical Microreproductions collection of pre-1900 print materials published in Canada. An economic model was developed of the stakeholders—-publishers, libraries, and patrons—-and the costs to stakeholders of the three formats—-print, microfiche, and digital. A detailed cost analysis was performed to estimate the costs of each of the three formats. The results of this cost analysis can be used as benchmarks for estimating the costs of other digitization projects. The analysis shows that digital access can be cost-efficient so long as there are a number of libraries that receive sufficient benefit such that they are willing to share the costs of digitization and access.
15.1 Introduction
Digital texts in a networked environment hold the promise of lower-cost access to information by a greater number of users than print texts. Projects such as The Making of America, Project MUSE, JSTOR, and the Early Canadiana Online project investigated in this study offer access to digital texts over the Internet to millions of potential users. These digital projects also offer the promise of lower costs by avoiding the cost of printing and shipping multiple copies of a text for patrons. In theory, once the fixed costs of digitization are incurred, the marginal cost of providing an additional electronic copy is zero.
The potential benefits of digital access are considerable. Patrons who previously traveled to a repository of rare books or a microfiche room at a research library can instead access historical information from their desktops. This dramatically decreases the time and effort patrons spend traveling to the source of the information. This also increases the potential benefits to new patrons who can now access historical texts that previously were only available at sites too distant for them to consider. The economic question is whether the cost of digitization is lower than this stream of future benefits.
This study examines the economics of digital, microfiche, and print access for the Early Canadiana Online project. The costs for these three methods of access include the costs of archiving and providing access to original print materials, microfiche copies of these materials, and digital copy accessible over the Internet. This study examines the production and storage costs and opportunity costs to patrons for digital access to the Early Canadiana online collection.
Data collected and analysed for this study will be important in determining the level of investment for future digitization projects of historical materials. Other studies at Cornell University (Kenney, 1997), Yale University (Conway, 1996), and Columbia University (Kantor et al., this volume) investigated the costs of online texts. The study at Columbia University described in this text examines the cost of using publisher-provided electronic files to produce text in HTML format. The studies at Cornell University and Yale University, like this study, examine the cost of digitizing print or microfiche. The Cornell and Yale studies measure the marginal costs per image of primarily in-house scanning. This study includes all costs associated with the production, cataloging, and sales of texts in microfiche or digital format. The cost estimates in this study are considerably higher than the marginal cost estimates in previous studies but are a more accurate estimate of the full costs of the production of microfiche or digital projects from start to finish.
This study also investigates the benefits of digitization. The primary benefit of these digital projects is the return to patrons from accessing these materials. Once digitized, stored, and made accessible over a campus network or the Internet, the materials are more easily accessible to more patrons. Patrons who previously had to travel to a library with the original or microfiche copies of the materials can now view them online from home or the office. Analysis of the data collected on use of the digital images, microfiche, and original texts will be helpful in predicting the use and benefits of other digital projects of historical materials. This will enable researchers to determine the return to investment of future digital projects.
The remainder of the paper is organized as follows; first, an economic model of digital access which includes a stakeholder analysis is examined. This provides a general framework for analyzing the costs of print, microfiche, and digital access. Second, a cost analysis of the Early Canadiana Project is presented. This cost analysis includes estimates of the cost of print, microfiche, and digital access; an analysis of the economies of scale for digitization studies; and an analysis of the institution and user cost of access to digital information.
15.2 The Costs and Benefits of Digital Access to Information
Digitization of information provides lower costs than do print products for the production, distribution, and access to information for producers, consumers, and intermediaries. Digital access results in on-demand access to information for patrons or consumers, lowering the opportunity cost of access. Consumers can more easily view digital information over networks without having to spend time traveling to the library. The low cost of web development and word processing lowers the cost of producing information in digital form. Producers do not have to print and distribute copies but can instead mount digital products on a local server enabling network distribution. Likewise, digital access provides lower cost distribution by intermediaries such as libraries, saving the costs of storing and circulating printed materials.
The Early Canadiana Online project includes all three of the stakeholders in the production and consumption of information. Libraries with rare book collections such as the University of Toronto Library and the Laval University Library provide access to original print texts. These libraries also provide access to the Canadian Institute for Historical Microreproduction's (CIHM) microfiche copies of these print materials. In this instance, CIHM is the producer of the information while the libraries are intermediaries in providing access to patrons. Patrons of this information include students and faculty accessing the CIHM collection whether in print, microfiche, or digital form.
Patrons of Early Canadiana Online
With the creation of Early Canadiana Online, patrons have three possible methods for accessing this information: digital, fiche, or original copy. Patrons incur a cost of access depending on their choice of method of access. These costs can be divided into fixed and variable costs. Variable costs are costs incurred each time information is reproduced or retrieved. Fixed costs are costs incurred regardless of the number of items retrieved.
A patron's choice of access will depend on which method provides lower total costs. A patron viewing images from a single text may have lower fixed and marginal costs in using the print than in using the fiche or digital formats. Accessing the print may require only travel to the library, selection of the text, and turning the pages. There are no learning costs or costs of expensive machines or network connections specifically related to using a print item. Accessing the fiche may require travel, selection, and determining how to use the fiche. Accessing the digital format requires the use or purchase of a computer with a network connection as well as determining how to search and use the digital collection.
Microfiche or digital access is more likely to have a lower total cost when more than a single text is used. While multiple texts may be found in the same library, the fiche collection may contain items not found in a library's print collection. Accessing print items from another library would require the patron to incur an additional cost of traveling to a second library. Learning how to use the microfiche collection is likely to have a lower total cost than traveling to more than one library in order to use the needed items in print form. Likewise, digital access may provide access to more images at a lower total cost than fiche or print.
Figure 15.1 illustrates total patron costs for access to information in the three formats. Figure 15.1 assumes that the fixed cost of digital access is greater than the fixed cost of fiche, which is greater than the fixed cost of print. Figure 15.1 also assumes the number of images available in digital form is greater than the number available in fiche at one library, which is greater than the number of print images available at one library.
The break-even points represent the levels of use at which two methods of access have the same total cost. Initially a patron will have a lower total cost from print texts. As use of images increases and the library's print collection is exhausted, a patron must incur the additional fixed costs of traveling to another library. At this point the total cost of fiche access is lower than the total cost of print access. As use continues to increase, the total cost of digital access becomes lower than the total cost of fiche and print access.
If Figure 15.1 accurately reflects the fixed and variable costs of access then high-frequency users who require more access to more digital images are more likely to use digital assets. These users incur a high total cost of access to the digital copy but gain greater access to more information. Patrons desiring only a few images from a single text are more likely to look at the original, print copy if it is available in their library. Mid-level users are more likely to use the fiche.
However, what may be an inaccurate assumption in Figure 15.1 is that digital access has higher fixed costs and equal marginal costs to fiche or print. The fixed costs of digital access include the learning costs patrons unfamiliar with digital copy must spend, as well as the costs of having access including a personal computer with network access. Once these costs are incurred patrons may have a fixed cost of digital access less than the fixed costs of fiche or print access. Patrons can also avoid the fixed costs of traveling to the library if they have at-home or office access to the network. The marginal costs of digital access may also be less than print or fiche. Patrons familiar with digital access are less likely to print materials, instead saving electronic copy on a disk or drive. If the marginal cost and fixed costs for digital access are lower than for print and fiche then digital access will have a lower cost at all levels of access and patrons will only access the information online.
Producers
The initial promise of digital information was that production costs would decrease when the costs of printing and distribution were replaced in the networked environment. These lower costs have led to an increasing number of free electronic journals published by faculty at colleges and universities. However, digital production also has costs. HTML programming costs, patron service costs, and production in both print and digital formats can increase the costs of production. Traditional print publishers have found that additional costs of production are necessary to publish a journal in both print and digital format, increasing subscription costs to libraries that require access to both forms.
Digital costs are lumpy, with a large fixed cost of production, and zero marginal cost to produce an additional digital copy over the Internet. However, digital copies have the same, if not greater, patron service costs as print and microfiche. Patrons need service in any environment. In the digital environment patron services include the cost of server maintenance, the cost of updating web pages, and the cost of answering electronic mail from patrons who are having difficulties with access. Since the networked environment allows more patrons to access the information than at the library, the cost of patron service may be greater for the digital information producer. Unlike print publications that are produced and then sent to information intermediaries, customer service in the digital environment requires the information producer to provide direct service to patrons.
Pricing in the networked environment can also be a difficult problem for information producers. Classic economic theory would indicate that the price of access should be set equal to the marginal cost of zero to achieve economic efficiency. However, a zero price does not allow for information producers to recover the costs of production. Access to digital products will be sold above the marginal cost of reproduction in the same way that books, journals, and other print products are sold above the marginal cost of an additional copy. This pricing, based on the value of the information good to consumers rather than the cost of providing an additional copy, is necessary in the networked environment to recover the costs of production.
The role of the library as intermediary is critical in the pricing of information. Libraries purchase information materials and provide access to patrons typically without an access fee. Patrons efficiently use the information since, in the networked digital environment, providing the information has no marginal costs and patrons are not charged for access. The charge to libraries covers the cost of production of the information while the absence of a charge for patrons insures economic efficiency.
Intermediaries
Libraries serve a crucial economic role as intermediaries in the distribution of and access to information. Libraries serve as a point of collective demand for information products, providing access to information as a public good to patrons.
The economic role of the library as an information intermediary is to estimate the collective demand of patrons and purchase and provide access to information goods. The collective value of any information product in a library is the sum of the value or benefit all patrons receive from it. This can be estimated by the number of times the information product is used multiplied by an estimate of the benefit from each use. If this collective value exceeds the purchase price then it is economically efficient for the library to purchase it and provide access to patrons. Additional access should be priced at zero to insure economic efficiency.
For digital products there are two possible benefits to library patrons. If the library does not subscribe to the print or fiche copy of the information, then patrons benefit by accessing information previously not available. If patrons have access to the fiche or print original, and access to the digital copy is available over the Internet or campus network, then the benefit of digital access is equal to the value of time saved from using the digital copy from the home or office instead of the fiche or original at the library.
Individual and Shared Costs and the Role of Information Intermediaries
The costs for information products and services can be categorized into private and shared costs or the costs of individual demand and public demand for a good. Private costs are the costs to an individual or consumer of his purchase of a good or service. Private costs include the costs of a personal subscription, personal home computer, photocopying papers, and downloading and printing information from the Internet. Shared costs are the costs of information products purchased for public use. Shared costs include the costs of library goods and services. The costs of library goods and services are shared among patrons through tuition payments, tax revenue, membership fees or other sources of revenue used to support the library.
Information intermediaries also have what can be considered private and shared costs. A subscription to a print or electronic database can be considered a private cost to the library, paid from the library's budget, although it is a shared cost to the library's patrons. The fixed cost of producing the database or print journal that is purchased by several libraries is a shared cost among the subscribing libraries. Each subscribing library pays for a share of the fixed costs of production.
The costs of digital information in a networked environment are shared. On the Internet, the costs of reproduction and distribution are zero. The fixed costs of production and storage are, by definition, shared among the patrons or information intermediaries that purchase access.
Market Forces: Demand and Supply
Digital information in a networked environment results in lower costs of reproduction and distribution for producers and intermediaries and lower opportunity costs for users. The lower costs contribute to an increase in the supply of information. Lower costs also result in more producers providing more methods of access to more information.
Lower costs mean new information products are produced. New publishers including universities, libraries, faculty and students find that they can produce web-based journals using low cost desktop publishing tools. This has led to an explosion in the supply of new electronic journals.
Paradoxically, this explosion has raised some costs while lowering others. Although the costs of many new electronic journals are relatively lower than those of print journals, for libraries this ever-increasing supply of digital information can dramatically increase the total costs. As the number of subscriptions purchased rises so will the staff costs involved in cataloging these journals, both elements impacting library budgets. Patrons find that the opportunity cost of accessing any given source of information has declined, while the overwhelming increase in the number of information products results in more time being spent on digital information than was spent consuming print products. The digitization of an information product previously made available only in print or microfiche can lower the cost per unit of production, the cost per unit for subscription by libraries, and the cost of access to the information by patrons, while at the same time dramatically increasing the supply of information products. This increase in the number of information products results in substantially higher total costs of production, subscription and access to information.
15.3 Cost Estimates of Early Canadiana Online
Estimating the costs of digital projects is necessary to determine efficient investments in digitization of print or microfiche information products. The primary goals of this project are to estimate and compare the costs of three methods of information delivery; print, microfiche, and digital. Data from the University of Toronto, Laval University, and the Canadian Institute for Historical Microreproductions was collected to estimate these costs. Data on the cost of construction of a new electronic library at the University at Albany was also collected for current library construction and maintenance costs.
One significant contribution of the cost estimates in this paper is that average costs are estimated for the production, storage, and use of information in print, microfiche, and digital formats. Previous estimates have either focused on one type of cost—production, storage, or use; one format—print, fiche, or digital; or have focused on the marginal costs rather than the full costs of production. In this paper all costs for each format are included.
The Cost of Print
Table 15.1 shows the cost estimates for book storage and access. These costs are based on the cost of the Thomas Fisher Rare Book Library at the University of Toronto. Construction costs are based on the 1999 library construction project at the University at Albany. Special environmental controls used in a rare book library imply that the construction costs in Table 15.1 may underestimate the actual construction costs. All costs are shown in Canadian dollars (CD).
Cost | Cost/volume | Cost/use | |
Construction, utilities and maintenance | $1,586,056 | $3.17 | $72.51 |
Salaries | $1,105,031 | $2.21 | $50.52 |
Equipment and supplies | $255,799 | $0.51 | $11.69 |
TOTAL | $2,946,885 | $5.89 | $134.72 |
The cost of construction, utilities, and maintenance is comparable to an estimate of $4.68CD (Bowen, 1998) and a 30-year amortization of $6.33CD reported in this book (Kantor et al., this volume). However, the cost per use of $134.72CD is significantly higher than the $1.50CD cost of retrieval previously reported (Bowen, 1998), the $3CD cost of retrieval for the New York Public Library and $6CD for the Harvard Depository Library (Lesk, 1998), or the $9CD maximum retrieval cost reported elsewhere (Getz, 1997). In Table 15.1, the cost per use is derived by dividing total cost by the number of annual requests for books. This inflates the cost per retrieval by adding the costs of storage into the equation. However, it is important to note that the "service" of a library is the use of its materials. All costs when divided by the use of those materials gives an average cost for service which will be higher than separating out only part of these costs for retrieval.
For a comparison with other estimates of retrieval costs, an estimated 80 percent of salaries at the Thomas Fisher Rare Book Library are for access. Taking 80 percent of salary costs yields an estimate of $40CD per transaction for labor, still significantly higher than other estimates. However, a rare book library has concerns of preservation that require additional staff care and monitoring for patron access. In addition, this estimate includes the total cost of administration, vacations, and benefits for employees rather than the marginal cost of retrieval based on a staff member's time spent multiplied by his salary.
Table 15.1 does not include the cost of purchasing a book. This is important although it will be a small percentage of total costs once the purchase price is amortized over the expected life of storage and use of the book. For example a rare book that costs $500 but is expected to last 100 years in storage has an annual cost, when amortized, of $4.80. Table 15.1 also does not include the value of the land. This can be significant but is different depending on the location of the library.
Figure 15.2 shows the categories of rare book annual storage and access costs as a percentage of total costs. As expected, the largest component of total costs for books is the cost of space to store them.
The Cost of Microfiche
The annual costs of microfiche storage and access at the University of Toronto are shown in Table 15.2. Cost per volume is based on a 216-page text, the average size of a text digitized in the Yale Open Book Project.[1] As with Table 15.1 these costs represent the average cost per unit for storage or access. Just as the cost of purchasing a book is not included in Table 15.1, the cost of purchasing the microfiche is not included in Table 15.2.
Both the cost of storage per volume and the cost per use are significantly lower for microfiche than for rare books. This is not surprising since microfiche is intended to provide access to and storage of information at a lower cost than print.
The cost per use is derived by dividing the total costs of microfiche storage and access by total use. As with Table 15.1, this assumes that the value of microfiche storage is for access to patrons. If salaries and equipment are the only costs for access, and 80 percent of salaries are for access, then the cost per transaction can be estimated as $3.75CD, which is comparable to estimates of the costs of book retrieval. Both retrieval functions are similar in that staff must locate, check out, and re-shelve the requested materials.
Cost | Cost/volume | Cost/use | |
Construction, utilities and maintenance | $170,527 | $0.06 | $2.71 |
Salaries | $251,602 | $0.09 | $4.00 |
Equipment and supplies | $34,423 | $0.01 | $0.55 |
TOTAL | $456,552 | $0.16 | $7.26 |
Figure 15.3 illustrates that microfiche costs are more salary intensive. Salaries constitute a larger percentage of the costs of microfiche than in the case of rare books. Rare books take up more space and therefore have a higher percentage of costs in construction, utilities and maintenance.
Table 15.2 does not include the subscription price of the microfiche to the library. These costs are part of the economic cost of producing microfiche and are shown in Table 15.3: The Costs of Microfiche. By counting these costs in the production but not the purchase of the microfiche, we avoid double-counting these costs. The costs of microfiche production are shared costs. Library subscription fees, grants and donations are used to jointly finance the production of the microfiche as a public good.
Table 15.3 includes all economic costs of microfiche production including the value of space The Canadian Institute for Historical Microreproduction uses at the National Library of Canada. While this space is donated to CIHM, it still represents an economic cost of producing microfiche. As with previous tables, the average cost of production is calculated by dividing total cost by the number of units. All costs are in Canadian Dollars.
Cost | Cost/fiche | Cost/image | Cost/volume | |
Master Copies | $150,000 | $16.07 | $0.22 | $46.88 |
Salaries | $602,932 | $64.58 | $0.87 | $188.43 |
Equipment and supplies | $125,880 | $13.48 | $0.18 | $39.34 |
Construction, utilities and maintenance | $187,066 | $20.04 | $0.27 | $58.46 |
TOTAL (shared costs) | $1,065,878 | $114.17 | $1.54 | $333.11 |
cost of microfiche reproduction and sales | $236,092 | |||
TOTAL COST | $1,301,970 | |||
Total cost per library (30-42 copies) | $43,399-$30,999 | $4.65-$3.32 | $0.06-$0.04 | $13.56-$9.69 |
Annual cost per library (30-42 copies) | $2,254-$1,487 | $0.22-$0.16 | $0.01-$0.00 | $0.65-$0.46 |
The first four rows of Table 15.3 show the cost of producing master copies of microfiche. The cost of producing master copies of microfiche is $114.17CD per fiche, $1.54CD per image, or $333.11CD per 216-page volume. This is the cost of producing a set of master copies that are then used to produce additional microfiche copies for distribution to subscribing libraries. The cost of the master copies is a shared cost for all subscribing libraries.
If we compare the cost per volume of creating and storing a master microfiche copy relative to creating and storing a print copy, microfiche is expensive to create but has significant savings in storage ($0.16CD per volume per year) relative to print ($5.89CD). However, at an annual savings of $5.73CD per year, it would take over 50 years to cover the cost of creation ($333.11CD) if the master copies were created solely for the use of one library.
Microfiche is produced by CIHM, not to have a single copy, but to provide multiple copies to libraries that would not otherwise have access to early Canadian literature. With a limited number of print copies, microfiche becomes a cost-effective alternative for providing access. CIHM produces several copies of each microfiche to sell as subscriptions for libraries throughout Canada, the United States, and the rest of the world. By purchasing a subscription, these libraries share the costs of the original microfiche production.
CIHM produces about 30 copies each year for library subscriptions and additional copies of individual microfiche at an additional cost of $236,092CD. The last two rows in Table 15.3 show how these costs are shared among the subscribing libraries. If the full cost of microfiche production is averaged over the 30 copies, the cost of annual production is $43,399CD per library. This includes the shared costs of production plus the costs of making copies. If an additional 12 copies of each fiche are sold, the average cost is $30,999CD per library.
The average cost per fiche, per image, and per volume for 30-42 copies are shown in the final three columns of Table 15.3. The sharing of the full costs of production among subscribing libraries reduces the cost to $0.04CD-$0.06CD per image or $9.69CD-$13.56CD per volume. This compares favorably to the cost of each library acquiring a printed manuscript. At an annual savings of $5.73CD per volume for storage and access relative to print for each library, it takes 1.7-2.4 years for the microfiche to cover the costs of creation ($9.69CD-$13.56CD).
Once produced, it is anticipated that a microfiche copy of a text will last for 100 years. The purchase of microfiche is an investment in an archival copy of materials that is expected to provide access to patrons to the information for many years. If the cost of the microfiche is spread out or amortized over a 100-year period, then the annual cost of microfiche production is only $0.65CD-$0.46CD per 216-page volume per year. When this is added to the cost of storage from Table 15.2, the annual cost comes to $0.81CD-$0.62CD per volume per year for producing, storing, and providing access to a text in microfiche format.
These costs indicate that when microfiche is produced in large numbers to accommodate several libraries, it costs significantly less to produce, store, and provide access to microfiche than to books. This shared cost per library declines further if the number of libraries acquiring subscriptions increases. In addition, the CIHM microfiche subscription provides access to a larger collection of texts than is likely to exist in any single library of rare books. These cost estimates show that microfiche is the more cost-effective alternative to library storage of print to provide patron access to out-of-print texts.
The Cost of Digital
The previous section showed that microfiche is a cost-effective alternative to print. Digitization of texts may be able to provide even greater savings relative to microfiche and print. Unlike print and microfiche, which must be produced and delivered to a library, digital texts have the advantage of being stored remotely but accessed globally via the Internet. The cost of reproduction and distribution of digital information in a networked environment is zero. The only costs are the one-time fixed costs of producing and the annual fixed costs of storing and maintaining the data. These fixed costs can be shared by subscribing libraries that, in theory, could drive the cost per library to a significantly lower level than the cost of microfiche.
In the Early Canadiana Online Project microfiche was converted to digital format. Microfiche was sent to Preservation Resources for scanning and the University of Michigan for optical character recognition (OCR). Cost estimates shown in Table 15.4 are based on contractual costs for scanning and OCR.
The total costs for production are $236.08CD per title or $1.20CD per image. Costs in the second and future years for digital storage and access are $35.76CD per title or $0.18CD per image. This includes the cost of salaries for maintaining the ECO Project database (1.5 full-time equivalents for administration, server and database maintenance) and annual costs of hardware storage. Although the cost of producing digital copy from fiche is less than the cost of microfiche, the cost of storage and access for digital, in this project, is more expensive. This is the result of costs averaging over a smaller number of available digital images that will be higher than the average cost per fiche in a university micro-text room that contains hundreds of thousands of microfiche.
Cost | Cost/title | Cost/image | Cost/volume | |
Digitization | $439,548 | $132.87 | $0.67 | $145.67 |
OCR | $159,098 | $48.09 | $0.24 | $52.73 |
Salaries | $153,264 | $46.33 | $0.24 | $50.79 |
Equipment & supplies | $7,975 | $2.41 | $0.01 | $2.64 |
Construction, utilities & maintenance | $21,053 | $6.36 | $0.03 | $6.98 |
TOTAL | $780,938 | $236.08 | $1.20 | $258.82 |
Annual costs of storage & access | $118,290 | $35.76 | $0.18 | $39.20 |
There are two factors that significantly lower the average cost per image of digital production and storage: the number of libraries subscribing to the database and the number of images stored. The production costs of the digital images are fixed costs that are constant regardless of the number of libraries that subscribe to the database. If 30 libraries subscribe to the database, the cost per library is $8.63CD per volume. An increase in the number of libraries or other organizations that subscribe to the database will decrease the "cost-share" for each organization. In addition, the annual cost of storage and access to the database is also a "shared" cost. If this cost is shared among 30 libraries it decreases to $1.31CD per volume per library per year.
As the number of images available in the ECO Project increase, the cost per volume will also decline. Space costs (utilities, construction, etc.) and salaries for maintaining and updating the database and server constitute 97 percent of the costs of storage and access. These costs are incurred regardless of the number of images. Storage costs per volume are $0.90CD of annual costs. As the number of images in the database increase, total storage costs will increase, but the average cost will continue to decline.
The cost estimates from Table 15.4 can be compared to similar recent studies estimating the cost of digital production. Estimates from studies at Cornell University and Yale University are shown in Table 15.5. (Cost estimates from Cornell and Yale are shown in Canadian dollars for comparison. Cost per volume is based on a 216-page text.)
Cost/Image | Cost/Volume | ||
Early Canadiana Online | $1.20 | $258.82 | Average cost estimate |
Yale | $0.40 | $83.96 | Marginal cost estimate |
Cornell | $0.43 | $91.37 | Marginal cost estimate |
These earlier studies show a significantly lower cost of digitization. The Cornell study created digital copies from paper while the study at Yale created digital copy from microfiche. The major difference between the Early Canadiana Online Project and these earlier studies is the method used for estimating costs. Both the Yale and Cornell studies estimated costs by timing staff scanning pages of print or microfiche. These studies are based on the marginal cost of scanning images and producing digital copy. The cost estimates for the ECO project are average costs based on dividing total project costs by the number of images, titles, or volumes. The ECO Project cost analysis includes the full cost of producing digital copies and mounting the database on a server for access over the Internet. The ECO project is larger in scope, number of titles, and number of images. ECO costs include all salaries, space costs, and outsourcing of digitization and OCR. Therefore this cost analysis should be viewed as a liberal cost estimate of a large digitization project with Internet access to the database.
Economies of Scale
The ECO project scanned a larger number of titles and images than the projects at Yale University and Cornell University. The ECO project scanned 3,308 titles compared to the 1,270 titles scanned at Cornell or the 2,000 titles scanned at Yale. Table 15.6 compares fixed, variable, and total cost estimates for the three projects.
Annual fixed project costs (equipment & salaries) | Per image variable cost estimates | Total Cost Estimate | Extras | Project size | |
Yale University Project Open Book (film to digital, 1994/95, marginal cost estimate) | $142,420 | $0.182 (based on 600 volumes timed) | $221,177 | 432,000 images; 2,000 volumes | |
Cornell University (paper to digital to COM, 1994/96, marginal cost estimate) | $27,931 | $0.319 manual $0.288 auto (150 volumes timed) | $197,275 manual $183,353 auto | $80,417 (COM costs) | 450,000 images 1,270 volumes |
Early Canadiana Online (fiche to digital, 1998/99, average cost estimate) | $161,239 | $0.674 (digitization contract) | $600,787 | $159,098 (OCR) $21,053 (space) | 651,742 images 3,308 titles |
The variable cost estimates in Table 15.6 for the ECO project include only the cost of scanning the images. OCR, space, and other salary costs contribute to total costs. For comparison with the Yale and Cornell studies, however, the vendor's cost of providing digital access may be more relevant. If texts were digitized without OCR then the additional cost would be $0.674CD per page. The relative costs and size of the three projects are shown in Table 15.7 and Figure 15.4.
Figures 15.4 illustrates the increase in cost per image and cost per title between the three projects. This may show diseconomies of scale, i.e. an increasing average cost as output increases. Larger projects may require more staff or have a greater complexity of task that results in higher costs per unit. However, much of the difference shown may simply be the result of different methods of estimating costs.
Number of images | Cost/image | Number of titles | Cost/title | |
Yale University (fiche to digital, 1994/95, marginal cost est.) | 432,000 | $0.51 | 2,000 | $111 |
Cornell University (paper to digital to COM, 1994/96, marginal cost est.) | 450,000 | $0.42 | 1,270 | $155 |
CIHM Early Canadiana Online (fiche to digital, 1998/99, average cost est.) | 651,742 | $0.92 | 3,308 | $182 |
Cost of Access to Digital Information
The cost of access to digital information is very difficult to quantify. Access to digital information includes the personal computer, network connection, and space used by the patron. Since these are all fixed costs of access that a patron or library must incur regardless of what information is accessed, the marginal cost of accessing any image or database is zero.
We can attempt to quantify the average cost per use to the library of providing access to digital information. This is shown in Table 15.8.
Cost | Cost per internal use | Cost per use | |
construction, utilities & maintenance | $186,274.25 | $0.01 | $0.00 |
Salaries | $114,000.00 | $0.01 | $0.00 |
equipment & salaries | $398,000.00 | $0.03 | $0.01 |
TOTAL | $698,274.25 | $0.04 | $0.01 |
Table 15.8 includes the cost of computers within the library, staff to maintain the server and network, and the cost of space for each computer. Cost per use is shown in terms of internal use and all uses of library databases regardless of the source. Internal use is defined as the number of unique and significant hits to the library server which originate from within the library (0.3 million per week). Use is the number of hits regardless of source (1.2 million per week). Regardless of which definition of use is applied, access to digital documents comes at a very low average cost per use. This is significantly lower than the average cost per use for microfiche or rare books.
Table 15.8 also illustrates the importance of understanding the difference between total, average and marginal costs. Table 15.8, like previous tables, shows the total and average costs per use. The total cost of providing electronic access within a university library is significant, but the high level of use of terminals within the library result in a very low average cost per use. The marginal or additional cost for each patron's use is zero. All costs in Table 15.8 are fixed costs, incurred regardless of whether a patron uses a terminal or not. Investments in information technology within university libraries can be expensive although digital documents in a networked environment come at a zero marginal cost of distribution.
User Costs of Access
The final economic cost of access is the cost to the user. With print and microfiche the user must travel to the library to use the information. Any library will have only a limited collection of print titles. To read other titles in print from the collection a patron may have to travel to another research library. With the CIHM microfiche collection a research library can offer patrons access to a greater number of titles than are typically available in print, although the patron must still travel to the library to access the microfiche.
Digital copies are accessible to all patrons of subscribing libraries with a network connection. This increased accessibility of the collection to patrons may result in a greater number of subscribing libraries and greater access to the CIHM collection of materials.
The cost to patrons of using information is the opportunity cost of their time spent in acquiring and consuming it. The value of access to information by patrons is reflected in the demand for using the database. Hypothetical demand for use of Early Canadiana Online is illustrated in Figure 15.5.
In theory, if the user has a cost of time of $10 per use of a manuscript in a rare books library, he may only use the manuscript 5 times a month. If the patron's opportunity cost of time spent consuming the information decreases then use will increase.
Microfiche is easier and takes less effort to use than books in a rare book library. Microfiche delivery by library staff takes less time than retrieval of a rare book. Once a patron understands how to use a microfiche reader he can view several books with relative ease. In addition, patrons do not have to travel to another library to view early Canadiana texts if their library holds the entire CIHM collection on microfiche. If we assume that the cost to a patron of accessing an Early Canadiana text on microfiche is $5, then patron use of the microfiche will increase to 30 times a month.
Digital access lowers the opportunity cost of access to the information even further. It enables patrons to view the information from their personal computer in their home or office or from a computer terminal in the library. Instant access to a large collection of images from the CIHM collection means faster, searchable access to the images. Using Figure 15.5, if we assume that the opportunity cost of patron access is only $2 per access for digital images, patron use of digital access will increase to 50 uses a month.
To patrons the time savings from digital access has two parts. First, there is the value to patrons of lower cost access to images they would have traveled to the library to view on microfiche. If a patron would have used microfiche 30 times a month at a cost of $5 per use, and this cost declines to $2 per use in digital form, then this patron has a $3 lower cost of access for 30 uses, or has decreased his cost by $90 a month. Second, there are additional uses of digital access that provide additional benefits to patrons. These additional uses can be assigned an average value of $1.50 each, or one-half of the value of lower cost access to the first 30 uses a month. If use increases to 50, the additional 20 uses per month would provide a benefit to this patron of roughly $30. The total value to this patron would be $120, the $90 in lower costs plus the additional $30 in benefit from an increase in access.
During this study, patron use of the print, fiche, and digital collection was observed. Patrons were also asked questions about their use and travel time to the library. Annual use of the collection at the University of Toronto and Laval University increased from 2,984 for print and microfiche to an estimated 7,030 uses of the digital texts. Travel time to the library for print and microfiche patrons varied from less than 30 minutes to more than one day, with 90 percent of patrons needing one hour or less.
If we assume that digital access saves print and microfiche patrons 30 minutes of travel time and that the value of this time is $10 per hour, then the annual savings of 7,030 uses equals $25,035.[2] This represents a lower-end estimate of the savings from accessing the CIHM collection online versus traveling to the library to use the microfiche or print. Some patrons are likely to save more than 30 minutes of travel time. Other patrons are likely to have an opportunity cost of time greater than $10 per hour. Most significantly, use of the Early Canadiana Online collection is likely to increase as more scholars and students are made aware of it.
15.4 Conclusions
This project estimated the cost of access to information in print, microfiche, and digital format. The results include
The average cost of producing a 216-page book in digital format from microfiche is $258.82 (C$) plus any copyright fees. The annual cost of storage and access is $39.20 per book. In theory, these costs can be shared by the number of libraries and patrons that access the digital copy over the Internet significantly lowering the costs per library.
The average cost of producing a book on microfiche is $333.11. Given the number of number of copies sold by CIHM, the cost per library is between $9.69 and $13.56 per book. The annual cost of storage and access in a university library microtext room is $0.16 per book.
The cost of a book in print format is the purchase price of the book. The annual cost of storage and access is $5.89 in a rare book library.
These cost estimates show that digitization of texts can provide significant savings if shared by a sufficient number of subscribing organizations. Networked access to digitized texts also provides several economic benefits to users including (1) increasing the availability of these texts to patrons of organizations with access to the Internet, (2) decreasing the opportunity cost of patrons' time spent accessing digital copies rather than traveling to libraries to use print or microfiche copies, and (3) providing electronically searchable texts making it easier for users to find items of interest. These increased benefits should result in a significant increase in use of the digital information relative to use of the print or microfiche copies.
Previous studies have estimated the marginal costs of production, acquisition, and storage of books, microfiche, and digital copies of texts. This study included all costs associated with the production, cataloging, and sales of texts in microfiche or digital format. Therefore, the estimates of the cost per book in print, microfiche, or digital format are average cost estimates based on digitizing over 3,300 titles and 650,000 images. Individual libraries engaging in small digitization or microfiche projects may have lower costs per text but the final product may not be of a quality needed for national or international sales. Large-scale projects that include cataloging and sales of several thousand texts are likely to experience a similar cost structure as estimated in this study.
The remaining paradox of digital information is finding the correct financial strategy to collect sufficient revenues to pay for the benefits of digitization. Digital information provides greater access to information at a lower cost. However funding the production, archiving, and access to the information requires creative financing including value-based pricing of information as well as the solicitation of grants and donations.
Information production and access comes at a cost. An accurate measurement of the full economic costs of different methods of information delivery is essential in determining the most cost-effective method. This study has shown the costs of three methods of access; print, microfiche, and digitization of microfiche. The cost of digital information is lower on a cost-per-library or per-patron basis so long as a sufficient number of libraries are interested in subscribing to the database.
In general, the lower cost of digital production will continue to result in more information products appearing in digital format on the Internet. The increase in the number of digital products will further contribute to the information overload of patrons and librarians. Information consumers are confronted with too many journals, databases, and research sources for the limited amount of time and attention they can give to any one source. Given a limited amount of time for information consumption, patrons will search for information of higher quality for use of their time. Any new digital product must have an assurance of quality in order to convince patrons and librarians that there is value in spending time consuming it. Manuscripts of historical significance, such as the ECO Project, produced by trusted organizations, such as CIHM, provide libraries and patrons with an assurance of quality.
Notes
† I would like to thank Pam Bjornson, Meredith Butler, Marshall Clinton, Malcolm Getz, Wendy Lougee, Tim Nef, Guy Teasdale, and Karen Turko for their comments and assistance. Funding for the Early Canadiana Online Project and this economic analysis was provided by the Andrew Mellon Foundation.
1. Volumes are considered to contain 216 images. Images are page images. Each microfiche image has two page images.
2. This comes from the original 2,984 uses multiplied by $5 saved per use plus an additional 4,046=7,030-2,984 multiplied by an average of $2.50 savings per use.
16. The Columbia University Evaluation Study of Online Book Use[†]
16.1 Introduction
This paper reports some observations about cost, use, and users of online books during the Columbia experiment. From winter 1995 to autumn 1999, the Online Books Evaluation Project at Columbia University explored the potential for online books to become significant resources in the academic world. The project prepared books in HTML format, a choice that seemed reasonable at the time it was made (1995). Later observation of user behavior makes us less certain of that choice. The evaluation component of the project included monitoring of the national technological environment. The project analyzed (1) the Columbia community's adoption of and reaction to online books, (2) relative life cycle costs of producing and owning online books and their print counterparts and (3) the implications of traditions in scholarly communications and publishing. The experience involved integration of two very diverse cultures, and has taught us the relevance of the following joke.
A manager, an engineer and a computer scientist are all traveling in a car in the mountains when the brakes fail and the car careens down the road and eventually stops just hanging over the edge of a cliff. They carefully climb out of the car and the manager says, "Well, now we'll have to form a focus team for a matrix review of vision and objectives." The engineer says, "Let me have a screw driver; I may be able to fix this in 10 minutes." And the computer scientist says, "Let's push this back up to the top of the hill and see if the brakes fail again."
Our approach to online books at Columbia was like that of the engineer, but "10 minutes" has been more like four years. One of the lessons learned is that, as libraries become more interdependent with online information services, we must become more accustomed to the kind of trial-and-error approach exhibited in the joke.
We started from an abstract formulation of the relation between users, libraries and constraints upon each of them; see Figure 16.1. Our goal is to understand the behavior of the user of the system, shown in the middle of the diagram. The capabilities of the individual users obviously influence their behavior. Their disciplines also probably influence it. The overall environment including technology and attitudes toward computers influences it. The resources available in the library also influence behavior. In turn, library management controls those resources. Our study is an effort to insert the dotted line shown in Figure 16.1, to provide management with feedback about the behavior of users which it can use to better manage the library resources.
16.2 Economic Perspective
We take an economic perspective on the complex problem of establishing an online books service at an existing major research library. We presume that the actors involved weigh the costs and benefits of various alternatives available to them. Each applies some kind of personal utility function to those costs and benefits, and chooses the action with the largest personal utility. In this complex setting there are many different kinds of economic actors: students, faculty, and staff. Indeed, the library itself, and even the entire university, can be thought of as "actors."
Individual economic factors
User Costs. What are some of the forces affecting individuals? First, there are costs of two kinds, capital and continuing. The capital costs are the cost of equipment needed to be able to use the digital library or online books, and the cost of acquiring the needed skills. Since, in the setting of our project, there is no transfer of funds from users to the library associated with use events, the continuing costs are (a) the cost of connecting to the library and (b) the mental costs or efforts associated with use. Not a lot is known about these costs to the user, at this point. However, in the transition from page-based books to the HTML format that we chose, we sense that certain kinds of mental landmarks that readers have developed over years of working with print on paper are removed. It seems likely that this results in additional mental cost to the users.
User Benefits. There are also benefits to the users. First among these, of course, is ubiquity of access. Also, our system provided a search capability. In addition, the book-marking system (supported through the browser) permits users to store pointers to important locations within an extended text. Our system did not directly support annotations, but obviously annotations can be established in the users' own computers. Finally, using a system like this provides the intangible benefit of being up-to-date relative to one's peers.
Beyond all this, having and using a digital library provides symbolic utility. Symbolic utility is a concept introduced by the philosopher Robert Nozick to represent the utility assigned to something good to have or to do, even if it doesn't necessarily "work" (Nozick, 1993, p. 226). In this case, members of the Columbia community may have felt proud about contributing early to the development of digital library systems.
Given the nature and variety of benefits, it seems probable that they outweigh the costs to individual users.
Staff Economic Factors
Staff Costs. Economic forces also affect the staff of any library that introduces a substantial digital component. One important cost is the learning curve, representing costs that must be incurred in order to get the system to work. The other, which is becoming a pervasive feature of the library world today, is the cost of continuous change, which involves not only learning but psychological stress as well. Because some older librarians did not foresee and do not enjoy those stresses, we will probably see a gradual change in the psychological profile of the profession.
Staff benefits. Among benefits to the staff, the first and most important is the ability to provide better service to patrons. Another important benefit is the ability, in the online books or digital library situation, to adapt materials developed by others. In focus groups conducted at New York University, we heard for the first time librarians reporting that they were pleased to be able to develop web resources in which pointers to resources developed by librarians at other institutions played a major role. A final benefit to staff is the fact that by working in the digital environment they are developing skills that are much more portable than traditional library skills.
Library Economic Factors
Library Costs. There are several incremental costs to the library in implementing this project. The most notable are costs of equipment, materials development, and training.
Library Benefits. Among the important library benefits are the contributions to the competitiveness of the university, and the contribution that digitizing the library contributes to the shared professional goals of growth and service.
Publisher Economic Factors
Publishers must consider the potential of electronic books in terms of their business plans and goals. While publishers share some objectives with libraries, authors, and readers, the relationship is sometimes antagonistic, because some portion of the price to readers and libraries goes to publishers rather than authors. We presume that for-profit publishers seek to maximize profit, while non-profit publishers seek to maximize the net of income over expenses attributable to each book.
16.3 Issues affecting the design of the studies
Based on the economic framework above, we studied the environment, publishing costs, and library costs. We explored various views on the function and design of online books. We conducted numerous and diverse studies of use and of user preferences. In this chapter we summarize some of what we've learned and discuss implications for the future.
In concrete terms, the Columbia University Online Books Evaluation Project repackaged books for online delivery, studied the use of those books, and estimated the costs for publishers and libraries of providing print and online books. There were four publishing partners in the project: Columbia University Press, Oxford University Press, Garland Publishing, and Simon and Schuster Higher Education. We analyzed the costs of development and delivery, and the use of digital texts. We sought to relate those costs to that use, within the context of university library service, and to the potential for service. Our analysis of the potential for service is by no means complete.
In the remainder of this section we review several considerations that framed our studies and motivated particular choices we made in the design of the studies.
Why put books online?
Online books have several advantages. First, we anticipate that online books will be cheaper to produce, to purchase, to acquire, and to maintain. We also expect that online books will provide increased functionality such as searching and linking. They offer obvious potential for enriched content through the addition of links to multimedia, computer simulations, and other features. There is also potential for developing expanded products, rather like a collection of books linked through a web site. Not least in importance, online books can provide availability around the clock and calendar.
Why not put books online?
The issue of whether to put books online at all was a serious one in 1994 and 1995, when the project was planned and launched. At that time, it seemed that the most important negative point was users' objections to reading books online. We do not know how true this is in the year 2000. There are definitely many who do not want to read books online, but we must entertain the possibility that most of those users are of an older generation, and will eventually be replaced by people who do want to read online.
Usability. When the project began, it was anticipated that online books would be difficult to use. At that time (1995), it was not even apparent that Web technology would be easy to use. We were also concerned that there was no feasible market model for the development of online books. We cannot say today that there is a clearly defined market, but the activities of netLibrary, Questia and other online, commercial accademic libraries show that there are multiple possible paths into the market for scholarly books, aimed at libraries and students, respectively.
Accessibility. We were concerned about the adequacy of access and connectivity. We had in mind, primarily, people working at home, connecting over telephone lines with top speed of 14.4 kilobits per second, which seemed likely to be inadequate. However, about halfway through the project the typical home-access speed had moved up to about 56 kbps, and increases in access speed continue to occur.
Production Cost. We were concerned that online books would be too costly to produce. In fact, we shall see that the production method we employed was relatively costly. Nonetheless, when compared with the total life cycle cost of paper, online production is something of a bargain.[1]
Author Interests. We also believed that authors might oppose the presentation of their books in online form. Authors might fear loss of royalties, and object to aesthetic compromises. HTML is a limited rendering language, and the connection to HTML might remove some important aspect of layout. Also important is a fear, on the part of young scholars, that exclusive publication in an online form might become common for first time authors and that this would demean their works and lessen their chances for career advancement. On the other hand many academic authors are concerned with documenting the impact of their works and the extent to which they are being read. The online environment is ideal for this.
National Environment: Access
Reviewing the environment for online books from 1995 to 1999, we see a number of changes. First is the improved price-to-power ratio for personal computers, discussed further below. We saw penetration of Internet use to more than 50% of all USA households by 1999. In addition, by 1999 half of all adults in the USA were Internet users. There was little improvement in Internet service provider pricing between 1997 and 1999. Hand-held book readers emerged in 1998, and some of our focus group work suggests that this will be important in the future growth of the online book market.
National Environment: Computer pricing
Moore's famous law is that computing power at a given price doubles every 18 months. The inverse formulation is that the cost of a given amount of computing power falls by half every 18 months. However, the corollary that consumer prices for computers falls at the same rate does not hold. Starting from when the base price of an adequate computer was about $4000 we would have expected that by the end of our study this price would have dropped to well below $1000. What we actually saw, through a program of tracing ads for an entry-level computers, is that prices dropped fairly rapidly to around $2000, and held there for some time. Towards the end of the study period there was a new break down to $1000. Apparently the strategy of manufacturers was to identify market price points that are acceptable to consumers and to improve the configuration of the computers rather than drop the price past those points. If Moore's law held strictly, a general purpose computer adequate for the use of online books over the 56 kbps lines should now cost only $300.
Local Columbia Environment
The local environment at Columbia for online books changed substantially during the period 1995 to 1999. By the end of this period there was Ethernet connectivity to every building and dormitory. By 1997, which is the last time that we could justify the costs of surveying to ask the question, 80% of students and faculty had adequate access to a network computer. By 1997, most library users reported an average of six hours per week of online activity of all kinds. That works out to about an hour a day and we estimate that by now this has probably at least doubled if averaged over the entire community.
By spring 1999, online use of complete texts had become common at Columbia. For example, the level of JSTOR use was equal to one use per month per potential user, on average. We found that most online book use was from on-campus computers. This is consistent with a concern that access from home might not be adequate. It is quite possible that, as bandwidth to the home increases, the usage of online books will increase further.
16.4 Cost data
We developed a variety of sources of data in the online books evaluation project. We conducted surveys online, by mail, by telephone and in class. We also conducted individual in-person and telephone interviews of scholars and a number of focus groups involving users, potential users, and librarians. In this report, we focus on cost analyses and on Web data.[2]
In a traditional print production environment, preparing texts for online access incurs an additional cost. We found an amazing range of estimates for this cost, from four cents per page to more than $2.00 per page, which works out to approximately $100 to $1000 per title. The range of cost is due to the enormous variation in the format and quality of source files from the publisher at the time, and in the conversion processes employed by various projects. Achieving the low-end cost requires a very standard and well-behaved PostScript source file. In addition, these figures include some unknown component of experimentation cost, as this project and others adapted to variations in input, and in desired presentation format.
1.51/pg. | Conversion: OCR or SGML or HTML |
1.00/pg. | Conversion: ASCII to HTML |
0.04/pg. | Conversion: Postscript to PDF |
20.00/title | Conversion management |
1.00/MB/yr. | Server maintenance |
In Table 16.1 we present some sample electronic book production costs. One conversion route is from OCR (or from SGML) to HTML, and the other, somewhat less expensive route begins with ASCII and goes to HTML. Conversion from PostScript to PDF is done using software from Adobe and yields a cost of about four cents per page. Note that this process, which has been tested at the University of Pennsylvania, does not yield fully navigable HTML files, but yields PDF output only. Management of conversion is estimated to have cost about $20/title at Columbia. Maintaining books on the server is steadily less expensive, estimated by the end of 1999 to cost about $1/megabyte per year.
A fully electronic production process (bypassing print) would be less expensive. Through conversations with scholarly publishers, we have been able to estimate that the potential savings for moving to online format, without paper would be about 10% at the plant (that is changes in typesetting costs) and perhaps an additional 15% in costs avoided for paper, printing and binding. Also eliminated would be costs associated with warehousing and shipping, which we did not attempt to estimate.
On the other hand, there are offsets to these savings for online production. They include costs of customer service, continuing file maintenance, and migration. These latter, archival, functions are very important. A rational economic publisher will only maintain the file for a book as long as the discounted total expected future revenue from sales exceeds the total discounted projected cost of keeping the file. Thus, libraries cannot rely on publishers to maintain the files of books with very low demand, unless they are willing to pay service fees that cover the publishers' expenses.
From our review of the literature we have prepared an estimate of life cycle costs to the library for online and paper books. These, projected over a thirty-year life cycle and discounted at a 5 percent real cost of money, are lower for online books. Our summary is shown in Table 16.2. The difference is essentially equal to the avoidance of the costs of managing circulation. In addition, long run costs for online books would likely be quite a bit lower as copy cataloging would prevail rather than the original cataloging experienced, and included in the costs, for this project. Original cataloging costs about $25 per title while copy cataloging would cost significantly less per title.
Online | ||
Acquisition/Processing | 47.00 | 39.00 |
Storage/Maintenance | 14.00 | 38.00 |
Circulation | 44.00 | (included above) |
TOTAL | 105.00 | 77.00 |
Design Considerations: Librarians
We conducted focus groups with librarians to identify market and design features that they consider important in building a collection of online books. The first feature emerging is the ability to search across selected groups of titles. A second, rather technical issue is the existence of "stable, granular" URLs. Stable means that the URLs remain the same over time, or at least that the system does not have to be manually updated. Granular has to do with the level of specificity with which a user can access a book. In the Columbia approach to online books, an individual file corresponds to a chapter within a book. We found that librarians want good bibliographic control of online books, with direct linking from the catalog into the book. But they would also like to see usage data on individual titles in some standard form. This usage data can feed back to rationalize online book acquisition policies. Finally, librarians want to be assured that an online book system will support reliable migration to new platforms.
Design Considerations: Scholars
Both in-depth interviews and focus groups with scholars generated a somewhat different list of desired design features. Scholars would like to be able move directly into the online book via direct link from the online catalog. They would like to be able to define groupings of texts on the fly, and search across that collection of texts. They would like a comprehensive and detailed table of contents, with direct linking into the book (providing, in effect, analytic indexing). When images are a significant part of the text they would like to see browsable, linked, thumbnail images. They would like screens and displays supporting the ability to show two nonconsecutive pages at once, permitting comparisons. They would like to be able to see footnotes and text in parallel displayed on the same screen, even if the "footnotes" are actually endnotes. They would also like to see pagination matching the print version, not only for navigational bearings, but also because, frequently, the citation that led them to a book specified a particular page.
Scholars would prefer that, whenever the collection contains the relevant material, references be hyperlinked directly into the cited material. They would also like to be able to link to a dictionary. They would like to be able to adjust fonts and formats for easier reading on screen. They would like to have annotation and highlighting capability that they could store with the book. They also expressed an interest in having the ability to share annotations on a single text.
16.5 Study of Users
The remainder of this paper discusses some of what we have learned about the users. The first interesting point is a relation among technology, behavior, and attitudes. We expected that the technology, as it grew, would influence the attitudes of scholars, both faculty and students, which in turn would influence their behavior. However, we tracked attitudes carefully over the entire study and saw only the smallest movement towards believing that online books are a better way to do one's scholarly work. This forces us to conclude that, in fact, technology effectively influences behavior and that attitudes simply have to catch up. This may mean that scholars are moved to technology by a subliminal perception of benefits, which they cannot articulate. On the other hand, it may mean that fashions in scholarly behavior are simply no more rational than any other kinds of fashion.
Analysis of individual use
A key innovation in the Columbia online books project was the introduction, in 1997, of the ability to identify the activity of unique users. This was a fortunate byproduct of the security system, developed to permit people to read online books from home. To maintain confidentiality of the users, system analysts replaced the identities of individual users with uninformative labels.
Frequency | Percent | |
Undergraduate Student | 2088 | 58.0 |
Other | 607 | 16.9 |
Missing | 328 | 9.1 |
Graduate Student | 295 | 8.2 |
Other Student | 145 | 4.0 |
Faculty | 136 | 3.8 |
TOTAL | 3599 | 100.0 |
With anonymity ensured, we were permitted to link usage to administrative files containing demographic information about the users. Typical results are those shown in Table 16.3, reporting the distribution of the status of individual users at the time they first used a particular resource. The resource in this case was the online version of the Oxford English Dictionary. While we had a number of reference works available online and, by the close of the project, close to 200 books in online form, the total usage of the OED represented approximately 50 percent of all online usage, and so it is used here to illustrate the types of analyses that we performed.
N | Stem | Leaves |
1049.00 | 0 | 00000000000000000000000000000000000000000000001111111111111111111111111111111111111111111111111 |
491.00 | 0 | 22222222222222222222222222333333333333333333 |
265.00 | 0 | 444444444444455555555555 |
202.00 | 0 | 666666666667777777 |
156.00 | 0 | 88888889999999 |
140.00 | 1 | 0000000111111 |
92.00 | 1 | 22223333 |
102.00 | 1 | 4444455555 |
86.00 | 1 | 66667777 |
62.00 | 1 | 888999 |
68.00 | 2 | 0000111 |
49.00 | 2 | 22233 |
57.00 | 2 | 44455 |
48.00 | 2 | 6677 |
38.00 | 2 | 899 |
38.00 | 3 | 001 |
41.00 | 3 | 2233 |
29.00 | 3 | 45 |
35.00 | 3 | 667 |
32.00 | 3 | 899 |
34.00 | 4 | 011 |
25.00 | 4 | 23 |
18.00 | 4 | 45 |
20.00 | 4 | 67 |
16.00 | 4 | 89 |
11.00 | 5 | 0& |
There were 3,600 individuals who used the OED during the study period. Just over 2,000 of these were undergraduate students at the time of first use. Nearly 300 were graduate students and close to 140 were faculty members.
We analyzed the ways in which individual users used the resource. To do this we introduced the rule that an inactive period of 15 minutes or more was considered to mark the end of a session. This is a reasonable rule based on detailed analysis, which showed that there was a natural break in the distribution (over all users) of the interval between "clicks" at somewhere around 10-15 minutes. We interpret this as meaning that continuation of a session over a break of this duration will be a rare event, which we can safely ignore. We also studied the total amount of use that individuals made of specific resources. This is illustrated by data on the OED. The mode (that is, most common) number of clicks that an individual user made on the OED is somewhere between 2 and 3. Above that number the number of clicks that a person made on the OED drops exponentially. The rate of the drop is such that the chance to go on to two more clicks is about 2/3 at any time. (The chance to add one more click is the square root of this number, or about, 83%.)
As shown in Figure 16.3, the time spent using the OED online follows an exponential distribution. This indicates that at any time in the course of using the OED an individual has a constant probability of just quitting and deciding never to use it again (roughly 100%-83%=17%).
This apparently exponential behavior is intriguing and we pursued it in another way. Since we could anonymously track individual users, we could plot how much an individual used the resource against how long it was since the first time that the individual used it. With 100% adoption this graph would be roughly linear. We show the actual data for the OED (which had heavy use) in Figure 16.4.
Figure 16.4 is a scatter plot. Each point represents one individual user. The y-coordinate of the point represents the number of sessions that an individual had with the OED and the x-coordinate represents the number of days since that individual first used the OED. The steep line represents the expected usage relationship if adopters continued to use the resource at a steady rate.[3] In fact, a regression analysis shows that the best fit is nearly horizontal, which indicates there is little ongoing use by individuals. It is apparent that many observations are not well-predicted by this model, and indeed, that some usage did persist.
We can plot this data in a more familiar form by showing the distribution of time since first use, without paying attention to how much use there has been. We do so by projecting the preceding figure onto a horizontal axis; See Figure 16.5. We see, as have most researchers in the academic setting before us, that it is very easy to discover the existence of the semester. Each of the five peaks in this graph corresponds to an academic semester. There might be some cause for optimism in the fact that the leftmost peak, which represents the most recent surge in use, spring 1999, seems to rise higher than any of the earlier ones. However we don't know quite what to make of the fact that the one before it (fall 1998) represents a drop from the preceding fall.
Online Versus Paper: Usage Data
Our data (based on comparison between the online book usage figures and data collected through circulation statistics and slips placed in corresponding reference titles in the library) suggest that online books were used more than their print counterparts. If we count circulation alone we find that there were about three times as many accesses per book online as for the paper version. After consultation with librarians we believe that a reasonable correction for in-house use is to increase circulation by 50%. This would reduce the ratio to twice as many online uses per book.

We conjecture that higher usage for online books is due to lower convenience costs than for other access options. Having purchased a paper copy for the library does not ensure that the book is available. The book might be in circulation, or missing from the shelf. If the library is closed the paper copy of book is not available to a user. A common access option is an online public access catalogue (OPAC). However, an online public access catalog does not support even the roughest form of browsing into the book until the book itself is put online. An OPAC provides so little information about a book that a scholar might not be aware that it contains material relevant to his work. If so, the mere ownership of that book by his library does not make it truly available to him. Catalog records enhanced with tables of contents and book indexes are a relatively new offering and a major asset to the scholar in locating books relevant to his or her research, but do not eliminate the higher convenience costs of accessing the physical book at the library.
Hence, the online access to a full book represents a quantum leap in the availability of the contents of that book, and, we believe, lowers the barriers to access for many modalities. Perhaps the only modality for which it is not clear that online access is preferable is "plain old reading at length."
We were also interested in studying patterns of access when readers use online books. We have approached this in two different ways. One is essentially qualitative, in which we asked people in surveys and in interviews how they used online books. In doing that we were able to identify at least the following kinds of activity: browsing, grazing (that is, reading portions of text scattered through the book, punctuated by visits to the index or table of contents) citation checking, the finding of individual facts or quotations, reading on reserve for a course, determining the need for a paper copy, printing (that is, turning the online book into paper), and directly reading online.
We have also, because we can track individual users, been able to break some new ground in quantitative analysis of how people use books online. Generally, each chapter is a separate file, and hence a separate entry in the web sever log. Thus, by analyzing the sequence of clicks on chapters, we are able to distinguish a number of different ways in which individuals use online books. The first style we characterize as linear use: an individual reads chapters of a book in exactly the same order in which they appear in the printed volume. The second pattern of use is quasi-linear, in which the sections of the book are visited in some personalized order but each section is read once and only once. We also observe a pattern we call hyper-linear, in which sections are visited in an arbitrary order and some sections are visited more than once. Hyper-linear usage occurs about 12% of the time. See Figure 16.6.
There are several ways that a use pattern may involve use of the index (or, more generally, search tools); see Figure 16.7. The first format is to use a search tool once, at the outset, and then to view portions of the book in some linear or quasi-linear order. Another possibility involves using the index, going to a section, and then going back to the index and out to another section and continuing in this pattern. Whether this is a natural behavior evolving in the presence of online books or an artifact introduced by the fact that returning to some index or search tool may be the easiest way to get to the next section is something we don't know at this point. In thinking about these patterns of use, we may compare them to what a person might do with the book in hand, at the library shelf, or with access to the catalog, in some online format.
16.6 Economic Behavior
Economic Behavior of Scholars
Given our original framework, we would like to bring together everything that we have learned, to formulate some economic model about scholars' preferences for modalities of book access. We believe that, for this issue, one key variable is cost which we characterize simply as low or high. (For the moment let us imagine that this is the purchase price of the book, as far as the scholar is concerned.). We propose that the other key variable is whether the scholar intends to read much or read little. We believe that whether the book is cheap or expensive, if only a little of it is to be read, the scholar will prefer to get it online. Based on data available to us during the span of this project, we believe that if much of the book is to be read, the scholar will prefer to get it in paper form. If the cost is low, the scholar will buy it; and if the cost is high, the scholar would like the library to buy it so that he or she can borrow it.
In short, what we seem to find is that users want online books for convenient access and for assured availability. They also want online books for many of the purposes discussed above. They are particularly attracted by the added functionality of annotating and hyperlinking. Nonetheless, our results indicate that when scholars want to read books at length, they still want them in paper form.
Economic Perspective of Librarians
Complementary to this analysis of when scholars will prefer online books, our focus group studies with librarians indicate that librarians want online books for high demand books (for example instead of buying a second copy). Librarians also want online books to meet transient demand, rather than having to purchase additional copies which will be unused later. And, of course, librarians want online books for the anticipated cost savings.
On the other hand librarians are concerned about having to pay separately for the online version of a book that they hold in paper. They are concerned about the uncertainty of preservation and migration of digital forms. They also are concerned about the appearance of unwanted and unused material in bundled packages. While bundling in general can increase both consumer and producer benefit (Shapiro and Varian, 1999), librarians are particularly concerned with the flow of cash from the institution to the publishers, and would like to have the finest possible detailed control to optimize the allocation of those funds, by avoiding materials that are less in demand.
Speculations on Marketing Strategies
We have tried to speculate on options for library-oriented strategies for the introduction of online books. For example, one might imagine that online versions are made available for little or no additional cost to purchasers of paper copies. One might hope to see entire collections of online material priced very attractively. On the other end of the bundling spectrum, one might see some kind of on-demand licensing, or on-demand print ordering. The netLibrary is offering yet another alternative by mimicing the circulation system for print books. It provides online books to individual libraries or library consortia and allows just one user at a time for each book. Other marketing strategies are more reader-oriented and less tailored to the concerns of a library. These include Questia's effort to build an online book collection the size of a college library (250,000 volumes) and to sell subscriptions to students. Another path is the hand-held device and downloadable book that is now coming to market. Generally speaking, in reader-oriented strategies, pricing of the electronic form will be unrelated to print purchase, as there is little chance that consumers can be persuaded to buy the same "book" twice.
We speculate, but at this point can only ask, whether different strategies will emerge for different classes of print materials such as text books, scholarly books, and narrow interest (sometimes called endangered) scholarly books.
At the end of our study it appears that a number of transitional strategies are available or being developed. The leading one is the dual provision by publishers of publications in print and online. Among other virtues of the strategy, there is the possibility of electronic publication of both a backlist (the books that have been available for over a year) and a front list (the books newly published). Since publishers still need to protect ultimate paper sales, some limits may be placed on the accessibility or functionality of new titles that are presented in front-list form.
16.7 Concluding Unscientific Postscript
Use of online books can be tracked at a micro level, providing valuable information for authors and publishers. In fact, scholarly authors must become concerned about these data since their advancement may depend on being able to document the degree to which their works are used, as well as the degree to which they are cited.
Having studied the provision and usage of online book for four years, we feel emboldened to make a few predictions. Due to cost, complex functionality will be reserved for books that have large sales or are developed in subsidized projects. We anticipate that endangered monographs will be available from academic or society servers, from sites like the Los Alamos Preprint site , or from the individual authors themselves. In other words, they won't be "published" as we understand it today. Many books will appear in both electronic and print versions. Commercial enterprises or academic organizations and not library experiments will define the product that eventually comes to dominate.
Notes
† This research has been supported by the Andrew W. Mellon Foundation and Columbia University. The views expressed herein are not necessarily those of that Foundation or the University. The first author acknowledges support from Columbia University through a contract with Tantalus Inc., SCILS, Rutgers University, the Fulbright Foundation, and Ragnar Nordlie and the Journalism, Library and Information Science Department of the Oslo University College, Norway. At Columbia the authors are indebted to many individuals in the Libraries, in Academic Information Systems, and in the academic departments for their participation, encouragement, and cooperation. Elaine Sloan, University Librarian, was critical to the formulation of the project and an insightful supporter. Walter Bourne, David Millman and Gordon Dahlquist were particularly improtant to the process of creating the online books and various online questionnaires. Lynn Jacobsen Rohrs was a key project participant as the analyst of the web server data. Kate Wittenberg of Columbia University Press, Leo Balk of Garland Press, and Ursula Bollini of Oxford University Press provided books from their presses and shared their insights into the publishing business and critical issues for our research. The authors thank the editors for careful revision of the manuscript.
1. As we will hear during this conference, publishers are working hard to reduce those prices to make online books very competitive.
2. The reader may visit the project web site: to review other studies and reports.
3. This is a qualitative relationship: our prediction is merely that the relationship would be linear and rising, but the slope in the figure is arbitrary.
17. Revitalizing Older Published Literature: Preliminary Lessons from the Use of JSTOR[†]
Perhaps it would be best to begin this chapter by stating explicitly what it is not. This chapter does not present a scientific study. It does not purport to present evidence that will lead the reader to a carefully argued conclusion. Rather, it is an attempt to highlight some of the questions that usage of the JSTOR database is enabling us to ask and to begin to assess whether there are answers that will prove interesting or valuable to the scholarly community. At this stage, and with the relatively small amount of data and minimal degree of analysis that has been conducted, this report should be regarded as highly preliminary.[1]
JSTOR began as a research project sponsored by The Andrew W. Mellon Foundation at the University of Michigan. Its original objective was to test whether the digitized versions of older research journals might serve as a substitute for the paper versions, thereby offering libraries the possibility of long-term savings in shelving and archiving costs while simultaneously improving their usability. A pilot database was created that included the back runs of ten journals — five in history and five in economics — and access was made available at five liberal arts colleges and the University of Michigan.[2] By the summer of 1995, it was apparent that the concept held great promise, and JSTOR was established as an independent not-for-profit organization. JSTOR was founded to carry on the original objective stated above, but with the added charge that it develop an economic model that would allow it to become self-sustaining.
The JSTOR Phase I database now includes the backfiles[3] of 117 journal titles (see Table 17.1) from 15 academic disciplines, a collection numbering nearly 5,000,000 pages. As of March 2000, more than 650 academic institutions from 30 countries were participants in this collaborative enterprise, with approximately 100 colleges and universities having had access to the database since early 1997. The amount of usage of the resource and its growth rate have been surprising. In 1999, over 1.4 million articles were printed from the JSTOR database, over 4 million searches were performed, and users accessed the database more than 17 million times.[4]
|
Figure 17.1 illustrates the growth in the total number of accesses since the database was first made available.
When JSTOR was established, many people questioned the wisdom of converting journal backfiles. With comparatively little use of these materials in paper form, one could not help but wonder whether there would be sufficient interest in gaining access to the resource to warrant the substantial investments that would have to be made to create it. It is clear that it would not have been possible even to conceive of pursuing a project like JSTOR without the interest of the Mellon Foundation. Through its grant-making, the Foundation provided the financial resources necessary to establish the technological infrastructure required to create the database. Perhaps more importantly, however, the Mellon Foundation contributed staff time, most notably that of its President, William G. Bowen, to launch the enterprise.
The investments of the Mellon Foundation have made it possible for JSTOR to pursue and begin to fulfill its important not-for-profit mission, one component of which is to enhance the accessibility of little-used and inconvenient-to-retrieve journal literature. Another primary component of JSTOR's mission is to act as a trusted archive for the material under its care. This part of JSTOR's mission is reflected in the number of articles in the database that are not being heavily used today, but which may someday be a critical component of a new line of argument for an important paper or research article.
Early analysis of JSTOR's usage data allows us to begin to ask questions about how scholars and students use older literature in electronic form. Do scholars and students make use of the older articles? Are the materials being used more now than they were in paper format only? Can these data provide guidance about what material should be digitized? Does the usefulness of the older literature vary by academic discipline? These are some of the questions that we hope JSTOR will answer over the long run.
17.1 Comparing JSTOR Use to the Usage of the Journals in Paper Format
As part of the original JSTOR pilot project, an effort was made to collect circulation and usage information for the ten pilot journals. The hope was that the data would serve as a benchmark for comparison purposes. Unfortunately, it was not easy to collect reliable data. Since many of the journals were available in open stacks, it was not possible to obtain accurate circulation figures (although some circulation data were obtained from the University Reserves office at the University of Michigan Library). Instead of regular circulation data, two counting methods were employed to obtain information about use of these journals. First, slips of paper were placed in each journal volume with a request that a user mark when they had used the volume. Signs were also placed in the area of the journals to instruct users of the survey being conducted. Second, staff at the library pilot sites were instructed to check the shelves each business day for several months and make note of which volumes were not on the shelves. The volumes not on the shelves were counted as having been used.
Also, only the journal volumes housed on the main library shelves at the participating pilot libraries were included in this work. Usage of the paper volumes in faculty offices or in departmental libraries was not captured. Because of the lack of a controlled environment and the relatively narrow scope of this study, one must be careful about conclusions drawn when comparing these data to site license access to JSTOR at the institutions.
It does appear, however, that the electronic articles in JSTOR are being used much more frequently than they were used in the paper form. The paper usage data was collected over varying lengths of times at the five institutions that returned data, but a minimum of three months of information was collected. There were a total of 692 uses of the ten journals at the five test sites over the course of the entire survey period. Usage of the same journals in JSTOR at the same five sites for the months of September, October and November of 1999 yields a total of more than 7,696 article views. In addition, although there is presumably substantial overlap in articles viewed and those printed, 4,885 articles were printed — a total of 12,581 views and prints during the three month time period. When compared to the 692 uses in the benchmarking survey, it would seem that the convenience of having electronic access is facilitating greatly increased use of the material.
Another way to assess whether usage of the older journals in electronic form is greater than in paper is by evaluating the growth in usage. As Andrew Odlyzko points out in Chapter 2, growth rates may matter more than absolute numbers. It is rather unlikely that the usage of older articles in paper form was growing at measurable rates. That contrasts markedly with usage of JSTOR (as well as other resources discussed in this book). Growth in the aggregate use of the JSTOR database has increased dramatically in the period since 1997 when it first became available. Table 17.2 below shows the total accesses to the database by institution type.[5] Total accesses to all content in the database increased 4.4 times from 1997 to 1998 and 3 times from 1998 to 1999.
JSTOR Class | Accesses 1997 | Accesses 1998 | 1997-1998 Growth Factor | Accesses 1999 | 1998-1999 Growth Factor |
Very Large | 817,893 | 3,291,648 | 4.0 | 8,550,945 | 2.6 |
Large | 160,700 | 785,244 | 4.9 | 2,766,100 | 3.5 |
Medium | 110,254 | 637,950 | 5.8 | 2,468,666 | 3.9 |
Small | 110,312 | 490,854 | 4.4 | 1,323,894 | 2.7 |
Very Small | 43,754 | 207,170 | 4.7 | 73,823 | 3.4 |
Totals | 1,242,913 | 5,412,846 | 4.4 | 15,814,475 | 2.9 |
Because some of the growth in aggregate usage of JSTOR is a result of new institutions signing up for the database during this time period, we have compiled usage figures at institutions that had JSTOR installed prior to April 1, 1997. Aggregate accesses at these institutions increased by a factor of 3.4 times from 1997 to 1998 and by a factor of 2.5 times from 1998 to 1999. The cumulative growth of usage over the three-year time period at existing sites is 740%!
As one contemplates this impressive growth in JSTOR usage, it is perhaps valuable to note that JSTOR is available "for free" to end users. Libraries have paid participation site license fees that allow authorized users (faculty, staff, and students) to make unlimited use of the resource. For the most part, authentication is handled by IP address, thereby making the authentication process virtually invisible. This unfettered access contributes to the rapid growth in use of the resource; it is consistent with the kind of growth one is seeing in other resources available on the World Wide Web. This picture might be very different indeed if JSTOR were charging either users or libraries based on usage.
17.2 The Interdisciplinary Appeal of JSTOR
An additional variable that is likely to be a contributing factor to the increasing use of JSTOR is the addition of new content. Since 1997 JSTOR has been digitizing new journals and making them available to participating institutions. Content in new academic disciplines introduces new scholars and students to the resource. Additional content in existing fields broadens the appeal of the resource within that discipline.
As the resource has grown, it is evident that the cross-title and interdisciplinary appeal of the resource has grown as well. Pulling from the search logs of a recent week of JSTOR use reveals that approximately 68,000 searches were conducted. Of these, just under 62,000 (91%) specified more than one title. Because JSTOR offers the option to search by cluster (pre-defined discipline-specific collections[6]), it is convenient for users to search across journals in a single discipline. Approximately 58,000 searches specified clusters. Of those cluster searches, 69% specified more than one cluster. This is quite significant because the JSTOR interface does not offer an option to select all clusters. Judging from this behavior, the ability to search across disciplines is important to users.
17.3 Nature and Distribution of Use
There are a total of 831,087 articles in the JSTOR database. Our use of the term "article" may be a bit misleading in that it refers to all items that are indexed as an item for retrieval. Full-length articles are a sub-set of this total, of which there are presently 356,978. Other "articles" are items like book reviews, letters to the editor, membership lists, and the like.
The distribution of the use of JSTOR is interesting because it speaks to the extent to which JSTOR functions as an archive. Many libraries, particularly research and academic libraries, have a mission to collect not only that material that is likely to be used today, but also to collect and care for that information which may be valuable in the future. JSTOR has surprised us in the extent and degree that it has been used, but there is something to be learned also from what has not been used.
After three years, 430,429 different articles have been viewed, representing 51.8% of all articles in the database. (Many of these articles have been viewed multiple times; the figure above relates to whether the article has ever been viewed.) 248,683 articles have been printed, representing 29.9% of all articles.
The complement to the statement above is that nearly half of the articles in the JSTOR database have never been viewed or printed. Will they ever be used? We do not know. Further, we find the distribution of use among the articles to be rather concentrated. Figure 17.2 presents the number of article views accounted for by the top n articles. For example, the top 100 articles viewed represent 112,072, or 2% of the total article views. The top 10,000 most viewed articles were viewed 1,987,982 times, or 36% of the total. And the top 100,000 most viewed articles were viewed 4,613,610 times, or 82 % of the total. This last figure means that 12% of the articles accounted for 82% of the views. This high concentration may be somewhat misleading because our count of total "articles", as mentioned before, includes all items in the database, such as reviews, and front matter and back matter, not just full length articles. Since it is natural that many of these items may never be viewed or cited, but are included in JSTOR to present the complete and comprehensive digital version of the originally published journal, this level of concentration probably should be expected. In any event, it is not a concern to JSTOR since its mission is to serve as an archive and not to make its decisions on preservation of content based on the amount of use of the various articles contained in the database.
17.4 Selection Criteria
Since it is generally accepted that it will not be possible to digitize all journals that have ever been published, an important question for any digitization project is how to select the retrospective content to be made available electronically. In JSTOR a variety of factors are taken into consideration in the selection process, including surveys of faculty and library professionals in the field in question, library subscription levels, citation impact factor measures, and length of the run, among other things.
Looking at JSTOR usage at the article level, it is evident that citations should not be used as the sole factor in determining what content should be digitized. To test the question of whether citation or citation frequency correlates with database usage, we conducted a preliminary analysis on use of particular articles in JSTOR. First, we identified the top ten most frequently used articles for each of the 117 journals in the database. We then looked up their citation data using ISI Social Science Citations. What we found was that usage and citation data were not correlated. For the purpose of illustrating the point, Table 17.3 displays an abbreviated version of the data we collected. Shown below are the top three articles in terms of JSTOR use since 1997 (through March 20, 2000) for three Economics titles. The number of citations to each article in the period from 1997 to 1999 is displayed,[7] as are the average number of citations to each article for the period from 1972 through 1999.
Journal Title | Number of Times Cited | Average cites/year | JSTOR views | Year of Publication |
American Economic Review | ||||
Article 1 | 79 | 24.1 | 1,670 | 1968 |
Article 2 | 77 | 15.7 | 1,232 | 1945 |
Article 3 | 181 | 35.9 | 1,316 | 1981 |
Quarterly Journal of Economics | ||||
Article 1 | 175 | 32.4 | 2,426 | 1970 |
Article 2 | 104 | 26.6 | 2,400 | 1992 |
Article 3 | 216 | 50.9 | 1,583 | 1991 |
Journal of Political Economy | ||||
Article 1 | 4 | 0.5 | 1,895 | 1973 |
Article 2 | 8 | 21.1 | 1,480 | 1990 |
Article 3 | 93 | 17.2 | 1,258 | 1983 |
Citations do not appear to provide anything like a complete picture of the potential usefulness of a journal article. The most notable example of this point is the number one article for the Journal of Political Economy. Even though this 1973 article has rarely been cited (4 times between 1997 and 1999) and only an average of .5 times per year between 1972 and 1999, it has emerged as the most often-used article from that journal. This article has been viewed 1,895 times and printed 1,402 times during the period that it has been accessible in JSTOR. What this example reveals is not only that citation data may not be the most useful measure for determining what should be digitized, but also that citations focus on what might be called the "reference" or "documentation" value of an article, not its usefulness defined more broadly. Articles with four citations may end up, for a variety of reasons, being the most used. Or, alternatively, highly cited articles may not be used very often at all. This is a factor to keep in mind when selecting content for digitization initiatives.
17.5 Age of Useful Articles
Table 17.4 shows calculated summary data for the most frequently used articles in each of the 15 JSTOR clusters. The purpose of this assessment was to take an initial snapshot of the relative value of older literature in each of our JSTOR fields. The chart was assembled by first collecting the number of article views from the JSTOR database, ranking the articles in order from most-often viewed to least viewed, and as in the case of the analysis above, pulling out the ten most frequently used articles. We know the year of publication for each article, so we were able to calculate the average age of the top ten articles for each title. We then averaged these data across each discipline to provide an estimate of the average age of the most-used articles in each field. When evaluated in this way, it was apparent that some older articles have truly lasting value, that in most of the JSTOR fields, older articles were well-represented among the "top ten", and that the value of older material seems to vary with the discipline.
Again, to use the field of economics as an example, a surprising number of older articles have emerged as the most heavily used. The average age of the articles in the top ten most printed and viewed articles in the economics cluster is 13 years. This is rather surprising, as our expectation before starting JSTOR would have been that usage of economics journals would be much more focused on more recent issues.
Number of Titles | Num. of Views from Top 10 | Share of Top 10 Views | Avg. First Year of Publication | Avg. Most Recent JSTOR Year | Avg. Age in years of Top 10 Articles | |
African American Studies | 7 | 16,637 | 4% | 1959 | 1996 | 3 |
Anthropology | 6 | 12,301 | 3% | 1954 | 1994 | 4 |
Asian Studies | 4 | 5,433 | 1% | 1936 | 1994 | 11 |
Ecology | 6 | 19,293 | 5% | 1943 | 1996 | 11 |
Economics | 13 | 87,711 | 22% | 1936 | 1994 | 13 |
Education | 4 | 13,153 | 3% | 1946 | 1995 | 11 |
Finance | 5 | 13,201 | 3% | 1958 | 1995 | 10 |
History | 15 | 58,365 | 15% | 1934 | 1995 | 12 |
Literature | 11 | 23,992 | 6% | 1946 | 1995 | 7 |
Mathematics | 11 | 7,344 | 2% | 1932 | 1994 | 32 |
Philosophy | 10 | 16,538 | 4% | 1931 | 1994 | 16 |
Political Science | 9 | 52,201 | 13% | 1933 | 1995 | 8 |
Population/Demography | 8 | 15,808 | 4% | 1965 | 1995 | 5 |
Sociology | 9 | 41,387 | 11% | 1945 | 1994 | 6 |
Statistics | 11 | 8,480 | 2% | 1936 | 1994 | 9 |
An even more dramatic example is Mathematics, where the average age of the most used articles in the field is 32 years! This result is consistent with what mathematicians have told us about their field; that is, that older mathematics literature remains valuable. (Mathematicians are some of the most enthusiastic supporters of JSTOR and regularly urge us to include more mathematics titles). However, it is worth pointing out that usage of the mathematics cluster in JSTOR has lagged behind usage in other fields. With the long runs of its 11 journals, as a cluster mathematics has the highest number of pages in JSTOR, and yet usage of the mathematics cluster represents just 3.3% of total usage. One reason for making this point here is that there simply is not enough data to make too much of the average length of the article in mathematics. With a small number of total accesses for the field, the actions of a few people can sway the data significantly. As mentioned earlier, one has to be careful about drawing conclusions from the data.
Nevertheless, the apparent contradiction between the qualitative value of JSTOR to mathematicians and the usage of the mathematics journals in JSTOR dramatically illustrates an extremely important point. One must define clearly what one means by "value". Usage does not necessarily equate to value in the research sense. Older articles may be absolutely vital to the continuation of high-quality scholarship and research in the field, but that may not lead to extensive use. Increasingly, one hears that libraries are planning to use electronic usage data to help make subscription decisions. If relied upon exclusively, this could prove to be a very dangerous tool, making it more difficult for lesser-used but valuable research journals to survive. Other measures, like citation data, need to be incorporated as well. The nature of these data will also change with the availability of electronic resources. One wonders, for example, if the number of citations to older articles in JSTOR will increase as the older articles become more conveniently accessible. This possibility is worth monitoring, but with the understanding that it will take years before changes in scholars' behavior will manifest itself in the citation data. Understanding the nature of a field and the way that research materials are used in the field is essential before making selection and cancellation decisions. It is our hope that, over the long run, JSTOR can contribution to this kind of understanding.
17.6 Conclusion
This paper provides a brief overview of preliminary information emerging from JSTOR usage data. As JSTOR usage increases, more interesting questions about the way that retrospective electronic collections are used can and should be asked and investigated. Although it is still too early to draw conclusions, and much more data will need to be collected, evidence points to preliminary hypotheses in five primary areas.
Electronic access seems to have increased the use of older materials at JSTOR participating sites.
The interdisciplinary nature of JSTOR seems to be valued by researchers and students.
Citation data alone is not a good predictor of electronic usage, and probably should not be used to make digitization decisions for retrospective content.
Older literature seems to remain valuable in many fields.
Care should be taken to insure that there is clear understanding of the definition of "value" for research articles. Judging by the nature of the JSTOR articles that are most used, valuable research articles are not always those that push forward the research and intellectual understanding of an academic discipline; they may very well be "popular" articles used in larger classes. "Value" needs to be clearly defined as libraries consider acquisition and cancellation decisions for electronic content.
What this preliminary evaluation of JSTOR usage does indicate is that electronic databases are leading us into new territory. Their availability impacts the use of scholarly resources in profound ways. It should come as no surprise that improving the convenience of access to an article increases the likelihood and frequency of use of that article. But does that impact the inherent value of the article? In evaluating usage of these materials, we will have to take a long view, as we cannot rely on old metrics, methods, and intuition to guide our sense of value. It will take time before we reach a new level of understanding — a kind of new equilibrium — of the relevant measures that will enable us to make useful comparisons between and among various resources.
Notes
† This paper was first presented at the Pricing Electronic Access to Knowledge (PEAK) conference entitle "Economics and Usage of Digital Library Collections," held at the University of Michigan in Ann Arbor, Michigan on March 23 24, 2000.
1. Participating JSTOR libraries and publishers have requested that JSTOR exercise care when presenting and distributing JSTOR usage data. We therefore aggregate these data whenever possible and do not identify the usage at individual sites, for individual publishers, or for individual articles by title.
2. The original test site libraries were Bryn Mawr College, Swarthmore College, Haverford College, Denison University, and Williams College, in addition to the University of Michigan.
3. JSTOR digitizes and makes accessible participating journals starting with the first volume published.In order to protect publishers' subscription revenue stream, JSTOR does not include current issues, but offers access up to a "moving wall" negotiated with each publisher.
4. In assembling usage statistics, JSTOR counts significant accesses, not server hits. Accesses include actions such as viewing electronic tables of contents or citation data, viewing an article, printing an article and executing a search.
5. JSTOR uses the Carnegie Classes of U.S. Institutions of Higher Education to place colleges and universities into one of five classes ranging from Very Large to Very Small. For a description of our methodology, see http://www.jstor.org/about/us.html#classification .
6. JSTOR's initial database offering, called Arts & Sciences I, includes journals from 15 academic disciplines, and are displayed in the interface in these groups, which are sometimes referred to as clusters.
7. Citation data were determined using the Dialog service to access ISI Social Science Citations. Individual articles were located, and citations to these articles were analyzed by year of publication. Data were determined for 1997-1999 inclusive by simple addition. Average number of cites per year was figured beginning in the year of publication or 1972, whichever is later, through the end of 1999, then dividing that total number by the number of years since publication or 1972, whichever is later. 1972 is the earliest year of data provided by ISI in the version of the database we consulted.
18. Measuring the Impact of an Electronic Journal Collection on Library Costs: A Framework and Preliminary Observations[†]
18.1 Introduction
Much has been written about the economic impact of electronic publishing on publishers. There also has been considerable discussion of the cost of subscribing to electronic publications. This paper addresses another important organizational impact triggered by the migration to electronic journals that has heretofore received little attention in the literature: the changes in the library's operational costs associated with shifts in staffing, resources, materials, space and equipment.
In 1998 the W.W. Hagerty Library of Drexel University made migration to an electronic journal collection as quickly as possible a key component of its strategic plan. If a journal is available electronically, only the electronic version is purchased whenever possible.[1] The sole exceptions are (1) when the electronic journal lacks an important feature of the print version (e.g., equivalent visuals) and (2) when the journal is part of the browsing collection (e.g., Scientific American and Newsweek). With the year 2000 renewals, Drexel's journal collection consisted of 800 print only subscriptions and 5,000 electronic journals; in 2001 the library will subscribe to about 300 print-only journals and over 6,000 electronic journals. A dramatic change in staff workload is the most immediate impact on library operations, but space, equipment, and even supply needs are affected. Some of the aspects of this transformation were obvious and predictable; others were not. This paper describes the changes experienced so far in the Drexel Library.
A common assumption is that converting library journals to digital format will ultimately improve library service and lower costs, but this is yet to be proven. Understanding the total costs associated with the library model for delivering digital information has now become a requirement for library survival since in the digital world, as opposed to print, the library has many viable competitors. Our goal is to develop a framework for assessing the shifts in personnel and costs that can be used for planning and budgeting at Drexel and provide guidance to other academic libraries who are not yet so far down this path.
18.2 Background
Commonly, journal cost analyses use subscription costs and ignore the operational costs associated with a journal collection. For example, White and Crawford (1998) undertook a cost-benefit analysis to determine whether acquiring Business Periodicals Online (BPO), a full-text database, was more cost-effective for supplying articles than obtaining articles in the database through interlibrary loan (ILL). They found that the out-of-pocket costs (ILL transaction costs versus the BPO subscription costs) were similar, but the level of service was much greater with BPO.
Hawbaker and Wagner (1996) also compute only subscription costs when comparing the costs of print subscriptions to online access of full-text. They conclude that, for a full-text business database, the University of the Pacific's library can offer more than twice as many journals for a 15 percent increase in expenditures.
The Electronic Libraries Programme (eLib, 2001) funded by the Joint Information Systems Committee (JISC) in the United Kingdom, consists of a series of major library projects investigating issues of digital library implementation and integration. The guidelines for evaluation of the eLib projects call for "modelling of functional, cost, organisational and technical variables" as one of the desired components (Kelleher et al., 1996). Pricing models for electronic journal subscriptions, licensing agreements, and infrastructure requirements to provide access are themes that are explored in these projects. Each project tends to be fairly focused in terms of the range of digital content and services offered or the topic addressed. Many deal with organizational and management models that can serve as the basis for scaling up initiatives.
JSTOR (1998), which builds journal backfiles, does address building-related costs. One of the JSTOR objectives is "To reduce long-term capital and operating costs of libraries associated with the storage and care of journal collections." By guaranteeing online availability of backfiles, JSTOR not only makes these files more accessible, but also allows libraries to discard old journal runs without decreasing service to their users.
In an analysis completed several years ago Tenopir and King (2000) report the average cost of acquiring and maintaining a print journal collection to be $71 per title in academic libraries and $81 in special libraries. These are 1998 figures adjusted for inflation.
Odlyzko (1999) also focuses on non-subscription costs. He points out additional factors to consider in evaluating the impact of journal growth on libraries:
Journal subscription costs are only one part of the scholarly information system...internal operating costs of research libraries are at least twice as high as their acquisition budgets. Thus for every article that brings in $4,000 in revenues to publishers, libraries in aggregate spend at least $8,000 on ordering, cataloging, shelving, and checking out material, as well as on reference help. The scholarly journal crisis is really a library cost crisis. If publishers suddenly started to give away their print material for free, the growth of the literature would in a few years bring us back to a crisis situation.
Odlyzko's analysis shows that the library's non-subscription (i.e., operational) costs are on average double the subscription costs. His figures are derived from the Association of Research Libraries (ARL) statistics (Association of Research Libraries, 2000b). His is a macro level measurement that does not take into account, for example, the different processing costs for books and journals or library costs unrelated to the collections which might cause the non-subscription figure to be over-estimated. On the other hand, ARL statistics do not report the considerable costs associated with constructing and maintaining library buildings, a factor which if added to Odlyzko's number would lead to a higher estimate of non-subscription costs. Even if off by a factor of two, Odlyzko's estimate is astounding to consider, and points out the importance of looking at how these operational costs shift in the transition to an electronic model.
18.3 Development of Drexel's Electronic Journal Collection
In the spring of 1998 only one full-text journal collection was accessible via the Drexel Library's web site, and database access was limited to text-based systems. That summer the web site was completely re-designed and by the fall more than 20 databases and several collections of full-text journals were available. The total number of print journals was 1,700 titles at the time. For 1999 and 2000 the number of print-only journal subscriptions was reduced to 1,100 and about 800 respectively. Some of the reductions were made because we had subscribed to an electronic counterpart; the other journals were not renewed primarily on the basis of low use. During the fall of 1998 through 1999, and into 2000, electronic subscriptions were sought out aggressively and added as they became available, bringing the fall 2000 total to over 6,000 unique electronic titles. Print-only renewals for 2001 were about 300.
The selection/ordering/acquiring process is far more complex for electronic journals than for print journals. The methods for purchase often include buying packages of titles or services, many with value-added features. Reviewing and negotiating proper terms for e-journal licenses is a major aspect of the complexity. Additionally, new variables must be considered (e.g., graphics, linking options, comparability with the corresponding print publication, web interface functionality, and other value-added features).
When the publisher's policy is to require purchase of the print journal in order to obtain access to the electronic journal, we attempt to negotiate a discount for the e-journal only. This has met with limited success so far, but does have the advantage of educating publishers about our needs. Because of the added cost of receiving, processing, binding and storing the print issues, we do not retain print journals even if we have paid for them. Some of these journals are never shipped by the publishers; others are retained by our serials vendor for their back issue file; and still others arrive and are given to our back issue jobber.
Drexel's approach to back files of print journals will seem cavalier, if not totally irresponsible, to those concerned with the archival role of libraries. Our position is that archival storage in most subject areas is not part of the mission of the Drexel Library.[2] On a national — even international — basis archiving of old, little-used journals would be much more cost effective if done centrally or in a few places for redundancy. This is true of both electronic and print formats. We are willing to make the leap of faith that this will happen, and are ready to pay the cost of access to the archived materials when they are needed. There are numerous well-qualified national and international organizations addressing this issue, including the Research Library Group and OCLC.
18.4 Impact on Library Staffing and Other Costs
Here I discuss changes in each area of the library's operations with particular attention to the changes in staffing patterns and shifts in costs. Table 18.1 summarizes these operational effects. No functional area of the library has been left untouched. I describe the changes in each library department below.
Function | Activity | Electronic Format | Print Format | Net Impact |
Infrastructure Systems & Space | campus network | completely upgraded in last 2 years | — | ↑ increased capital costs |
computer hardware (servers and workstations) | 100% replacement/upgrade of library computers | — | ↑ increased equipment costs | |
computer systems maintenance | installing software, imaging (1.0 FTE) | — | ↑ increased staffing | |
hardware, software maintenance | service contracts | — | ↑ increased costs | |
setting up access | new activity, requires troubleshooting | — | ↑ increased staffing | |
software purchase & development | new activity to manage more complex process | — | ↑ increased staffing | |
printing | increased activity | — | ↑ increased costs and revenue | |
space utilization | content stored remotely | fewer items added/extensive collection weeding | ↓ reduced space needs | |
Administration/Management | negotiating contracts | new activity | — | ↑ increased staffing |
managing the change | closer oversight required | — | ↑ increased staffing | |
attention to decisions | increased number of variables | — | ↑ increased staffing | |
budgeting | greatly increased tracking and planning time | — | ↑ increased staffing | |
subscription fees | titles added | titles reduced | ↑ increased costs | |
Technical Services | print journal check-in | — | fewer items to check in | ↓ reduced staffing |
acquisitions | requires higher skill level | fewer items to purchase | ↑ increased staffing | |
claiming | URL maintenance | fewer items to claim | ? net impact unclear | |
bindary staffing effort and fees | — | fewer items; costs down | ↓ reduced staffing & costs | |
cataloging new items | significant increase in number of items | significant decrease in number of items | ↑ increased staffing | |
OCLC transactions | increased OCLC charges | decreased OCLC charges | ↑ increased costs overall | |
catalog/e-journal list maintenance | significant level of new effort | expected decrease over time | ↑ increased staffing | |
Circulation/Access | reshelving | — | fewer items to shelve | ↓ reduced staffing |
collecting use data | complex, requires higher skill level to organize | fewer items to count, takes less effort | ↑ increased staffing | |
stack maintenance | — | fewer items out of place | ↓ reduced staffing | |
user photocopying | — | fewer copies made; down 20% | ↓ reduced use & revenue | |
Reserve | article file maintenance | — | fewer articles on reserve | ↓ reduced staffing |
article checkout | — | fewer items checked out | ↓ reduced staffing | |
maintaining e-reserves | requires equipment, higher skill level | — | ↑ increased staffing | |
Document Delivery | faculty copy service | copies from e-journals | copies from print journals | ? net impact unclear |
interlibrary loan-borrowing | — | slight decline in activity | ↓ reduced vendor charges | |
interlibrary loan-borrowing | — | slight decline in fees | - all services expected to decline | |
Information Services | references at desk | fewer but some longer transactions | fewer transactions; down 15% | ? net impact unclear |
instruction/promotion | increased need | — | - expect increase | |
preparing documentation | increased number of items | greater level of review | ↑ increased staffing | |
journal selection | more detailed evaluation process | — | ↑ increased staffing |
Infrastructure
Systems. While space is the most important requirement for the print format, networks, computer hardware/software and systems staff are required to provide access to electronic resources. These resources are rapidly becoming key components of a well-functioning operation in all academic institutions, as they are essential for so many other reasons. None of the library systems are used for electronic journals exclusively since we provide access to many other applications. The webmaster easily spent 30 percent of his time on electronic journal access during the start-up period. He maintains the entire library web site, which initially included over 200 static HTML pages listing e-journals by title and by subject. When it became clear that maintaining this many continually changing static HTML pages was a major burden, the webmaster developed an e-journal maintenance database using MySQL and PERL scripts to manage the lists and deliver them to the web dynamically.
Space Utilization. The chief impact of print journals on infrastructure is in the physical space for growth of the collection over time. The transition to electronic journals essentially eliminates space concerns—no more trimming the collection, converting it to microfilm, or moving it to a remote location to make space for new volumes. Eventually, because of retrospective conversion efforts like JSTOR, we will be able to reclaim journal storage space for other purposes. The cost savings, both on a capital and annual basis, are considerable. Estimating $100 per square foot (Fox, 1999), the minimum cost for library buildings in large urban centers, the 20,000-square-foot space currently occupied by the Drexel journal stacks would cost $2 million to construct. Estimating annual maintenance costs at $12 per square foot, the cost of maintaining the space occupied by the library's journal collection is approximately $240,000 per year.
Subscription Costs
Budget allocations reflect the decision to shift from print to electronic subscriptions. Purchase decisions are based on two processes. First, we undertake a major initiative to analyze our current print and electronic holdings prior to renewal of subscriptions for the coming year. Secondly, throughout the year we invest significant staff resources to keep current with all e-journal offerings from vendors, publishers and consortia within the scope of our collection and initiate negotiations for pricing and packages tailored to our needs. In particular, we seek out electronic equivalents of current print holdings and replace the print with the electronic version of the title unless the title meets the exception criteria. Eventually, we expect to have a browsing collection of fewer than 100 titles.
As a result of these efforts Drexel's total journal subscription costs will be approximately $636,000 for 2001. See Table 18.2 for the breakdown. Aggregator subscription costs are difficult to calculate since these resources are part database, part electronic journals. With a "best guess" allocation of the cost of these services, we are spending or expect to spend a total of $600,000 for electronic journals.
Category | # of Titles | Amount |
Print only subscriptions | 300 | $36,000 |
Electronic subscriptions* | 2800 | $550,000 |
Aggregator/databases with full-text content** | 3500 | $45,000 |
Total E-Journals (Unique titles) | 6300 | $595,000 |
On a raw per-title basis the e-journal subscription dollar has superior purchasing power when the aggregators' titles are included. Our print-only journal subscriptions now cost an average of $120 per title while e-journals are $95 per title. This difference is far greater when one considers that nearly all the electronic journals come, even when a subscription is first entered, with several years of backfiles. In addition, the electronic subscriptions include many titles that cost several thousand dollars in print. The 300 print journals consist mainly of humanities and social science publications, along with some popular titles, all of which are low cost historically. The increased value of electronic journals is even more evident when coupled with use statistics, since our figures show that electronic journals are used more heavily than their print counterparts (see Montgomery and Sparks, 2000).
Administration/Management
Academic library directors have always paid considerable attention to journal subscriptions. Journal costs usually take most of the materials budget in science and technology libraries. Faculty often have strong feelings about particular titles which they do not hesitate to make known. Traditionally, the decision to subscribe to a new journal has required careful consideration because of the implication that the subscription is a long term commitment. For the last two decades, as prices escalated so dramatically, directors became increasingly involved in both advocating for additional funding to pay for journals and overseeing the time-consuming annual journal evaluation processes and cost-cutting measures. Electronic journals raise new issues that require the director's involvement to an even greater extent. Activities that are new or escalated for a director who makes a major commitment to electronic journals include:
communicating and obtaining institutional funding and support,
joining consortia and other "buying clubs,"
negotiating and reviewing contracts,
determining and revising strategies for e-resource acquisition,
building a library staff with the appropriate skills, and
managing the change in budget allocation.
Drexel created a new position, Electronic Resources Librarian (ERL), to provide a focal point for integrated development of all electronic resources. This position crosses the traditional departmental functions of management, systems, technical services and reference. The person in this position shares the responsibility of keeping up-to-date on the availability of new electronic resources with the Information Services (IS) librarians who do collection development. The ERL initiates contacts with vendors to negotiate favorable pricing and packaging and arranges trials for each new service considered for purchase. She also reviews licenses and contracts and negotiates appropriate amendments and corrections to these documents. For example, one of our goals is to provide remote access to content we make available to our users; some contracts do not allow this. The ERL also interacts with consortia for purchase of electronic resources and evaluates the cost/benefits of going with a particular group offer. Once the purchase decision is made, IP information is communicated to vendors and content changes are made on our web site. The ERL collaborates closely with the webmaster in designing and populating our e-journal database. Finally, gathering and organizing use statistics for electronic resources is a major aspect of her responsibilities. A recent ARL SPEC Kit (Blieler and Plum, 1999) describes these activities and the various ways large academic research libraries have structured themselves to deal with them.
It is always more difficult and time-consuming to manage change than to maintain the status quo. The amount of time spent managing and overseeing this transition is substantial, when one includes major ongoing efforts to restructure workflow and reorganize staff to respond to the migration to electronic journals.
Technical Services
In the Technical Services department, the transition to e-journals has had a direct impact on the day-to-day work of each staff member. Changes in workflow and procedures are dramatic, with very large shifts in costs. It is clear that the significant reduction in print titles has directly decreased workload related to the print format. Less time is needed to check in print issues, claim non-arrivals, replace missing pages, and prepare and receive bindery shipments. Also, direct costs for cataloging new print titles and maintaining existing MARC records (OCLC charges) have been reduced. Bindery fees are also reduced accordingly.
Offsetting the decrease in activity levels and costs related to the print format is a large increased workload for both the serials acquisitions and cataloging functions related to providing access to electronic journals. Updating the e-journal maintenance database that now creates our e-journal lists is a major new task. The e-journal collection is much more volatile than a print collection: links break, coverage changes, and sometimes the electronic journals themselves are available through a new distributor. An advantage of electronic distribution that creates extra work is that we are not bound to calendar-year-only subscriptions, so journals are added continuously and sometimes discontinued during the year.
Another activity that has greatly affected Technical Services is an expanded review process for journal renewals that includes the IS librarians (each represents the various colleges in the University). During the past two years we have evaluated every journal title — print and electronic — before it is renewed. The coordination and tracking of the renewal decisions has increased significantly.
Not only has the format of materials shifted, but the volume of materials has increased more than three-fold. We are now managing over 6,000 journal titles as opposed to 1,700 titles two years ago. Unfortunately, we are not always able to switch existing staff to e-journal tasks. In the process of "re-engineering" the entire department, we upgraded two positions, added one temporary position, and replaced one position. We now require detail-oriented support staff who have advanced computer skills and who can adjust to continuous changes in procedures and methods as our environment evolves.
Circulation/Access/Stack Maintenance
Staffing. Obviously, shelving decreases when journals are no longer physically stored in the library. Bound journal re-shelving has been reduced by 40 percent and re-shelving of current journal issues is down over 20 percent over the past two years. At Drexel, the collection of print journal re-shelving statistics is only partly automated. Shelvers track use by title as they shelve bound volumes and current issues. Fewer journals to shelve also translates to less time collecting print re-shelving statistics.
Electronic Use Statistics. In theory, it is easier to collect use statistics and richer, more accurate demographic and search information for electronic journal usage because data collection can be automated and expanded. In reality, at this time it is very difficult and labor intensive to obtain useful and comparable title-by-title use data for electronic journals and compile them in a way that is helpful for making management decisions. Activity measures and, in particular, comparable activity measures across journal vendor services are frustratingly difficult to come by. Mercer (2000) describes the problems encountered in trying to collect and analyze the vendor information to use it for service evaluation and decision-making. Among the statistics reported are session length, number of searches, journal title hits, page hits, types of pages hit, top XX titles accessed each month, "turnaways," form and type of articles downloaded, and number of unique IP addresses using a service or journal title.
Since the data for print volumes are not strictly comparable, they must be interpreted carefully. Our print statistics represent volumes or issues re-shelved rather than actual articles copied or read, while the e-journal statistics below represent articles accessed which may or may not have been read. The print use data is somewhat under-reported because, even when asked not to, users re-shelve journals after they look at them. Even so, from preliminary data we can say confidently that our users are accessing the electronic journals in numbers far exceeding our print collection.
Photocopying. Since our statistics have decreased so dramatically for print journal usage, it is only logical that photocopier use would also decrease since this is one of the primary uses of our library photocopiers. Photocopy use has decreased about 20 percent since electronic journals were introduced.
Reserve
Circulation of reserve materials, which had been steady at about 30,000 items per year, dropped by 50 percent during the 1999/2000 academic year. What portion of the e-resources used are electronic journals and what are other e-resources is an open question. We do expect this trend to continue for the print reserve format, particularly when our electronic reserve module is fully functional later this year. It appears that not only are students using fewer reserve materials, but our faculty also are placing fewer items on reserve. With respect to staffing impact, we have reorganized some of the work assignments in this department due to the reduced workload, and upgraded the reserve room supervisor position in anticipation of the electronic reserves activity.
Document Delivery/Interlibrary Loan (DD/ILL)
Our expectation with the implementation of electronic journals was that we would see a significant decrease in user requests for journal articles via our DD/ILL services. In fact, "borrowing" photocopies of journal articles from other libraries increased by 16 percent in FY1999-2000. The library's document delivery service, which provides copies of articles from the Drexel Library collections free of charge to faculty and distance learners, delivered 1,122 articles from the electronic journal collection in this same time period. The majority of these articles are for faculty who presumably are not aware of the ready accessibility of e-journals, or who either cannot or choose not to retrieve the articles themselves. At the moment, it is not possible to measure the net impact of the electronic journals on the DD/ILL department volume because too many other factors are influencing use of the service. Research activity has grown dramatically at Drexel in the past two years, and the provision of over 100 web-based databases likely has increased demand for articles. Our prediction is that ultimately we will see a decrease in net requests for this service as our users become increasingly self-sufficient and as electronic content continues to expand.
Reference/Information Services (IS)
Reference services are nearly always affected by any significant change in library resources. At Drexel the Information Services/Reference staff are responsible for materials selection in addition to the usual functions of answering questions, teaching classes, and performing public relations functions such as promoting the availability of services. So they are involved in several stages of the "life cycle" of electronic journals at Drexel. They share responsibility for identifying e-journal candidates for purchase, evaluating potential purchases, helping students and faculty use the e-journals effectively, incorporating information about them in their classes, and helping publicize them to their constituencies.
Reference Service. Some interesting trends are occurring at the reference desk. Questions decreased in 1999/2000 by about eight percent although more of the transactions that do occur turn into "teaching opportunities" for those users who are less self-sufficient. Staff observe that, in particular, students using the web-enabled computers in the "hub" near the reference desk, are increasingly self-sufficient.
Instructional Program. Offsetting the decrease in reference questions is the amount of time IS staff are spending on instruction and outreach activities to make faculty and students aware of the library's resources and services. Workshops and teaching sessions have increased. Vendor presentations are more frequent. IS librarians are engaging in greatly expanded public relations by personal visits and presentations, email updates to departments, exhibits and other activities. Another effort that has also expanded is the preparation of both online and printed documentation to help users understand how to use electronic journals.
Staffing. The electronic journal option and new processes have most certainly increased the workload for selecting journals. We do expect that over time this increase will level off as the collections and offerings stabilize in the electronic environment. No new staff positions have been added in the IS department but there has been significant turnover and, again, we carefully screen new hires for expanded computer skills and experience using, selecting and promoting electronic resources. A lot of the increased journal evaluation work comes in the summer, a time when many of the other activities of the department are reduced. So far the staff have been able to handle the additional work at current staff levels.
18.5 Discussion
Drexel is probably farther along in the transition to an all-electronic journal collection than most, if not all, academic libraries in the United States. A late 1997/1998 survey of ARL and non-ARL academic libraries found that just 29 and 33.5 percent, respectively, had cancelled print journals in favor of electronic access in the previous 12 months (Shemberg and Grossman, 1999). Fifty-one percent of the ARL libraries and 40 percent of the non-ARL libraries had not cancelled print subscriptions in favor of electronic and declared that they will not in the future. Their reluctance was attributed to the enormous change required in academia to relinquish print.
This has not been a problem for Drexel. Faculty and students have embraced the transition almost universally. Organizational readiness, important in any successful organizational change, has been critical to the ability of the Drexel Library to move so rapidly to a new model. The most important factors have been: (1) a highly computer-literate faculty and student body; (2) programmatic emphases in science, technology and business, areas where publishers have been quickest to provide e-journals; (3) the existence of a high-speed ubiquitous network; (4) general dissatisfaction with the print journal collection; (5) a supportive administration that provided a significant increase in funding; (6) a strong and growing distance education program and (7) a large number of academic institutions in the Philadelphia area, including the nearby University of Pennsylvania, with substantial libraries that are available to Drexel faculty and students
This description of the Drexel experience should be useful to others because our transition is indicative of what most academic libraries will eventually experience. There are accredited academic institutions that are functioning with completely digital libraries, i.e., they never had a print library. Examples are Jones International University (2000) and the University of Phoenix (2000). Other libraries have created large electronic journal collections—e.g., the University of California system (California Digital Library, 2000) and most, if not all, large research libraries—but they are maintaining large print collections concurrently. The approach Drexel is implementing—substituting electronic for print—will be the typical scenario in most academic libraries because it will be necessary to make electronic collections affordable.
Preliminary cost comparisons for processing print versus electronic journals indicate that the electronic collection is substantially more expensive to maintain. We estimated staff costs by allocating percentages of individual staff members' time to the various tasks and projects described in this paper using the functional cost analysis approach of Abels, Kantor, and Saracevic (1996). The amount of time spent per task was determined by interviewing staff and supervisors to analyze the impact for each area and by reviewing library statistics and other records. Then, we computed the annual cost in salaries using indiviual rates of pay. The result indicated that the substantial costs in maintaining an electronic journal collection more than offset the savings from eliminating the clerical chores associated with maintaining a print journal collection. While fewer staff are needed the new staff are more skilled, and therefore more highly compensated. Likely, as the electronic journal publishing industry and related service industries mature, the change process will become easier, and thereby less costly, for libraries.
Drexel's per-title subscription costs are lower for electronic journals. While this is a function of our selection process and the particular "deals" we have been able to obtain, we suspect that the majority of academic libraries will have the same experience, particularly if they purchase a large number of titles through aggregator collections. Since use is much higher for e-journals the cost benefit is even greater. We plan further analysis to refine our calculations of operational costs, as well as subscriptions costs that include factors such as backfiles and use data, in order to come up with good estimates of "real" per-title costs that include all factors of operational costs, and subscriptions costs that include all factors.[3]
There are many areas where improvements made by publishers and vendors could decrease the library workload. Of particular value would be
better information about the existence of electronic journals and their characteristics,
standards for presentation of use data by vendors,
easier methods of providing access to electronic journals, either through cataloging or in list form, and
an assured solution to archiving.
As the entire electronic publishing system matures, we anticipate that these improvements will come.
Notes
† Reprinted with permission from D-Lib Magazine, Vol. 6, No. 10, October 2000. (http://www.dlib.org/dlib/october00/montgomery/10montgomery.html) To conform with publication standards this publication has some format differences from the D-Lib article. Also, some notes have been added.
1. Some publishers continue to insist that print journals must be purchased to gain access to the electronic equivalent, i.e., bundling. We have developed strategies to discontinue receiving and storing the print copies.
3. The Drexel Library recently received a $128,000 grant from the Institute of Museum and Library Services to study these costs and develop models for measurement.
19. The Impact of Digital Collections on Library Use: The Manager's Perspective
This paper focuses on the impact of digital collections on library use based on three years of experience in a metropolitan research university. Through statistics and observations it will be demonstrated that an academic library can become user-centered in the electronic environment. It will also be demonstrated that new educational initiatives from the state government and the university administration can help the library gain a more central place within the academic enterprise. Various education initiatives in Kentucky from 1997 to the present will be used as a case study for this paper to demonstrate that the electronic information environment can lend itself to improve learning and teaching outcomes for a educationally underdeveloped population. Information will be presented on cooperative and consortia-based initiatives to contain costs and to expand access to electronic databases.
19.1 Introduction
The availability of electronic information continues to increase. Based on a study at University of California, Berkeley, the world's total annual production of information amounts to about 250 megabytes for each man, woman and child on earth.[1] The challenge will be to learn how to navigate in this sea of endless information. Information is also becoming more accessible each year as more people are acquiring personal computers. Based on 1997 statistics there were 407 computers for each 1000 people in the United States, the highest computer-per-persons ratio in the world. 42.1 percent, almost half of the households in the United States, have a computer.[2] In 1998 higher education spending on computer hardware including personal computers reached $1.4 billion, and this is expected to increase seven percent annually.[3] The growth of the Internet and related Web information during the past several years has likewise been phenomenal and continues to increase at a rapid pace. For example, the sale of electronic books (e-books) was $40 million in the year 2000. The marked increase in Internet and Web use has caused analysts to project that e-book sales will surge to $2.3 billion in 2005. This growth projection is based on convenience in updating data and information, especially as related to college textbook and reference books.[4] As people access and use the Internet and the Web their expectations for finding information quickly and conveniently are growing rapidly, especially in the higher education environment, and academic libraries increasingly experience the effects of these growing information expectations.
In the current information and technology environment academic library users, students, professors, and researchers have a variety of expectations. They continue to need and want print, monographs, serials, documents, manuscripts, maps, photographs, archives, and related items for teaching, learning, and research. They also want multi-media formats for learning and curriculum support such as films, slides, videos, digital videodisks (DVD), compact disks, tapes, and microforms. Ultimately, academic library users want and need information electronically. It must be available anytime, anywhere, for multipurpose uses, quickly, conveniently, and in a portable and easy-to-use form. Library users not only want to locate the information quickly, they also want to be able to take it with them either by printing it, copying it, sending it by e-mail to their personal computer, or by downloading it unto a disk.
Librarians must ensure that they are capable of satisfying their users' diverse information needs. In cooperation with the faculty, and based on the curriculum and research needs, academic librarians must continue to build appropriate print and multimedia collections. They must offer convenient access, preferably Web-based, to a large array of electronic information resources including books, documents, journals, and other digitized information, all in full text. They must also provide adequate computer workstations, strong and supportive networks, and printing and downloading capabilities. Finally, they must ensure that their users have or learn the skills to find, access, evaluate, organize, and use electronic information appropriately and responsibly.
In the 1999-2000 academic year 1,787 academic libraries spent almost $56 million, or 4.7%, of $1.2 in billion acquisition expenditures on electronic resources.[5] To accommodate their users' increasing demands for efficient access to and use of electronic information, most academic libraries have had to update their technologies as well as their infrastructures at substantial cost and effort.
Users want up-to-date computing, and they need good training and instruction. Academic librarians have begun to rethink their operations and services in terms of the electronic information environment and their users' needs and demands for electronic information access. They have had to gain expertise in evaluating electronic information and appropriate access mechanisms, as well as gain the technical expertise to handle the networking and computer infrastructure. While in many academic institutions use of the physical library and of print resources has begun to decrease slightly in recent years, the use of electronic information sources has increased rapidly. Although national and international standards in reporting statistical data related to electronic information use have not yet been, developed substantial amounts of data on electronic information use is being accumulated.
Librarians are slowly beginning to understand information-seeking behavior in the electronic information environment and how users search for online information. They are starting to work with vendors and aggregators of electronic information to produce better designs for electronic product use and adequate methods to collect appropriate use statistics for electronic information formats.
19.2 Electronic Information Environment — Kentucky
Kentucky's population is undereducated for the challenges they will face in the 21st century information environment. The average income per person falls in the lower third for the United States. The majority of Kentucky citizens do not have a college degree and Kentucky's full-time enrollment in higher education is one of the lowest in the United States.
Under the leadership of Kentucky Governor Paul Patton, the Legislature and leaders in education have worked together to upgrade the state's total education system. In 1997 the Governor collaborated with the leaders in higher education to increase the percentage of Kentucky's population who have access to higher education. He allocated more than $167 million in additional resources to higher education during the 1998-2000 biennium to improve research initiatives, technology, development of the workforce, physical facilities and to increase financial aid for students.
Governor Patton supported the creation of a new governing structure for higher education by creating the Council of Post-Secondary Education and gave them responsibility for technical schools, community colleges, comprehensive and research universities and, most recently, continuing education and the newly created Kentucky Virtual University. The Kentucky Virtual University was created in 1998 to help address the problem of access to higher education for Kentucky's citizens living in remote rural areas. The goal of the virtual university is to provide Kentucky's citizens with access to higher education both undergraduate and graduate, no matter where they reside, through electronic learning and online educational support. Utilizing any type of library, school, community center, or other computer, with on-line access, any citizen can have access to information and to instruction.
The Kentucky Virtual Library (KYVL, at www.kyvl.org ) is a library consortium including all types of libraries. KYVL is funded jointly by the Council of Postsecondary Education and all participating libraries. One of the consortium's initial major initiatives was to ensure that all state universities and community colleges utilize the same client-server library system, Endeavor, to provide common electronic access to these collections. The Kentucky Virtual Library is accessible twenty-four hours a day, seven days a week, from any Internet-connected computer and provides access to a variety of commercial, state and local databases for all citizens of Kentucky. Online tutorials help citizens learn valuable information skills. Timely document delivery, online reference services, cooperative digitizing projects and a common interface assist all citizens to have equitable access to information, including access to such databases as OCLC's First Search and EBSCO.
19.3 University of Louisville — Libraries
The University of Louisville (U of L) is the second largest metropolitan research institution in Kentucky with more than 21,000 students. With close to 12 percent of its student body belonging to minorities, U of L has the highest percentage of minority students in the state. Most students, 85.3 percent, are from Kentucky, while the other 14.7 percent are from other states and other countries. The University is a Research I institution with 1,350 faculty, and featuring 163 degree programs including 30 doctoral programs. There are seven libraries including a medical library located on the health sciences campus, a music library, an art library, a science and engineering library, and a law library, as well as Ekstrom Library, the main library. In addition to serving the university community, the libraries are a net lender of library materials for the state of Kentucky. The libraries possess more than 1.8 million volumes, 16,000 current print serial subscriptions and diversified special collections and other media. Access to several hundred databases and more than 25,000 electronic full-text journals is provided.
At the U of L expenditures for electronic resources during the past five years have more than tripled from 6.7 percent to 15.3 percent, or from $301,000 to $1,259,000 of the acquisitions budget. That trend is continuing. It is noteworthy that U of L's expenditure for electronic information is more than three times that of Kentucky's average and three times that of the national average. To support the growing expenditures for electronic information resources at the U of L, $2.5 million was spent during the past three years on a new client-server system and to update both the technological infrastructure and library computers for staff and the public. The libraries went from a mainframe computer system to a state-of-the art client-server system, from no network to an Internet network featuring 100-megabit connections and a wireless computer environment, from no servers to seven servers, and from fifty "dumb" terminals to 550 state-of-the art computer workstations. Through major rethinking and reallocation the libraries' technology department grew from four to seven full-time staff and gained a support structure of a ten-member technology team. The libraries have several state-of-the-art interactive computer classrooms utilized for more than 900 class sessions with 11,000 students a year to teach curriculum-related information skills. A state-of-the-art computer laboratory is used by more than 75,000 persons during one year.
19.4 Library Use — Statistics and Observations
From 1997 to 2002 overall library use increased by forty percent. This use statistic included use of reference services, circulation, including in-house use, reserve and interlibrary loan.
Approximately 1.8 million users enter the libraries physically each year and use library services including electronic information. The number of persons coming to the libraries has steadily increased each of the last five years. Based on annual assessment data, collected by the Library Assessment Team using student and faculty surveys and focus groups, it has been found that people like the services provided for them because they are based on their information needs. The campus community also appreciates the state-of-the-art electronic information environment in the libraries and last but not least they enjoy such amenities as the computer lab and e-mail facilities.
In 1998-99 more than seven million electronic uses of the online catalog, web sites and electronic journals were registered; in 1999-2000 such electronic use went up to eleven million, a fifty-seven percent increase. In 1999-2000, only 38 percent of total use of electronic materials came from inside the U of L libraries. External accesses outnumbered internal at nearly a three-to-one ratio. Close partnerships with the faculty have resulted in teaching information skills to more than 8,000 students a year while beginning to integrate information literacy throughout the curriculum.
The U of L libraries feature eighteen distinct web sites including 1,158 pages and more than 54,000 links. These web sites are updated and increased on a regular basis. Last year alone seven million uses of the electronic catalog and web sites were recorded, an increase of 350 percent over the previous year. It must be noted that the libraries are only at the very beginning of collecting use statistics related to electronic information and the Web. Much more has to be learned to ensure that these statistics are truly meaningful in measuring use.
The U of L libraries are beginning to allocate significant resources for, and to rethink services related to electronic information. In 2002-20031 access to library users to 270 electronic databases has been made available, as compared to forty-two databases in 1996-97, an increase of over 600 percent. Among these new resources are large databases and services with access to abstracts and full-text articles, such as ABI/Inform, First Search, EBSCO, Biological Abstracts, Beilstein, INSPEC, Medline, Science Direct, Lexis-Nexis, and Web of Science. Databases in almost all subject areas and covering a variety of sources, such as reference books, theses, encyclopedias and biographies, are available.
The U of L libraries, similar to other academic and research libraries, have been forming partnerships and cooperative agreements with one another to ensure preservation and cost containment for electronic and scholarly publications. The U of L libraries subscribe to several of these, such as SPARC, the Scholarly Publishing and Academic Resources Coalition; MUSE, a consortium of more than twenty-six journals from University Presses and Scholarly Societies; JSTOR, a consortium of over 1,000 academic libraries and preserver of online back files; and IDEAL (International Digital Electronic Access Library), a publisher consortium for online journal subscriptions.
Access to and utilization to many electronic journals is achieved through a variety of interfaces and search engines such as OVID, a search engine for psychology and health sciences databases. Users prefer this search engine since it allows them options to use and understand complex databases. The OVID search engine enjoys heavy use among health sciences and science students and faculty. Use of this search engine continues to increase dramatically, for example in 1999 43,000 uses were recorded, compared to 89,036 in 2000.
Another example of user preferences is Web of Science, a citation database for sciences, social sciences, arts and humanities that covers thousands of research journals across many disciplines and offers searchable author abstracts as well as citations to support research. In 1998 access to all components of the Web of Science and its substantial back files was offered for the first time at U of L libraries. Use statistics indicate increasing use of this database. In 1999 37,378 searches were recorded compared to 44,221 in 2000. [6]
19.5 Electronic Information Use in Distance Education
Access to the multitude of electronic information resources has had a major impact on library users in distance education. The University of Louisville has been offering a variety of distance education programs both within the United States and in other countries. At this time programs are offered in several countries, including Greece, Egypt, Panama, Czech Republic and Germany. Approximately 3,500 course enrollments per year are registered in distance education offered through the University and each of the participants receives timely and individualized information support.
During the past nine years the libraries have developed a special program in support of distance education programs. Included in the library support program are document delivery in all types of formats, reference services and instruction in information skills. In 1998 a library distance education office was created with two staff, a library faculty and a technological support person, and with state-of-the-art technology including a proxy server to keep track of students and faculty involved in these programs. The libraries have installed Ariel software in various locations outside of the United States to facilitate document delivery activities. They work with teaching faculty to create appropriate Web pages for the teaching of the courses and including appropriate library and information support. Based on five years of experience the librarians have also developed cost data for library support to distance education students and faculty.[7]
19.6 Assessment of Library Users
Assessment of library users takes several forms. During the past two years the libraries' assessment team has completed several surveys of students and faculty. Last year the University, including the libraries, contracted with an assessment firm, Dey Systems, Inc. to develop instruments for measuring students' educational outcomes. In addition, librarians hold meetings and focus groups with graduate students, undergraduate students and faculty to assess information needs and concerns of these groups. Suggestion forms are available throughout the libraries and on the Web sites and help the library staff address library and information needs and concerns.
The libraries have utilized information obtained from the student and faculty surveys in 1997, 1998, 1999 and 2000 to improve library services and document delivery. Users indicated a need for more computers, more electronic information, more books, more journals, better photocopying, quicker interlibrary loans and additional hours. In response, the libraries have regularly added to and improved library holdings and access to electronic information.The libraries have extended library hours, updated and added computers, and instituted a state-of-the art photocopying service. The libraries also began loaning wireless laptops to people for use in the libraries.
The trends in user needs and utilization of the libraries during the past three years show an increase in the use of all library services, but especially in reference, the online catalog and electronic databases.
19.7 Future
Experience at the University of Louisville during the past several years indicates that users want access to electronic information wherever and whenever possible. They need state-of-the-art computing equipment and strong, supportive networks to make this possible. They also need much training and professional advice to be successful information users. These findings based on library surveys and interviews conducted at U of L are similar to information presented at library conferences and discussions with colleagues around the country. Computers and software products can make information use difficult. Librarians need to provide assistance and make information more usable. Librarians already provide value-added services such as instructional tools, teaching sessions and reference assistance to create a layer of intervention between the user and the products. Librarians are concerned with user needs and provide a user-centered environment. As librarians build good web sites it will help them provide more user-centered information by providing attractive, easy-to-use sites, intuitive navigation, currency and appropriate text links. Librarians facilitate information retrieval, helping users avoid aimless information surfing.
Librarians have been utilizing the Web and electronic information while working with vendors for several years now to provide their users with the best possible access to online information. They need to work with vendors and providers of electronic information to ensure consistency, and user control. They must also work with electronic information providers to utilize feedback from users. Vendors of electronic information and databases should work with librarians to create better common interfaces to electronic databases, and consistent statistical reporting. Such statistical reports should include number of logons, number of actual searches of a particular database, number of actual users of full text articles, type of subject searches completed, and how many users were unsuccessful. Such data will enable librarians to assess actual use of particular databases and specific journal articles so they can make electronic material selection decisions based on actual user needs.
Librarians need to assess the use of digital collections in terms of comparisons to print use; previously underserved populations; change in usage patterns; value of the collections for campus information support; change in and preservation of scholarly communication; and finally, the effect on overall expenditures.
Furthermore librarians need to regularly assess the impact of electronic information on library operations and services. Already operations have been and are in the process of changing, especially in terms of cataloguing, processing and collection building. More outsourcing of processing to obtain shelf-ready monographs is becoming the norm. Use of approval plans is increasing. Networking in cataloguing facilitates faster and less expensive cataloguing.
Services are similarly changing in terms of electronic information provision, reference, instruction, reserves and document delivery. Academic libraries have been implementing electronic reserves to facilitate access and faster document delivery. They have been implanting software packages such as Ariel and Illiad to improve interlibrary loan processes. Academic librarians have energetically made electronic information available through their libraries and they are beginning to rethink reference in an electronic environment.
User studies are beginning to indicate that the following factors statistically influence the use of electronic information: form of access; available technology; available guidance and instruction; full text availability. Librarians need to work closely with teaching faculty to assess the impact of digital information in terms of learning outcomes for students and with researchers to assess the effect of electronic information on research results.
Notes
1. Lyman, Peter and Hal R. Varian, How Much Information? 2000. Retrieved from http://www.sims.berkeley.edu/research/projects/how-much-info/summary.html on 24 July 2003.
2. Statistical abstracts of the United States. Washington, D.C.: U.S. Dept. of Commerce, Bureau of the Census, 1999.
3. U.S. industry & trade outlook. New York : DRI/McGraw-Hill : Standard & Poor's. Washington, D.C. : U.S. Dept. of Commerce/International Trade Administration, 2000.
4. Standard & Poor's Corporation. Standard & Poor's industry surveys. New York: Standard & Poor's Corp., 2000, p. 6.
5. Although there are 4,723 academic libraries in the United States, only 1,787 report information on their acquisition expenditures nationally and the above statistics originated from these 1,787 libraries. The Bowker annual library and book trade almanac. New York : Margaret M. Spier, R.R. Bowker, 2000, pp. 420-421.
6. More cooperation is needed from vendors to ensure that such statistics are recorded and that they measure different types of usage.
7. Edge, S. M. (2000). Faculty-librarian collaboration in online course development. In S. Reisman (Ed.), Electronic Learning Communities - Issues and Practices (pp. 135-185). Greenwich, CT: Information Age Publishing, Inc.
20. Economics and Usage of a Corporate Digital Library
This paper analyses the usage of the journals and books available at the BT Digital Library, looking in detail at the usage by the 3,500 people at BT's development site at Adastral Park, near Ipswich, and the impact of this usage on purchase decisions in the library.
20.1 Background
British Telecommunications (BT) is a leading provider of telecommunications services. Its main products and services are providing local, long distance, and international calls; providing telephone lines, equipment, and private circuits for homes and businesses, providing and managing private networks, and supplying mobile communications services.
BT's library is organisationally located in a unit called Advanced Communications Engineering (ACE). ACE's 3,500 people mainly work in software and telecoms engineering. ACE develops advanced communication technologies for the companies across the BT Group and for selected other businesses.
ACE is headquartered at Adastral Park, Martlesham, in the East of England. It's the centre of technical expertise for the BT Group and works with the company's businesses worldwide to help them to deliver new products and services to their customers and to build infrastructure for their future. Its reputation was established with BT's pioneering work in the field of optical communications.
While the majority of ACE's people are based at Adastral Park, significant numbers are located in offices worldwide, including locations in Asia, continental Europe, and North America. They include many who are in the forefront of their specialist fields, leading the development of standards and new technologies in areas including multimedia, IP and data networks, mobile communications, network design and management, and business applications.
Like other companies in high-tech sectors, BT experiences major pressures on costs and products, requiring a challenging combination of cost-cutting and innovation to maintain competitiveness. These pressures have led to a change of direction in BT's research focus, moving from research in the optics area towards software and internet engineering.
20.2 The BT Library
Until the mid-1990s, BT's library had a large collection to meet the needs of these researchers, with more than 800 titles, 23 staff, and accommodation totalling 450 square meters. The library provided a full range of services including document delivery from its own collection, inter-library lending, and carrying out online searches for its users.
The pressures on BT have fed through to the library. Costs were increasing at a time when pressure to reduce overheads was insurmountable. The library's user community was changing as BT shifted its research efforts away from pure science into more directed research. As they moved into these new areas, users became much less inclined to visit the physical library, preferring the information they needed to be delivered to them.
In 1994, the library's management team realised it could no longer pursue the path of looking for incremental budget cuts and savings. Radical change was needed. Benefiting from an extensive study of the usage of library material and building on increasing confidence in the in-house enhancements to library automation systems, the library chose to fundamentally rethink the way it provided services to its users.
The library collection was cut to a core 250 journals that were heavily used by people visiting the library in person. Accommodation and staff have been trimmed by 67%, with library staff being redeployed elsewhere in the company. Attention was focussed on establishing a digital library that provided users with access to content over the network through online journals backed up by commercial document delivery.
The provision of loans and photocopies was outsourcedwith the development of the BLADES system. BLADES accepts and validates user requests, tracks the progress of requests and provides status reports for users and input for billing systems for those users who pay for requests (Broadmeadow, 1997). User requests are transmitted semi-automtically to the British Library for fulfilment, with the BL delivering photocopies and loans directly to the user. The substantial savings in journal subscriptions more than offset the cost of commercial document delivery. Part of the savings were due to a communications campaign that highlighted the real cost to BT of requesting a photocopy, thus reducing demand.
The BT Library provides over 800 online journals to its users, either loading them onto its own server or linking through to the publishers' or aggregators' server.The Inspec and ABI/Inform databases act as gateways to these journals, using software developed by BT's knowledge management research team for searching, current awareness and collaboration features. The databases are used to provide end-user searching and browsing, with the provision for users to save searches for selective dissemination of information (SDI) as the databases are updated each week. A table of contents service is also provided.
In the physical library, the librarian could readily see users when they were floundering in their search for information and discretely offer assistance. In the digital library, users are relatively invisible to the librarian. The Digital Library is developing methods for understanding its users, behavior more effectively. Studies of user behavior are intended to highlight the server's problem areas, which then can be redesigned to make them easier to use, and to develop ways of automatically profiling users' interests and work areas. The unspoken purpose of this analysis is also to develop a compelling case showing how effectively the library supports BT's business processes.
20.3 Methodology
Although no detailed study of the usage of the Digital Library has been undertaken, it is important to understand how users access the collection, what problems they find there, and what their usage patterns are. Data about where users are in the organisation are important because of the need to allocate costs and charges.
The Digital Library studies the library server's log files and uses a number of channels to encourage user feedback. The difficulties of log file analysis are well documented (Wright, 1999). The log files are huge and processing them can be time-consuming. Unless user authentication is required, logs record only machine addresses and not personal identifiers. Each server transaction is logged, so a user retrieving a page with five graphics is recorded in six lines in the log file. Users share machines at cybercafes, operate behind proxies, or they use dynamic IP addresses, so that the IP address cannot readily be tied to a single user. Although Wright is describing the problems of log file analysis for servers on the internet, the Digital Library faces the same challenges. BT's intranet is large and proxies and firewalls are installed between different portions of the network. Users share machines in public spaces or borrow colleagues'PCs. Dynamic allocation of IP addresses is used within the intranet to increase flexibility.
A number of packages are available to assist in log file analysis (Busch, 1997). These packages report statistics such as the total hits on the server, the number of Not Modifieds (304's), Redirects (302's), Not Founds (404's), Server Errors (500's), the number of unique URL's served, the number of unique client hosts accessing the server, the total kilobytes transferred, the top one second, one minute, and one hour periods, the most commonly accessed URL's, and the top 5 client hosts accessing server. These deliver a higher level of management information than the librarian needs and are not used in the BT Library
Wright describes techniques for grouping unidentified readers into "constituencies", based on their usage patterns (Wright, 1999). These constituencies, such as robot checkers, users checking the What's New page, new users, or demonstrators, are identified by analysing the server's log files and can then be used to observe navigation of the site and spot usability problems.
The BT Digital Library adapted Wright's ideas in its own analysis of its log files. The purpose of the analysis is
to understand who is using the Digital Library,
to track individual usage of the library, to enable personalisation and collaborative filtering,
to understand which library resources are being used and the extent of that usage to inform renewal decisions,
to track which material was being requested through the document delivery system to suggest additions to the collection,
to ensure that usage of material is within the licenses agreed upon with publishers.
Perl scripts are used to extract meaningful usage data from the server's log, concentrating on the html pages and pdf files read and the server's cgi-scripts run, and ignoring less meaningful traces of usage. Accesses from robot checkers and from the library's own staff were excluded in the analysis. Weekly reports are prepared detailing
the number of distinct IP addresses accessing the server (as a proxy for the number of individual users),
the number of users who logged into the server,
the number of searches done in each of the library databases,
the number of individuals searching each of the library databases,
the total number of online journal articles read from the library server,
the number of readers of the library's collection of online books,
the number of articles read from each of the online journals purchased through individual subscriptions,
the number of users accessing their SDI pages,
the number of users who subscribe to journals' tables of contents and the number of journals which have subscriptions to their tables of contents,
the number of users annotating database records and the number of annotations made.
20.4 Qualitative feedback
The Digital Library strenuously encourages user feedback, although it has not yet carried out formal user surveys. Less formal methods of obtaining feedback are used, such as user meetings and publicity events, e-mailing users when they have had their password reset, and user feedback links on the server.
20.5 Usage of the Digital Library
Journals
The BT Library offers 800 online journals to its user community, within the limits of the publishers' agreements. The collection of online journals is supplemented by a disintermediated document delivery service providing what might be called near-online journals. Articles not available online on the Library's server can be requested in a straightforward way and are usually delivered in two to three days.
A noticeable impact of the move from the physical to the digital library is in the distribution of the user base. In 1994 the library served the research community almost solely. In spite of current awareness bulletins, which were distributed throughout the company, and a document delivery service to supplement this, approximately 90% of the library's usage was from the Adastral Park site. By the end of 1998, when ACM, IEE, and IEEE journals became available on the server in addition to the in-house journals and a selection of titles from Elsevier, this figure had gone down to 61%. In 1999 the Library's collection was enhanced with the addition of material from ABI/Inform. Since then, the balance has shifted so that only 40% of users come from the BT's Adastral Park site.
As part of this study, the usage of the 3,500 potential users at Adastral Park was examined. These users are readily traceable, because they use relatively static IP addressing, allowing a more detailed study of individual usage.
In 1999, 1,091 users from BT ACE read 9,108 journal articles from the digital library 12,919 times. (These figures exclude journals from the ACM Digital Library and from the selection of other journals available only on the publishers' sites, where usage data is not available.) In comparison, the library had 1,500 users registered for access to the physical library and lent fewer than 8,000 documents in the same period.
IEL
The IEEE's IEL product offers IEE and IEEE journals and conference proceedings in Adobe Acrobat format. These are received monthly and loaded onto the library server, so that data on usage are available for study. In the BT implementation, user registration is not necessary to access these publications, so user analysis is limited to studying access by IP address. Data on usage are available from November 1998, which is too short a period to do more than speculate on seasonal variations beyond noticing the obvious drop in readership at the Christmas/New Year period.
A mean of 39 users read IEL papers each week, with a minimum of 7 and a maximum of 56. These users read 11 papers each, with a minimum of 4 and a maximum of 38.
The IEL collection offers a typical usage pattern, with 80% of journal usage concentrated on 21% of the titles in the package.
ABI/Inform
ABI/Inform offers a wide range of management and trade journals. Some of these, such as Harvard Business Review or trade journals in the telecoms area are of key interest to the library's user base. ABI/Inform has been fully available through the Digital Library for two months, which restricts the possibility for usage analysis.
During this limited period, a mean of 97 users a week have read ABI/Inform papers, with a minimum of 25 and a maximum of 214. These people have read 3 papers each, with a range from 2 to 6 papers read per user during this limited period.
Elsevier
The Digital Library holds a collection of more than 20 Elsevier titles online. Unlike ABI, IEL, and ABI/Inform which offer a package of publications, the Elsevier collection is based on the set of journals the library took in paper form. These were selected as being core journals, based on library usage and on the library's understanding of BT's research interests. Because access to these journals can be more easily recorded than that to those in paper format, usage patterns can be tracked more easily, advising the librarian which titles are no longer appropriate to hold in the library's collection. In addition, the library's document delivery system records the publishers of documents requested, allowing the librarian to monitor new titles for inclusion. Three titles are heavily used, but even these are not accessed at all in 30% of the weeks. BT's recent decision to stop research in the speech processing area is reflected by a sharp drop in accesses to journals in these areas. In between these two extremes are the majority of journals that are used sporadically. In spite of the additional data on how frequently these journals are used, collection management is still difficult because the masses of data now available make it challenging to extract meaningful information.
Books
The Library has made a limited start to providing online books to its user base. Twenty-four computing books from O'Reilly on Perl, Java, Unix, and networking were made available in 1999. 87 users a week access one of the O'Reilly books online, with the number of users ranging from 8 to 179 in any one week. These books, serving as reference books for problem-solving as well as textbooks, are ideal for online publishing. They have certainly produced the most positive unsolicited feedback.
20.6 Conclusion
The corporate librarian is pressed to demonstrate his or her value to the parent organisation, leading to efforts to reduce costs and improve library usage. In BT's case, these pressures resulted in outsourcing labour-intensive activities, such as document delivery, and in replacing paper-based publications with online versions. Susan Rosenblatt is reported as commenting that "available information drives patterns of usage" (Odlyzko, 1997a). BT's experience bears this out. Making more information available on the intranet increases library usage both by local users choosing online access in preference to using the library in person and by remote users who previously had no practical means of access.