spobooks5621225.0001.001 in
    IV. Building and Using Digital Libraries IV. Building and Using Digital Libraries > 16. The Columbia University Evaluation Study of Online Book Use

    16. The Columbia University Evaluation Study of Online Book Use[†]

    16.1 Introduction

    This paper reports some observations about cost, use, and users of online books during the Columbia experiment. From winter 1995 to autumn 1999, the Online Books Evaluation Project at Columbia University explored the potential for online books to become significant resources in the academic world. The project prepared books in HTML format, a choice that seemed reasonable at the time it was made (1995). Later observation of user behavior makes us less certain of that choice. The evaluation component of the project included monitoring of the national technological environment. The project analyzed (1) the Columbia community's adoption of and reaction to online books, (2) relative life cycle costs of producing and owning online books and their print counterparts and (3) the implications of traditions in scholarly communications and publishing. The experience involved integration of two very diverse cultures, and has taught us the relevance of the following joke.

    A manager, an engineer and a computer scientist are all traveling in a car in the mountains when the brakes fail and the car careens down the road and eventually stops just hanging over the edge of a cliff. They carefully climb out of the car and the manager says, "Well, now we'll have to form a focus team for a matrix review of vision and objectives." The engineer says, "Let me have a screw driver; I may be able to fix this in 10 minutes." And the computer scientist says, "Let's push this back up to the top of the hill and see if the brakes fail again."

    Our approach to online books at Columbia was like that of the engineer, but "10 minutes" has been more like four years. One of the lessons learned is that, as libraries become more interdependent with online information services, we must become more accustomed to the kind of trial-and-error approach exhibited in the joke.

    Figure 16.1: Abstract variables of interestFigure 16.1: Abstract variables of interest

    We started from an abstract formulation of the relation between users, libraries and constraints upon each of them; see Figure 16.1. Our goal is to understand the behavior of the user of the system, shown in the middle of the diagram. The capabilities of the individual users obviously influence their behavior. Their disciplines also probably influence it. The overall environment including technology and attitudes toward computers influences it. The resources available in the library also influence behavior. In turn, library management controls those resources. Our study is an effort to insert the dotted line shown in Figure 16.1, to provide management with feedback about the behavior of users which it can use to better manage the library resources.

    16.2 Economic Perspective

    We take an economic perspective on the complex problem of establishing an online books service at an existing major research library. We presume that the actors involved weigh the costs and benefits of various alternatives available to them. Each applies some kind of personal utility function to those costs and benefits, and chooses the action with the largest personal utility. In this complex setting there are many different kinds of economic actors: students, faculty, and staff. Indeed, the library itself, and even the entire university, can be thought of as "actors."

    Individual economic factors

    User Costs. What are some of the forces affecting individuals? First, there are costs of two kinds, capital and continuing. The capital costs are the cost of equipment needed to be able to use the digital library or online books, and the cost of acquiring the needed skills. Since, in the setting of our project, there is no transfer of funds from users to the library associated with use events, the continuing costs are (a) the cost of connecting to the library and (b) the mental costs or efforts associated with use. Not a lot is known about these costs to the user, at this point. However, in the transition from page-based books to the HTML format that we chose, we sense that certain kinds of mental landmarks that readers have developed over years of working with print on paper are removed. It seems likely that this results in additional mental cost to the users.

    User Benefits. There are also benefits to the users. First among these, of course, is ubiquity of access. Also, our system provided a search capability. In addition, the book-marking system (supported through the browser) permits users to store pointers to important locations within an extended text. Our system did not directly support annotations, but obviously annotations can be established in the users' own computers. Finally, using a system like this provides the intangible benefit of being up-to-date relative to one's peers.

    Beyond all this, having and using a digital library provides symbolic utility. Symbolic utility is a concept introduced by the philosopher Robert Nozick to represent the utility assigned to something good to have or to do, even if it doesn't necessarily "work" (Nozick, 1993, p. 226). In this case, members of the Columbia community may have felt proud about contributing early to the development of digital library systems.

    Given the nature and variety of benefits, it seems probable that they outweigh the costs to individual users.

    Staff Economic Factors

    Staff Costs. Economic forces also affect the staff of any library that introduces a substantial digital component. One important cost is the learning curve, representing costs that must be incurred in order to get the system to work. The other, which is becoming a pervasive feature of the library world today, is the cost of continuous change, which involves not only learning but psychological stress as well. Because some older librarians did not foresee and do not enjoy those stresses, we will probably see a gradual change in the psychological profile of the profession.

    Staff benefits. Among benefits to the staff, the first and most important is the ability to provide better service to patrons. Another important benefit is the ability, in the online books or digital library situation, to adapt materials developed by others. In focus groups conducted at New York University, we heard for the first time librarians reporting that they were pleased to be able to develop web resources in which pointers to resources developed by librarians at other institutions played a major role. A final benefit to staff is the fact that by working in the digital environment they are developing skills that are much more portable than traditional library skills.

    Library Economic Factors

    Library Costs. There are several incremental costs to the library in implementing this project. The most notable are costs of equipment, materials development, and training.

    Library Benefits. Among the important library benefits are the contributions to the competitiveness of the university, and the contribution that digitizing the library contributes to the shared professional goals of growth and service.

    Publisher Economic Factors

    Publishers must consider the potential of electronic books in terms of their business plans and goals. While publishers share some objectives with libraries, authors, and readers, the relationship is sometimes antagonistic, because some portion of the price to readers and libraries goes to publishers rather than authors. We presume that for-profit publishers seek to maximize profit, while non-profit publishers seek to maximize the net of income over expenses attributable to each book.

    16.3 Issues affecting the design of the studies

    Based on the economic framework above, we studied the environment, publishing costs, and library costs. We explored various views on the function and design of online books. We conducted numerous and diverse studies of use and of user preferences. In this chapter we summarize some of what we've learned and discuss implications for the future.

    In concrete terms, the Columbia University Online Books Evaluation Project repackaged books for online delivery, studied the use of those books, and estimated the costs for publishers and libraries of providing print and online books. There were four publishing partners in the project: Columbia University Press, Oxford University Press, Garland Publishing, and Simon and Schuster Higher Education. We analyzed the costs of development and delivery, and the use of digital texts. We sought to relate those costs to that use, within the context of university library service, and to the potential for service. Our analysis of the potential for service is by no means complete.

    In the remainder of this section we review several considerations that framed our studies and motivated particular choices we made in the design of the studies.

    Why put books online?

    Online books have several advantages. First, we anticipate that online books will be cheaper to produce, to purchase, to acquire, and to maintain. We also expect that online books will provide increased functionality such as searching and linking. They offer obvious potential for enriched content through the addition of links to multimedia, computer simulations, and other features. There is also potential for developing expanded products, rather like a collection of books linked through a web site. Not least in importance, online books can provide availability around the clock and calendar.

    Why not put books online?

    The issue of whether to put books online at all was a serious one in 1994 and 1995, when the project was planned and launched. At that time, it seemed that the most important negative point was users' objections to reading books online. We do not know how true this is in the year 2000. There are definitely many who do not want to read books online, but we must entertain the possibility that most of those users are of an older generation, and will eventually be replaced by people who do want to read online.

    Usability. When the project began, it was anticipated that online books would be difficult to use. At that time (1995), it was not even apparent that Web technology would be easy to use. We were also concerned that there was no feasible market model for the development of online books. We cannot say today that there is a clearly defined market, but the activities of netLibrary, Questia and other online, commercial accademic libraries show that there are multiple possible paths into the market for scholarly books, aimed at libraries and students, respectively.

    Accessibility. We were concerned about the adequacy of access and connectivity. We had in mind, primarily, people working at home, connecting over telephone lines with top speed of 14.4 kilobits per second, which seemed likely to be inadequate. However, about halfway through the project the typical home-access speed had moved up to about 56 kbps, and increases in access speed continue to occur.

    Production Cost. We were concerned that online books would be too costly to produce. In fact, we shall see that the production method we employed was relatively costly. Nonetheless, when compared with the total life cycle cost of paper, online production is something of a bargain.[1]

    Author Interests. We also believed that authors might oppose the presentation of their books in online form. Authors might fear loss of royalties, and object to aesthetic compromises. HTML is a limited rendering language, and the connection to HTML might remove some important aspect of layout. Also important is a fear, on the part of young scholars, that exclusive publication in an online form might become common for first time authors and that this would demean their works and lessen their chances for career advancement. On the other hand many academic authors are concerned with documenting the impact of their works and the extent to which they are being read. The online environment is ideal for this.

    National Environment: Access

    Reviewing the environment for online books from 1995 to 1999, we see a number of changes. First is the improved price-to-power ratio for personal computers, discussed further below. We saw penetration of Internet use to more than 50% of all USA households by 1999. In addition, by 1999 half of all adults in the USA were Internet users. There was little improvement in Internet service provider pricing between 1997 and 1999. Hand-held book readers emerged in 1998, and some of our focus group work suggests that this will be important in the future growth of the online book market.

    National Environment: Computer pricing
    Figure 16.2: Prices do not follow Moore's Law (impressionistic)Figure 16.2: Prices do not follow Moore's Law (impressionistic)

    Moore's famous law is that computing power at a given price doubles every 18 months. The inverse formulation is that the cost of a given amount of computing power falls by half every 18 months. However, the corollary that consumer prices for computers falls at the same rate does not hold. Starting from when the base price of an adequate computer was about $4000 we would have expected that by the end of our study this price would have dropped to well below $1000. What we actually saw, through a program of tracing ads for an entry-level computers, is that prices dropped fairly rapidly to around $2000, and held there for some time. Towards the end of the study period there was a new break down to $1000. Apparently the strategy of manufacturers was to identify market price points that are acceptable to consumers and to improve the configuration of the computers rather than drop the price past those points. If Moore's law held strictly, a general purpose computer adequate for the use of online books over the 56 kbps lines should now cost only $300.

    Local Columbia Environment

    The local environment at Columbia for online books changed substantially during the period 1995 to 1999. By the end of this period there was Ethernet connectivity to every building and dormitory. By 1997, which is the last time that we could justify the costs of surveying to ask the question, 80% of students and faculty had adequate access to a network computer. By 1997, most library users reported an average of six hours per week of online activity of all kinds. That works out to about an hour a day and we estimate that by now this has probably at least doubled if averaged over the entire community.

    By spring 1999, online use of complete texts had become common at Columbia. For example, the level of JSTOR use was equal to one use per month per potential user, on average. We found that most online book use was from on-campus computers. This is consistent with a concern that access from home might not be adequate. It is quite possible that, as bandwidth to the home increases, the usage of online books will increase further.

    16.4 Cost data

    We developed a variety of sources of data in the online books evaluation project. We conducted surveys online, by mail, by telephone and in class. We also conducted individual in-person and telephone interviews of scholars and a number of focus groups involving users, potential users, and librarians. In this report, we focus on cost analyses and on Web data.[2]

    In a traditional print production environment, preparing texts for online access incurs an additional cost. We found an amazing range of estimates for this cost, from four cents per page to more than $2.00 per page, which works out to approximately $100 to $1000 per title. The range of cost is due to the enormous variation in the format and quality of source files from the publisher at the time, and in the conversion processes employed by various projects. Achieving the low-end cost requires a very standard and well-behaved PostScript source file. In addition, these figures include some unknown component of experimentation cost, as this project and others adapted to variations in input, and in desired presentation format.

    Table 16.1: Sample e-book production costs ($)
    1.51/pg. Conversion: OCR or SGML or HTML
    1.00/pg. Conversion: ASCII to HTML
    0.04/pg. Conversion: Postscript to PDF
    20.00/title Conversion management
    1.00/MB/yr. Server maintenance

    In Table 16.1 we present some sample electronic book production costs. One conversion route is from OCR (or from SGML) to HTML, and the other, somewhat less expensive route begins with ASCII and goes to HTML. Conversion from PostScript to PDF is done using software from Adobe and yields a cost of about four cents per page. Note that this process, which has been tested at the University of Pennsylvania, does not yield fully navigable HTML files, but yields PDF output only. Management of conversion is estimated to have cost about $20/title at Columbia. Maintaining books on the server is steadily less expensive, estimated by the end of 1999 to cost about $1/megabyte per year.

    A fully electronic production process (bypassing print) would be less expensive. Through conversations with scholarly publishers, we have been able to estimate that the potential savings for moving to online format, without paper would be about 10% at the plant (that is changes in typesetting costs) and perhaps an additional 15% in costs avoided for paper, printing and binding. Also eliminated would be costs associated with warehousing and shipping, which we did not attempt to estimate.

    On the other hand, there are offsets to these savings for online production. They include costs of customer service, continuing file maintenance, and migration. These latter, archival, functions are very important. A rational economic publisher will only maintain the file for a book as long as the discounted total expected future revenue from sales exceeds the total discounted projected cost of keeping the file. Thus, libraries cannot rely on publishers to maintain the files of books with very low demand, unless they are willing to pay service fees that cover the publishers' expenses.

    From our review of the literature we have prepared an estimate of life cycle costs to the library for online and paper books. These, projected over a thirty-year life cycle and discounted at a 5 percent real cost of money, are lower for online books. Our summary is shown in Table 16.2. The difference is essentially equal to the avoidance of the costs of managing circulation. In addition, long run costs for online books would likely be quite a bit lower as copy cataloging would prevail rather than the original cataloging experienced, and included in the costs, for this project. Original cataloging costs about $25 per title while copy cataloging would cost significantly less per title.

    Table 16.2: Estimated life cycle costs ($)
    Print Online
    Acquisition/Processing 47.00 39.00
    Storage/Maintenance 14.00 38.00
    Circulation 44.00 (included above)
    TOTAL 105.00 77.00
    NOTE: Calculated over a 30-year life, at a 5% discount rate.
    Design Considerations: Librarians

    We conducted focus groups with librarians to identify market and design features that they consider important in building a collection of online books. The first feature emerging is the ability to search across selected groups of titles. A second, rather technical issue is the existence of "stable, granular" URLs. Stable means that the URLs remain the same over time, or at least that the system does not have to be manually updated. Granular has to do with the level of specificity with which a user can access a book. In the Columbia approach to online books, an individual file corresponds to a chapter within a book. We found that librarians want good bibliographic control of online books, with direct linking from the catalog into the book. But they would also like to see usage data on individual titles in some standard form. This usage data can feed back to rationalize online book acquisition policies. Finally, librarians want to be assured that an online book system will support reliable migration to new platforms.

    Design Considerations: Scholars

    Both in-depth interviews and focus groups with scholars generated a somewhat different list of desired design features. Scholars would like to be able move directly into the online book via direct link from the online catalog. They would like to be able to define groupings of texts on the fly, and search across that collection of texts. They would like a comprehensive and detailed table of contents, with direct linking into the book (providing, in effect, analytic indexing). When images are a significant part of the text they would like to see browsable, linked, thumbnail images. They would like screens and displays supporting the ability to show two nonconsecutive pages at once, permitting comparisons. They would like to be able to see footnotes and text in parallel displayed on the same screen, even if the "footnotes" are actually endnotes. They would also like to see pagination matching the print version, not only for navigational bearings, but also because, frequently, the citation that led them to a book specified a particular page.

    Scholars would prefer that, whenever the collection contains the relevant material, references be hyperlinked directly into the cited material. They would also like to be able to link to a dictionary. They would like to be able to adjust fonts and formats for easier reading on screen. They would like to have annotation and highlighting capability that they could store with the book. They also expressed an interest in having the ability to share annotations on a single text.

    16.5 Study of Users

    The remainder of this paper discusses some of what we have learned about the users. The first interesting point is a relation among technology, behavior, and attitudes. We expected that the technology, as it grew, would influence the attitudes of scholars, both faculty and students, which in turn would influence their behavior. However, we tracked attitudes carefully over the entire study and saw only the smallest movement towards believing that online books are a better way to do one's scholarly work. This forces us to conclude that, in fact, technology effectively influences behavior and that attitudes simply have to catch up. This may mean that scholars are moved to technology by a subliminal perception of benefits, which they cannot articulate. On the other hand, it may mean that fashions in scholarly behavior are simply no more rational than any other kinds of fashion.

    Analysis of individual use

    A key innovation in the Columbia online books project was the introduction, in 1997, of the ability to identify the activity of unique users. This was a fortunate byproduct of the security system, developed to permit people to read online books from home. To maintain confidentiality of the users, system analysts replaced the identities of individual users with uninformative labels.

    Table 16.3: Status of users at time of first use
    Frequency Percent
    Undergraduate Student 2088 58.0
    Other 607 16.9
    Missing 328 9.1
    Graduate Student 295 8.2
    Other Student 145 4.0
    Faculty 136 3.8
    TOTAL 3599 100.0

    With anonymity ensured, we were permitted to link usage to administrative files containing demographic information about the users. Typical results are those shown in Table 16.3, reporting the distribution of the status of individual users at the time they first used a particular resource. The resource in this case was the online version of the Oxford English Dictionary. While we had a number of reference works available online and, by the close of the project, close to 200 books in online form, the total usage of the OED represented approximately 50 percent of all online usage, and so it is used here to illustrate the types of analyses that we performed.

    N Stem Leaves
    1049.00 0 00000000000000000000000000000000000000000000001111111111111111111111111111111111111111111111111
    491.00 0 22222222222222222222222222333333333333333333
    265.00 0 444444444444455555555555
    202.00 0 666666666667777777
    156.00 0 88888889999999
    140.00 1 0000000111111
    92.00 1 22223333
    102.00 1 4444455555
    86.00 1 66667777
    62.00 1 888999
    68.00 2 0000111
    49.00 2 22233
    57.00 2 44455
    48.00 2 6677
    38.00 2 899
    38.00 3 001
    41.00 3 2233
    29.00 3 45
    35.00 3 667
    32.00 3 899
    34.00 4 011
    25.00 4 23
    18.00 4 45
    20.00 4 67
    16.00 4 89
    11.00 5 0&
    Figure 16.3: Stem and Leaf diagram of time spent viewing the OED
    NOTE: This figure is a stem-and-leaf diagram that represents a histogram of use. It will look like an ordinary histogram if rotated 90 degees to the left, but contains rather more detail, as individual values are represented by numbers, rather than marks. Each leaf digit represents an observation. The value of the observation in minutes of usage is equal to 10*Stem+Leaf. The last row represents fifty or mroe minutes of usage. The value in the first column (N) is the total number of observations summarized in that row.

    There were 3,600 individuals who used the OED during the study period. Just over 2,000 of these were undergraduate students at the time of first use. Nearly 300 were graduate students and close to 140 were faculty members.

    We analyzed the ways in which individual users used the resource. To do this we introduced the rule that an inactive period of 15 minutes or more was considered to mark the end of a session. This is a reasonable rule based on detailed analysis, which showed that there was a natural break in the distribution (over all users) of the interval between "clicks" at somewhere around 10-15 minutes. We interpret this as meaning that continuation of a session over a break of this duration will be a rare event, which we can safely ignore. We also studied the total amount of use that individuals made of specific resources. This is illustrated by data on the OED. The mode (that is, most common) number of clicks that an individual user made on the OED is somewhere between 2 and 3. Above that number the number of clicks that a person made on the OED drops exponentially. The rate of the drop is such that the chance to go on to two more clicks is about 2/3 at any time. (The chance to add one more click is the square root of this number, or about, 83%.)

    As shown in Figure 16.3, the time spent using the OED online follows an exponential distribution. This indicates that at any time in the course of using the OED an individual has a constant probability of just quitting and deciding never to use it again (roughly 100%-83%=17%).

    This apparently exponential behavior is intriguing and we pursued it in another way. Since we could anonymously track individual users, we could plot how much an individual used the resource against how long it was since the first time that the individual used it. With 100% adoption this graph would be roughly linear. We show the actual data for the OED (which had heavy use) in Figure 16.4.

    Figure 16.4: Scatter plot of total use against time since first useFigure 16.4: Scatter plot of total use against time since first use

    Figure 16.4 is a scatter plot. Each point represents one individual user. The y-coordinate of the point represents the number of sessions that an individual had with the OED and the x-coordinate represents the number of days since that individual first used the OED. The steep line represents the expected usage relationship if adopters continued to use the resource at a steady rate.[3] In fact, a regression analysis shows that the best fit is nearly horizontal, which indicates there is little ongoing use by individuals. It is apparent that many observations are not well-predicted by this model, and indeed, that some usage did persist.

    We can plot this data in a more familiar form by showing the distribution of time since first use, without paying attention to how much use there has been. We do so by projecting the preceding figure onto a horizontal axis; See Figure 16.5. We see, as have most researchers in the academic setting before us, that it is very easy to discover the existence of the semester. Each of the five peaks in this graph corresponds to an academic semester. There might be some cause for optimism in the fact that the leftmost peak, which represents the most recent surge in use, spring 1999, seems to rise higher than any of the earlier ones. However we don't know quite what to make of the fact that the one before it (fall 1998) represents a drop from the preceding fall.

    Online Versus Paper: Usage Data

    Our data (based on comparison between the online book usage figures and data collected through circulation statistics and slips placed in corresponding reference titles in the library) suggest that online books were used more than their print counterparts. If we count circulation alone we find that there were about three times as many accesses per book online as for the paper version. After consultation with librarians we believe that a reasonable correction for in-house use is to increase circulation by 50%. This would reduce the ratio to twice as many online uses per book.

    Figure 16.5: Histogram of time since first useFigure 16.5: Histogram of time since first useNOTE: Height of the bar is the number of sessions logged by users starting the indicated number of days before data collection.

    We conjecture that higher usage for online books is due to lower convenience costs than for other access options. Having purchased a paper copy for the library does not ensure that the book is available. The book might be in circulation, or missing from the shelf. If the library is closed the paper copy of book is not available to a user. A common access option is an online public access catalogue (OPAC). However, an online public access catalog does not support even the roughest form of browsing into the book until the book itself is put online. An OPAC provides so little information about a book that a scholar might not be aware that it contains material relevant to his work. If so, the mere ownership of that book by his library does not make it truly available to him. Catalog records enhanced with tables of contents and book indexes are a relatively new offering and a major asset to the scholar in locating books relevant to his or her research, but do not eliminate the higher convenience costs of accessing the physical book at the library.

    Hence, the online access to a full book represents a quantum leap in the availability of the contents of that book, and, we believe, lowers the barriers to access for many modalities. Perhaps the only modality for which it is not clear that online access is preferable is "plain old reading at length."

    We were also interested in studying patterns of access when readers use online books. We have approached this in two different ways. One is essentially qualitative, in which we asked people in surveys and in interviews how they used online books. In doing that we were able to identify at least the following kinds of activity: browsing, grazing (that is, reading portions of text scattered through the book, punctuated by visits to the index or table of contents) citation checking, the finding of individual facts or quotations, reading on reserve for a course, determining the need for a paper copy, printing (that is, turning the online book into paper), and directly reading online.

    We have also, because we can track individual users, been able to break some new ground in quantitative analysis of how people use books online. Generally, each chapter is a separate file, and hence a separate entry in the web sever log. Thus, by analyzing the sequence of clicks on chapters, we are able to distinguish a number of different ways in which individuals use online books. The first style we characterize as linear use: an individual reads chapters of a book in exactly the same order in which they appear in the printed volume. The second pattern of use is quasi-linear, in which the sections of the book are visited in some personalized order but each section is read once and only once. We also observe a pattern we call hyper-linear, in which sections are visited in an arbitrary order and some sections are visited more than once. Hyper-linear usage occurs about 12% of the time. See Figure 16.6.

    Figure 16.6: Patterns of motion in online booksFigure 16.6: Patterns of motion in online books
    Figure 16.7: Use of index in online booksFigure 16.7: Use of index in online books

    There are several ways that a use pattern may involve use of the index (or, more generally, search tools); see Figure 16.7. The first format is to use a search tool once, at the outset, and then to view portions of the book in some linear or quasi-linear order. Another possibility involves using the index, going to a section, and then going back to the index and out to another section and continuing in this pattern. Whether this is a natural behavior evolving in the presence of online books or an artifact introduced by the fact that returning to some index or search tool may be the easiest way to get to the next section is something we don't know at this point. In thinking about these patterns of use, we may compare them to what a person might do with the book in hand, at the library shelf, or with access to the catalog, in some online format.

    16.6 Economic Behavior

    Economic Behavior of Scholars

    Given our original framework, we would like to bring together everything that we have learned, to formulate some economic model about scholars' preferences for modalities of book access. We believe that, for this issue, one key variable is cost which we characterize simply as low or high. (For the moment let us imagine that this is the purchase price of the book, as far as the scholar is concerned.). We propose that the other key variable is whether the scholar intends to read much or read little. We believe that whether the book is cheap or expensive, if only a little of it is to be read, the scholar will prefer to get it online. Based on data available to us during the span of this project, we believe that if much of the book is to be read, the scholar will prefer to get it in paper form. If the cost is low, the scholar will buy it; and if the cost is high, the scholar would like the library to buy it so that he or she can borrow it.

    In short, what we seem to find is that users want online books for convenient access and for assured availability. They also want online books for many of the purposes discussed above. They are particularly attracted by the added functionality of annotating and hyperlinking. Nonetheless, our results indicate that when scholars want to read books at length, they still want them in paper form.

    Economic Perspective of Librarians

    Complementary to this analysis of when scholars will prefer online books, our focus group studies with librarians indicate that librarians want online books for high demand books (for example instead of buying a second copy). Librarians also want online books to meet transient demand, rather than having to purchase additional copies which will be unused later. And, of course, librarians want online books for the anticipated cost savings.

    On the other hand librarians are concerned about having to pay separately for the online version of a book that they hold in paper. They are concerned about the uncertainty of preservation and migration of digital forms. They also are concerned about the appearance of unwanted and unused material in bundled packages. While bundling in general can increase both consumer and producer benefit (Shapiro and Varian, 1999), librarians are particularly concerned with the flow of cash from the institution to the publishers, and would like to have the finest possible detailed control to optimize the allocation of those funds, by avoiding materials that are less in demand.

    Speculations on Marketing Strategies

    We have tried to speculate on options for library-oriented strategies for the introduction of online books. For example, one might imagine that online versions are made available for little or no additional cost to purchasers of paper copies. One might hope to see entire collections of online material priced very attractively. On the other end of the bundling spectrum, one might see some kind of on-demand licensing, or on-demand print ordering. The netLibrary is offering yet another alternative by mimicing the circulation system for print books. It provides online books to individual libraries or library consortia and allows just one user at a time for each book. Other marketing strategies are more reader-oriented and less tailored to the concerns of a library. These include Questia's effort to build an online book collection the size of a college library (250,000 volumes) and to sell subscriptions to students. Another path is the hand-held device and downloadable book that is now coming to market. Generally speaking, in reader-oriented strategies, pricing of the electronic form will be unrelated to print purchase, as there is little chance that consumers can be persuaded to buy the same "book" twice.

    We speculate, but at this point can only ask, whether different strategies will emerge for different classes of print materials such as text books, scholarly books, and narrow interest (sometimes called endangered) scholarly books.

    At the end of our study it appears that a number of transitional strategies are available or being developed. The leading one is the dual provision by publishers of publications in print and online. Among other virtues of the strategy, there is the possibility of electronic publication of both a backlist (the books that have been available for over a year) and a front list (the books newly published). Since publishers still need to protect ultimate paper sales, some limits may be placed on the accessibility or functionality of new titles that are presented in front-list form.

    16.7 Concluding Unscientific Postscript

    Use of online books can be tracked at a micro level, providing valuable information for authors and publishers. In fact, scholarly authors must become concerned about these data since their advancement may depend on being able to document the degree to which their works are used, as well as the degree to which they are cited.

    Having studied the provision and usage of online book for four years, we feel emboldened to make a few predictions. Due to cost, complex functionality will be reserved for books that have large sales or are developed in subsidized projects. We anticipate that endangered monographs will be available from academic or society servers, from sites like the Los Alamos Preprint site , or from the individual authors themselves. In other words, they won't be "published" as we understand it today. Many books will appear in both electronic and print versions. Commercial enterprises or academic organizations and not library experiments will define the product that eventually comes to dominate.


    This research has been supported by the Andrew W. Mellon Foundation and Columbia University. The views expressed herein are not necessarily those of that Foundation or the University. The first author acknowledges support from Columbia University through a contract with Tantalus Inc., SCILS, Rutgers University, the Fulbright Foundation, and Ragnar Nordlie and the Journalism, Library and Information Science Department of the Oslo University College, Norway. At Columbia the authors are indebted to many individuals in the Libraries, in Academic Information Systems, and in the academic departments for their participation, encouragement, and cooperation. Elaine Sloan, University Librarian, was critical to the formulation of the project and an insightful supporter. Walter Bourne, David Millman and Gordon Dahlquist were particularly improtant to the process of creating the online books and various online questionnaires. Lynn Jacobsen Rohrs was a key project participant as the analyst of the web server data. Kate Wittenberg of Columbia University Press, Leo Balk of Garland Press, and Ursula Bollini of Oxford University Press provided books from their presses and shared their insights into the publishing business and critical issues for our research. The authors thank the editors for careful revision of the manuscript.return to textreturn to text

    1. As we will hear during this conference, publishers are working hard to reduce those prices to make online books very competitive.return to text

    2. The reader may visit the project web site: to review other studies and reports.return to text

    3. This is a qualitative relationship: our prediction is merely that the relationship would be linear and rising, but the slope in the figure is arbitrary.return to text