/ Institutional and Policy Issues in the Development of the Digital Library

An earlier version of this paper was presented at the International Conference on Scholarship and Technology in the Humanities, Elvetham Hall, England, April 1994.

The information revolution is electronic, digital, and networked. It brings the efficiency and economy of magnetic and optical media and character encoding. It brings "smart text" — machine-readable, linked, manipulable. And it brings ubiquity, the emancipation of text and image from points of origin and processing. These changes will radically transform many institutions, none more so than publishers and libraries.

In the networked environment, the pipeline model of publishing collapses. Authors can speak directly to readers. Publishers and libraries find themselves in the same business: providing access to information. Under the old model, publishers saw that books and journals were manufactured and physically delivered; libraries cataloged and archived books and journals from many publishers and made them available to one user at a time. In the new model, these classical functions, and the neat division of labor that characterized the pipeline model, disappear. Publishers traditionally evaluate, assemble, and integrate products. Libraries evaluate products, and assemble and integrate multipublisher environments. But is CompuServe a publisher or a library? It publishes its own material, mounts material provided by others, provides gateways for other publishers, and offers forums for user interaction. Is cable television a publishing operation or a library service? Are these simply contemporary equivalents of commercial lending libraries, offering access not by rentals but by the drink, by the hour, or by flat monthly fees.

Libraries traditionally represent the interests of end users, whether members of an academic community, employees of a company, or the general public. Commercial publishers, of course, represent the interests of their shareholders, but they must also meet the needs and desires of their customers, i.e., both authors and users. Nonprofit publishers, such as university and society presses, may claim to represent the entire community of users and producers, as well as the public interest in maximizing the dissemination of knowledge. The key difference is that libraries lack the strategic position in the distribution chain that publishers, commercial or noncommercial, have. Libraries are not inherently entrepreneurial because their users are captive. And although they are often an important part of the chain, their role is not exclusive, because many consumers of information buy directly from publishers or bookstores.

Like other institutions, both nonprofit publishers and libraries are captive to long-established policies and practices. In addition, their nonprofit structure makes it difficult for them to capitalize new modes of doing business. As an alternative, libraries characteristically look to cooperative cost or resource-sharing enterprises to spread costs and risks of new activities. OCLC's Online Union Catalog and the Research Libraries Group's Research Libraries Information Network are classic examples of cooperative resource-sharing that date from mainframe era. Other resource-sharing enterprises have arisen independent of libraries within specific academic communities: the Inter-University Consortium for Political Science Research (ICPSR), ARTFL (The Project for American and French Research on the Treasury of the French Language), and Comserve (for Communications studies). All of these have been around for some time. Why, given the continually dropping costs of mounting electronic information, the growth of the Internet, and the increasing capabilities of academic users, are there not many more?

The Cost-Shared Journal

The Chicago Journal of Theoretical Computer Science, a peer-reviewed electronic journal under development by the MIT Press, provides a promising new model for applying cost-sharing and risk reduction to academic publishing. In return for annual subscription fee of $125, libraries are licensed for unlimited use of the journal at their institution. The journal can be mounted locally or accessed over the Internet. It will be archived at MIT by the MIT Libraries so that subscribing libraries will not have the burden of archiving as they do with paper journals. The Journal will take advantage of its electronic form by including executable computer code. The market for the Chicago Journal of Theoretical Computer Science is fairly clearly defined: perhaps 400 research libraries worldwide. Since there will be virtually no market for separate individual subscriptions, the economics are nearly transparent. There is no press run, no shipping costs, no inventory to maintain. As for marketing and promotion, the Internet puts scholarly communities (and their libraries) in such virtual proximity that marketing costs are trivial. The opportunities to streamline the processes of solicitation, review, and editing promise to further reduce the costs of operating a scholarly journal.

The core managerial and editorial functions will remain. These include the often intangible front-end costs of attracting a prestigious advisory board, reviewers, and contributors. These initial costs will be greater than normal because electronic journals lack the acceptance and broad reach of print journals.

Ironically, the startup costs are increased by the very fact that early electronic publications have been distributed free of charge, often with limited or no peer review. While free distribution has much to commend it, it does not instill confidence that the journal, or the authors represented, will endure. The result is that the electronic journal has been stigmatized as an underfunded, technologically driven novelty — a periodic bulletin board.

The Chicago Journal of Theoretical Computer Science takes aim at the gulf between the free information and unmetered environment that characterizes the current Internet and the tightly metered world of online publishing. MIT Press must put effort and resources into selling the model, but that burden will diminish for those that follow. Basically, electronic scholarly publishing must be capitalized as an institution, although much of the required capital may take the form of prestige and other intangibles. The commitment must span the entire academic enterprise — faculty, libraries, computing services, and administration — encompassing tenure and promotion decisions, acquisition and licensing policies, and infrastructure development. And it must be a mutual commitment entered into by a critical mass of the most respected universities.

As the problem of global capitalization is solved, commitment to support individual electronic journals can and should be analogized to the periodical subscriptions that libraries have entered into for the past two centuries. The library pays a fixed amount of money for a reasonably fixed flow of information of known quality. This predictability helps simplify and rationalize the process of scholarly communications.

But while it is helpful and probably necessary to focus on the journal as an enterprise, the individual article is really the fundamental unit, and it is the citations and links between articles that define scholarly communications. Whereas print journals constrain the flow and format of information, digital technology and networks make it possible to quicken, enhance, and intelligently order information in new ways. The difference is as dramatic as the difference between the one-way analog channels of cable television and the packet-oriented environment of the Internet. For the near term, however, it is challenging enough to wrestle with accountability at a journal level of granularity.

Site Licensing for Cost Recovery

While digital technology provides opportunities for many different kinds of added value supported by greater accountability, other characteristics push the other direction. With marginal costs approaching zero, networked digital information behaves increasingly like a pure public good, suggesting that there may be consumer welfare loss associated with strict controls. Thus, the local site license (allowing unrestricted local use) looks highly desirable, if the licensor can be reasonably confident of controlling leakage from institution to institution.

This usually means circumscribing the community of authorized users more rigorously than simply those individuals with accounts on the institution's computers. Licensing practices for electronic journals could well follow definitions and security procedures that have been developed for commercial databases under institutional site license. Some may feel uncomfortable with this unless the license preserves universal access within the walls of the institution's library — so that the library does not have to erect internal barriers to access and use. More generally, the challenge is how to balance the need for accountability at an institutional level with options for occasional access by others. In some fields (including, quite possibly, theoretical computer science), there may be a relatively neat fit. Virtually all users would be expected to subscribe, especially if the scope of the enterprise is small (like a specialized journal) and the cost of unlimited local access is affordable. However, if the field is interdisciplinary or otherwise ill-defined, and as the enterprise grows to a library-like scale with commensurate costs, then the need to accommodate occasional users grows. Furthermore, the larger the enterprise the greater the costs of capitalization — and so the necessity of discriminating between the initial funders and those that buy in later when the enterprise has proved viable.

In some disciplines, the potential users will be very mixed. Whereas users of humanities journals and databases may be almost exclusively academics, users of medical resources will be an extremely heterogeneous lot: academic researchers, industry researchers, practicing professionals.... Among the professionals, some may be in lucrative private practice; others may work in adverse conditions in developing countries.

Under the traditional library model, such issues of market segmentation and/or equity are obscured. The library is a local service that is taken for granted, not a global service accountable to the many institutions that support it. Library use is without direct cost, because copyright law does not require accounting to rights holders for simple use and because both efficiency and equity militate against cost allocation based on use. In the United States, there is no public lending right, so that even loans for use outside the library require no accounting. There continues to be great ambivalence about accounting for inter-library loan, even though the recent Association of Research Libraries study shows per-transaction costs of greater than $30. Electronic publishing, by contrast, proceeds from a model in which access to information (whether direct or through an online vendor) is contracted for on a usage-sensitive basis.

Both the free loan/free use model and the metered use model have been modified to fit particular circumstances. Some libraries charge user fees for some services (such as access to online databases) where the library incurs direct costs. However, a number of online databases are made available to educational institutions on a library-like fixed-fee basis, in large measure for the purpose of building a future customer base.

Public libraries, in contrast, are seldom offered promotional pricing and must face the fundamental problem directly: How does the library as an institution maintain the principle of universal access in a networked environment where all transactions are increasingly contractual and accountable? What replaces the lending library, where those who cannot afford to buy a book can borrow it? Do we perpetuate the tradeoff between cost and convenience by creating electronic queues, where anybody can wait to get free information they could get instantly if they were able and wiling to pay for it?

Defining the Digital Library

The term "digital library" has become widely used within past five years, but there is considerable uncertainty about what it means. Nonetheless, it can be contrasted with conventional libraries in important respects:

The conventional library is local and generalized (even if it is focused on specific field or discipline); the digital library is a unique, specialized, global resource.

The conventional library is supported as a line-item in an agency, institutional, or corporate budget; the digital library is supported by memberships, subscriptions, and service fees.

The print library is a cataloged repository of mass-produced physical objects; the digital library is software-enabled environment of nearly uniquely located virtual objects.

The conventional print library differs fundamentally from the mass-produced objects it contains, which are global and specialized. The digital library is functionally similar to the electronic journal, although there may be significant differences of scale and editorial control. Both are specialized and globally centralized, like books and unlike conventional libraries.

Other characteristics of the digital library are less clear. Will it be an extension of publishing? Or is it a product of user needs to access information in a diverse multipublisher environment? To what extent will it be an extension of academic networking and computing infrastructure?

Quite possibly, the answer varies from discipline to discipline. Perhaps the humanities are at one end of the spectrum, and technical information, where most of the market is outside academia, is at the other end. University presses are heavy at the humanities end and weak in applied sciences and technical information. Will this be the natural state of affairs for digital libraries as well?

One hopes not. The university press model is fundamentally a sub-market model, in which the university subsidizes publishing that in general cannot be supported in the commercial marketplace. It contrasts with the role the universities have played in the development of the Internet, where, with help from the public sector, they have been in the vanguard. Indeed, the computer networks that give coherence and scope to the emerging information infrastructure of higher education and research have been driven in large part by uses in the sciences, including science applied to problems such as global climate change.

Two factors have been critical to this growth and leadership in infrastructure. One is the compelling economic and political case for resource-sharing which led to NSF funding of access to computer science centers (CS-NET in 1979) and to supercomputers (NSFNET in 1986). The synergy between networking and resource- sharing has led to a proliferation of volunteered resources on the Internet, most of which are available at no charge, including the pre-print bulletin boards which have become so important to communicating new knowledge in high-energy physics and other sciences.

The other factor has been the remarkably close relationship between academic research in computer science and the development and implementation of TCP/IP-based computer networking with the Internet serving as a common testbed. Substantial federal funding for computer science provided a base of resident expertise and leverage for the development of an in-house infrastructure. As network users, computer scientists have had a strong professional interest in advancing networking technology and infrastructure.

More generally, the growth of distributed computing, personal computers, and the availability of inexpensive leased lines created important opportunities to develop academic infrastructure. Use of these technologies is characterized by fixed costs, which have made it easy to bring in other academic uses at the margin — i.e., virtually for free, provided that occasional congestion can be tolerated. This has been particularly evident at large research universities where network capacity has been driven by remote visualization and other high- bandwidth uses and it has been possible to achieve large economies of scale.

Control of Information

The remarkable strength of academic information infrastructure at the network level contrasts with remarkable weakness at the information level. Here universities find themselves having to buy back research information that their faculty and staff have generated, especially in the sciences. It is in this context, with journal costs rising rapidly, that interest in the electronic journal as university-based alternative has been most intense. Preprint bulletin boards have sprung up to provide fast and, in some respects, nearly costless means of disseminating new research.

The basic problem is this: Once established as a critical channel of communications within a new field, a specialized journal can grow with the field, occupying it very effectively to the exclusion of the others. For potential entrants, the risks of starting a new journal are compounded by the risks of taking on a known journal of record. Secure from competition and owning the primary means of communication within the field, the journal which successfully occupies its niche has considerable latitude to raise prices, which it may rationalize by increasing its frequency or the number of pages per issue, further entrenching itself in the process.

From the academy's perspective, this problem is aggravated by the routine assignment of copyright by academic authors of journal articles. The loss of copyright to commercial publishers means that any alternatives to journal subscriptions (beyond interlibrary loans under "the rule of five") will be subject to copyright fees entirely within the discretion of the publisher. As institutions cancel subscriptions, publishers will rely on copyright fees to recover lost revenue. The fees are likely to become an increasingly visible element of interlibrary loan, document delivery service, and classroom copying ("course packs").

As we have seen, much of the value added by publishers in the case of the print journal is unnecessary to the electronic journal. Beyond the individual articles, the enduring value is added by the editor, who is likely to be an academic, and the reviewers, who are typically uncompensated academics and professionals. As publishing becomes network-enabled and less encumbered with costly physical processes, the opportunity for universities to repatriate and internalize the publishing process grows. (See the report of the Association of American Universities Task Force on a National Strategy for Managing Scientific and Technical Information, May 1994.)

However, there is presently no institutional framework to enable this transformation. In some respects, academic societies are positioned to do so in that they provide global services on a cooperative basis. But their resources are minuscule, and those that publish journals are often reluctant to disturb an established member benefit and source of income. For the most part, only the largest scientific societies have been able and willing to experiment with electronic publishing.

Recasting the Legal Issues

There has been much concern within the library community that publishers will seek to sidestep copyright provisions on fair use, including interlibrary loan practices, by licensing information under restrictive contracts. Networks facilitate contracting and avoid the problems of shrink-wrap licenses (i.e., the difficulty of establishing contracts concerning products that appear to have been sold outright). Copyright remains as a powerful means of dealing with uses that are not explicitly permitted under the contract. In fact, direct network delivery makes copyright more powerful than ever because the copyright holder's distribution and public display rights are not vitiated by the first sale doctrine. The first sale doctrine applies when copies are sold into commerce, but if instead access is licensed by the copyright holder, then those rights are enforceable. If public display is forbidden under the license, it is not only a violation of the license, it is a violation of copyright.

Arguably, use of information in violation of a contract could even work to limit a fair use, because it would color the "character of the use," which is one of the four statutory factors that determine fair use.

The ability to license access to information effectively makes it eaiser to exercise market power. The legislative history of the U.S. Copyright Act makes it clear that Congress did not intend copyright to preempt freedom of contract under state law. Even if that were the case for libraries, as some have argued it should be, content owners would be inclined to cut off access to libraries altogether and deal only with users who can enter into enforceable contracts. At least this would be the case for much scientific and technical information where there is substantial demand outside academia. In the U.S., publishers might well try to treat their information as a trade secret. Given the strong American regard for freedom of speech, it is improbable that content owners could be compelled to "publish" information rather than circulate it in private.

The answer, in my opinion, is not to look nostalgically back at the conventions of the print environment but to develop the information infrastructure of the academy aggressively to provide effective cost-based alternatives to conventional publishing. This includes implementing institutional policies that discourage wholesale assignment of copyright to commercial interests. In the case of scholarly publication where academic authors do not expect compensation (i.e., articles, but not textbooks), copyright should be held jointly by authors and their institutions with the understanding that publishers will normally be granted a right of first publication. Allowing a reasonable degree of exclusivity (depending on the subject matter and publisher's ability to reach the whole of the potential audience) to the first publisher will not disrupt or jeopardize the operation of the present system.

A New Model: Functional Integration of Distributed Resources

Steps to rationalize and reform the conventional system must be combined with the continued development of generic infrastructure and the validation and enhancement of new means of organizing, presenting, and accessing information. This leads to the view of the digital library not as an institution, but as functional infrastructure — the distributed, unbounded global information infrastructure enabled by gopher, world-wide-web, and other high- level protocols. It is embodied in the Mosaic software created by the National Center for Supercomputing Applications that integrates much of this functionality under an elegant user interface.

Just as the Internet is not a network but a metanetwork of many autonomous networks, Mosaic and the underlying protocols define a metalibrary of autonomous interlinked "libraries." These libraries may be all original material, or they may be collections from multiple sources. They may be electronic journals, bulletin boards, or archives. They point as they wish to other libraries and objects within those libraries. They arise and evolve independently, leveraged by their own ad hoc networking and standardized means of access. As in the Internet, the centralized functions associated with the metalibrary are the development, maintenance, and enhancement of the software, standards, and conventions that enable the linked libraries to appear and interoperate as part of a common environment.

The Patent Threat

The functionality of the networked digital environment, at both library and metalibrary levels, is subject to preemption by patents in the United States. In Europe, the presence of certain statutory exceptions to patentable subject matter, such as "presentations of information," together with a more conservative attitude in the European Patent Office and national patent offices, has discouraged the patenting of software processes.

However, in the U.S., erosion of judicially developed limits on patentable subject matter, promotion of patents by the patent bar, lack of patent examiner expertise, and the absence of pre- grant publication of patent applications have led to a flood of patents on software processes. In general, this has pleased the hardware industry and the patent bar, while it has forced software companies to engage in "defensive patenting" and engendered considerable anxiety among content-driven multimedia developers and publishers.

The academic community has not reacted coherently to the explosion of software patents. While computer scientists seethe with contempt for the patent office (which did not hire computer scientists as examiners until early 1994), they assume that software patents, as ludicrous as many are, will not directly affect academic interests. The voice of academia on patent policy is the Association of University Technology Managers, which takes the perspective of the university as a small non- manufacturing licensing entity — not as a builder of a highly complex and interdependent information infrastructure.

The one incident that came closest to affecting academic interests involved the X Windows system developed by MIT and licensed free to the computer industry as a public platform, just as the Internet protocols are available for public use and implementation. After Windows had been widely implemented by the industry, AT & T sent letters to commercial licensees claiming that X Windows infringed on a patent it held. Although X Windows was developed at MIT independently of the AT & T work, independent creation is not a defense to patent infringement.

Ironically, it is the very breadth and integrative power of Mosaic — as well as its growing ubiquity — that makes it commensurately vulnerable to U.S. patents. Furthermore, because of the complete secrecy of the patent application process in the U.S., we will not know until 1996 or 1997 or even later what patents may be infringed by the current version of Mosaic. It is important to note that while AT & T may have compunctions about going after educational and research institutions for patent infringement, small licensing companies who hold strategic patents are much less concerned with public perception or relations with a large customer base. Their dominant ethical concern is: "Is it fair to our shareholders?" States, including state universities, are now fair game with deep pockets, because Congress saw fit to abrogate state sovereign immunity for patent infringement in 1992.

If there are any doubts about the relevance of patents to the conduct of research and education, consider the following recent spectacles:

  1. At a major computer trade show, the Executive Vice-President of Compton's New Media announces that Compton's has been awarded a patent on accessing text and images through multiple entry paths and that Compton's is inaugurating a licensing program that would have virtually the entire multimedia industry pay tribute on every sale. The patent creates such an uproar that the Commissioner of Patents and Trademarks takes the extraordinary step of ordering a rexamination.

  2. The President of Optical Data Corporation, a publisher of multimedia instructional materials, writes a friendly letter informing the state educational technology directors in Florida, Texas, and California that the company had been granted two patents which most videodisc-based curriculum products were probably infringing. He thereby implicitly informs the states that they will be liable for triple damages, since henceforth their infringement would be willful.

These are not examples of technology companies fighting it out with other technology companies but publishers using the patent system to hamstring competing publishers. This is not the genteel world of copyright where there must be an actual taking somewhere along the line to find infringement. This is a world where first in time is first in right regardless of how many others independently arrive at the same result. It is a world in which state-sanctioned monopolies cover not only copying and public use, but all use, private as well as public — and not only use of processes but the use of products of patented processes, so that merely reading a document created with a patented process infringes the exclusive rights of the patentee.

This is a world that exalts technology over content by allowing patent holders to control the flow of information and knowledge. In the United States, despite our love of free expression and the free flow of information, the order and syntax of interactive speech is up for grabs. So, too, is the design and operation of the digital library — especially the new fabric of the global metalibrary that is taking shape on the Internet. The promise of a technology-enabled knowledge infrastructure built on the accumulated wealth of human enterprise and expression has been stood on its head. Such aspirations are hostage to secret proceedings in a federal bureaucracy, where speculators in abstract processes tough it out with the electrical engineers that examine the patents.


The full realization of the digital library challenges the higher education and research community in several distinct ways. It requires universities to cooperate in new and unfamiliar ways. Instead of bricks and mortar or faculty positions, it asks for a commitment to an intangible inter-institutional infrastructure. It asks a considerable short-term investment in the expectation of long-term returns. At the same time, it means reorienting internal communities that have developed their own intramural practices and cultures — and reconstructing many well-worn practices.

Along with this entrepreneurial and managerial challenge comes a a challenge to develop public policies that support the creation, management, and dissemination of knowledge within the emerging information infrastructure. The higher education community, with its enormous stake in these processes, its experience and leadership in information infrastructure development, and its ability to draw on a wide range of faculty experts, is uniquely positioned to provide such stewardship. However, such policies must not be merely reactive and protective of established practices and short-term interests. Nor can they be abstract or speculative. They must be informed and tested by hands-on experience in developing and maintaining alternative forms of publication and by the actual design and implementation of digital libraries as functionally sophisticated global enterprises.

About the Author

Brian Kahin is Director of the Information Infrastructure Project and Adjunct Lecturer in Public Policy at Harvard University's John F. Kennedy School of Government. He also serves as General Counsel for the Annapolis-based Interactive Multimedia Association and directs the Association's Intellectual Property Project.