Economics and Usage of Digital Libraries: Byting the Bullet

4.3 System design and implementation

Our research team worked with the University Library to design an online system and market this system to a variety of information clients. We primarily targeted libraries, focusing on academic and corporate libraries. Contacts were made with institutions expressing interest and institutions already invested in digital library activity. Over thirty institutions were contacted as potential participants, of which twelve agreed to join the effort. Decisions not to participate were frequently driven by budget limitations, or by the absence of pricing options of interest to the institution. [3] The resulting mix of institutions were diverse in size and information technology infrastructure, as well as in organizational mission. PEAK participants were the University of Michigan, the University of Minnesota, Indiana University, Texas A & M, Lehigh University, Michigan Technological University, Vanderbilt, Drexel, Philadelphia College of Osteopathic Medicine, University of the Sciences in Philadelphia, Dow Chemical, and Warner-Lambert (now part of Pfizer Pharmaceuticals).

The PEAK system provided full-text search of and retrieval from the entire body of Elsevier content for the duration of the experiment, including some content from earlier years that Elsevier provided to help establish a critical mass of content. Several search and browse options were available to users, including mechanisms that limited searches to discipline-specific categories designed and assigned by librarians at the University of Michigan. Any authorized user could search the system, view abstracts, and have access to all free content (see below). Access to "full-length articles" (a designation assigned by Elsevier) depended on the user's institutional subscription package. With this access, articles could be viewed on screen or printed.

The delivery and management of such a large body of content (over 11,000,000 pages at the conclusion of the experiment) and the support of the PEAK experiment required a considerable commitment of both system and human resources. In addition to the actual delivery of content, project staff were responsible for managing the authentication mechanisms, collecting and extracting statistics, and providing user support for, potentially, tens of thousands of users.

PEAK ran primarily on a Sun E3000 with four processors, and was stored on several different configurations of RAID (redundant, fast access drive systems). User authentication and subscription/purchase information was handled by a subsidiary Sun UltraSparc.

Searching was conducted primarily with a locally-developed search engine called FTL (Faster Than Light). Bibliographic search used the OpenText search engine. The authentication/authorization server ran Oracle to manage user and subscription information. Several other types of software came into play with use of the system. They included

  • Cartesian Inc.'s compression software, CPC, which allowed us to regain a significant amount of disk space through compression of the TIFF images;

  • Tif2gif software developed at the University of Michigan, which converted images stored in CPC to GIFs;

  • CPC, printps (for generating Postscript), and Adobe Distiller, which were used in combination to deliver images to users as PDF files; and

  • The Stronghold web server, which provided SSL encryption for the security of user information.

Project staff at the University of Michigan Digital Library Production Service (DLPS) wrote middleware to manage the interoperation of the tools discussed above.

Designing and maintaining the PEAK system, as well as providing user support and service for the participant institutions, required significant staff resources. Once the system was specified by the research staff, design and maintenance of the system were undertaken by a senior programmer working close to full time in collaboration with a DLPS interface specialist. DLPS programming staff contributed as needed, and the head of DLPS provided management. A full time programmer provided PEAK database support, collecting statistics for the research team and the participants, as well as maintaining the database of authorized users and the transaction database. Two librarians provided about one full-time equivalent of user support (one was responsible for the remote sites, the other for the University of Michigan community). Other UM library staff put in considerable time during the setup phases of PEAK to market the service to potential participants, some of whom required substantial education about the methods and aims of the experiment, and to formalize the licensing agreements with participants.

In order to facilitate per-article purchases, PEAK also needed to have the capacity to accept and process credit card charges. In the early months of the service, this billing was handled by First Virtual, a third-party electronic commerce company. This commercial provider also verified the legitimacy of users and issued virtual PINs that were to be used as passwords for the PEAK system. Less than half way through the PEAK experiment, First Virtual restructured and no longer offered these services. At that point, DLPS took over the processing of applications and passwords. Credit card operations were transferred to the University of Michigan Press.

We designed the system to support the planned research as well as to serve the daily information needs of a large and varied user community. Accommodating both purposes introduced a number of complexities. We sought to balance conflicting demands and to adhere to some fundamental goals:

  • Providing meaningful intellectual access via a Web interface to a very large body of content.

  • Balancing the aims of the research team with the library's commitment to making the content as easily accessible as possible.

  • Enabling and supporting a number of different transaction types, taking into account that not all users have access to all types of transactions and that the suite of transaction choices may change over time, depending on the manipulation of experimental conditions.

  • Enabling and supporting a number of different access levels, based on whether the user authenticates by password, the location of the user, the date of the material, and the type of material (e.g., full-length articles vs. other materials).

Tensions were exacerbated by our reliance on content from just one large commercial publisher and by the specific requirements for the research experiments. John Price-Wilkin, Head of DLPS, compared the production system problems to those of a standard service (Price-Wilkin, 1999):

The research model further complicates these methods for access, where all methods for access are not available to all institutions, and not all institutions choose to take advantage of all methods available to them. This creates a complex matrix of users and materials, a matrix that must be available and reliable for the system to function properly. Independence from Elsevier was critical in order for us to be able to test these models, and the body of Elsevier materials was equally important to ensure that users would have a valuable body of materials that would draw them into the research environment. The ultimate control and flexibility of the local production environment allowed the University of Michigan to perform research that would probably not have otherwise been possible, or could not have been performed in ways that the researcher stipulated.