The volatile environment of the Web poses several challenges to developers of commercial services. Offering a service that works well for a heterogeneous group of widely dispersed users leads us to pay special attention to client-software diversity, fluctuating network connections, and amounts of data to be transferred. Development decisions that may be practical for a service delivered to a controlled user base or via a limited intranet may be impractical for a service with a global audience and network.

The Research Libraries Group (RLG) [formerly http://www.rlg.org] offers access to its data resources through an online search-and-retrieval system called "Eureka" [formerly http://www.rlg.org/eurekaweb.html]. The content has expanded in recent years from bibliographic metadata to include richly structured text and digital images. The first version of Eureka, built prior to the explosion of Web applications, used the telnet protocol and provided a VT-100 interface to databases of library-cataloging and article-citation records. A Web version of Eureka was created in 1996, with access to the same databases as the telnet version. Eureka has now been extended to provide access to SGML-encoded full text and to still and motion images and sound files.

Here we discuss approaches RLG's Eureka designers have taken to:

  • support a variety of Web browsers
  • keep the network connections to a minimum
  • shorten network paths
  • manage the delivery of large files
  • choose wisely among a range of digital-object file formats
  • provide reliable access to distributed resources

Set a Browser Baseline

As Web developers, we need to design services that can be used with as wide a range of Web browsers as is practical. Set the bar too high, and your service's customer base will be too limited. Set the bar too low, and you may not be able to deliver the functionality demanded by your customers.

Although the Web site that explains RLG's programs and initiatives is designed to be used by as many browsers as possible, we decided to set the browser-compatibility level higher when we developed and released the Web version of the Eureka search service. In order to make a place for help messages, to update help as options are selected, and to manage opening and closing of a separate help window, we chose to incorporate the JavaScript scripting language in the design. That scripting language was first supported in Netscape Navigator 2, and Eureka is accessible with that browser and all later Netscape releases. We expected to offer support with Microsoft Internet Explorer 3, but that version did not offer enough JavaScript functionality. Internet Explorer 4 resolved those scripting issues and Eureka accepts connections from it.

"It isn't sufficient to test with just Navigator 4"

That decision is a good example of the sort of dilemma that faces every Web developer when service enhancements are being considered. Do you forgo enhancements that would improve the interface for the majority of users, or knowingly restrict access for a portion of the potential user base? In the case of Eureka, we found the interface improvements that were possible with a judicious use of JavaScript and HTML frames were worth the restrictions on supported browsers. We were in a better position to make that decision than some Web developers might be, in that RLG continues to support the telnet version of Eureka for anyone unable to use the Web version. Without that alternative, we would have considered offering a version of Eureka that did not incorporate HTML frames or JavaScript.

The newest version of Eureka raises the browser-compatibility bar another notch. To take advantage of image replacement and HTML tables functionality not present in Navigator 2, the minimum Netscape Navigator version for the most recent Eureka version is 3.0 or higher (Internet Explorer 4 support continues).

Reduce Network Connections

An obvious visual difference between the latest version of Eureka and earlier versions is that frames are more widely used to provide visual stability and persistent tools for navigation and orientation. That kind of visual persistence has advantages for usability that are widely recognized. We've found that we can build on the same framework to improve network connectivity.

Combining JavaScript and HTML frames is an effective technique for reducing the number of network connections in a search-and-retrieval service. For components of the user interface that will be requested many times, but that do not change each time they are requested (such as online-help components, search screens, and options displays), we've found that delivering the source HTML as JavaScript objects at the start of the session, and retaining it for later use in an HTML frame of the user interface greatly reduces network traffic. Those components of the user interface are then read directly from the frame in which they are stored, and displayed in another without requiring another connection by the user across the network to the server.

That combination of frames and JavaScript immediately raises the stakes when it comes to systematic, detailed testing. Much more attention must be paid to testing the user interface in different browsers. We've found that testing must be taken to the Manufacturer/Model/Version/System level. It isn't sufficient to test with just Navigator 4, you must test with 4.0 through the latest 4.x release, test on all supported platforms (Windows 3.11, Windows95/NT, Macintosh, etc.), and continue that same approach with all other supported browsers. Careful use of JavaScript functionality that is well established across a range of browsers, and adherence to such standards as currently exist for scripting, pays dividends when testing time arrives.

Provide a Variety of Network Paths

RLG has offered services over the Internet since 1988, and since then the network has dramatically evolved. One continual issue has been the fluctuating stability of Internet connection paths. Although the physical location of individual systems becomes nearly meaningless in the Internet landscape, we have found that access to our services is greatly improved by offering multiple Internet paths to our hosts. In recent years, RLG has supplemented our primary Internet Service Provider (ISP) with direct connections to a second international ISP, to an ISP with network facilities concentrated in the United Kingdom, and to an ISP providing access to educational institutions in New York state. Each new path is a more direct route to our systems for some users.

That approach offers the benefits of "mirroring" — faster response over a shorter path — but differs fundamentally in that only one instance of the host needs to be maintained. The one system can be accessed by multiple network interfaces, and so appears to be directly connected to a set of different and important ISPs.

It's not practical or feasible to arrange direct connections to many ISPs, but by surveying the customer base and selecting wisely, it is possible to come up with a manageable set of connections. The investment in upgrading the host network to manage multiple network interfaces and cost of maintaining links to multiple ISPs is offset by increased use of the services offered via those paths.

Employ Progressive Delivery and Surrogates

The Eureka service has the potential for producing large amounts of HTML-encoded text. A search in the 30-million-plus title RLIN Bibliographic file can easily result in many thousands of individual records. Each record, in its fullest display format, can be 1,000 characters or more.

"Transferring the entire guide to the user can be too much information, too soon"

To manage the user's interaction with the service, Eureka sets a maximum number of records that can be returned to the browser at any one time, and includes navigation tools for moving through large result sets. That is a common approach in online search-and-retrieval systems. A recent Eureka enhancement allows the user to set both the number of records to be returned and their sort order.

In the latest version of Eureka, where access is provided to SGML-encoded guides to archival collections and to digital images of museum objects with extensive documentation, the familiar concepts of brief displays and result-set management needed to be extended.

A large digital collection guide, or finding aid, has delivery issues comparable to a large result set of records. Transferring the entire guide to the user can be too much information, too soon. Not only is there a practical limit that may be exceeded in transferring the file, the user is likely to prefer less information at first, with navigation tools to find the relevant components of the guide. Our approach is to offer a brief record with selected fields from the guide, accompanied by a small portion of the guide's Scope and Content note. Once the user makes a decision to obtain the whole guide, we transfer a table of contents and one piece of the guide. The table of contents can then be used to display other pieces, reducing the amount of data sent at any one time.

For digital images of museum objects, we provide a sequence of image sizes, beginning with a "thumbnail" size to supplement a brief display of textual data; larger image sizes suitable for inspection or presentation can be requested separately as needed. Since network connections and user needs vary, we also offer the options of text without images (for slow connections), or images with very minimal text (for scanning large numbers of images).

Rely on the Baseline Browser

Eureka now includes access to text files encoded in SGML and images formatted as TIFFs. Those file formats are not viewable directly in the Web browsers supported by our services, but browser "helper" applications and "plug-ins" are readily available for both formats.

Our experience has shown that a default view of those digital objects is best made available in a format that the browser can display on its own. For SGML documents, we provide a form of the document converted to HTML as the default view, while offering a user-settable option to view documents in their native SGML form. For TIFF files, we convert the images to either GIF or JPEG (as appropriate), and also offer access to the images in their original format.

That compromise enhances the service with a default view that any Web browser can handle (especially important to first-time or casual users of the service), and still provides an opportunity to view those objects more or less as their providers intended (depending on the capabilities of the viewer).

Balance Distributed Storage and Integrated Access

Eureka now provides integrated access to archival collection guides that reside on servers at various institutions across the Internet. The design of the system includes a centralized index, housed at RLG, that provides search access to the SGML-encoded guides. We initially assumed that once a search had identified relevant guides, the default access path to the guides would be via direct connections from the user's browser to the server on which the guide was stored.

Our experience in developing the service has confirmed that it is possible to maintain an up-to-date index for search access to the guide text. For various reasons we've discovered that it is preferable to offer an HTML version of each guide using a converted copy stored on an RLG server, at least for the time being. Access to the SGML guides on remote servers can be too intermittent, for mostly administrative reasons, to support a commercial service that is expected to be available 24 hours a day. A user requirement to highlight search terms within the text of the retrieved guides requires Eureka to manipulate the guide text on the fly before it transmits the results to the user. In addition, our preliminary SGML-to-HTML conversion facilitates the delivery of the guide in pieces of manageable size.

"We have been conservative about file formats and browser requirements"

We are able to keep the contents of the delivered guides and their access points current with frequent reindexing of the guide servers, and we retain a user option for retrieving the source SGML-encoded guide from the remote server (for those so inclined and enabled). That approach leverages some of the strengths of both a consolidated "union" database and remote networked servers, offering a system with low administrative overhead for the users and the guide contributors.

From a production perspective, access to completely distributed SGML doesn't seem sufficiently reliable to be the basis of a commercial service at the present time. We do expect that, in the future, building a system with remotely stored objects in a more completely distributed fashion will be increasingly practical for RLG. Several initiatives are helping to prove the effectiveness of that concept: The Arts and Humanities Data Service (funded by the Joint Information Systems Committee of the U.K.'s Higher Education Funding Council) and Project Isaac (based in the Computer Sciences Department at the University of Wisconsin, Madison) are two interesting examples.

Conclusion

We have been able to deliver commercial Web services in an acceptably reliable way on today's Internet to users whose browser choices we cannot really influence. In order to do that, we have been conservative about file formats and browser requirements and resourceful about managing results and network connections. We expect to be able to rely more and more on functionality at the browser end of the connection, rather than the server end, although we don't expect to depend on features present only in recent releases. We'll continue to prefer conventional forms of navigation to novel ones, and we'll face the challenges of network reliability with creativity.

For more information:



At the Research Libraries Group—a not-for-profit membership corporation of institutions devoted to enhancing access to information that supports research and learning—Arnold Arcolio helps to design and develop new digital services, including RLG's forthcoming Museum Resources service and other resources with web interfaces. Previously at RLG, he helped to fashion Windows and telnet interfaces. He came to RLG in 1989 to be a technical trainer, from background in the history of the book. You may contact him by e-mail at bl.aja@rlg.org.

Bruce Washburn has worked for and with libraries for 20 years. He's been with the Research Libraries Group, Inc. since 1987, initially as one of the first members of the RLIN Information Center, and more recently as a programmer and web designer helping to develop web interfaces to RLG's online services. He's currently working on the development of a new version of RLG's Eureka service and the formation of RLG's new Archival and Museum Resources.