All Things to All People: Choosing Delivery Mechanisms For Commercial Web ApplicationsSkip other details (including permanent urls, DOI, citation information)
This work is protected by copyright and may be linked to without seeking permission. Permission must be received for subsequent distribution in print or electronically. Please contact firstname.lastname@example.org for more information. :
For more information, read Michigan Publishing's access and usage policy.
The volatile environment of the Web poses several challenges to developers of commercial services. Offering a service that works well for a heterogeneous group of widely dispersed users leads us to pay special attention to client-software diversity, fluctuating network connections, and amounts of data to be transferred. Development decisions that may be practical for a service delivered to a controlled user base or via a limited intranet may be impractical for a service with a global audience and network.
The Research Libraries Group (RLG) [formerly http://www.rlg.org] offers access to its data resources through an online search-and-retrieval system called "Eureka" [formerly http://www.rlg.org/eurekaweb.html]. The content has expanded in recent years from bibliographic metadata to include richly structured text and digital images. The first version of Eureka, built prior to the explosion of Web applications, used the telnet protocol and provided a VT-100 interface to databases of library-cataloging and article-citation records. A Web version of Eureka was created in 1996, with access to the same databases as the telnet version. Eureka has now been extended to provide access to SGML-encoded full text and to still and motion images and sound files.
Here we discuss approaches RLG's Eureka designers have taken to:
- support a variety of Web browsers
- keep the network connections to a minimum
- shorten network paths
- manage the delivery of large files
- choose wisely among a range of digital-object file formats
- provide reliable access to distributed resources
Set a Browser Baseline
As Web developers, we need to design services that can be used with as wide a range of Web browsers as is practical. Set the bar too high, and your service's customer base will be too limited. Set the bar too low, and you may not be able to deliver the functionality demanded by your customers.
"It isn't sufficient to test with just Navigator 4"
The newest version of Eureka raises the browser-compatibility bar another notch. To take advantage of image replacement and HTML tables functionality not present in Navigator 2, the minimum Netscape Navigator version for the most recent Eureka version is 3.0 or higher (Internet Explorer 4 support continues).
Reduce Network Connections
An obvious visual difference between the latest version of Eureka and earlier versions is that frames are more widely used to provide visual stability and persistent tools for navigation and orientation. That kind of visual persistence has advantages for usability that are widely recognized. We've found that we can build on the same framework to improve network connectivity.
Provide a Variety of Network Paths
RLG has offered services over the Internet since 1988, and since then the network has dramatically evolved. One continual issue has been the fluctuating stability of Internet connection paths. Although the physical location of individual systems becomes nearly meaningless in the Internet landscape, we have found that access to our services is greatly improved by offering multiple Internet paths to our hosts. In recent years, RLG has supplemented our primary Internet Service Provider (ISP) with direct connections to a second international ISP, to an ISP with network facilities concentrated in the United Kingdom, and to an ISP providing access to educational institutions in New York state. Each new path is a more direct route to our systems for some users.
That approach offers the benefits of "mirroring" — faster response over a shorter path — but differs fundamentally in that only one instance of the host needs to be maintained. The one system can be accessed by multiple network interfaces, and so appears to be directly connected to a set of different and important ISPs.
It's not practical or feasible to arrange direct connections to many ISPs, but by surveying the customer base and selecting wisely, it is possible to come up with a manageable set of connections. The investment in upgrading the host network to manage multiple network interfaces and cost of maintaining links to multiple ISPs is offset by increased use of the services offered via those paths.
Employ Progressive Delivery and Surrogates
The Eureka service has the potential for producing large amounts of HTML-encoded text. A search in the 30-million-plus title RLIN Bibliographic file can easily result in many thousands of individual records. Each record, in its fullest display format, can be 1,000 characters or more.
"Transferring the entire guide to the user can be too much information, too soon"
To manage the user's interaction with the service, Eureka sets a maximum number of records that can be returned to the browser at any one time, and includes navigation tools for moving through large result sets. That is a common approach in online search-and-retrieval systems. A recent Eureka enhancement allows the user to set both the number of records to be returned and their sort order.
In the latest version of Eureka, where access is provided to SGML-encoded guides to archival collections and to digital images of museum objects with extensive documentation, the familiar concepts of brief displays and result-set management needed to be extended.
A large digital collection guide, or finding aid, has delivery issues comparable to a large result set of records. Transferring the entire guide to the user can be too much information, too soon. Not only is there a practical limit that may be exceeded in transferring the file, the user is likely to prefer less information at first, with navigation tools to find the relevant components of the guide. Our approach is to offer a brief record with selected fields from the guide, accompanied by a small portion of the guide's Scope and Content note. Once the user makes a decision to obtain the whole guide, we transfer a table of contents and one piece of the guide. The table of contents can then be used to display other pieces, reducing the amount of data sent at any one time.
For digital images of museum objects, we provide a sequence of image sizes, beginning with a "thumbnail" size to supplement a brief display of textual data; larger image sizes suitable for inspection or presentation can be requested separately as needed. Since network connections and user needs vary, we also offer the options of text without images (for slow connections), or images with very minimal text (for scanning large numbers of images).
Rely on the Baseline Browser
Eureka now includes access to text files encoded in SGML and images formatted as TIFFs. Those file formats are not viewable directly in the Web browsers supported by our services, but browser "helper" applications and "plug-ins" are readily available for both formats.
Our experience has shown that a default view of those digital objects is best made available in a format that the browser can display on its own. For SGML documents, we provide a form of the document converted to HTML as the default view, while offering a user-settable option to view documents in their native SGML form. For TIFF files, we convert the images to either GIF or JPEG (as appropriate), and also offer access to the images in their original format.
That compromise enhances the service with a default view that any Web browser can handle (especially important to first-time or casual users of the service), and still provides an opportunity to view those objects more or less as their providers intended (depending on the capabilities of the viewer).
Balance Distributed Storage and Integrated Access
Eureka now provides integrated access to archival collection guides that reside on servers at various institutions across the Internet. The design of the system includes a centralized index, housed at RLG, that provides search access to the SGML-encoded guides. We initially assumed that once a search had identified relevant guides, the default access path to the guides would be via direct connections from the user's browser to the server on which the guide was stored.
Our experience in developing the service has confirmed that it is possible to maintain an up-to-date index for search access to the guide text. For various reasons we've discovered that it is preferable to offer an HTML version of each guide using a converted copy stored on an RLG server, at least for the time being. Access to the SGML guides on remote servers can be too intermittent, for mostly administrative reasons, to support a commercial service that is expected to be available 24 hours a day. A user requirement to highlight search terms within the text of the retrieved guides requires Eureka to manipulate the guide text on the fly before it transmits the results to the user. In addition, our preliminary SGML-to-HTML conversion facilitates the delivery of the guide in pieces of manageable size.
"We have been conservative about file formats and browser requirements"
We are able to keep the contents of the delivered guides and their access points current with frequent reindexing of the guide servers, and we retain a user option for retrieving the source SGML-encoded guide from the remote server (for those so inclined and enabled). That approach leverages some of the strengths of both a consolidated "union" database and remote networked servers, offering a system with low administrative overhead for the users and the guide contributors.
From a production perspective, access to completely distributed SGML doesn't seem sufficiently reliable to be the basis of a commercial service at the present time. We do expect that, in the future, building a system with remotely stored objects in a more completely distributed fashion will be increasingly practical for RLG. Several initiatives are helping to prove the effectiveness of that concept: The Arts and Humanities Data Service (funded by the Joint Information Systems Committee of the U.K.'s Higher Education Funding Council) and Project Isaac (based in the Computer Sciences Department at the University of Wisconsin, Madison) are two interesting examples.
We have been able to deliver commercial Web services in an acceptably reliable way on today's Internet to users whose browser choices we cannot really influence. In order to do that, we have been conservative about file formats and browser requirements and resourceful about managing results and network connections. We expect to be able to rely more and more on functionality at the browser end of the connection, rather than the server end, although we don't expect to depend on features present only in recent releases. We'll continue to prefer conventional forms of navigation to novel ones, and we'll face the challenges of network reliability with creativity.
For more information:
The Research Libraries Group, Inc. [formerly http://www.rlg.org]
Eureka [formely http://www.rlg.org/eurekaweb.html and http://www.rlg.org/arrhome.html
Arts and Humanities Data Service
Internet Scout Project Isaac
At the Research Libraries Group—a not-for-profit membership corporation of institutions devoted to enhancing access to information that supports research and learning—Arnold Arcolio helps to design and develop new digital services, including RLG's forthcoming Museum Resources service and other resources with web interfaces. Previously at RLG, he helped to fashion Windows and telnet interfaces. He came to RLG in 1989 to be a technical trainer, from background in the history of the book. You may contact him by e-mail at email@example.com.
Bruce Washburn has worked for and with libraries for 20 years. He's been with the Research Libraries Group, Inc. since 1987, initially as one of the first members of the RLIN Information Center, and more recently as a programmer and web designer helping to develop web interfaces to RLG's online services. He's currently working on the development of a new version of RLG's Eureka service and the formation of RLG's new Archival and Museum Resources.