/ Creating Wiley Interscience: Moving from Ink Molecules to Computer Bits

The World Wide Web has enabled a new way to communicate that is transforming our lives at a rapid pace. A vast field of information is now at the world's fingertips — book lovers can order their favorite titles from an electronic bookstore such as Amazon.Com, news hounds can personalize their daily newspaper, information scientists can read the latest in technology developments, and medical researchers in labs located hundreds of miles from the nearest library can access the latest innovations in medicine. The Web has increased the world's information expectations and has changed forever the infrastructure of information delivery in the science publishing community.

John Wiley & Sons, Inc., has had to develop a rapid response in order to meet the diverse demands of the customers and users of its four hundred science, technology, and medical journals. Wiley has responded with a creative, innovative and evolving Web-publishing strategy that was crafted to meet the needs of users, ease the stress on the existing print-journal infrastructure and lay the groundwork for the journal of the 21st century.

This article will review the history of Wiley's online publishing activities, describe the reasoning behind the choice of presentation format of the current offering, and touch on the issues associated with building an infrastructure that will sustain a creative, innovative, and evolving Web-publishing service to the scientific community.

Timeline

John Wiley & Sons, Inc., launched its first electronic journal, the online version of The Journal of Image Guided Surgery (recently retitled Computer Aided Surgery), in March 1995. It was one of the first peer-reviewed scientific journals that offered full HTML text online. Computer Aided Surgery was great training for Wiley production and information technology personnel: The work was largely manual and very labor intensive for the staff of five, and what we learned allowed us to begin to lay the framework for a workflow process that now serves the production of hundreds of journals. Today production of the print and online versions of Computer Aided Surgery now come from one content stream in an automated fashion.

In the fall of 1995, Wiley developed its first journal Web page — for the Journal of Computational Chemistry. That page provided information about the journal — tables of contents, editorial boards, contact information, and instructions for author — but stopped short of delivering full text in any form. As with Computer Aided Surgery, that effort was largely a manual process. Soon, however, Wiley found an outside contractor who set up an SGML system to generate tables of contents. In parallel with the outside contractor's efforts, Wiley developed an internal Web-page production system powered by an application built around the Informix Illustra database. That application allows production personnel to maintain informational Web pages for nearly three hundred of Wiley's four hundred journals.

Two years later, in January 1997, Wiley began putting Cancer on line in full text. Cancer, the official journal of the American Cancer Society, presented a unique challenge in that it publishes thirty issues and 6,000 pages annually. Both the print and online issues are produced from the same content stream using the process designed for Computer Aided Surgery. What set Cancer apart from Computer Aided Surgery was a much more demanding schedule and greater volume of content and the fact that an external contractor managed all Web production.

By the fall of 1997 we were ready to stop adding journals piecemeal, and ready to coordinate a more formalized online publishing system for our journals. That's when we introduced Wiley InterScience. (http://www.interscience.wiley.com)

Wiley InterScience is more than just a storefront for 150 journal Web pages; it offers additional content and services to journal readers, including tables of contents, article abstracts (dating back to 1996) and complete articles in Adobe Acrobat PDF (Portable Document Format) back to January 1997. Those 150 journals represent 60 percent of Wiley's journal-page output. We expect that the remaining 40 percent of Wiley's content will be on line by this fall. Today, in June 1998, the service is open to anyone who registers as a guest user; Wiley is using the information gained during this open-access period to fine-tune both the features and business terms of what will become a subscription-based offering.

The 5-Minute Wiley InterScience Overview

Currently, a user's front door to Wiley InterScience is the Personal Homepage that serves as a workspace within InterScience to organize and keep track of favorite Wiley journals, articles of interest, specific search criteria, and personal annotations to the journal articles. Next year, when we make Wiley InterScience a commercial service, we will no longer offer the Personal Homepage through institutional accounts: Libraries have told us they don't want to be responsible for keeping track of individualized services; they want to offer only the content.

Getting to the Articles

From the Personal Homepage the user may browse a list of journals by title or subject, or search the database of article metadata. Browse by Title reveals an alphabetical listing of the journals. Clicking on a journal title reveals a listing of volumes and issues available for that journal. Linked from there are tables of contents showing titles and authors. This table of contents is dynamically created from SGML metadata stored in a Sybase back-end database. Clicking on a particular article will reveal an article summary including the article abstract — with a link to "View Article". The full text of all articles published after January 1977 is available in Adobe Acrobat PDF.

"Wiley is currently building links from the nearly 300,000 cited references in 30,000 articles to abstracts of the original documents"

If, while browsing, a user finds a journal or an article that he or she would like to flag, it can be bookmarked by clicking on "Add to Hot Journals" or "Add to Hot Articles." That article or journal is then listed on the user's Personal Homepage, and the user can jump to it directly.

Users can search article metadata, and save any number of searches on the Personal Homepage, to be run as often as needed. Searching is powered by a Verity search engine.

The unusual "Article Annotation" feature allows a user to attach notes to an article, and those notes can be shared with others. That is a particularly useful feature for someone in a work group. The feature comes up with the article abstract; it presents a text box where notes can be entered, named, and saved. Annotations are then linked to the article for future reference.

Future Enhancements

The features described above are but a few of the planned services to InterScience subscribers. As we begin to analyze system usage and usability, we will add, change, or remove services according to user habits and demands. Future features may include:

  • A table of contents e-mail service
  • Supplemental material for journal articles (currently in place for the Journal of Computational Chemistry)
  • Discussion forums
  • Professional services, such as abstract submission and review
  • Author services such as manuscript status lookup and manuscript template tools

Link Services

Linking requires special mention. Wiley is currently building links from the over 300,000 cited references in 30,000 articles to abstracts of the original documents. Since Wiley publishes only a small percentage of the world's science, technology, and medical literature, access to those original documents is not totally under Wiley's control. As a result, Wiley must collaborate with other primary as well as secondary publishers to provide users the best possible access to sources.

Wiley is collaborating with other major science publishers in the development of the Digital Object Identifier, a system that will create a repository of bibliographical data with its corresponding descriptive metadata. In addition, Wiley is working with the major commercial and public abstracting and indexing services to standardize creation of sustainable links between Wiley reference citations and the source abstracts within the various services. As the industry projects progress, linking from the more than 300,000 references will grow more complex — as readers will be given the choice of going to a number of sources from the same citation.

Creating InterScience

The two major goals for Wiley InterScience are to provide a rich source of searchable information about our journals and the articles they contain, and to make those articles available electronically without sacrificing the advantages of print (especially the expression of mathematics). To accomplish those goals, we had to do a lot of work — and that work allowed us to move toward a third goal, that of developing skill, in house and in our vendors's shops, in processing SGML effectively, efficiently, and correctly.

As we began to build the foundation for InterScience, the first major decision that we made was the presentation format. Our choices were:

  • full text HTML converted from typesetting files
  • full text HTML driven by an SGML database
  • a combination of HTML "headers" (or metadata), and PDF for the full text

We made the most practical choice available to us — the combination of HTML headers and PDF full text. Our typesetters were already converting Postscript to PDF and we were already using SGML to store information about our journal articles.

Our next step will be to offer reference citations in HTML format so that we can build the rich linking described above. We are pursuing that as part of a plan to offer full-text SGML (Standard Generalized Markup Language) presentation. As part of our Web-publishing strategy, we will expand our offering of full text HyperText Markup Language (and XML) for journals steadily over the coming years. As good as PDF is, because of its static nature it is not the format of the future science journal. HTML and XML (Extensible Markup Language) will give us the ability to create hypermedia features such as interactive mathematics, illustrations, and other as-yet unimagined features.

"Our goal is to have one seamless process drive several publishing formats"

SGML

Wiley developed its first journal SGML Document Type Definition (DTD) in 1994. That DTD, which defined journal headers, was based on the European Working Group's MAJOUR DTD for article headers. Development of our DTD required that we carefully analyze all four hundred of our journal titles so that every essential element of those journals (such as journal subtitle, the ISSN of the journal as previously published under a different name, the distinction between the author's professional and personal affiliations) was accounted for in the DTD. That analysis was complicated by the diversity in our journal-publishing program, requiring in some cases subject-specific exceptions inside the DTD, such as mathematics character sets, the identification of subject-related article headings, and specific funding-agency information that is important to the author and reader, to cite three examples.

That analysis caused us to make some changes in the structure of all of our journal articles in order to create a logically consistent format that would make the print and SGML-capture process more efficient. Those changes took months to accomplish as they had to be coordinated within all the Wiley units worldwide (in the U.S., U.K., and Germany), with the various external journal editors (who retain a strong voice in the format of the journals), and with the external typesetting companies that prepare the material.

SGML codes are marked up on paper (or in many cases electronically, using Wiley's copy-editing software tools) by Wiley's independent contractors. Typesetters — as many as twenty worldwide — create the SGML either by a direct input to the typesetting system, or as the result of a conversion process out of typesetting. Currently Wiley does not dictate the method that typesetting contractors use to create the SGML, apart from secondary overall specifications. We realize that our contractors, important partners in the communication chain, need to create the most efficient process for themselves. With our contractors, we are working toward the goal of creating an SGML database that will allow us to publish our material in several different formats from one content source.

Expansion of the DTD to include the article-reference sections as well as the full body of the journal article is planned for this summer. The reference section portion of the DTD is perhaps the most difficult, as again we are faced with having to standardize the reference-section content styles of four hundred journals, which requires coordination between Wiley, the external journal editors, and in some cases scholarly societies, all of whom have different definitions of "the complete citation." A good example of those differences is that references in the physical sciences typically do not include the title of the cited paper, unlike the reference styles for life sciences or clinical medicine.

PDF Creation

Adobe Acrobat PDF allows for many enhancements as well as choices in figure resolution. Wiley's guiding principle in creating a standard PDF specification is that the PDF file size be as small as possible without creating an adverse effect on screen viewing or printing. Accordingly, we instruct our vendors to downsample both color and greyscale to 120 dpi, compress the images using Zip compression (4 bits) and reduce the color image and grey image depth to 4. Monochrome images are downsampled to 300 dpi.

"Satisfied that the content is accurate, the production editor 'publishes' a journal issue via a simple mouse click"

We do not enhance the PDF in any other way; we do not add PDF bookmarks, nor do we create any links within the document (except for some experimental titles). The PDF file has been created with the expectation that the user will print it rather than read it online. Those specifications have resulted in PDF files that average less than 500 kilobytes, yet retain sharpness in both screen and print quality. The PDF is created by typesetters from the Postscript that is used to make the printing film. Wiley worked with all of its typesetting contractors over a twelve-month period to ensure that proper quality checks were in place.

Workflow and Quality Control

After the PDF and SGML for a journal issue are complete at the typesetter (about two weeks before print mailing), typesetters transmit the files to a secure FTP server located at Wiley's data center in Somerset, New Jersey. Typesetters follow an exact file-naming standard for both the SGML and PDF to insure that each pair is correctly matched. Those files are checked automatically to make sure that the metadata — journal title, ISSN, CODEN, abbreviated title, and SGML syntax — are correct. Any errors are automatically corrected and noted in a log file. That log file provides a way to monitor quality and offer feedback to our contractors. An internal Web application (named JASPER) allows Wiley editors to do a final quality check before they put the articles on the InterScience site for public access. JASPER alerts production personnel that the files have arrived, and provides a way for the production department to review, edit and parse the SGML, and view the PDF. The incoming SGML is rendered into a more readable HTML form and the production editor can easily flip back and forth between the SGML and HTML view.

JASPER forces the production editor to view both the SGML and the PDF before clicking an approval button for both. All production activity is monitored and recorded by the system. Satisfied that the content is accurate, the production editor "publishes" a journal issue to Wiley InterScience via a simple mouse click. That copies the files to an InterScience file tree, where a program picks up incoming material, loads the SGML into a Sybase database, and creates a pointer to the PDF file. Creating that internal system has provided us with the experience and perspective that will help us to design other workflow applications in the future — in addition to revolutionizing the distribution process the web is proving an invaluable tool in the production process.

The entire process allows Wiley to keep separate databases for archival and Web publishing purposes. Although we in effect double our storage costs, we have created crucial redundancy for our journal content.

Computing and Telecommunications Infrastructure

Wiley InterScience and the journal content warehouse that serves it are supported by two multi-processor Sun Enterprise 4000 servers. Each server has one gigabyte of RAM, and 40 gigabytes of hard-disk storage. InterScience is connected to the Internet via a single T-1 connection. Wiley's European operations in Chichester, U.K., and Weinheim, Germany, are connected to the data center via a private frame-relay network to facilitate moving data from those operations. That infrastructure has been made possible by the experienced computer and telecommunications operations that Wiley has had in place for decades.

People Issues

Training, Retaining, Retooling

Just as the automobile industry responded to the oil crisis in the 1970s by retooling operations to produce more fuel-efficient vehicles, Wiley has recently initiated a publishing-process re-engineering that will result in a more efficient publishing engine. We expect to make changes over the next five years that will allow us to treat the content of our journals as bits of data rather than as molecules of ink on paper. To do that, we will automate a great deal of our work.

"Nearly a century passed without any wholesale change to the journal-publishing process"

Automation requires both training and thought, and one of the greatest challenges in creating Wiley InterScience has been addressing the personnel, training, and process issues that have resulted from the need to create new production and distribution systems. Although many of the skills used in traditional publishing can be carried over to the digital world, our biggest hurdle was training a large, internationally dispersed publishing production staff (seventy in New York, one hundred worldwide) to understand the esoterica of SGML. Wiley provided training classes through outside consulting groups over a one-year period.

Once staff are trained in Web and SGML skills, it is important that they are retained in order to sustain the quality that has become the hallmark of Wiley publications. In the face of growing demand for their skills in a competitive job market, Wiley has created a working environment that is at once rewarding and challenging, and Wiley is committed to hiring and retaining a professional publishing staff. Staff are encouraged to further their education in technical as well as traditional publishing skills through a liberal education-subsidy program that is supplemented by in-house training seminars sponsored by the Human Resources Department. It has been gratifying to work with staffers who have embraced change, and who have re-engineered themselves as professionals for the future.

The Web is changing scientific communication and the way we produce and deliver scientific information in professional journals. When one considers the history of journal publishing and the fact that nearly a century passed without any wholesale change to the process, it is amazing to reflect on the accomplishments of the industry over the past four years.

Publishing, particularly scholarly publishing, is a complex communication process involving people and technology. Changes to the process must be efficient and sustainable. Wiley's strategy for producing the journal of the 21st century will continue to evolve to meet customer needs. More importantly, we will ensure that our processes will maintain quality-publishing standards during this rapid period of change.

While the individual Web site is the yardstick by which Web publishing is often measured, Wiley recognizes that the long-term viability of useful Web products will be secured only by a strong back-end operation and a devoted staff. Experience has taught us that it is rather straightforward to create innovative Web sites for a handful of journals, yet a big challenge to create and sustain hundreds of Web sites with tens of thousands of articles and associated reader services that will endure well into the 21st century.



Gerry Grenier is Director of Development for Wiley InterScience, developing the online features and reader services for John Wiley's 400 science, technical and medical journals. He has spent the past 18 years at Wiley in various production and technology-related jobs. From 1993 to 1997 he was Director of Journal Production Technology, where played a hands-on role in spearheading Wiley's SGML and journal Web development.