EPUBs are an experimental feature, and may not work in all readers.

Abstract

The set of NLM DTDs have emerged as a de facto standard content interchange for STM publishers over the past several years. Recently, the ACS Publishing Division has utilized customized forms of these DTDs made public by the NLM to implement XML-based publishing processes for our chemistry-related journals, books, and magazine publications. In this paper, we look at the drivers behind our decisions of whether customizations should be made, and if so, how much customization is needed, to meet the needs of our publication processes. To frame the discussion of the various customizations, we also offer the concepts of a customization level, a customization implementation method, and a customization profile. At the end, we share some of the successes and lessons from our experiences.

Introduction

When considering standards that apply to technology or processes (such as to publishing), the Merriam Webster Dictionary offers a useful definition:

standard: something established by authority, custom, or general consent as a model or example[1]

However, is it always an implied truism that standards should never be modified or extended? Should the concept of a “customized standard” be considered as an oxymoron? In this paper, we suggest that customizations of standards are sometimes acceptable, and at times, even preferred.

Standards bodies such as NISO, W3C, and etc. rarely develop updates to standards in a vacuum, considering only academic or abstract rationales for making a change. In practice, it often happens that more prominent sets of customizations and extensions made to a current standard by third parties become candidate features in a subsequent version of the core standard itself. We saw this in the progressive development of the HTML standard: During the “Browser Wars,” companies behind mainstream web browsers sought to innovate by creating extension beyond the core HTML standard of the time, and many of these extensions were since folded into later versions of the HTML standard by W3C.[2],[3],[4]

Since the introduction of the first “NLM DTD” in 2003, the set of DTDs developed by the National Center for Biotechnology Information (NCBI) at the National Library of Medicine (NLM) has steadily gained acceptance as a quasi-standard for using XML markup to produce, exchange, and archive journal articles and books within the scientific, technical, and medical (“STM”) publishing spaces. Now referred to as “JATS,” the Journal Article Tag Suite,[5] the underlying NLM DTD components are on their way to being codified as an actual NISO standard.[6] In fact, see the companion article in this issue from Jeff Beck that covers NISO Z39.96.

This paper presents a case study of why the Publications Division of the American Chemical Society (“ACS Pubs”) has chosen to customize versions of the NLM DTDs for use in the production and delivery of ACS Pubs’ content products.

After a quick introduction of ACS Pubs as a publisher and our three primary content product types (journals, books, magazine), we introduce a set of concepts and terminology that we’ll use to profile our own customizations. The remainder of the paper discusses, for each of the three core ACS Pubs product types, what drivers led to the decisions to implement customizations and to determine how much customization was needed. Finally, we share some lessons learned and successes that we experienced along the way.

Note that for the purposes of this paper, we use generalized definitions for the terms tag, tag definition, tag set, module, and schema; for some readers with detailed experience with XML and DTDs, these terms might otherwise conjure up more specific technical definitions than what is intended here.

ACS and ACS Publications

The American Chemical Society (“ACS”) is a professional membership organization, chartered by the U.S. Congress in 1876, representing over 160,000 professionals at all degree levels and in all fields of chemistry and sciences that involve chemistry. Primary ACS divisions include Membership and Programs; Chemical Abstracts Service (“CAS”), a secondary publisher of chemical-related data, information abstracts, and databases; and the Publications Division (“ACS Pubs”).

ACS Pubs Product Types

ACS Pubs publishes three types of content products.

  • Journals: Forty peer-reviewed journals cover all types of chemistry and cross-disciplinary fields. Annually, they represent 300,000 published print pages. A number of journals, representing about half of ACS’s annual volume, are published on a weekly basis, while others are published on monthly or bi-weekly schedules.
  • Books: An average of 35 peer-reviewed books are edited, compiled, and published annually through the Symposium Series. Each book averages around 25 chapters, and while each chapter is independently authored and submitted, the whole book is the primary focus of publication and is delivered as a cohesive, publishable work.
  • Magazine: ACS’s flagship magazine, Chemical & Engineering News (“C&EN”), is published weekly and it is sometimes likened to a Businessweek for those in chemistry-related disciplines. In addition to specific issues published in print and online, C&EN also publishes daily news articles online. In contrast to ACS’s journals and books, C&EN content is largely written by staff or contract writers. Highly designed and styled pages also required additional considerations when working with XML.

Each of these three product types has incorporated XML into its content production processes to some degree, allowing each to distribute its content in multiple forms: print, online, mobile, etc. Each product type has unique characteristics, however, that influence the design of the XML that is produced.

Customization Terminology

Before discussions of ACS’s customizations of NLM tag sets, we want to lay out some terminology that we will use as a framework within that discussion.

Customization Levels

One could consider that the possible approaches to customizing a standard could occur along a spectrum between two ends.

  • On one end, no customizations are applied and the standard is used precisely as-is.
  • On the other end, only the general principles behind the standard are used to “inform” the design of one’s own approach; the actual standard isn’t used at all.

In reality, ACS’s application of the NLM tag sets occur somewhere between these extremes. From our own anecdotal experience and discussions with other organizations, we suspect that many other XML publishing applications also fall somewhere between these extremes.

We will give names to a few points along this spectrum; we will call these customization levels.

  1. As-is: The standard is used without changes or modification.
  2. Extended: A superset of the standard is used, with additional but optional features defined. Any XML that is valid according to the standard would also be valid to the extended version of the schema, although the reverse may not be true.

    For example, the definition of tag <xyz> in a standard schema may allow tags <a> and <b>, while the extended version defines tag <xyz>, allowing tags <a>, <b>, and new tag <c>.

  3. Reduced: A subset of the standard is used, with some features removed or made unavailable to the application. Any XML that is valid for the standard may not be valid to the reduced version, although the reverse would be true.

    For example, the definition of tag <xyz> in a standard schema may allow three child tags <a>, <b>, and <c>, while the reduced version defines tag <xyz> as only allowing tags <a> and <b> and omitting the option for <c>.

  4. Customized: Modifications to the standard are a combination of both extensions and reductions. A “customized” schema still uses the same tag names and similar tag hierarchies as in the standard but has changes that are a result of a combination of extension and reduction. XML that is valid for the standard is likely not valid to the customized schema, and likewise XML that is valid to the customized schema is likely not valid to the standard.

    For example, the definition of tag <xyz> in a standard schema may only allow tags <a> and <b>, while the customized version defines tag <xyz> as allowing only tags <b> and <c>.

  5. Built from: Modifications made to the standard are more substantial, including changes to existing features. A schema at this level could include renamed tags and non-trivial changes to the tag hierarchy, etc. Many of the modules from the standard are used as the starting point for a schema in this category.

    For example, the definition of tag <xyz> in a standard schema may allow child tags <a> and <b>, while the customized version renames the tag as <abc> and only allows tags <a> and <b> to exist when grouped within <c>: <abc><c><a/><b/></c></abc>

  6. Informed by: The standard may be referenced during the design of the application but is not directly used in its implementation. A schema at this level could occasionally share similar names or models to the standard, but they are not defined using the same modules.

The following table illustrates and summarizes the customization levels by comparing a standard XML tag set to a customized version.

Table 1. Customization Levels
Table 1. Customization Levels

Customization Implementation Methods

In addition to defining various levels of customization, we further recognize two basic approaches to implementing these customizations within an XML tag set (with the exception of the “as-is” and “informed by” levels). Note that the set of implementation method options may differ for other types of standards customizations, depending on the type of technology involved.

  1. Overrides. With this approach, the standard tag set files are not directly modified but are left intact. Any customized tag definitions are located within separate, schema-specific modules that are positioned within the schema to “override” the standard versions of the same tag definition.

    An advantage to this approach is a sharper divide between customizations and original tag set definitions. The original module from the standard tag set is not modified in this approach, so any other schemas that use the standard module will continue to pick up the correct tag definitions.

    However, this approach does require that the original tag set was specifically developed to allow “override” capabilities. The NLM tag sets are fortunately and thoughtfully designed with many override capabilities.

  2. Modifications. In contrast, a rather simple and straightforward approach is to merely edit the tag definitions within the actual modules provided with the standard tag set. An advantage of this approach is that it can be done rapidly with little understanding of the underlying design of the standard tag set.

    However, one drawback to this approach is that it may not be obvious to other future schema developers if a given module is the original version or a customized version. Another drawback is the need to carefully manage versions of a module. Otherwise, an application that has awareness of several different schemas, where those schemas “share” common modules (for example, a book and journal schema share a “paragraph” module), may pick up the wrong version of a module when attempting to load a given schema. This could lead to runtime errors or unintended consequences for applications like composition systems and XML editors that are often configured to handle different types of content.

    These two implementation methods are not mutually exclusive, and a given tag set or schema could be customized using some combination of both Modifications and Overrides.

Customization Profile

A customization profile for a tag set/schema can be defined using the combination of the customization levels and customization implementation methods defined above. A two-dimensional chart that contains the possible combinations can be formed using the customization levels on the Y-axis and customization implementation methods on the X-axis. Within this chart, customization implementation methods would not apply to the extremes of the Y-axis customization spectrum: Neither the “as-is” nor the “informed by” customization levels would utilize the customization implementation methods on this scale.[7]

Table 2. Tag Set/Schema Customization Profile
Table 2. Tag Set/Schema Customization Profile

Below we will discuss how the three ACS product types fit in to this table.

ACS Pubs’ Use of NLM Tag Sets—Overview

When we set out to implement XML-based production for each of our primary product types (journals, books, magazine), we were faced with answering questions such as:

  • Whether to leverage a standard schema or develop one from scratch
  • If utilizing a standard schema, whether customization was needed (i.e., where do we land on the customization levels spectrum)
  • If customization was needed,
    • How much customization was needed
    • What customizations are needed
    • How to implement the customizations (i.e., where do we land on the customization implementation methods spectrum?)

For each product type, we weighed four primary factors when attempting to answer the questions above.

  1. ACS-specific product requirements: What unique characteristics does ACS Pubs have for journal article, book chapter, or magazine story print/page layout? For online HTML delivery? For metadata feeds, for example, bibliographic feeds to institutions and indexing organizations? For other eventual external content consumers, such as secondary publishers or online repositories?
  2. ACS-specific process requirements: What unique process characteristics does ACS Pubs have for production and delivery of journal article, book chapter, or magazine content? (Process requirements can be rooted in either technical constraints and/or ACS business practices.)
  3. ACS-specific terminology: Over the decades, certain vocabularies have become entrenched in ACS’s products and the production processes. During the XML production implementation efforts, our philosophy was that the XML vocabulary should fit ACS’s existing terminology—when appropriate—instead of trying to adapt ACS products or processes to a new vocabulary articulated within a standard schema. An XML vocabulary with a radically new terminology would have, at best, dramatically increased staff training requirements, and at worst, inhibited the success and acceptance of the overall project.
  4. Availability of applicable standard schemas: Considering ACS’s product, process, and terminology, which of the publically available schemas comes the closest to meeting ACS’s needs for the given product type (journals, books, magazine)?

Journals: ACS Pubs’ Use of NLM Journals Tag Sets

What We Use

In 2005, the ACS Pubs assembled a panel of ACS participants and STM markup experts to determine current and future product, process, and terminology requirements for the journals’ product line. The ACS panel represented all areas of the journals program, including production, electronic delivery, sales and marketing, editorial office operations and peer review, IT, and product development.

At the time of that effort, the NLM journal tag sets were just starting to emerge as a de facto standard in modeling journal articles within the STM publishing community. This status, combined with a relatively close match in modeling the journal article constructs already used by ACS journals, made the NLM tag sets a natural top candidate. However, as the NLM tag sets were primarily intended to be a common interchange format between publishers and archives, and not specifically intended to support the production of any journal or publisher, [8],[9] direct “as-is” use of the NLM journal tag sets or one of their schemas was unlikely without at least some level of customization.

The recommendation by this panel was to develop an ACS-specific DTD, built on an ACS tag set that was loosely based on version 2.3 of the NLM Journal Archiving and Interchange Tag Set. This recommendation reflected findings that ACS had non-trivial differences in both product-specific requirements and terminology when compared to the NLM tag set. ACS accepted and proceeded to implement this recommendation, resulting in a customized tag set and schema (implemented as a modular DTD).

Some examples of the notable differences between the current ACS Journal Tag Set in use within journal production and the NLM Journal Archiving and Interchange Tag Set (version 3.0) are given here.

  • To retain ACS production terminology in use, ACS Pubs opted for the more generic term of <document> instead of using the NLM <article> as the root element.
  • ACS Pubs also retained product-related terminology of “figure,” “chart,” and “scheme” by defining separate <fig>, <scheme>, and <chart> tags instead of using NLM’s single <fig> tag having a @fig-type attribute. The differences go beyond mere naming, however. For example, per ACS journal style, a <fig> is displayed with its label and caption underneath the image, while a <chart> is displayed with its label and title above the image.
  • Extensions to the CALS table module were added to the ACS Journal Tag Set for handling desired indentation behavior for table columns and individual cells.
  • The model for tagging ACS references followed the highly structured tagging design of NLM’s <nlm-citation> and <element-citation> tags. However, since ACS was not seeking to adopt NLM’s specific citation style, the tag set was extended to define structured citation models that matched ACS’s three primary journal citation styles: one with article titles, one that omitted titles, and one specific to biochemistry journals.
  • As an example of a domain-specific customization, support for a special chemical notation occasionally used in some chemical expressions called “tie-bars” was added to the ACS Journal Tag Set.

While the NLM tag set served as the starting point for development of the ACS Journal Tag Set, the original tag set modules themselves were modified to implement the ACS-specific requirements. The ACS Journal Tag Set and DTD are represented in the customization profile with a customization level of “Built from” and customization implementation method using “Modifications.”

Table 3. Tag Set/Schema Customization Profile for ACS Journals Production Tag Set
Table 3. Tag Set/Schema Customization Profile for ACS Journals Production Tag Set

Current State and Maturity Level

The ACS Journal Tag Set was finalized in late 2006, going into production in early 2007 and reflecting minor adjustments discovered during early testing. With few minor exceptions, subsequent version and patch releases have been backward compatible with the prior version, meaning that previous XML could still be validated with the newest DTD version.

The ACS Journal DTD was a monolithic schema, intended to be used for all internal ACS journal article XML operations and for distribution to our external XML consumers. As our internal applications evolved and external consumer use of our XML increased, making updates to ACS Journal DTD required ever-increasing overhead in dealing with testing, communication, and logistics of handling joint deployments across all XML applications and consumers.

With a subsequent version of our ACS Journal Tag Set (which rolled into production in late 2010), we sought to alleviate this logistical bottleneck by packaging several different schemas that all use the same tag set.

  • The External/Interchange DTD is functionally equivalent to the prior version. With the rollout of the new tag set version, we opted to only introduce the changes internally to ACS Pubs journal production; all external XML content feeds would continue to receive XML that is valid to the schema that those consumers already had.
  • The Production DTD contains the updated functionality to serve internal journal production process requirements. This DTD is coded as a superset of the External/Interchange DTD. XML content intended for delivery to external consumers is transformed into the External/Interchange DTD using XSLT scripts (losing granularity introduced in the revised tag set).
  • The Layout DTD is a specialized superset of the Production DTD that expresses an additional set of vocabulary that is focused on page layout and composition processes. For example, additional tags are defined to indicate whether a given table should be allowed to spread across the width of an entire page or constrained to fit within a single text column—information that has no meaning outside of the context of a composed page. XSLT scripts are used to automatically inject, and remove, the specialized Layout tags when content is moving to and from the basic Production schema.

The Production and Layout DTDs are implemented as customizations to the External/Interchange (“base”) DTD, defined at the “Extended” customization level and implemented using the “Override” method. The Production DTD extends the base tag set, and the Layout DTD further extends the Production DTD. Because all customizations are implemented using the “Override” method, all three DTDs live happily within the same directory and share the same core modules.

Figure 1. Architecture of current ACS Journal DTDs sharing a common tag set.
Figure 1. Architecture of current ACS Journal DTDs sharing a common tag set.

ACS’s Content Interface with NLM-Based Web Delivery System

ACS Pubs’s web delivery platform also utilizes a customized version of the NLM tag set, containing several types of documented extensions. It is this “Delivery NLM” XML that is used as the source for both generation of the HTML article content pages and the metadata that drives other features of the delivery site. Because these features were implemented without breaking compatibility with XML content that is compliant with the NLM tag set, and because we coded many of the changes directly to original standard modules, we characterize these extensions within the customization profile at the “Extended” customization level and using a “Mixed” customization implementation method.

“ACS production” XML is not functionally equal to the “Delivery NLM” XML required by our delivery system, so a conversion process occurs during the transfer between the two systems to account for the differences in the two customized versions of the standard NLM tag set.

Table 4. Tag Set/Schema Customization Profile for the Journal Tag Set Used by the ACS Journal Delivery System
Table 4. Tag Set/Schema Customization Profile for the Journal Tag Set Used by the ACS Journal Delivery System

Books: ACS Pubs’ Use of a Customized NLM Book Tag Set

What We Use

When ACS Pubs set out to implement an XML-first workflow for ACS Symposium Series books, a few factors played into our analysis and selection:.

  • Delivery System: We knew at the beginning of the project that we would be delivering HTML editions that leveraged ACS Pubs’ investment in our existing journal delivery platform.
  • Composition: A project goal was to implement a highly automated book page composition that used XML as its source.
  • Like Journals: We wanted to leverage as much of our journal production processes as applicable.
  • Unlike Journals: At the same time, we knew that books had unique product characteristics of their own; it was highly unlikely that we would be able to simply “shoehorn” our book production into the very same journal XML production processes and tool.
  • Book vs. Chapter: We knew that we needed to perform editing and draft pagination at the chapter level, compose and paginate at the book level, and provide a combination of both book and chapter XML deliverables to our online delivery system.
  • Learned from Experience: We wanted to take advantage of our prior experience of implementing XML-based journal production where applicable.

Because we knew from the start that our existing web delivery system already had built-in support for delivering book content using an extended version of the NLM Book Tag Set, and our journals’ experience suggested that we should seek ways to minimize the amount of XML translation needed, the NLM Book Tag Set was an obvious first candidate for use in our workflow. A gap analysis between ACS Pubs’ book requirements and the NLM book tag set used by our web delivery system revealed that this tag set was indeed a very close match, with many of the extensions meeting our production needs as well. While DocBook was also briefly considered, it would have required significant staff training as well as non-trivial development of transformations to convert from production DocBook XML to the web delivery system’s extended NLM XML.

Our web delivery system’s version of the NLM book tag set (and the specific tagging conventions required to drive the delivery system) still had two primary limitations when considering it for our production use. The first is how books are linked to chapters, and the other is related to having the XHTML table model as the default table model. As a result, we made further production-focused customizations to the instance of the NLM book tag set used by our delivery system.

Customizations made to the NLM book tag set include the following.

  • We added XInclude[10] support to allow chapter XML files to both simultaneously stand alone as valid XML documents and to be linked into their parent book XML. This approach allows flexibility within production to allow editing specific chapters before the entire book is ready, while still allowing for book-level processing of all book and chapter content such as pagination and indexing. We have found the use of XInclude to be a very natural, successful, and beneficial extension to the Book tag set, and have proposed to the NLM tag set owners that inclusion of XInclude be considered in a future version of the public NLM book tag set.
  • While the NLM book tag set featured the XHTML Table Model, ACS chose to swap in the CALS OASIS Table Model in its place. We made this customization for two business reasons.
    1. The OASIS model allowed greater control and flexibility in meeting ACS book product requirements as compared to the XHTML model.
    2. Production staff already had experience and training in using the OASIS table model from our journal tag set.
  • One notable gap from the NLM book tag set is the lack of dedicated tags to facilitate the creation of an index. We added a simplified version of DocBook’s indexing tag model to meet our needs for generating the subject index section for each book.

When comparing ACS Pubs’ production book tag set to NLM’s book tag set, we characterize the full set of these extensions (whether implemented directly by ACS Pubs or within the tag set instance from our delivery system) using a customization profile with a “Customized” customization level and using a “Mixed” customization implementation method. (While almost all of the customizations could be characterized as “extensions,” one notable change—removing the XHTML table model in lieu of the OASIS table model—excludes the tag set from the “extended” level.)

Table 5. Tag Set/Schema Customization Profile for the ACS Books Production Schema
Table 5. Tag Set/Schema Customization Profile for the ACS Books Production Schema

Magazine: ACS Pubs’ Customized Tag Set for C&EN

What We Use

In 2010, the production team for our C&EN magazine began the process of implementing a specially tailored schema to their production and electronic delivery processes. This schema is based on the ACS Journal Tag Set with extensive customizations made to further meet magazine publishing requirements.

Unlike ACS’s journals and books, the driving goal was not to implement an XML-first workflow in which the XML served as the common content format within C&EN production editing and page-design activities. Indeed, we determined that introducing XML during these workflow stages would have forced a disruptive change in the production tools and processes while offering little tangible production benefit in return. Instead, the primary goal for the use of XML with C&EN was two-fold.

  1. Ability to store a “content of record” version of article content that is independent of any particular production application format or technology, thus allowing for future reuse of this content
  2. To serve as a technology-neutral, “content interchange format” to facilitate automated content delivery, such as to a web delivery platform or external syndication

The choice to use a customized version of the ACS Journal Tag Set to implement a schema occurred only after a careful evaluation of other public schemas. An emphasis was placed on the ability of the tag set to retain C&EN–specific semantics from a product perspective. The ability to use tag names consistent with C&EN–specific terminology was a plus but not a driving requirement.

While several tag sets had wide adoptance within the news and magazine communities, they were primarily intended for interchange of metadata and formatted content, with little support for capturing content semantics (without further customization). While we felt that we could meet our objectives using either “DITA For Publishers” or a customized ACS journal tag set, the latter was selected because it offered a few advantages.

  • It already offered many existing tag names that referenced terminology already familiar to C&EN staff.
  • It already had support for many C&EN product-specific content features, which were previously defined in the ACS journal tag set to handle “magazine-like” front-matter content published in some ACS journals.
  • It was already familiar to the team who was responsible for supporting ACS Pubs’ various schemas and XML implementations, resulting in a lower learning curve when implementing the needed customizations.

Some examples of the further customizations made to the base ACS Journal Tag Set to meet specific C&EN magazine requirements include the following.

  • Similarly to the ACS Book DTD, we incorporated XInclude[11] into our magazine schema to allow for a consolidated issue XML while still accommodating for easily available individual XML component articles, sub-articles, and media for reuse. The issue-level XML provides both a packaging mechanism for all the print and online articles, metadata, and media elements while also incorporating the relevant information for the table of contents.
  • Several content constructs required for the stylized nature of magazine publishing were absent in the NLM tag set. Accounting for these nuances required the addition of XML structures to account for pull quotes, eyebrow, ads, and thematic sections to categorize sub-articles. In addition, while the ACS schemas for journals and books allows for pagination information, the possibility of noncontiguous paging exists with C&EN. This required customization to allow for multiple iterations of first- and last-page pairs within the metadata for a single article XML.

When comparing the C&EN magazine schema to the NLM journal tag set on which the source ACS Journal Tag Set was based, we characterize the set of these extensions with a customization profile at a “Built From” customization level and using a “Mixed” customization implementation method.

Table 6. Tag Set/Schema Customizations Profile for the C&EN Magazine Schema
Table 6. Tag Set/Schema Customizations Profile for the C&EN Magazine Schema

Summary: The ACS Tag Set Inheritance and Interchange MapSuccesses and Lessons Learned

Figure 2. This map illustrates the progression of customizations which transformed the standard NLM tag sets into the ACS specific tag sets/schemas, and how XML content is interchanged between the various levels at ACS Pubs
Figure 2. This map illustrates the progression of customizations which transformed the standard NLM tag sets into the ACS specific tag sets/schemas, and how XML content is interchanged between the various levels at ACS Pubs

Successes and Lessons Learned

In addition to sharing how ACS Pubs has customized and applied NLM-based tag sets, we want to share some lessons that we have taken away from these experiences in customizing standard tag sets.

  1. Busting the standards “compatibility” myth: We found a common misconception, both within our organization and within the larger STM publishing community: XML is either compatible with the NLM DTD or it is not. Instead, as we have shown here, reality is more flexible: There are multiple levels of customization and compatibility that are possible. Because the NLM DTD cannot be all things to all organizations and their respective products and processes, customization to this particular “standard” tag set should be expected.

    Additionally, there is no “standard” way to tag an “NLM XML” content file. Due to the inherent model flexibility within the NLM tag sets, we find that organizations or systems that leverage the NLM tag sets will often, out of necessity, enforce additional specific product, process, or system tagging requirements that go beyond the rules encoded within NLM’s schema. The “NLM XML” produced by one organization or system may fail to meet the tagging conventions required by another organization or system—without some type of translation process between them.

    The view that the NLM Journal Archiving and Interchange Tag Set provides the basics for encoding STM content seems most appropriate to us. It promotes effective exchange of information between organizations while still being designed to allow many different styles of tagging.

  2. Moving from a monolithic, one-size-fits-all standard to specialized versions: As we described in the section titled “Journals: ACS Pubs’ Use of NLM Journals Tag Sets,” we originally had a single DTD to handle production, print composition, web delivery, content interchange with external parties, and content archival. Not only did we struggle with implementing support for increasingly widely varied and sometimes conflicting requirements, but the logistics of testing, coordinating, and distributing updates to all parties and systems also grew increasingly challenging.

    Watch for warning signs that you are overextending the use of a standard or a single version of a customized standard. It may be that creating more than one specialized version of the standard may be a better approach.

  3. Supplementing customized versions of a standard: We have found that specifying tagging requirements via a customized tag set is insufficient. Use of additional descriptive information is essential. In our case, a package consisting of these additional deliverables provides the highest level of success.

    • Documented tagging conventions, with validation tools and services as needed to enforce the conventions
    • Complete XML samples that are valid to both the schema and the documented tagging conventions
    • Providing just the schema or DTD itself to another group or organization, with no additional guidelines, provides an incomplete picture regarding how to create or use the required tagging. Documented tagging conventions will articulate how to supply or interpret tagging that will meet product or process-related requirements. Because written documentation can be incomplete or misinterpreted despite the best efforts of the author(s), sample XML files can help fill in any gaps in comprehending the requirements.

    In addition to more clearly specifying our tagging to external consumers, we also found benefits internally during the development process. The mere process of creating all three deliverables actually reinforced a comprehensive specification of our tagging. The process of creating the “convention documentation” often reveals gaps within the schema that need to be further tuned, while the process of creating XML samples often revealed gaps in the tagging conventions.

    When we are asked for “a copy of our DTD,” we do not supply just the DTD itself because this only tells part of the story; we supply the complete package of DTD, conventions, and samples. As a rule, we consider that development of an XML schema is not complete until the conventions and samples are also finalized.

    Lastly, we developed a validations service that is used throughout the production workflow to enforce requirements that a DTD could not enforce. This service helps ensure that the customized tagging conventions are actually followed by all parties modifying our XML.

  4. More semantics require more standards or extensions: In the future, we expect that a renewed focus on content interchange, with a special emphasis on capturing and exchanging higher degrees of semantic markup, will emerge both within and between organizations, and that this focus will spur additional standards or tag set extensions. Interchange between applications and services will require additional semantic conventions to be developed and shared outside of the tag sets. We expect that a need for greater exchange of common sets of semantic information will drive further extensions to the NLM tag sets and others.


Residing in Worthington, Ohio, Dan O’Brien is a Manager of Publication Production Systems at the American Chemical Society with 18 years of IT experience. He has spent most of the past seven years in a wide variety of roles within the technology and publishing groups within the ACS Publications division, leading and contributing to various markup-related initiatives.

Residing in Fripp Island, South Carolina, Jeff Fisher is a Senior Scientist at the American Chemical Society with over 30 years of IT experience. He worked for 12 years with the Chemical Abstracts Service division and has spent the last 6 years leading XML-related projects within the ACS Publications division.

Residing in Hilliard, Ohio, D. J. Haines is a Senior Systems Engineer at the American Chemical Society with 16 years of IT experience. He has spent the last five years working on the customization of XML editors and DTDs in support of workflow projects within the ACS Publications division.

Disclosure

Portions of this article were repurposed by the authors from their paper presented at NLM’s Journal Article Tag Suite Conference 2010. This paper is freely available from NCBI’s Bookshelf: http://www.ncbi.nlm.nih.gov/books/NBK47083/

Notes

    1. http://www.merriam-webster.com/dictionary/standard (accessed April 2011).return to text

    2. W3C, section “1.5, Design Notes,” HTML5. http://dev.w3.org/html5/spec/Overview.html#design-notes(accessed April 2011).return to text

    3. W3C, section “1.4, History,” HTML5. http://dev.w3.org/html5/spec/Overview.html#history-1(accessed April 2011).return to text

    4. W3C, “W3C Relaunches HTML Activity: Developers and Browser Vendors Shape HTML Future,” W3C Press Release Archive. http://www.w3.org/2007/03/html-pressrelease(accessed April 2011).return to text

    5. http://jats.nlm.nih.gov/.return to text

    6. http://www.niso.org/workrooms/journalmarkup(accessed April 2011).return to text

    7. For specifics of how the JATS authors describe customization and extensibility options to its tag set, see http://dtd.nlm.nih.gov/#custom.return to text

    8. http://dtd.nlm.nih.gov/.return to text

    9. Bill Kasdorf. “The Benefits to be Gained from the New DTD Standard,” presented at the Association for Learned and Professional Society Publishers (ALPSP) Technical Update: “A Standard XML Document Format: The case for the adoption of NLM DTD?”, December 3, 2007, London. http://www.alpsp.org/ForceDownload.asp?id=606(PDF 2.4MB)return to text

    10. The W3C XInclude specification allows multiple separate XML files to linked and then processed as a single composite XML document. For more information, see http://www.w3.org/TR/xinclude/.return to text

    11. The W3C XInclude specification allows multiple separate XML files to linked and then processed as a single composite XML document. For more information, see http://www.w3.org/TR/xinclude/.return to text