Appendix: Modifications to the ISO 12083 DTD

Since early 1996, the University of California Press has been marking up some of its books in SGML, using the ISO 12083 DTD. It has been found necessary to modify the DTD extensively. The modifications were done within the constraints specified by the ISO. Some of them might usefully be incorporated in the next version of the DTD.

What follows is a summary of the main areas in which the original DTD has been modified (see the UC DTD [formerly http://www.ucpress.edu/scan/epub/ucp_dtd.html]). For more information, send an e-mail message to tony.hicks@ucop.edu.

Front Matter

Considerable modifications were needed in the preliminary pages, particularly for the copyright page (PUBFRONT).

It is impossible to foresee all the front-matter sections that may need to be included (e.g., Note on Transliteration). Instead, it was found useful to define a general front-matter section element, FMSEC, which could then be identified by an ID attribute (e.g., <:FMSEC ID="NTRANS">).

It was also found useful to define a general front-matter list element, FMLIST, for lists of tables and illustrations.

Chapters

Additional elements are needed in the chapter opening display (e.g., subtitle).

In the printed book, the notes may need to be a section or subsection within the chapter.

The element POEM needed considerable modification, and a new element DIALOG needed to be defined.

Divisions Other than Chapters

Books in the humanities may have divisions other than chapters (e.g., acts and scenes in a play). It would be desirable to define a general element, DIV, that could then be identified with an ID attribute (e.g. <DIV ID="I.ii"> for act I, scene ii of a play).

Model Group "m.pseq"

The model group "m.pseq" defines the content of several quite different types of elements: BQ; ITEM and DD; FOOTNOTE and NOTE; and TSTUB and CELL. It would be preferable to define each type in a separate model group, because each type has a different structure.

A poetry quotation should preferably not require a P as its first subelement. A poetry quotation should not be required to contain the element POEM, because generally it is only poemlines that are being quoted, not an entire poem.

A list should preferably not require a P as its first subelement.

A table cell should preferably not require a P as its first subelement. A table cell should be able to include untagged data (e.g., a number).

Emphasis

For emphasis types, it's desirable to specify the reason for the formatting rather than the formatting itself. The following emphasis types have been found useful:

Italic: frnlang = foreign language; name = name (e.g., of a ship); wdaswd = word as word

Small caps: datetime = date (such as B.C.) or time (such as P.M.)

It was found useful to tag cited titles as T (title), with an attribute specifying the type of title, e.g., <T TYPE="book">, <T TYPE="article">, etc.

Back Matter

Some books have back-matter sections other than those specified in the ISO DTD. Examples: acknowledgments, afterword, list of contributors. It's desirable to have a general "bmsec" element that can be identified with an ID attribute as needed.

In the printed version of the book, the notes usually constitute a separate back-matter section.

Bibliography

Many modifications were found necessary in the tagging of citations. In particular, it is essential to be able to list citations with the author's name first rather than the title.

Illustrations

It is desirable to treat an illustration and its caption as a unit that can be inserted at an appropriate point in the text. For this, a new element ILLGRP (illustration group) is needed, with an ID attribute so that it can be referenced from the text. It needs to contain the empty elements PLATE and MAP in addition to FIG. It also needs to contain the elements CAPTION (a text element) and ILLUSTR (an image element, e.g., EPS or TIFF).

Universal Attributes

It would be desirable to give all elements the attributes TYPE and ID. The TYPE attribute would give flexibility by permitting similar elements to be distinguished (e.g., <TITLE TYPE="book">, <TITLE TYPE="article">, etc.) without having to create completely new element names. The ID attribute would allow specific instances of elements to be identified for typesetting, formatting, or cross-reference purposes.

Character Entities

The ISO lists of character entities referenced in the 12083 DTD have many gaps, and it has been found necessary to add a supplementary list. The Unicode character set might be a better alternative.

SGML Declaration

It was found necessary to increase the maximum value of LITLEN from 240 (in the reference concrete syntax) to 480. This change makes it possible to increase the size of some model groups so as to add the necessary number of new elements.