EPUBs are an experimental feature, and may not work in all readers.

Electronic journals offer new capabilities for communicating scholarly information to the research community. This article explores ways to optimize that communication for journals in mathematics and related sciences.

The most obvious benefit of electronic distribution of scholarly work is convenience. Papers are available instantly from home or office, 24 hours a day. But that is only the tip of the iceberg. Many other benefits come from the use of links: Internal cross-reference links make electronic formats easier to read than paper copy, and external links permit readers to travel through the primary and secondary literature without leaving their desks. Electronic media also offer more indirect methods of passing between resources: searchable databases of abstracts, full texts of papers, and on-line reviews. Also, papers may be linked to on-line archives of programs and supporting materials, which, since they are not journal articles in themselves, may be updated. Links may also be made to reviews and related material published later.

Here, I explore these issues in the context of the mathematical sciences. One of the central issues in electronic publishing in the mathematical sciences is need for complex notation, requiring fonts and typesetting commands not available to the standard Web browsers. Much of this paper will also apply to more general fields, e.g., foreign languages with non-Roman alphabets, or sciences that make use of mathematical or chemical notation in the body of papers. I will also explore the specific electronic infrastructure available in mathematics.

Economic issues are important in this analysis. The costs of academic publishing have increased dramatically in recent years, especially in the sciences. Commercial publishers charge much more than not-for-profit publishers. Electronic publication cuts some of the publishing costs and requires less start-up money than paper publication (assuming that the publisher already has a computer and the technology infrastructure, as most scholars do today), enabling new publishers to enter the game. Cost containment now becomes an important issue. Without it, these new players will be driven out of the business.

Formats of Articles

The standard paradigm for electronic publication comes from HyperText Markup Language, HTML. File sizes are small, and connectivity is high: Hypertext links are freely used, both for internal cross-references and to external resources.

HTML remains a common format for electronic journal articles, but it has certain drawbacks. First, it is difficult to control the appearance of the articles on the reader's screen. While cascading style sheets provide a certain degree of control, the articles still look different in different browsers or on different platforms, even when the reader has all the specified fonts. And the fonts available on different systems vary considerably.

Symbols not available in standard fonts are generally rendered as in-line gifs, and illustrations and other graphics are generally rendered as in-line inclusions of external files, as well. Style sheets are also external to the HTML file, and are not saved by the browser when it saves the source file. Thus, many HTML documents are not portable (that is, they depend on being on the Web to present properly; they cannot be downloaded easily to an offline computer).

And printed documents are still important. But the print quality for HTML documents produced by most Web browsers tends to be low, and page numbers are not standardized. One loses a familiar and important method of identifying segments of papers, and of identifying the location of a particular article within a journal.

Also, while links are freely available, creating them is labor intensive. There is no macro language in HTML that can be used to generate automatic placement of links. Similarly, apart from the elements that can be specified in style sheets, there is no automation available in the markup of HTML files. Table writing, for instance, can be tortuous. A language other than HTML is therefore preferable for authoring papers, and conversion programs from other languages to HTML are needed.

In reaction to some of these issues, a number of journals have chosen to use Adobe's Portable Document Format, most popularly used in Adobe Acrobat, for their papers. PDF permits internal and external links, giving it the same functionality on the Web as HTML. It also has fixed typesetting and page numbers, permitting the screen and printed copy to look the same. But placing links in PDF documents can also be labor intensive, unless they can be placed automatically in the Postscript file used to create the PDF. Again, conversion software is important.

"Donald Knuth developed a typesetting system from the ground up, in part to be able to typeset his own work"

On the other hand, fonts and graphics may be incorporated directly into the PDF file, making the result portable for use on off-line computers. The Acrobat Reader (a freely available software package for reading PDF files) also prints excellent copy on virtually any printer, preserving the fonts, graphics, layout, pagination, etc., specified by the publisher. That guarantees print-journal quality output from an online document.

Nevertheless, some readers prefer HTML format for papers. For instance, the HTML files are smaller and transfer more quickly. And in journals that offer both paper and electronic formats, the PDF version is often formatted identically to the paper version. If the paper version has large pages, small print, and multiple columns (as is the case for some science journals), an identically formatted PDF version is likely to be difficult to read on-screen, and even slower to load. Also, some readers like to be able to override publisher-specified formatting and alter the appearance to suit their preferences. So some journals offer both HTML and PDF versions of their papers. Indeed, multiple formats of papers is likely to become more common in electronic journals.

Formats for Papers in Mathematics

The mathematical sciences have special typesetting needs, so much so that the famous computer scientist Donald Knuth developed a typesetting system from the ground up, in part to be able to typeset his own work. Known as TeX (pronounced "tech") Knuth's system has become the standard tool for communicating mathematics. Virtually every mathematical scientist has access to a TeX system, and uses it for typesetting papers. (The output contains all information needed for printing the paper. Mathematical journal and book printings, for instance, are generally made from photoplates made from printouts of TeX files.)

Since mathematical scientists write in TeX, most mathematics journals (both print and electronic) either prefer or require that articles be submitted in TeX, which is then edited and reprocessed by the journal's production department.

The input language for TeX is ASCII, with the text interspersed with typesetting commands. It is a mark-up language with macros: user-definable commands that can be used to automate processes like repetitive editing functions.

Indeed, macros are used to set all the basic style parameters for an article: page layout, formatting of headings of sectional units, formatting of theorems, equations, and all other elements in the paper. Much of this is done in a style file written by a typesetting designer and loaded when the paper is processed. The author need only include statements like

\section{Introduction}

to begin a new section. The style file then sets the headers, passes information to the table of contents, etc.

Similarly, theorems may be specified by of the form

\begin{theorem} Text of theorem. \end{theorem}

which can be set to produce an automatically numbered display like

Theorem 5. Text of theorem.

Similar formatting instructions, including automatic numbering, may be done for equations, figures, and other standard sorts of environments. Commands are differentiated from text by the leading backslash, and may be defined (or redefined) at various levels, including the in style file(s) and in the input file for the document itself.

These anchors and links are not visible to every existing software package for reading TeX documents, but may be converted to anchors and links in a PDF file that can be prepared from the TeX document.

Notice that that macro capability allows the author to move virtually all the typesetting decisions to the style file, which HTML cannot currently do.

The ASCII input file I've been describing is converted by TeX's software system into a file format called DVI (for "device independent"). The DVI file contains all the typesetting and font information, but not the font files themselves. Large software installations (with many font files) are needed to display or print DVI files, or to convert them to Postscript.

"Postscript readers and printer drivers are far from universal, and are in fact quite rare in the PC world — and many mathematicians have trouble installing them"

Graphics information is not contained in the DVI file itself, and must be bundled with the DVI file to be accessed by the software for displaying, printing, or converting the file. Moreover, the commands used for accessing the graphics are not native to TeX, but are part of a system of extending TeX's capabilities via commands called "\specials". The \special commands are passed through to the DVI file to be read by whatever software is used to view, print, or convert the DVI to another format. And different software packages recognize different formats of \special commands (the format being contained in the argument to "\special").

Because of this difference in syntax, it is difficult to distribute TeX with graphics over the Web in a way that will be accessible to viewers on different platforms. Also, the graphics files themselves, usually in Postscript, are not necessarily viewable with any of the standard software packages on some platforms. The software and hardware needed to read or print TeX is both specialized and large. Indeed, many mathematicians don't bother to keep it locally (instead using network resources at their offices), which further limits accessibility of TeX files on the Internet. Nevertheless, many of the earliest electronic mathematics journals used TeX source files and DVI files as the formats of their papers.

As an alternative, the most common choice was Postscript, which had other drawbacks. Postscript files are large, and while it is possible to compress embedded graphics files, there is no uniform system of decompression available. So documents that include large graphics files become unwieldy. Also, Postscript readers and printer drivers are far from universal, and are in fact quite rare in the PC world — and many mathematicians have trouble installing them.

Moreover, hypertext links are not native to either TeX or Postscript. Systems have been developed to incorporate them (using \specials), but there are at least two incompatible methods, and most TeX and Postscript viewers can't process either of them. Thus, publishing in TeX or Postscript is far from optimal.

But as I mentioned above, by using TeX as input it is possible to harness the macro capabilities of the TeX input language to produce PDF files with numerous automatically generated hypertext links, to both internal and external resources. The internal links are especially important in mathematical papers, where various theorems get quoted in the proofs of subsequent results. Thus, when Theorem 2 is invoked in the proof of Theorem 3, the reader can jump to the statement of Theorem 2 with one mouse click, and jump back with another.

As far as the appearance of the viewed or printed image is concerned (i.e., in terms of fonts, layout, etc.), the resulting PDF file is identical to the TeX DVI, as the typesetting is all done by TeX.

The graphical compression available for PDF is also extremely useful in mathematics journals. Many authors use hand-drawn illustrations, which must then be scanned to be included in an electronic format, and high-resolution scans are often needed to get an appealing result. Indeed, one paper with 12 megabytes of scans was reduced to a PDF File of less than 500 kilobytes for the Pacific Journal of Mathematics.

Thus, it would be tempting to use PDF files as the sole format for electronic journals in the mathematical sciences. But there is a great deal of inertia among the readership. My own experience in this regard comes from the New York Journal of Mathematics, which was started in 1994, using only DVI and Postscript formats. We introduced the PDF format in 1996, but it didn't catch on right away. By the summer of 1997, the PDF format was roughly equal in popularity to DVI and Postscript, despite the fact the latter two lack links. Currently, PDF is a strong favorite, but the other formats are still being accessed.

MathML

While standard HTML cannot handle mathematical notation, an extension within the XML framework is being developed that does. The idea is to create new tags that can be embedded in (enriched) HTML documents, to encode mathematical information. Known as MathML, it has tags for expressing highly complex mathematical syntax. The language itself is so complex that it is impractical to write directly in MathML, and conversion utilities from other input languages are needed. To be cost effective in the current environment, a good conversion tool from TeX is needed.

Conversion from TeX, however, is somewhat problematic, as the syntactic content of MathML is richer than that expressed by even very highly structured versions of TeX, such as LaTeX. This is not too surprising, as newer systems are often more ambitious than older ones. The aim is to encode enough information in the syntax to be able to pass the mathematical information along to other systems (e.g., computer algebra systems) in a form that would actually permit calculations to be done. In TeX, things like integrals are treated as typesetting phenomena, rather than mathematical syntax, and the information is not readily transferable to other systems. The ultimate ambitions of MathML in that regard have not yet been settled (not surprising, given that MathML is being developed by committee). It is an evolving system.

Given those considerations, it seems that new authoring systems will be needed for MathML to be useful. Given the reluctance of most researchers to learn new systems, these developments will likely take years, and much will depend on the actions of the mathematical societies and other influential groups. The acceptance of TeX was heavily influenced by support from the American Mathematical Society.

There are also issues concerning displaying and printing of MathML documents. MathML cannot be rendered by the standard browsers, so plug-ins and/or helper programs are needed.

Current and soon-to-be-implemented methods of creating and rendering MathML include the following, none of which can do everything needed:

  • IBM's Techexplorer (http://www.software.ibm.com/enetwork/techexplorer/) is a plug-in for standard Web browsers. It can render a subset of the current version of MathML, and is available on Windows platforms. IBM promises a Unix version, too. Users need to purchase the professional version of Techexplorer for printing.

  • WebEQ (http://www.Webeq.com/) is a Java-based suite of products, including an editor that can produce MathML code.

  • Amaya (http://www.w3.org/Amaya/) is a Windows- and Unix-based stand-alone browser that doesn't interface with standard Web browsers. It renders MathML, and has an editing mode for writing MathML expressions.

  • MathType (http://www.mathtype.com/) is an equation editor that runs on Windows and Macintosh machines, and interfaces with the standard PC-based word processors (but not with TeX). It can embed MathML expressions in HTML documents.

  • A subset of MathML can be generated by EZmath (http://www.w3.org/People/Raggett/EzMath/), a Windows-based editor. It uses its own input language, also developed for use on the Web.

  • The computer algebra systems Maple and Mathematica are developing support both for generating and rendering MathML expressions.

It will be very interesting to watch these developments and see which, if any, catch on with the authors and readers of mathematical documents. MathML may well offer a useful alternative to PDF in the future. It is quite likely that many journals will eventually publish in both formats.

Greater Connectivity

The issue here is methods of finding mathematical research papers by searching for subject or keyword information in central resources, or by following a sequence of links emanating from such resources.

In mathematics, the most powerful central resources are the reviewing journals Mathematical Reviews (available onlineas MathSciNet, http://www.ams.org/mathscinet/) and Zentralblatt fuer Mathematik, http://www.zblmath.fiz-karlsruhe.de/MATH/subscription/subscription. They publish reviews of the papers published in mathematical journals.

MathSciNet has an especially powerful system of internal links between reviews, and provides direct links to the papers themselves, if they are available on line. The reviews are classified by subject and are searchable in multiple ways. Thus, it provides an unparalleled method of browsing the literature for serious content (along with accurate bibliographic data). To take advantage of MathSciNet, the New York Journal of Mathematics has begun including links to MathSciNet reviews from each of the bibliographic entries in its papers. If that practice becomes widespread in electronic journals, and if more journals go on line, then readers will be able to tour the primary and secondary literature in their fields without leaving their desks.

"The xxx archive is the only one I know that attempts to cover all of mathematics"

A similar, but less systematic source of connectivity comes from the "living review articles" being pioneered by the Electronic Journal of Combinatorics, at http://www.combinatorics.org/Surveys/. Those articles give periodically updated reviews of the literature in particular subject areas.

Another method of finding papers is through robot-compiled indices. But one of the drawbacks there is that very few journals distribute the TeX input files for their articles, and most robots are unable to parse information from PDF, DVI, or Postscript files. Thus, the only information available, in most cases, is the material available in HTML files at the journal site , such as abstracts. The resulting indices are less powerful than those used by the reviewing journals, but they do cover articles that have not yet been reviewed.

Some journals, such as the New York Journal of Mathematics and the Pacific Journal of Mathematics, provide indices of the full text of their TeX input files (see the NYJM index at http://nyjm.albany.edu:8000/search/j/ghindex.html) and redirect the output for a query to the versions available online.

Much work remains to be done on linking such databases at disparate sites.

Another potential source of connectivity comes from preprint and e-print archives. The xxx Mathematics e-Print Archive (http://front.math.ucdavis.edu/) at Los Alamos National Labs [1] is a comprehensive archive of e-prints in mathematics. The author, title, and abstract information is searchable, and the citations are given for papers that have subsequently been published, but there are no links, as yet, to the published versions of the papers.

That archive has a strong connection to the research community, as it maintains e-mail lists in each of its subject areas, notifying users of any new papers in areas they choose. [2] The archive also keeps the full TeX source of its papers, and may at some point implement full-text indexing. (There are other servers in some specialty areas that offer subsets of those services, but the xxx archive is the only one I know that attempts to coverall of mathematics. Given the interrelationships between fields, it is useful to be able to search a database that covers the whole of mathematics, rather than a particular subfield.)

The xxx archive is interesting in that it is in some sense in competition with journals. The e-prints are freely available in perpetuity (superseded versions are available on request). Journals are encouraged to make use of the archive by contributing the final versions of papers and simply linking to xxx from the journal's own Web site to provide access to the papers themselves. Such journals are called "overlay" journals: the journal acts as an overlay to the xxx archive.

But at present, it is unlikely that overlay journals will be able to recoup the costs of copy editing and typesetting unless they mandate page charges. Even for journals that don't recoup their costs, the use of overlay technology would result in a one-size-fits-all look and feel. Many journals will be motivated to maintain their own archives and production systems, instead.

Without links to the published version, effective cooperation between xxx and non-overlay journals is difficult. Thus, in the current environment, we have a cleavage between xxx, with its full-text database and broad e-mail notification, and the general run of electronic mathematics journals, which are served primarily by the reviewing journals.

Additional Features

An increasing number of mathematical papers include arguments settled by running a computer program. The electronic environment permits efficient distribution of such programs, in a form that may be used by the reader to verify the results. See, e.g., the programs available through the following "associated links" files for journal articles:

Such programs may be updated to reflect version changes in other programs.

Journals may also maintain links to reviews and other commentary, and may archive errata files, author's elucidations, and pointers to subsequent applications of a particular paper.

In particular, the electronic environment permits updating the connections between published papers (which remain fixed in form) and new work.



Mark Steinberger is Editor-in-Chief of the New York Journal of Mathematics, a refereed electronic mathematics journal founded in 1994. His mathematical research is in algebraic and geometric topolgy, with a special interest in symmetry. (See http://nyjm.albany.edu:8000/~mark/rsch.html for details.) Receiving his Ph.D. from the University of Chicago in 1977, he has taught at M.I.T. and Cornell, and has been an Associate Professor of Mathematics and Statistics at the University at Albany since 1987. His other activities in electronic publishing include membership in the steering committee of the xxx Mathematics e-Print Archive at Los Alamos National Labs, and administration of the EMJ mailing list for discussion of electronic publishing in mathematics. You may contact him by e-mail at mark@csc.albany.edu.

Notes

1. The xxx e-Print archive at Los Alamos National Labs has been phenomenally successful in physics, primarily as a forum for circulating preprints: most of the papers appear in traditional journals later on.

Later, they established an archive in nonlinear sciences, and more recently, archives in mathematics and comupter science. The URL http://front.math.ucdavis.edu/ is to a front end set up by the steering committee of the mathematics archive.return to text

2. A number of electronic journals also maintain such mailing lists.return to text

blog comments powered by Disqus