ACM's Journal of Experimental Algorithmics: Bridging the Gap Between Theory and PracticeSkip other details (including permanent urls, DOI, citation information)
This work is protected by copyright and may be linked to without seeking permission. Permission must be received for subsequent distribution in print or electronically. Please contact firstname.lastname@example.org for more information. :
For more information, read Michigan Publishing's access and usage policy.
The Association for Computing Machinery's Journal of Experimental Algorithmics was founded in 1995 with the aim of fostering experimental work in the area of data structures and algorithms. At the time of its founding, such experimental work was clearly becoming necessary, but was not a traditional part of the research enterprise in computer science. What experimental research did take place was isolated; in particular, it suffered from the lack of established test suites for comparative purposes. In addition, the lack of experimental work had created a large gap between the research work in academic circles and the tools actually used in industry, with industry often lagging behind by nearly ten years. Much of that lag could be directly attributed to the fact that recent advances in algorithms and data structures had been purely pencil-and-paper affairs, so that any implementation would have required large investments of time. Thus a second purpose of the ACM's JEA was to help bridge the gap between theory and practice in the area of algorithms and data structures.
ACM's JEA was announced in the fall of 1995, and published its first article in the summer of 1996. A year later, it has published eight articles, with another two accepted and a dozen in various stages of revising and refereeing. JEA belongs to the line of ACM's archival, refereed journals and transactions; like its cousins in that line, it is available only by subscription. Unlike all other ACM refereed publications, JEA has no print component: publication is exclusively online, as are all interactions between the editors, the referees, the authors, and the readers.
In this article, I will review the rationale and means envisioned for JEA, and comment on the current state of the journal and the lessons that may be learned from the enterprise to date. Of necessity, my perspective is informed by the academic context of theoretical computer science, so I will briefly describe the characteristics of research publications in this area before discussing JEA itself. While my comments are motivated by JEA, I believe that most of them apply to any research journal published by a scholarly society and many to journals published by commercial publishers as well.
Research Publications in Algorithms
The research community in algorithms and data structures has a well-defined set of flagship conferences and journals. Acceptance rates in the flagship yearly conferences (IEEE Foundations of Computer Science, ACM Symposium on the Theory of Computing, and SIAM/ACM Symposium on Discrete Algorithms) runs around 25 percent; acceptance rates in the main journals (Journal of the ACM, Journal of Algorithms, SIAM Journal on Computing, and Algorithmica) are more variable (and somewhat harder to evaluate), but rarely above 40 percent, in spite of strong self-selection exercised by authors. The next tier of conferences and journals have similar acceptance rates.
All of the conferences are run by program committees that review all submissions; complete proceedings of the conferences are published and are part of subscription packages offered to libraries by the various sponsoring societies (ACM, IEEE, SIAM). Conferences typically have a lead time of six months and publish articles of about ten pages each, while publication in the main journals (of articles that vary in length from a few pages to as many as fifty pages) takes around two years (SIGACT News features a regular column that keeps track of these values) — a rather long time that results in roughly equal proportions from refereeing delays and publication backlogs. As a result of the long turnaround time of journals, conference proceedings are referenced more than journal articles, although most authors eventually turn their conference papers into journal articles.
"Recent advances in algorithms and data structures had been purely pencil-and-paper affairs."
Those journals and conferences tend to favor a similar style of presentation: to a large extent, that style is mathematical — definitions, theorems, and proofs. Until recently, very few of those outlets published papers of an experimental nature.
Since computer scientists have been early participants in the Internet, more and more areas of computer science are turning away from journal publication and even formal conference publication to the much faster mechanism of "home-page publishing": most researchers maintain home pages with links to their most recent work. At many institutions, the home-page publications have completely supplanted the traditional departmental technical report as the primary means of early and rapid dissemination of results.
Theoretical computer science as a whole makes constant use of the Internet, but academics continue to be evaluated on the basis of the work they present at the flagship conferences or publish in the main journals. As the backlog of these journals and the rejection rate of these conferences indicate, there is no sign of decline in the number or quality of submissions.
Formal online publishing is still quite rare in theoretical computer science. While all three scholarly societies with an interest in the area (ACM, IEEE, and SIAM) have long-term plans to move their publications online, only ACM's JEA is a true on-line journal. A few other journals are on-line: the Chicago Journal of Theoretical Computer Science (MIT Press) and the Journal of Universal Computer Science (Springer Verlag), and, somewhat farther afield, the Electronic Journal of Combinatorics (Springer Verlag) and the Electronic Transactions on Numerical Analysis (Kent State University).
All of those journals are just a few years old. Unlike JEA, the others started as free publications, but most are now converging to some method of subscription.
Why an On-Line Journal
When JEA was founded, online publication without any print component was considered a key feature of the journal for at least four reasons:
An online journal can publish data, programs, animations, and multimedia components that no print journal can publish; programs and data are critical to the aims of the JEA.
An online journal is inexpensive: at a time when libraries everywhere have to cut their subscription lists and when some print journals can cost a thousand dollars a year, anyone starting a new journal has a duty to the academic community to make that journal permanently affordable.
An online journal can offer features, such as online search, not available to print journals. It can also evolve quickly.
An online journal is not tied to a printer, a format, or a distribution network; it has no page count limitation and cannot suffer from the printing backlogs that are plaguing print journals in the area. In comparison to print journals, an on-line journal should thus have a shorter turnaround time as well as more flexibility in what it can publish.
Publishing Software and Data Along With Articles
Since one of the main aims of JEA is to bridge theory and practice, one of its main components is a library of software, test data, test generators, and past results. While most authors keep copies of the code and data used in their work on their home pages, placing all of those suites in a common location has many advantages. Some derive from the nature of the maintainer: a scholarly society like the ACM is likely to endure beyond the tenure of any given researcher at his or her current institution, ensuring the archival quality of the work; and a scholarly society can afford to employ professionals to maintain its Web site and ensure that its access mechanisms are up-to-date, flexible, and fast.
Other advantages derive from synergistic effects: building a repository of well-tested software with fully documented performance provides a very valuable resource that encourages further implementation and testing of other theoretical concepts. Users of the repository communicate with the authors and the journal's editors, providing a level of review and quality control otherwise unattainable in an area where all editors and referees serve on a purely voluntary basis.
The main cost of a print journal comes from the printing, distribution, editing, and marketing. There is no printing cost for an online journal, and distribution costs (the acquisition and maintenance of servers and network connections that support the journal) are minuscule compared to those of print journals. Editing costs are low: authors typically typeset their own articles; the editors will review each article for basic compliance in style, for spelling and grammar, but will not do additional typesetting, just some nearly automated format conversions. Comparable costs include the advertising, merchandising, and administrative costs associated with a subscription-based publication. Such costs can be quite high in many print journals, but most online journals "advertise" only through the Web and rely on links from many sites to establish their presence.
"At many institutions, the home-page publications have completely supplanted the traditional departmental technical report as the primary means of early and rapid dissemination of results."
While ACM's publications are generally quite inexpensive, the subscription cost of JEA (currently $22 for ACM members, $17 for students) is below that of other ACM publications (most of ACM's quarterly Transactions cost between $33 and $40 for ACM members). A look at my institution's library budget shows that the subscription cost for JEA is about the lowest for a publication from a major scholarly society.
Flexibility and Expandability
The nature of an online journal allows plenty of room for growth, experimentation, and customization. The type of publication can evolve along with the Net and the community. For instance, before the end of the century I would expect to see interactive scientific publications that customize reader paths within the structure, based on usage patterns. Even now, editors and publishers can easily alter procedures and access mechanisms. For instance, I do not expect the standard subscription model to endure much longer; instead, I expect to see pay-per-article mechanisms to be in place in the near future and pay-per-search (in a scholarly society's archive of publications) to follow quickly. Even in the limited current context, online publications offer their readers capabilities not available with print journals, such as searching and linking.
Obviously, an online publication has no page count problems and so can publish every accepted article as soon as it is ready. With the proper use of typesetting tools by the authors, and of conversion tools by the editors, the overhead costs of placing a new article on the server are minimal. Thus the time delay from acceptance to publication, which is on the order of a year or more for most of the flagship journals in the area of algorithms, can be reduced to a few days.
Online Publishing: Charging for Access
Rationale for Charging
Since the costs of on-line publishing are so low, it may make sense to expect free access to scholarly online publications. Much of today's research is indeed available free of charge on the net, through the researchers' home pages. However, the point of a scholarly publication is the imprimatur conferred upon it by the publisher — otherwise there would be no need for refereeing. Indeed, several models of scholarly discourse have been proposed that do away with formal refereeing entirely. In mathematics, the hard sciences, and engineering, however, peer review remains the single mark of quality — and no viable alternative has yet been proposed. The publisher, by setting up the infrastructure, maintains the refereeing process, and the publisher must be able to survive. Thus it is entirely reasonable to charge for access to refereed online journals. On the other hand, as the costs are indeed minimal compared to the costs of print media, the charges must reflect that very different cost structure. I can see no rationale for on-line journals with annual subscription costs in the hundreds of dollars, as is all too often the case with print journals.
Private and Authors' Copies
Once we accept the principle of charging for access, we face the problem of dissemination of free copies. I firmly believe (and the ACM has taken a similar stand) that authors should be free to place a copy of their articles on their home page, providing free access to those articles. Those copies must carry a statement to the effect that the copyright rests with the publisher, not the author, in order to limit reproduction, archiving, and redistribution. In other words, the publisher has sole rights to set up and maintain a server that archives its publications; other than authors, no one else may place an article on the Net. (Of course, subscribers are free to keep copies of articles on their machines, but they cannot place them on the Web themselves. Instead, they should place a link to the journal's version.) Enforcing that policy is difficult except perhaps in the case of egregious violations; however, the current system of publication and access is also largely based on honorable conduct by all participants, and it works well enough.
A fundamental question in charging policies is the distinction often made between institutions and individuals. The source of subscription revenues varies considerably. For instance, the ACM relies more on individual subscribers than on institutions, while the American Mathematical Society relies almost entirely on institutional subscribers. Once a library subscribes to an on-line journal, how should access be granted? And should the library be charged more than an individual? I firmly believe in the library model and thus strongly advocate that access for institutional subscribers be granted on the basis of IP addresses, while access for individual subscribers can be handled through a conventional username-and-password pair. That is the access mechanism supported by JEA: individual subscribers set up their personal usernames and passwords at the central ACM Web site; institutional subscribers register a set number of IP addresses with the ACM so that anyone accessing JEA from one of those IP addresses is automatically granted access. In consequence, the institutional subscription fee is higher than the individual one, although it remains reasonable. One can easily conceive of slightly more flexible mechanisms whereby an institution can purchase the right to add more IP addresses to the authorized list; an alternate flexible solution is to have the server keep track of the total number of accesses associated with a collection of IP addresses and charge the institution according to the volume of transactions it generates.
"Since JEA publishes articles that are really made of a variety of pieces, we need to consider what should remain free and what should require payment."
Another point about access has to do with the nature of the data. Subscribers to print media receive a physical product that they can keep forever, assuming they have the room for it. Subscribers to an online journal receive an entrance key that is revoked once their subscription expires. The simplest access mechanisms for online publications, including that used for now by JEA, all grant access to all years of publication to current subscribers and deny access to any year to nonsubscribers — a very different model from that of the print media. Of course, nothing prevents a subscriber from making private copies of the online journal's complete database before his or her subscription expires, but the database itself does not reflect the totality of the information delivered by the journal: updates will be missing (to links, in particular) and search facilities set up by the journal and tailored to its format and contents will no longer be available.
What Do We Charge For?
Since JEA publishes articles that are really made of a variety of pieces, we need to consider what should remain free and what should require payment. Other components of JEA articles are:
- software and test data;
- readers', authors', and editors' letters and comments; and
- search capabilities.
I proposed, and ACM endorsed, a policy whereby those three components remain free. The software and test data have not been refereed to the same level as the texts of the articles: referees simply do not have that kind of time. Thus the publisher cannot assume full responsibility (or indeed, any responsibility) for the software and data. All that the JEA says about the code, for instance, is that it was found to compile and run and to produce results generally compatible with the claims made by the authors in the text of the article. Allowing free access to the software archive relieves the publisher from legal responsibilities and enables everyone in the community to use the software and data, thereby encouraging further research. The correspondence and the search mechanisms both encourage a reader to get the article(s) referred to and thus are best left free as well.
Design, Formats, Architecture, Navigation, and Interactivity
To a large extent, the issues of architecture and navigation are the hardest to define, because they are the most constrained by the current state of technology.
The most important limiting factors have to do with patterns of usage.
- Readability: Whatever is published online will be viewed on today's monitors; their resolution is a far cry from what is needed to compete with a high-quality printed page.
- Portability: One issue of a printed journal is easily carried in one hand, can be taken to the local coffee shop and read while sipping a cup of coffee. Notebook computer technology remains a long way from such convenience.
- Annotation: A printed copy is easily annotated, while annotating electronic documents is difficult at best.
Those are very serious limitations for a research journal. Most researchers I know will read articles in a comfortable armchair with pencil in hand and coffee cup nearby. Many articles require several close readings for a complete understanding; many will prompt questions that the reader will want to write down. Conducting all of these activities while pinned down in a chair in front of a terminal is nearly impossible.
The limitations just discussed ensure that, for the next five years or so, any online research journal will have to provide a printable version of the articles so that the reader can make a copy on a high-resolution printer and then read and annotate the copy.
Unfortunately, most formats that lend themselves well to high-quality printing (mostly Postscript and PDF) do not lend themselves well to searching and navigation and cannot be downloaded incrementally. In contrast, a format such as HTML, which supports searching, navigation, and incremental downloading, offers only the most rudimentary support for formatting and nearly none for typesetting. The latter is a nearly fatal flaw for science publishers: any paper with a reasonable amount of notation or mathematics simply cannot be written in HTML — it will need dozens, perhaps hundreds of in-line figures to represent the symbols that HTML does not support.
Unless an online journal intends to be simply a means of electronic delivery of printed documents (serving Postscript or PDF files and nothing else), it has to maintain documents in a format that lends itself to true online viewing and interaction. I conclude that, at least for the next five years or so, online journals will have to maintain their publications in three formats:
- a format supporting high-quality printing, such as Postcript or PDF;
- a format supporting navigation and incremental downloading, such as HTML; and
- the true authoring format from which the other two were created.
"Usually half of the typical two years from submission to publication is spent in the refereeing process."
Articles published in JEA are provided in Postscript, HTML (with a few exceptions), and original (LaTeX) formats.
The third format need not be provided online, but it is likely to be the medium of communication between the authors and the publisher. Here again, we hit a problem: authoring tools abound and are only rarely compatible. Certain communities have adopted a particular tool and have a tradition of use that sustains it. The community most likely to read and contribute to ACM's JEA, theoretical computer scientists and mathematicians, normally uses TeX (or LaTeX) as an authoring tool, because of its superior support for mathematical expressions and for typesetting. Preparing Postscript or PDF files from a TeX or LaTeX file is simple; preparing an HTML document from the same is a challenge, even with the excellent LaTeX2HTML conversion tool. The worst problem occurs in communities where no consensus exists: the array of conversion tools that a publisher would then need would be formidable indeed, and most of those tools do not exist.
Navigation and search are two of the most attractive features that an online publication currently has to offer. For now, though, both features remain fairly primitive in most settings and are of limited value until the volume of publications grows significantly. Relatively few research journals are truly online in the sense of providing all of their material online rather than just abstracts and correspondence; those that are have been online only for a short period. As a result, the amount of material published online to date is modest; search mechanisms are not particularly useful as yet, and navigation is confined to what is online. Indeed, most online journals to date have only primitive search mechanisms, if any at all. JEA, with only eight articles, has implemented navigation through hyperlinks (although it is left to the authors' discretion), but has not yet activated planned search mechanisms.
Interactivity and Related Issues
One exciting promise of online publication is interactivity. In a simple form, such interactivity is already present: for instance, a publication can feature Java applets that allow the reader to run customized examples. Other even simpler types of interactivity include the use of forms for querying a database (as discussed above under navigation, but put to different uses, such as making use of results discussed in an article by submitting a query to the authors' software). In the future, with full multimedia publishing, those types of low-level interactivity should become far more common; it is mostly a process of educating authors and making them believe that the extra effort put into that type of interactivity is worth it. More subtle interactions will also appear. For instance, with the proliferation of online publications both browsers and servers will have to become more responsive to a particular user's needs, to the point of completely tailoring the interface so that each user actually gets a distinct, unique view of the journal she or he is browsing. Of course, such a view has to evolve, taking into account the recent history of browsing by the user and automatically prioritizing the information available.
It was my hope when I started JEA that refereeing could be streamlined. A major problem experienced by all editors in computer science is that of obtaining timely reviews: all researchers are extremely busy and so often have to be harassed to deliver a promised review. Reviewing time is a major factor in the long publication delays that plague computer science journals. Usually half of the typical two years from submission to publication is spent in the refereeing process. I wanted the ACM JEA to have a turnaround time of six months from submission to publication. Naturally enough, though, refereeing is no different for an online journal than for a print journal; collecting the two or three complete reports necessary to make a decision still takes closer to a year than to a few months.
"Editors have to press friends into doing them favors or tell authors who have had their submission accepted that they will be expected to do a few reviews in return."
Unconventional mechanisms for refereeing are made possible by the online nature of a journal. For instance, one could post submissions to an area of the server available only to a pool of associate editors and let "natural selection" take its course: if no one volunteers to handle the article, that article is rejected. The problem we face in computer science and thus at JEA, however, is simply that people are already working 60 hours or more a week and have no time for reviewing submissions; editors have to press friends into doing them favors or tell authors who have had their submission accepted that they will be expected to do a few reviews in return.
While online publication for now shows more promise than results, the promise is overwhelming: flexibility, ease of access, intelligent searches, portability, and ecological benefits — the promise of the "paperless office" (which so far has been an ironic reversal, since computing, with its easy production of documents, has contributed to the destruction of untold square kilometers of forests). I firmly believe that all scientific research publications will be online within the next decade, especially as technological advances may soon produce a true notebook computer — i.e., something of the size of a modest notebook, with the display quality of a good print journal, the annotation capabilities of paper and pencil, and enough memory to store all issues of several journals. In the process, however, publishers will need to rethink their approach to publication and the research community will need to reevaluate its standards for refereed publications versus instant dissemination of results. The progress to date of JEA shows that a purely online journal can hold its own in the academic world and attract the same quality of submissions as a print journal; that it can deliver benefits not otherwise available in print journals; but also that it will not solve the most urgent quality and volume problems confronting researchers and academics everywhere.
Bernard Moret was born in Switzerland, where he studied the classics before entering the Swiss Federal Institute of Technology in Lausanne, where he earned his Diploma of Electrical Engineer in 1975. After working for two years in industry (for Omega Sports Timing), he entered graduate school at the University of Tennessee, earning an MS in 1977 and a PhD in 1980. He then joined the Department of Computer Science at the University of New Mexico, where he has been ever since. He served as chairman of the department from 1991 to 1993. He is the founding editor and editor-in-chief of the ACM Journal of Experimental Algorithmics. He has won numerous teaching awards, has published over 40 papers, and authored two textbooks, Algorithms from P to NP at Benjamin-Cummings (1991) with Henry Shapiro and The Theory of Computation (1988) at Addison-Wesley. His research interests include complexity theory, discrete and combinatorial optimization, and the design and experimental assessment of algorithms and data structures. His hobbies are reading fiction, collecting antique clocks, skiing, and rock climbing.