EPUBs are an experimental feature, and may not work in all readers.

It is no secret that the Internet is in a state of impending chaos. Not only are items difficult to find but, once found, they often disappear without notice, producing the dreaded "404 The requested URL was not found" or "HTTP/1.0 404 Object Not Found" error messages. In other cases, the URL can still be found, but the object sought is no longer there. And the present crop of search engines has numerous problems discovering and retrieving documents due, in part, to the enormous size of the Web, which has jumped from about 76 million Web pages in 1996 to well over 300 million today.[1]And the size of the Internet is expected to increase by 1000% over the next few years, so the situation is not likely to improve soon.

The transience of Web sites, coupled with the difficulty of discovering and retrieving their contents, causes serious problems for libraries and other institutions that attempt to integrate Web documents into local systems. Even many of the URLs found in databases such as online catalogs are no longer valid.[2]

Furthermore, the majority of users have been conditioned to believe that most things on the Internet (especially text) should be free. This anti-commercialization has not made the Internet a very friendly environment for commercial publishing, as was demonstrated recently when the popular online magazine Slate initiated a very reasonable annual subscription of $20 a year. Overnight, readership fell from nearly 60,000 to a paid subscription list of about 17,000. Although it has recovered slightly since then, Slate is still suffering from an annual deficit of at least $4 million[3], and Microsoft's patronage may not be enough to save it.

In spite of this hostile and chaotic climate, publishers of the most costly scholarly journals, mainly those produced by the scientific, technical, and medical (STM) publishers, realize that long-term survival depends on their ability to market products successfully over the Internet. Within the past two years the majority of the STM publishers have established a significant (though probably not profitable) Web presence. Academic Press, John Wiley & Sons, Springer-Verlag, Elsevier Science, and many other publishers have begun duplicating an impressive percentage of their print journal output in electronic format, and have made it available on the Internet. Some have even begun publishing journals that are only available electronically, such as Elsevier's GENE-Combis.[formerly http://www.elsevier.nl/journals/genecombis/Menu.html]

Electronic publishing produces some classic dilemmas for publishers, as digital material is simultaneously both remarkably constant and amazingly protean. It can be copied effortlessly and quickly an indefinite number of times with absolute fidelity, and just as easily be cut and pasted and otherwise modified. Even more alarmingly, from a rights-holder's perspective, digital material can be freely distributed, with or without authorization, to enormous numbers of people. Some well-known cases of such piracy have already occurred for software[4] and music.[5] Such abuses can lead to serious financial losses for the original rights owners, yet making the material too inaccessible through encryption, or by placing it in secure electronic containers, can lead to frustration and disaffection by users. A number of software publishers discovered the drastic consequences of such restrictive policies a few years ago when they made their software programs difficult or impossible to copy and ended up losing market share.

Origin of the DOI System

Faced with these conflicts, the Association of American Publishers (AAP) set up the Enabling Technologies Committee in 1994 to design a system that would protect copyright while facilitating commercial transactions. It soon became clear that "no single identifier is capable of serving all purposes"[6] and the committee finally decided that its first step should be the introduction of an industry-wide standard identifier that would facilitate the control of transactions and other operations, support systems interoperability between publishers and their clients, and serve as the underpinning of a workable rights- and permissions-management system.[7]

From the start, the committee recognized that many of the problems with the Internet were due to the function of the URL, which was never meant to be an identifier but only to designate the location of objects.[8] To overcome that difficulty, the group's first and primary task to date has been to design and begin the implementation of a system based on a persistent identifier that would be assigned to an object when it was created, or even before that, and stay with that object throughout its life.

Structure of DOIs

The DOI System is an application of the Handle System Resolver®, [formerly http://www.doi.org/resolver.html] originally developed by the Corporation for National Research Initiatives. A DOI consists of two parts, a prefix containing the directory designation and the registrant number, both assigned by the Directory Manager (yet to be designated, but likely to be the international ISBN agency), and a suffix that uniquely identifies the particular item. Eventually there may be many Directory Managers, perhaps one in each country or for each industry sector (publishing, photos, music, software, etc.).[9] The first two characters of the prefix designate which Directory Manager issued the prefix, and for the time being all DOIs begin with 10 — for example, 10.1002. The second part of the prefix (e.g., the .1002) defines the organization, publisher, or any rights-owner or controller registrant that purchased the DOI prefix. (Andy Powell has published a particularly good discussion of the structure of the DOI and how servers currently deal with DOI redirection.[10]) The two parts of the prefix are followed by a slash; everything after the slash is the suffix, a unique character string created by the publisher that defines the specific digital object.

10.1000/92 is a typical DOI, albeit a simple one

To try out the Handle System and assign and experiment with identifiers at the CNRI site, go to http://www.handle. net/ietf/handle/register _handle.html [link no longer active]

Many publishers may choose to use legacy identifiers, such as the ISBN, ISSN, or SICI, as part of their suffixes. The problem with legacy identifiers is that such codes will lose their original meaning if the object changes ownership and context during the course of its useful life, just as the ISBN cannot always be used to identify the current publisher of a work. For that reason, among others, users are cautioned to regard the DOI as having no affordance; that is, the DOI prefix and suffix cannot always be derived only from information about the object — a bibliographic citation that describes it, for instance.[7]

"The process is about as fast as going directly to the URL, and it increases the chance of retrieval significantly in environments where URLs change frequently"

The fact that DOIs are designed to be dumb numbers, with any necessary intelligence carried in the metadata (descriptive data about the object that is often invisible to the user) rather than in the identifier,[11] has led Berinstein[12] to conclude that efficient DOI finder systems will be essential for the success of the system. A few publishers, e.g. Wiley and Academic Press, have already made such finders available. Wiley's finder is a sophisticated implementation based on the Sybase database server. Academic Press's finder [formerly http://www.hbuk.co.uk/tony/doi/DOIFinder.cgi] consists of an HTML form that allows searching only by publisher and journal name, combined with the issue, date, and starting page. In part to alleviate the difficulty of finding DOIs only through their bibliographic data, the International DOI Foundation (the Foundation) is beginning to investigate the possible use of metadata for reference linking in the DOI System, based on a "'bare bones' DOI metadata set such as assignor, creation type and information identifier . . . , enumeration . . . , [and] format."[13] Another approach being considered is to have the metadata elements associated with a digital object include lists of related or derived objects linked to that object.[14]

How the DOI System Operates

Existing identifiers such as the ISBN and ISSN, which were created for print publishing, do not provide the range of identification and trading needs required in the electronic environment.[15] That limitation led the AAP to form a partnership with the Corporation for National Research Initiatives (CNRI) to develop the Digital Object Identifier (DOI) system specifically for digital materials. The DOI System was officially launched in the second half of 1997 at the Frankfurt Book Fair and was demonstrated again at the October 1998 Fair.[16]

Very briefly, the DOI version of the CNRI Handle System® involves attaching a unique and persistent identifier, created with or before the object itself, to a digital object, thereby providing a label that will define that object for its entire life, independent of where it is located. For current discussion purposes, a digital object is any machine-readable file that can be addressed on a computer. Ideally that identifier — in this case the DOI — will stay with the object, perhaps as part of its associated descriptive data, throughout the object's existence. A record of that DOI, along with information on the location of its object, is then sent to a central server where the DOI data are stored. Once a DOI is registered in the DOI System server, or repository, it will be stored there virtually forever. That centrally stored data form a resolver database that, in conjunction with special software, can link or "resolve" a DOI to its associated-object's location. When a user asks for a digital object, or for information about a digital object, a DOI query goes to the DOI server. That server finds the record of the DOI and the address of its associated object, links the two together, and sends the location (most likely the URL) back to the user's browser. The browser then retrieves either the object itself or the information about it, and shows it to the user. The process is about as fast as going directly to the URL, and it increases the chance of retrieval significantly in environments where URLs change frequently.

The objects themselves — computer programs, digital text files, digital audio, digitized images or video, or any other material that can be digitally expressed — will reside in databases controlled and maintained by the publishers. When an object's location changes, either within that publisher's databases or when it is transferred to another publisher's database, information concerning that change is sent by the object's owner to the server, which automatically updates the record. As long as the Foundation gives authority for issuing and updating DOIs only to those who will be conscientious about keeping their data up to date, the central database should retain a high degree of integrity. The Foundation has not yet established a procedure to cover the transfer of an object from a publisher in the DOI System to a publisher that does not participate in the system.

DOI Size

The DOI is currently limited to 128 characters.[15] With at least 40 possible character substitutions (letters, numbers, and symbols) allowed for each place in the DOI code, a 120 character string (the length of the suffix) can already potentially code for 1.8X10192 objects, in a universe that contains only about 1080 atoms. It is true that the number of meaningful strings, those with mnemonic content, for example, are many fewer than that,[17] but DOIs are supposed to be dumb. The Foundation does not consider such identifier lengths to be a problem since it maintains that DOIs will seldom have to be manually entered. It seems unlikely this will really be the case. Much depends on how often DOIs are manually transferred from digital to print format and back again. There may be shortcuts for such transfer in the future, such as scanners, but for the present a significant number of URLs are copied manually from printed sources into browsers. If such manual transfers remain a common means of transferring DOIs from DOI-labeled text- based objects (in some cases these will be print versions of digital objects), the prospect of typing in 128 or more characters with no mnemonic content seems very daunting indeed. Such long DOIs also make it more difficult to hide them in electronic watermarks, which, in text objects, often rely on changes in line width, character spacing, or modification to store and hide the information.[18]

The issue of the optimal length of the DOI is not going to be easy to resolve. Affordance, mnemonic aids, and brevity would be ideal for human manipulation and use, but that is probably not practical or even desirable in the ever-changing electronic environment. If the DOI is asked to carry accessory information about the object (format, version, edition, rights, etc.) as an intelligent component of its syntax, it will become cumbersome and difficult to manage. Yet if the DOI System depends on the object's metadata component to contain such information, then the metadata component will similarly become more complex and expensive to maintain. The resolution of these two conflicting directions will have a great deal to do with the final structure of the DOI and with electronic identifiers in general.

"The Foundation's stated aim is to coordinate with authors, libraries, and subscription agents, but the role of libraries thus far has been, at best, peripheral"

The IDF and DOI System Management

The DOI System is administered by the not-for-profit International DOI Foundation (IDF) currently headed by Norman Paskin, an expert on digital identifiers. He was recently recruited from his position as director of information-technology development at Elsevier. The Foundation was set up by the AAP and is governed by a board composed of representatives of major publishers[19]. They include Microsoft, Elsevier, and John Wiley & Sons, as well as representatives of societies that promote publishers' interests, such as the AAP, the Authors Licensing and Copyright Society (from Great Britain), and the International Association of Scientific Technical and Medical Publishers, all groups that can afford the $10,000 to $30,000 annual membership fee.

International DOI Foundation Members
Current as of: 14 Oct 98

CHARTER MEMBERS

Association of American Publishers*

Academic Press/Harcourt Brace

American Chemical Society

Blackwell Science Elsevier Science*

International Publishers Association*

Institute for Scientific Information

ISBN International MCPS/PRS/BUMA/STEMRA Alliance of European Music Rights Societies*

Publishers Licensing Society

Springer Verlag

International Association of Scientific, Technical and Medical Publishers*

John Wiley and Sons*

GENERAL MEMBERS

Addison Wesley Longman

ALCS*

American Mathematical Society

Association for Computing Machinery

Bokforlaget Natur Och Kultur

Copyright Clearance Center

EDP Sciences

Houghton Mifflin

IEEE

Kluwer Academic Publishers

Microsoft Corporation*

National Music Publishers Association

New England Journal of Medicine*

Novelon Inc.

RCP Consultants

Thomson Technology Labs

Xerox Corporation

*Denotes IDF Board Member Organization

Foundation members help set DOI policy. Because no library association, university, or other such body is a member of the Foundation, institutions like libraries have had only indirect input into the policies being implemented. (Some Foundation society members, such as the American Mathematical Society, the Association for Computing Machinery, and the Institute for Electrical and Electronic Engineers, share at least some common interests with libraries and other non-profit institutions.) Libraries are being consulted to a limited extent through the NISO standards-development process of the DOI, as library representatives are on the NISO committees. Recently the Coalition for Networked Information has been asked "to help increase understanding of the DOI's objectives and roles, particularly as they related tolibrary services, and to help to suggest ways in which the DOI might be made more useful to the broader bibliographic community."[20] This is in line with the Foundation's stated aim to coordinate with authors, libraries, and subscription agents, but the role of libraries thus far has been, at best, peripheral. There has been almost no official discussion between the DOI and any library organization such as the American Library Association or Special Libraries Association. In this environment it seems unlikely that the DOI will be designed to meet the needs of libraries, although libraries will certainly find DOIs useful. Indeed, the mission statement of the Foundation, adopted in October 1997, clearly stated that the primary function of the IDF was to "insure that the system meets the needs of publishers." However, the Foundation has since removed its mission statement from its DOI Web site for discussion and revision.

"It is easy to see how the in-house maintenance costs of DOIs could become burdensome and expensive for publishers"

When it was first set up, the Foundation anticipated operating under well-organized and stringent guidelines. Whether these guidelines still operate is unclear, however, since they are under revision. The Foundation is expected to make the new draft guidelines available for study at their Web site. It is likely that a similar degree of quality control over participants will be necessary for the health of the system. Under the original guidelines, publishing companies or other entities that wish to participate in the DOI System must agree to have a designated intermediary, called a Requestor, who is responsible for all transactions with the Foundation and for maintaining the accuracy of the links to specific URLs and the validity of the URLs. The guidelines also say that URLs must be under the organization's control, and the prefix and password may only be used by the organization that requested and paid for them. Finally, the guidelines say that DOIs can only be assigned to materials to which the organization has copyright privileges, and the requestor (i.e. the publisher or rights-owner that manages the objects and their associated DOIs) is liable for any damages caused by malfeasance. Such restrictions and organizational requirements are clearly primarily designed for a corporate environment, and would be difficult to adequately support and enforce in many less-formal institutions such as universities or professional societies. In those organizations even officially sanctioned Web publishing often is decentralized, and it would be extremely burdensome (and often bureaucratically absurd) to establish and fund central clearinghouses.

DOI Assignments

Publishers buy unique DOI prefixes from the Foundation, at a price currently set at $1,000 a prefix. A publisher might well want separate prefixes for different corporate subdivisions or for different imprints. There is no limit on the number of prefixes a publisher may purchase to meet operational needs. The publisher-assigned prefixes, however, should not be thought of as containing meaning that is useful for identifying a particular publisher or a specific piece of information. It is quite likely that publishers will sell some items or even whole imprint series to other publishers. Thus, while a publisher might start out owning every digital object marked with that publisher's original prefix, as objects are sold those prefixes will become associated with other publishers. Thus the connection of the prefix to the owner would have no long-term identification content. (The same issue already exists with the ISBN.)

Publishers will eventually have to pay an annual fee for each DOI they register on the DOI server. That fee will support the DOI server that stores information about registered digital objects. The charge has not yet been established, but may be set anywhere from one cent to ten cents an object. As the revenue received from these fees will need to cover operating costs, it is entirely possible that the Foundation will have to charge even more than ten cents per DOI per year. For publishers like Academic and Wiley that have tens of thousands of DOIs already registered with the Foundation, such annual costs could become considerable as the numbers cumulate over the years, especially if DOIs are used to define parts of objects as well as whole ones. As publishers increase the granularity of the material that is identified (objects such as encyclopedias or image collections might have hundreds, or even thousands, of identifiers) these annual costs will increase proportionally. Cost models for this maintenance await real implementation.

In the long run, however, a bigger expense than the direct costs of the DOI System may well be the overhead costs associated with the staff needed to administer the DOI System within each organization. Some of the detailed housekeeping for such a system could be handled by machine, although programs capable of doing so efficiently have not yet been developed. The intangible nature of the digital inventory will necessitate the maintenance of a high degree of detail, accuracy, and completeness of descriptive data about each object. Details such as size and rights — communicated in the form and format of the printed book — now must be logged for each object into a searchable file system that is ubiquitously available and persistently maintained. Programs might be set up that would oversee and continuously monitor the local state of articles and their locations, and automatically send in information to the central DOI server as needed. Although it hasn't yet been attempted, the DOI System currently plans on doing this by periodically "pinging" each DOI's linked URL to make sure it is still at least a valid site. When such a test finds a URL to be invalid, then the system will no doubt notify the publisher and ask for an update or correction. It is easy to see how the in-house maintenance costs of DOIs could become burdensome and expensive for publishers, since staff expenses would also mount proportionally with the number of DOIs that are registered.

It is a new responsibility for publishers to work extensively with such descriptive data, especially as that data must conform to exacting standards acceptable to a wide variety of systems. Location information, rights information, and administrative data are all volatile and require vigilant upkeep procedures.

One of the more valuable uses of DOIs will be to enable publishers to link directly from one publisher's product, such as a bibliographic database or article citation, to another publisher's abstract or full-text version of the object. As Clifford Lynch has pointed out, the ability to create such "actionable" citations is one of the more powerful and significant features of such identifier systems — and one of the more problematic.[20] An interesting collaboration [formerly http://www.apnet.com/www/doi/gallery.htm#Journals] is under way between Academic Press and John Wiley & Sons to show how such a system might work. In their prototype, citations in an electronic-journal article incorporate DOIs that are linked directly to the text of the original articles in either of the two publishers' online-journal databases. That is an area where the DOI can provide a real benefit to the academic community. In actual practice, of course, the operation of such a system would require that the user have access rights to both journal databases, or at least would be willing to pay to see each of the original articles — only one of the difficulties that is involved in the actual implementation of such a system.

"The DOI is evolving into a digital ID for intellectual-property trading, not unlike the UPC (Universal Product Code) for the retail trade"

The DOI can also be used to take the user directly from a citation or abstract to the object itself. The ability of the DOI to keep track of the current location of that object "is especially useful with documents that may move from server to server, software that becomes available in new versions, digital music in different file formats, or a scientific article that has been revised or augmented."[21]

What a DOI Identifies and/or Resolves To

The Foundation is currently grappling with the issue of what the DOI should identify. This is one of the more difficult and important issues to be decided before the DOI can become a fully operational system. Today a DOI is used to identify digital or print documents, images, sounds, video clips, parts of works, works under development or evolving works, or even constantly changing sources like news headlines or stock quotes (the "ongoing entities" concept discussed by Reynolds, as described in the DOI Discussion paper).[22] How the DOI will reflect a work versus a version of a work, and how different versions of the same work will be identified (whether these are format differences or differences in content based on a chronology of development) have yet to be established. Increasingly, it appears that publishers will create their own in-house policies. One development that has helped clarify concepts is the clear separation between intellectual-property objects and services. Any new developments of DOI assignments are expected to include that separation. Delivery and presentation formats are defined as services that use DOIs to resolve to an object.

Also, if accurate and complete descriptive data is easy to get to, solutions to problems of defining relationships and establishing definitions will be decoupled from the object identifier. Another development is toward the assignment of DOIs to printed books and other non-digital creations, as defined by Godfrey Rust [23],[24]and more recently Norman Paskin, since "the scope of what DOI identifies is defined by meaningful content that is to be traded electronically, not by digital manifestation."[25]

The DOI is evolving into a digital ID for intellectual-property trading, not unlike the UPC (Universal Product Code) for the retail trade. Some implementations of the CNRI Handle System, such as those at the Library of Congress, can point to several digital objects. The current DOI System, however, requires that a DOI only be assigned to a single digital object defined both by its content and its data type. This is termed a "level 1 DOI," but a change has been proposed to allow a "level 2 DOI"[26] to point to different versions of the same material. Such a change will require some changes in the DOI System implementation policy.[27]

Most of the time when people click on a DOI from a commercial publisher, they will see a description of the item, and an invoice rather than the object itself. That is because the system must be flexible enough to meet the proprietary business needs of publishers. Nevertheless, such an intermediate step appears as a barrier or hurdle for the scholar doing research. For commercial journals and other subscription-based objects, only those users who can demonstrate access-and-use privileges (through their IP address, for example, or by entering a password, or perhaps by entering a credit-card number) will be allowed to proceed on to the desired object. Some users may be part of large institutions that have subscriptions to the output of many, or even most, large publishers, but that will surely not be a situation encountered by the majority of users. Berinstein warns that the frustration of frequent blocks to access will quickly cause users to lose interest in using DOIs.[28] As long as DOIs are primarily used for commercial products, their general acceptance could well be inhibited because access within the scholarly community to those products will be limited to users with purchased or licensed rights privileges. Of course, one way around that conundrum would be for every user to begin a session by signing on with an e- cash debit card that would automatically be debited for each financial transaction that occurred. Such a scenario, interestingly enough involving a library patron, was presented in the white paper In Search of the Unicorn, [formerly http://www.bic.org.uk/unicorn2.pdf] written by Mark Bide for the publishing industry. It is clearly a concept not foreign to the strategic plans of the industry.

DOIs and Rights Management

In its current implementation, the DOI System functions primarily as a means to "facilitate electronic commerce and enable copyright management systems."[29] The DOI concept was first proposed by the AAP as a means of establishing copyright control; its original hope was to fix DOIs in some irreversible manner to copyrighted online materials, perhaps as part of something like a digital watermark. It was soon discovered, however, that this was not feasible with the technology then available.[30] There are still some technical problems associated with adding digital watermarks (usually hidden digital codes or images not visible to the user, but recognizable by means of special programs or procedures) to text objects. For instance, they can cause letters to jam together on the screen or in a printout. However, the experimental ByLine system, designed by the Authors' Licensing and Collecting Society of the UK in cooperation with the European Commission-sponsored IMPRIMATUR project, claims to be able to embed DOIs as indelible watermarks in each digital object, either text or image. ALCS claims that its product will allow the tracking and policing of any use of such watermarked materials, for example, when copies are transferred to other sites. Similarly, FileOpen Systems markets a method of incorporating DOIs into PDF files.

"Any guarantee of persistence in the digital environment, where objects can appear and disappear nearly as rapidly and mysteriously as quantum particles, or, like rapidly mutating bacteria, divide and evolve into dozens of related objects, is going to require a lot of ongoing detail work"

DOIs as Persistent Identifiers

If the material does disappear from the Internet and the original publisher does not supply a response page explaining its disposition, then the DOI resolver database server might or might not be obligated to provide such information (the policy is still under discussion). Whether a response from the DOI System that an object is no longer available would be preferable to the notorious 404-error message is questionable. Only if the DOI-generated messages were significantly fewer than the current frequency of 404-error responses would the DOI System be an improvement. In a worst-case scenario, DOI negative-response pages could become as common as 404-error responses are now. However, by primarily allowing only major, or at least responsible, publishers to issue DOIs, the Foundation's early guidelines clearly hoped to minimize this sort of vacant response. It should be remembered that, once issued, a DOI would probably never truly disappear from the Internet. Copies could last indefinitely in various files and bookmarks scattered all over the globe.

Because of the persistence of the DOI, the Foundation would be impotent if it could not guarantee that records of all DOIs will be indefinitely available in some form, if only within a separate archive database. Some type of archival record should be kept even of those DOIs for which errant publishers are no longer paying their annual DOI-upkeep fee. The Foundation policy on dead DOIs is still very much in flux and a final policy on how to deal with this problem has yet to be decided. However, Andy Powell, coordinator of the UK Office for Library and Information Networking, has made the point that identifiers need to significantly outlast their objects and "probably need to outlast current Internet technology and computer systems."[31] DOIs are, after all, defined as examples of "generic" URNs[32] and URNs in turn are defined as identifiers that provide both persistence and availability. Practically, however, any guarantee of persistence in the digital environment, where objects can appear and disappear nearly as rapidly and mysteriously as quantum particles, or, like rapidly mutating bacteria, divide and evolve into dozens of related objects, is going to require a lot of ongoing detail work.

"If too many borderline institutions are allowed to participate in the DOI System, the system may ultimately collapse through the accumulation of identifiers for objects that no longer exist"

Buying a DOI prefix gives the publisher (or, more precisely, the rights-owner) the right to attach unique identifier suffixes for each digital object it wishes to tag. Under the original set of Foundation guidelines, DOI prefixes would only be issued by the DOI agency to those that it determined had the means and intent of managing their DOIs in a manner that met the Foundation's very strict contractual requirements. For the most part, major publishing companies and larger societies, bodies that produce materials that are already relatively stable on the Internet, would most assuredly meet these criteria. Even though DOI-issuing bodies do not yet include institutions like libraries or other non-traditional publishers, the Foundation's original information pages stated that the organization envisioned that such groups, along with "perhaps mid-size and small publishers," would be represented. It is promising that the Foundation has recently loosened restrictions on the types of institutions that can issue DOIs and now says that any rights owner can buy a prefix. How this will work out in practice has yet to be tested, and it would probably be counterproductive for the Foundation to accept too wide a variety of publishers.

The preceding discussion raises a major issue and a quandary, both for large commercial publishers and non-traditional ones like universities and libraries. We believe that the DOI System, or some other handle- based system, provides the only currently viable means of identifying online objects with any degree of persistence across a variety of publishers. If, however, too many borderline institutions are allowed to participate in the DOI System, the system may ultimately collapse through the accumulation of identifiers for objects that no longer exist. On the other hand, should such non-traditional publishers not be allowed to issue DOIs and participate in the DOI System, they will likely be forced to create parallel systems with many of the same features, and police these systems themselves to guarantee their quality. This is not a trivial task, as illustrated by the money, complexity of design, and sophistication of organization that is going into the construction of the DOI System. It certainly does not seem likely that all types of publishers, from individuals and political or religious groups to the existing variety of small publishers and societies, will be able to indefinitely expend the time and money needed for the sophisticated, ongoing information maintenance that a system like the DOI requires. Clearly the problems associated with the development of any comprehensive identifier system for digital objects is going to be very difficult to implement. It may be that there simply is no identifier that will prove to be an adequate, affordable solution for these non-mainstream information providers, even though such groups will probably continue to be responsible for the majority of material published on the Internet.

Summary of Selected Concerns

The importance of the work being done on the design of the DOI System, and its consequences with respect to digital identifiers in general, would be difficult to overrate. Solving the problems of identifying specific objects on the Internet is extremely important, and the work being done on the DOI System will help with that solution. Still, there are a number of current issues concerning this system that have no easy solutions and particularly concern us:

  1. At present, only established commercial and society publishers are purchasing publisher prefixes and so are allowed to issue DOIs. This means that most individual or non-traditional publishers are not participating directly in the DOI System, but are merely acting as end users. Since the biggest problems with URL stability and the lack of persistence of Internet objects lies outside the products provided through large publishers, it is unclear how the DOI System is going have any generally beneficial effect on the solution of the Internet's problems.

  2. Those who participate in the DOI System will need to include in their operating costs the overhead of detailed housekeeping of the DOIs and each item's associated metadata, upon which many of the DOI's more advanced functions will depend. In addition, there are the fees that the Foundation will need to levy to support the maintenance of the resolver-databases server for the continued tracking of traded, retired, erased, or simply forgotten and abandoned identifiers. Even with computerized aids, the cost to publishers of maintaining the robust and persistent matrix of numbers and descriptive text that a handle-based system requires will be considerable. Under the current model, the annual fees exacted by the Foundation from its participating publishers must cover operating expenses. Since no one yet knows how high these fees might be, we are concerned that costs for smaller publishers and not-for-profit participants might be so prohibitive that they will be largely excluded.

  3. At up to 128 characters, DOIs are simply too long to be practical outside of the digital universe. The Publisher Item Identifier (PII), for example, at seventeen characters, is a much more reasonable length and probably is still long enough to identify every item we will ever need to identify. Indeed, Norman Paskin estimates that only 1011 digital objects will ever require identification.[33] Since it is unlikely that we will never need to copy DOIs manually from print into electronic format, and since both their length and limited affordance (mnemonic content) will make it difficult to transfer them accurately by any manual means, this could turn out to be a nuisance factor that will hinder their widespread acceptance. Long identifiers are also harder to code into watermarks, especially in text objects that lack background noise in which to hide such data.

  4. DOIs will probably not lead to more open access to online materials, at least to those commercially published. In fact, most DOI queries from most users, except for those that can demonstrate access rights, will probably lead to invoice forms of one sort or another rather than directly to the primarily requested object. This aspect of the DOI System could make the Internet even more frustrating for the majority of the users than it is now.

Conclusion

The DOI System is still very much a work-in-progress and many of the issues that concern us may well be solved over the next few months. We are greatly encouraged by the willingness that the Foundation has shown in modifying its system design in response to comments and criticisms from both inside and outside parties. For example, the Foundation sponsors an open online forum on the DOI [formerly http://www.doi.org/maillist-info.html]. Anyone may participate, and Norman Paskin, the forum coordinator and Director of the IDF, reads all comments and suggestions.

Certainly an identifier system like the DOI is needed for the continued viability of the Internet, and if the DOI did not exist then one of the other identification schemes[34], [35] would be the subject of this paper. However, it is not a perfect system, especially from the point of view of the scholarly community. It does not provide a stable, reliable, affordable, and standardized system that everyone can use. It does not offer rights holders an efficient mechanism for fair return on intellectual effort while providing adequately for fair use. While the DOI System will help to satisfy many of the needs of the major publishers, both in supporting commercial interactions and in protecting intellectual property, those publishers produce only a fraction of the total material available on the Internet. Besides, their materials were already under good control and relatively stable before the DOI was introduced.

However, if the majority of those publishing on the Internet are not able to participate directly in the DOI System, either because of its expense and complex upkeep requirements or because they cannot meet the Foundation's stringent standards for participation, then it will be up to them to develop a comparable system on their own. A clear demonstration by the Foundation that the handle-based DOI System is adequate to systematize Internet materials for at least major publishers might give other groups the incentive they need to collaboratively develop an operational but more affordable parallel system.

The DOI provides a creative opportunity for publishers to introduce systems that protect them from serious financial losses accruing from the misuse of digital materials, while allowing the continuation of socially beneficial institutions and policies like fair-use and inter-library loan. However, if end users find that the DOI begins to function too frequently as a hindrance in their attempts to access scholarly information, they might well be dr iven to less formal literature resources, such as the proliferating preprint archives.

So far, such alternative resources are best developed in the sciences, for example, the Los Alamos physics preprint archive, the various mathematics preprint archives [formerly http://euclid.math.fsu.edu/Science/Preprints.html], and the human geneticists and molecular biologist's preprint collection. They could certainly expand into other fields as well if traditional access routes become infeasible. Similarly, should universities continue to find that they can neither afford to subscribe to journals, nor gain access through other channels to the scholarly output of their faculties, it is likely they, too, will need to explore alternatives. Indeed, that is beginning to happen, as evidenced by the recent controversy over whether academic authors should retain copyright over their works.[36] The Association of Research Libraries' SPARC project is another example of an attempt by members of the scholarly community to re-establish control over their intellectual products.[37]

It is obvious that it would be best for publishers to collaborate and cooperate with their clients, such as libraries, to establish DOI System policies and applications that adequately support both distribution and access.

Bill Rosenblatt, in his article on the DOI, illustrated the relationship publishers find themselves in with respect to the DOI by quoting a statement made by Benjamin Franklin at the signing of the Declaration of Independence. We will end with the same quote, but suggest that it might better apply not just to relationships between publishers, but also to those between publishers and their close partners in the organization and distribution of scholarly information, libraries: "We must all hang together, or assuredly we shall all hang separately."



Lloyd Davidson is the life-sciences librarian and bibliographer and head of access services at Northwestern University's Seeley G. Mudd Library for Science and Engineering. He just finished a year as a fellow in the Alice Berline Kaplan Humanities Center at Northwestern, during which he studied the ways in which faculty from all disciplines use electronic communication, particularly e-mail and the Internet. He is also an adjunct associate professor at the School of Library and Information Science at Dominican University in River Forest, IL, where he teaches science-reference sources. He is interested in DOIs from the user's perspective, based on his work with the Library and Information Technology Association's electronic-publishing/electronic-journals interest group, which he founded in 1996. For that group he organized two panels on DOIs in 1998, at the American Library Association's midwinter and annual meetings. He is currently organizing a LITA workshop on identifiers for the 1999 ALA annual meeting in New Orleans. He holds a B.A. and M.A. in Paleontology and a Ph.D. in Cell Biology from the University of California in Berkeley, as well as an M.L.I.S. from Indiana University at Bloomington. You may contact him by e-mail at Ldavids@nwu.edu.

Kimberly Douglas is the director of the Sherman Fairchild Library for Engineering and Applied Science and head of the library system's technical-information services at the California Institute of Technology. She received her MS in library science from the Long Island University in 1978. Since that time, she has held positions of increasing responsibility in scientific-research libraries, first with the Bigelow Laboratory of Ocean Sciences in Boothbay Harbor, Maine and then at the University of Southern California. At USC she was director of the Hancock Library of Biology and Oceanography from 1982-1985 and then head of the Science and Engineering Library from 1985-1988. She began at Caltech in 1988 as head of reader services and has been involved in developing and implementing automated systems since that time. She is currently a member of the NISO Standards Committee that is developing an American National Standard Syntax for Digital Object Identifiers. You may contact her by e-mail at kdouglas@caltech.edu.

References:

    1. Lawrence, Steve and C. Lee Giles, "Searching the World Wide Web," Science vol. 280, (3 April 1998) pp. 98-100. [doi: 10.1126/science.280.5360.98]return to text

    2. Ford, Charlotte E. and Stephen P. Harter, "The Downside of Scholarly Electronic Publishing: Problems in Accessing Electronic Journals through Online Directories and Catalogs" C&RL Vol. 59, No. 4 (July 1998) pp. 335-346. http://www.eric.ed.gov/ERICWebPortal/custom/portlets/recordDetails/detailmini.jsp?_nfpb=true&_&ERICExtSearch_SearchValue_0=EJ572238&ERICExtSearch_SearchType_0=no&accno=EJ572238 (Abstract only)return to text

    3. Pogrebin, Robin, "For $19.95, Slate Sees Who Its Friends Are.," New York Times, Monday, 30 March 1998, C1 (col. 2), C7 (col. 1).return to text

    4. Software Piracy Information Site. [formerly http://www.nopiracy.com/]return to text

    5. Kipi Tape Disc Business (music piracy site) See, e.g. RIAA Files Civil Action. [formerly http://www.kipinet.com/tdb/tdb_aug97/dept3.htm]return to text

    6. DOI Discussion Paper, version 3 (17 August 1998), p. 9. [doi: 10.1000/92]return to text

    7. Rosenblatt, Bill "The Digital Object Identifier: Solving the Dilemma of Copyright Protection Online," The Journal of Electronic Publishing vol. 3, no. 2 (December 1997). [doi: 10.3998/3336451.0003.204]return to textreturn to text

    8. Bide, Mark, "In Search of the Unicorn: The Digital Object Identifier from a User Perspective," British National Bibliography Research Fund Report, Book Industry Communication, London, November 1997. [formerly http://www.bic.org.uk/bic/unicorn2.pdf]return to text

    9. DOI Foundation, "Introduction to the Digital Object Identifier." http://www.doi.org/about_the_doi.htmlreturn to text

    10. Powell, Andy, "Resolving DOI Based URNs Using Squid: An Experimental System at UKOLN" D-Lib Magazine (June 1998) http://www.dlib.org/dlib/june98/06powell.htmlreturn to text

    11. DOI Discussion Paper (17 August 1998), p. 11. [doi: 10.1000/92]return to text

    12. Berinstein, Paula "DOI: a new identifier for digital content," Searcher vol. 6, no. 1 (January 1998) pp. 72-78 [formerly http://www.infotoday.com/searcher/jan98/story4.htm]return to text

    13. DOI Discussion Paper (17 August 1998), pp. 23, 24. [doi: 10.1000/92return to text

    14. DOI Discussion Paper (17 August 1998), p. 9. doi: 10.1000/92]return to text

    15. Bide, Mark, "In Search of the Unicorn: The Digital Object Identifier from a User Perspective," British National Bibliography Research Fund Report, Book Industry Communication, London, November 1997. http://www.bic.org.uk/bic/unicorn2.pdfreturn to textreturn to text

    16. Announcement of DOI demonstration at 1998 Frankfurt Book Fair. [formerly http://www.ipa-uie.org/ipa_iic.html]return to text

    17. Holtzman, David, "Domain Names: Will We Run Out." [formerly http://comsvc.on.ca/endlessdomains.html]return to text

    18. Tuck, Bill "Rights Management: Issues and Solutions" Information Online & On Disc 97: The Eighth Australasian Conference and Exhibition http://www.csu.edu.au/special/online97/proceedings/onl201.htmreturn to text

    19. DOI Board. [formerly http://www.doi.org/board-announce.html]return to text

    20. Lynch, Clifford, "Identifiers and Their Role in Networked Information Applications", ARL: A Bimonthly Newsletter of Research Library Issues and Actions. No. 194 (October 1997). [formerly http://www.arl.org/newsltr /194/identifier.html]return to textreturn to text

    21. Green, Brian and Mark Bide, "Unique Identifiers: A brief introduction." [formerly http://www.bic.org.uk/bic/uniquid]return to text

    22. DOI Discussion Paper 1(3) (17 August 1998), p. 12. [doi: 10.1000/92]return to text

    23. Rust, Godfrey, "The Fire and the Rose: An Integrated model for Descriptive and Rights metadata" Data Definitions, 11th June 1998. [formerly http://www.bic.org.uk/rights.html]return to text

    24. Rust, Godfrey, "Metadata: The Right Approach; An Integrated Model for Descriptive and Rights Metadata in E-commerce" D-Lib Magazine (July/August 1998) http://www.dlib.org/dlib/july98/rust/07rust.htmlreturn to text

    25. DOI Discussion Paper 1(3) (17 August 1998), p. 20. [doi: 10.1000/92.] See also: DOI Discussion Paper 1(3) http://www.doi.org/handbook_2000/policies.htmlreturn to text

    26. DOI Discussion Paper (17 August 1998), p. 16. [doi: 10.1000/92]return to text

    27. DOI Discussion Paper (17 August 1998). [doi: 10.1000/92]return to text

    28. Berinstein, Paula, "DOI: a new identifier for digital content," Searcher vol. 6, no. 1, pp. 72-78 (January 1998). [formerly http://www.infotoday.com/searcher/jan98/story4.htm]return to text

    29. "Digital Information Objects and the STM Publisher" Reproduced from STM Annual Report, 1997 [formerly http://www.elsevier.nl/inca/homepage/about/diginfo/]return to text

    30. Green, Brian and Mark Bide, "Unique Identifiers: A brief introduction." [formerly http://www.bic.org.uk/bic/uniquid]return to text

    31. Powell, Andy, "Unique Identifiers in a Digital World", Ariadne, Issue 8 (9 April 1997). http://www.ariadne.ac.uk/issue8/unique-identifiers/return to text

    32. DOI Discussion Paper (17 August 1998), p. 18. [doi: 10.1000/92]return to text

    33. Paskin, Norman, "Information Identifiers," Learned Publishing vol. 10, no. 2, pp. 135-156 (April 1997). [doi: 10.1087/09531519750147139]return to text

    34. Biblink list of Existing Identification Schemes. [formerly http://hosted.ukoln.ac.uk/biblink/wp2/d 2.1/doc0005.htm]return to text

    35. PADI, Preserving Access to Digital Information, "Unique Identifiers for Digital Information." http://www.nla.gov.au/padi/return to text

    36. Bachrach, Steven, R. Stephen Berry, Martin Blume, Thomas von Foerster, Alexander Fowler, Paul Ginsparg, Stephen Heller, Neil Kestner, Andrew Odlyzko, Ann Okerson, Ron Wigington and Anne Moffat, "Intellectual Property: Who Should Own Scientific Papers?" Science vol. 281 (4 September 1998) pp. 1459-1460. [doi: 10.1126/science.281.5382.1459]return to text

    37. Rambler, Mark, "Do It Yourself? A New Solution to the Journals Crisis" Lingua Franca (December/January 1999) pp. 61-69return to text

    Links from this article:

    Slate Magazine. (http://slate.msn.com)

    Academic Press. (http://www.academicpress.com/)

    John Wiley & Sons. (http://www.wiley.com/)

    Springer-Verlag. (http://www.springer.de/)

    Elsevier Science. (http://www.elsevier.com/)

    Elsevier's GENE-Combis online journal. [formerly http://www.elsevier.nl/journals/genecombis/Menu.html]

    Association of American Publishers. (http://www.publishers.org/)

    Handle System Resolver®. [formerly http://www.doi.org/resolver.html]; also Handle System. (http://www.handle.net)

    Corporation for National Research Initiatives. (http://www.cnri.reston.va.us/)

    Description of the ISBN standard. (http://www.isbn.org/)

    Description of the ISSN standard. (http://www.issn.org/)

    Copy of the Serial Item and Contribution Identifier (SICI) Standard, ANSI/NISO Z39.56-1996 Version 2 (http://sunsite.Berkeley.EDU/SICI/)

    Wiley DOI Prototype Server and Finder. (http://doi.wiley.com)

    Academic Press DOI Finder.[formerly http://www.hbuk.co.uk/tony/doi/DOIFinder.cgi]

    International DOI Foundation Home Page. (http://www.doi.org)

    Wiley-Academic Press collaboration in cross-linking. [formerly http://www.apnet.com/www/doi/gallery.htm#Journals]

    Bide, Mark, "In Search of the Unicorn: The Digital Object Identifier from a User Perspective," British National Bibliography Research Fund Report, Book Industry Communication, London, November 1997. [formerly http://www.bic.org.uk/bic/unicorn2.pdf]

    ByLine System. [formerly http://www.universalbyline.com/]

    Authors' Licensing and Collecting Society of the UK. (http://www.alcs.co.uk/)

    FileOpen Systems. (http://www.fileopen.com/)

    DOI Foundation policy-discussion page (http://www.doi.org/policy.html)

    Los Alamos physics preprint archive. (http://xxx.lanl.gov/)

    List of mathematics preprint archives.[formerly http://euclid.math.fsu.edu/Science/Preprints.html]

    Human geneticists and molecular biologist's preprint collection. (http://www.hum-molgen.de/)

    Scholarly Publishing and Academic Resources Coalition. (http://arl.cni.org/sparc/)

    Discuss-DOI forum sign-up page. [formerly http://www.doi.org/x-info.html]