Digital Libraries and the Need for a Universal Digital Publication Format

Hillesund, Terje; Noring, Jon E.

doi:https://doi.org/10.3998/3336451.0009.203

EPUB
Print
Share+
- Twitter
- Facebook
- Reddit
- Mendeley

Digital Libraries and the Need for a Universal Digital Publication Format

Terje Hillesund and Jon E. Noring

Journal of Electronic Publishing

Volume 9, Issue 2, Summer 2006

DOI: https://doi.org/10.3998/3336451.0009.203

Permissions: This work is protected by copyright and may be linked to without seeking permission. Permission must be received for subsequent distribution in print or electronically. Please contact mpub-help@umich.edu for more information.

For more information, read Michigan Publishing's access and usage policy.

This paper was refereed by the Journal of Electronic Publishing's peer reviewers.

ABSTRACT

Reports have revealed low uses of e-books and other lengthy texts held in digital libraries. In this article we claim that one of the main reasons for the lack of interest is the current multitude of end-user text formats, some oriented towards print, others proprietary, and few optimized for sustained reading of text-intensive publications. We note IDPF's reluctance to develop a common digital publication format, discuss requirements for a universal, open-standard end-user format, and present the effort to establish such a format by the OpenReader Consortium. The main objective of the article is to examine the pros and cons of a universal, reader-oriented text format for different types of critical text editions and digital libraries.

Digital libraries need a universal digital publication format. Today, digital publishing suffers from a plethora of incompatible end-user text formats. Some formats were originally created to facilitate print publishing; others are proprietary, tied to specific operating systems, reading applications and reading devices. An open-standard, fully cross-platform, end-user format will alleviate the format chaos, and specialized open-source reading software will enhance the reading of text-based digital publications. Digital libraries, and their users, will greatly benefit from this standardization.

Our assertion is supported by the reported low usage of freely available e-books in digital libraries,[1] a situation paralleled by low commercial sales of e-books.[2] Some potential users (teachers, researchers, and students) express doubts about the authoritativeness and accuracy of free e-books produced by some projects, especially classical works transcribed from printed editions. Others complain about the lack of availability of complete ranges of titles. However, the slow diffusion of e-books is partly due to usability issues: most people find printed books more readable and easier to navigate than most e-books — a circumstance we believe is due to current format limitations — and so they print out the e-books so they can read them. For digital libraries, this situation represents a paradox: they expend a large amount of resources in digitizing texts, which in the end — and once again — are printed and read on paper.

In Books in the Digital Age, Thompson states that the modest use of digital books is explained by hardware limitations, a multitude of proprietary formats, inconvenient DRM systems and high prices.[3] To the list, we would also have added unsatisfactory software designs. To address the format and software issues, currently there are efforts to establish an interoperable open-standard, end-user, digital-publication format. Among the initiatives are OpenReader[4] and OpenBerg.[5] OpenReader is the more ambitious of the two; in addition to developing a common digital publication format, the OpenReader Consortium is working with commercial and open source advocates to develop OpenReader user agents (reading software) for a very wide range of digital computing platforms.

In this article, we will: 1) define and describe "digital reading," 2) give a brief historical explanation of the format chaos, 3) place OpenReader in the historical context, and 4) discuss the potential use of the OpenReader format in digital libraries.

Digital reading

There are constant discussions on the future designs of digital libraries, where digital libraries is a broad term encompassing scholarly archives, text collections, cultural heritage, and educational resource sites.[6] It is important to point out the obvious fact that whatever the design — collaborative or user-oriented — much of the information in digital libraries is preserved for the purposes of access and reading. This has led researchers to call for stronger emphasis on the user's point of view in digital libraries, in addition to technological and organisational viewpoints, especially on interoperability,[7] and, in the case of reading applications, a call to comply with preferences of real readers.[8]

Reading is the aim and essence of all text production. Screen reading — or digital reading — has become immensely widespread and is an integral part of interactive communication and reception of all kinds of digital texts. Digital texts have generated a wide range of new reading practices, such as complementary reading of text in multimedia and Web presentations, e-mail, Web browsing and e-learning. Computer users have to develop multiple literacies,[9] but despite a wide variety of uses, today's display screens fall short when it comes to reading lengthy texts such as journal articles and books. Research heavily supports the general belief that most people still prefer to read lengthy texts on paper.[10]

Reading, or the process of understanding written linguistic messages, is a complicated cognitive task involving important parts of the brain. It takes years of training to become a fluent reader. Even small disturbances in typography, ergonomics, or word understanding can disrupt the reading process and bring the act of reading to a stop.[11] Today's commonly used digital equipment and software cannot compete with printed paper as a medium for sustained reading.

For libraries distributing digital texts, this situation represents a challenge. To analyze the challenge, we dichotomize the concept of reading, and study the uses of digital libraries in light of two types of reading: intentional reading and functional reading:[12]

Intentional reading, as defined here, is attentive reading of literature. It is primarily an activity we perform to be entertained or receive information. Usually this form of reading lasts for some while, as when we read newspapers, journal articles, or books. Intentional reading is a constrained, linear reading form heavily influenced by the printed-book culture.
Functional reading, by contrast, is what we engage in when we manipulate different types of content. For example, functional reading is what we do when we browse the Web, search text databases, or write. In these cases reading is a part of more complex tasks.[13] Most screen reading is functional; it is an integral part of the wider activities of searching, studying, and creating texts.

In practice, reading forms a continuum from pure intentional reading (e.g., the reading of novels) to pure functional reading (e.g., reading involved in the act of writing). Most reading activities related to digital libraries, such as study and research, can be placed somewhere between the extremes of this continuum.

When it comes to digital texts, the type of reading varies with the nature of the digital texts held by digital libraries. For example, the use of many critical editions is dominated by functional reading: students and researchers closely read texts in order to study word forms, grammatical constructions, literary styles, or philosophical arguments. On the other hand, intentional reading is more dominant with classical and scholarly literature. For digital libraries that hold a wide range of digital texts, both modes of reading are significant, as when students read lengthy texts and do closer studies of the same texts. Even if functional reading dominates in certain digital libraries, those libraries could enhance and increase readership with reading software optimized for intentional reading of texts.

Comments on Text Formats

Sustained or intentional reading of digital texts on screen displays is difficult in part because of hardware limitations and ergonomics: stationary computer screens require static reading positions, and poor type representation impedes reading and sometimes causes eyestrain. Many of these shortcomings are being overcome with better handheld devices, higher resolution screens, and the use of new screen technologies such as electronic paper. In the future, the hardware challenges to readability will be solved. However, in addition to display ergonomic issues, intentional reading is also dependent on properly designed reading applications and formats. Most current formats and reading software are rooted in print technology; they are usually not optimized for intentional screen reading on a wide range of display types and sizes.

To better explain the digital transformation of text, we introduce the concept of a text cycle. A text cycle consists of several interrelated phases, such as writing, storing, producing, distributing, and reading. After centuries dominated by written and printed text cycles, in which all phases were based on ink and paper, digitization has established a new set of cycles: digital text cycles.[14] Nevertheless, the digitization of the different phases of the text cycle has been a gradual process.

In the 1960s, digital text encoding standards such as ASCII were developed. By the 1970s, keyboards and computer screens became the interface between man and computer. Beginning in the 1980s, powerful word processors and desktop publishing applications were developed. The writing and production phases of the text cycle were thus digitized, but the applications were primarily designed to facilitate print production. The accompanying file formats mixed text content with relatively inflexible, fixed-page typographical presentation information oriented towards printing. Microsoft's DOC and Adobe's PDF formats are well-known examples. It wasn't until the early 1990s, when HTML was developed for use on the World Wide Web, that we saw a fundamental change in the distribution phase. HTML is a mix of content and presentation markup, oriented, for the first time, towards reflowable rather than fixed-page presentation. Nevertheless, the focus of HTML was primarily towards functional reading.

At the time, most electronic publishingwas a digitization of the print process; it ignored possibilities that would meet the special needs of digital reading. Several text formats from this period are common in digital libraries: ASCII, DOC, PDF, and HTML. None of these formats were designed specifically for intentional digital reading. All of these formats (with the exception of HTML) are oriented towards fixed-page presentation such as print. The use of these formats (particularly PDF and DOC) is one of the reasons the reading phase of the text cycle is still dominated by paper.

In the late 1990s, XML was developed partly to address various shortcomings of the underlying structure of HTML, with the intent of conforming HTML to what is now called XHTML. But the use of XML goes well beyond its application to normalize HTML markup. It is used today in all kinds of text and database applications with a broad-range of markup vocabularies. Today XML is widely employed in many commercial workflows to produce various types of digital documents and publications.

XML is also the underpinning of the open standard Open eBook Publication Structure (OEBPS), a universal e-book production and exchange format, first developed in 1999. OEBPS is currently maintained by the International Digital Publication Forum (IDPF). The advantage of OEBPS (now at version 1.2)[15] is that it intelligently utilizes XML to create e-books that are flexible in presentation and better adapted to intentional digital reading. Because it is reflowable (like HTML), OEBPS allows the publication to be "typeset" on the end-user side, thereby optimally adapting the publication to end-user hardware and typographic preferences such as font size. The page-based formats described previously do much of the typesetting on the publisher end, thus reducing or even eliminating the ability to optimally adapt to end-user hardware and personal preferences.[16]

Through its work, IDPF has addressed several of the challenges concerning the reading phase of the digital text cycle. The OEBPS format is the basis of several dedicated reading applications, such as Cybook, Rocket eBook, Electronic Book Technologies (ETI), and Microsoft Reader. These applications employ fairly high-quality screen typography, and are primarily targeted for intentional reading.

OEBPS is built on tested and stable open standards, incorporating XHTML, CSS, Dublin Core and Unicode, among other open standards. To be more specific, an OEBPS Publication consists of at least two types of files: one or more OEBPS Document files and an OEBPS Package file. The package may also include CSS-based style sheets and PNG/JPEG images. OEBPS Documents are XML documents containing the publication's textual content; the basic markup vocabulary is based on XHTML, but may be extended. The OEBPS Package file describes the organization of the publication, and includes publication metadata, a manifest of the resources making up the publication (documents, images, style sheets, etc.), navigational information, and a description of the order (the "spine") in which the documents comprising the main text should be rendered in reading systems.

However, an OEBPS Publication by itself cannot be easily distributed since it is not a single file; for distribution and end-user use it must be wrapped or encapsulated into a single file. When OEBPS 1.0 was released in 1999, such a "wrapper" format was also proposed.[17] Unfortunately, there was little interest in elevating this proposal to a recommendation, probably because it was seen by a few of the major players behind OEBPS as being competitive with proprietary end-user e-book formats being developed at the time. Since then, OEBPS has been officially described as an e-book "exchange" format, even though it may be used natively, when encapsulated, as an end-user e-book format.

Today, OEBPS is used in the digital publication industry for various purposes. The most well-known use of OEBPS is in the Microsoft Reader. The proprietary Microsoft Reader LIT format is a slightly modified and encapsulated OEBPS 1.0.1 Publication. Many regard Microsoft Reader as having the best typographical richness and styling flexibility of all e-book readers (although others decry the lack of end-user options to change typography other than font size, even though Microsoft Reader could allow greater end-user typographic control if it wanted). On the downside, and certainly of importance to digital libraries, is that the LIT format is proprietary, and at this time Microsoft Reader runs only on Microsoft operating systems (e.g., XP and Mobile). Microsoft distributes a Reader SDK, which allows anyone to build an OEBPS to LIT converter application. OverDrive's ReaderWorks is the most popular commercial application to produce LIT from OEBPS Publications and from HTML sources.

Since 1999 when the first version of OEBPS was released, other major technology companies, notably Adobe, Palm and Gemstar (which divested itself of its e-book division two years ago) decided to push their own proprietary solutions, and relegated OEBPS to a supporting role at best. For readers, retailers, publishers, and libraries, these decisions have been disastrous.

Today, as a result, there is a jungle of competing e-book formats. In 2006, one of the leading American online e-book retailers, Fictionwise, offers e-books in ten unsecured and five "secure" formats, all of which are proprietary formats connected to different reading applications, reading devices, and digital-rights-management systems.[18]

For digital libraries, this situation is correspondingly complicated. When presenting e-books or other lengthy texts, the libraries have to choose among a maze of proprietary reading formats (plus the open HTML), most of which are not optimized for intentional reading. Often they try to provide multiple formats to cover end-user needs. In the Etext Center at University of Virginia Library, e-books are distributed in three formats: HTML, LIT, and PDB (Palm).[19] An e-book report from the Oxford Text Archive is distributed in eight formats: HTML, PDF, XML (using TEI), RTF, TXT, PRC (Mobipocket), PDB, and LIT.[20] Needless to say, many regret IDPF's reluctance to promote a universal, high-quality, end-user format based on OEBPS.

Universal digital publication format requirements

In the area of digital books, we can safely assume that most readers want a simple system that allows them to purchase or lend e-books from any retailer, publisher, or library they like and then be able to read the books on whatever device, operating system, or reading software they choose. To accomplish this, the publishing industry needs a universal digital publication format that meets the important needs of both publishers and end users, and thus, by extension, digital libraries.

Lee, Guttenberg, and McCrary present requirements for a universal end-user standard:[21]

"Interoperability: The eBook industry (...) should be able to exchange eBooks independent of software and hardware."
"Extensibility:An eBook standard should be able to be extended to include new functionalities such as multimedia and user interaction"
"Applicability: An eBook format should be easily applicable to various kinds of related fields such as database system and wireless Internet."
"Openness:An eBook standard should be independent of a particular vendor. That is, it must be an open standard that is freely accessible."

Noring adds three requirements based on well-recognized publisher and end-user requirements:[22]

Typographical Richness: "The format must have adequate internal structural resolution and presentation richness to allow very high typographic quality presentation".
Adaptability: "The format must allow end-users some latitude of control over the presentation parameters for personal needs and reading preferences, such as font size and other typographic settings. (...) A corollary of this requirement is that the format must be fully reflowable (...) in response to differing presentation hardware and end-user settings."
International: "The format must be capable of representing any language and glyph set in use today. The format is not universal unless it is truly international."

Publishers and end users are the critical stakeholders in the digital publishing ecology, and unless their needs are met, a format cannot be universally accepted.

While the authors do not mention digital libraries specifically, the listed requirements also apply to reading formats applicable to digital libraries, especially in libraries distributing lengthy digital texts and e-books. Both articles point to XML, and conclude that a distribution format encapsulating an OEBPS Publication (or similar type of XML-based document framework) will meet the requirements, provided the distribution format itself is an open standard.

One such open format is currently in development: the OpenReader format.

The OpenReader Format

The vision of the OpenReader Consortium is to establish a universal, open-standard digital-publication format, and to encourage the development of multiple OpenReader user agents (reading software) for both intentional and functional reading of lengthy texts such as e-books, periodicals and newspapers, journal articles, business documents, and other types of digital publications. The consortium's universal end-user format will be a single, portable, compressed archive file containing one or more publications that adhere to the OpenReader Publication Framework. The framework is essentially an improved version of OEBPS; it will support Web mode in addition to the standard OEBPS mode, thereby allowing it to be used for highly non-linear publications such as Web sites.

The OpenReader Consortium also plans to establish an open-source application-code base to allow others to build OpenReader user agents, both open source and commercial, for a wide range of hardware and operating systems. A "reference implementation" user agent, developed by the OpenReader Consortium itself, is under consideration.

An important goal for the OpenReader System (format + user agent) is that it should be capable of high quality typographical presentation, which is possible considering its support for the full CSS 2.1 specification and a rich, extensible mark-up vocabulary. (In the future, OpenReader plans to support TEI and other well-established publication-oriented markup vocabularies.) OpenReader will encourage user agents to present publications in a page-by-page view rather than the typical scrolling fashion of Web browsers. although a scrolling view can be an end-user option. OpenReader will also encourage giving the end user substantial choice in changing typographic parameters from the default publisher-supplied CSS settings. This would give readers control over issues such as font size and type, margins, leading, and character spacing.

The OpenReader System will be designed to facilitate intentional reading. However, user agents will be able to include features associated with functional reading such as annotation, bookmarking, highlighting, copy-and-paste, searching, dictionary access, and network integration, and will be encouraged to do so. Firthermore, the OpenReader framework will facilitate powerful inter- and intra-publication deep linking into and between OpenReader Publications, including worldwide across networks, through its associated URI/IRI addressing scheme. OpenReader, following the OEBPS paradigm, supports non-linear content, and will make it easy for user agents to present non-linear content — notes, sidebars, and other "amplificatory" content — in ways not possible in print, such as display in separate popup windows, known in OpenReader terminology as "booklets."

The strength of the OpenReader vision is its openness, its reliance on repurposeable text-based standards (e.g., XML and CSS), and its commitment to encourage multiple, and even competing, user agents. This is important to digital libraries, not only for short-term access to content, but also for long-term preservation to survive greatly evolving technology.

One of the open-source reading applications now under development that may be used to render OpenReader Publications is OpenBerg. OpenBerg is an open-source project building a reader capable of rendering OEBPS Publications. Any user agent capable of rendering OEBPS should readily adapt to the OpenReader framework, since they are quite similar. OpenBerg uses the Mozilla codebase, and its first goal is to design an add-in application that will enable users to read OEBPS Publications in Firefox. OpenBerg will, in the short term, use ZIP as a wrapper format, but will probably switch to the OpenReader wrapper format once it is finalized, providing that it is consistent with that currently under development by IDPF.

On the commercial side, OSoft is planning to upgrade its reading system to support OpenReader as its primary format.[23] The Osoft system, which is primarily used for functional reading, will be renamed "dotReader" and will incorporate reading modes more conducive to intentional reading. The dotReader is quite powerful in its feature set of annotating, bookmarking, searching, and highlighting of digital publications, and is now used for a wide range of computer, high technology and educational materials. The tentative planned release for the OpenReader-compliant dotReader is August 2006. The codebase will be open source.

Like any new initiative that has not yet been implemented on a wide-scale, there is uncertainty. OpenReader is working hard to secure wide support from publishers large and small, librarians and archivists, document conversion houses, high technology companies, academia, and other stakeholders in the e-book and digital publication. Despite the uncertainty, OpenReader is a serious contender to resolve many of the problems that digital libraries currently face with e-books and other digital publications.

Potential Use of OpenReader in Digital Libraries

From the point of view of users, the idea of a universal reading format is undoubtedly appealing.[24] Even though there is no guarantee that OpenReader will succeed in its goals, it is worthwhile to explore, using several specific examples, both how and where OpenReader might benefit digital libraries.

Critical Text Editions

The Norwegian Newspaper Corpus contains the editorial content of six major online newspapers (currently 350 million words). They are gathered, automatically tagged in XML, and stored in an ever-growing database.[25] The occurrences of new words are registered and linked to their source Web pages. The database is primarily accessed by lexicographers and linguists in statistical research, who use Web browsers to study, for example, collocations and the influence of foreign words on the Norwegian language. Such statistical uses of the text corpus are at the far functional end of the reading continuum, essentially being database-oriented. For them, OpenReader is irrelevant.

The Medieval Nordic Text Archive (MENOTA) in Bergen, Norway[26] is a project to encode, using TEI, every single word of old Nordic manuscripts, and do so in three distinct versions: text facsimiles, diplomatic transcriptions, and normalized texts. Web browsers will be used to interact with the collection, and for online screen presentation the project has developed several new digital characters for which they seek Unicode recognition. Users will include linguists - who study word formations, grammatical construction, and the evolution of language - as well as historians, archaeologists, and literati familiar with Old Norse. The facsimile and diplomatic versions are definitely not for intentional reading. Users of the normalized versions could perhaps benefit from dedicated readers, but considering the overall purpose of the MENOTA corpus, OpenReader is of minor relevance, although a few users may find OpenReader versions of the normalized text desirable.

Another type of text edition is based on images: digitized photographs and scans of manuscripts, letters, official documents, newspapers, and books. The source documents may be handwritten, typed, or printed. For example, the U.S. Library of Congress's American Memory collection provides many examples of photographed manuscripts including the Abraham Lincoln papers.[27] Many libraries collect digitized photographic reproductions of medieval and ancient manuscripts (for example, the Bodleian Library, University of Oxford).[28] Such collections are obviously not for intentional reading; historians, literary scholars, linguists, and medievalists use them for research. Usually Web browsers are used to directly access the document images, but in some cases the images are encapsulated in Adobe PDF or DjVu. OpenReader confers no special advantage in the case of image collections.

However, whenever text in critical editions is captured in digital text form and properly marked up to reflect document structure, OpenReader will be quite relevant. For example, a group of French researchers is currently publishing digital text versions of the 19^th century Lyon newspaper L'Écho de la Fabrique.[29] The weekly newspaper was issued for a few years in an important revolutionary period starting in 1832, and each issue of the paper is now being republished in a critical digital text version. The digital text of L'Écho de la Fabrique can be read online, and the digitized images of the original newspaper pages are encapsulated in DjVu. Each digitized text edition of the newspaper could be converted to the OpenReader format, with hypertext links in the appropriate places to the original page scans. Doing so would allow both intentional and functional reading.

The Wittgenstein Archives at the University of Bergen is a long-term project to digitize more than 20,000 pages of unpublished manuscripts[30] of the Austrian/British philosopher Ludwig Wittgenstein, who published only one tiny book in his lifetime ("Tractatus Logico-Philosophicus"). This "Nachlass" has been encoded in its entirety using the MECS markup vocabulary. The University of Oxford Press has published the whole corpus on CD-ROM.[31] The University of Bergen has produced the "Bergen Electronic Edition" that allows subscribers to study Wittgenstein's writings in three versions: photographic facsimile reproductions of the manuscripts, diplomatic transcriptions (http://www.oup.co.uk/academic/humanities/philosophy/wittgenstein/details/), and normalized text versions. When working with a document in one version, the user can readily access the other versions. The manuscripts are thoroughly indexed and the whole corpus is fully searchable. Such an extensive use of a text corpus in its various forms is outside the scope and purpose of OpenReader. However, the Wittgenstein Archives is discussing future uses of the corpus. In the past, several edited printed editions have been produced from the Nachlass, presenting the fragmented writings of Wittgenstein in a fashion more suited for intentional reading. OpenReader versions of these editions would certainly be useful for intentional reading and even some types of functional reading.

Collections of classic literature

A number of institutions have digitized the complete writings of famous authors, such as Henrik Ibsen,[32] an influential Norwegian playwright. In the Henrik Ibsen's Writings Project - the first editions of Ibsen's plays and poems - his letters a large collection of professional commentaries, and other writings are all marked up in digital text form using TEI. This critical text edition will be published both in digital form and in print. The printed book edition will comprise a total of 30 volumes: 15 volumes of text, and 15 volumes of commentaries. The digital format of this collection has yet to be determined. OpenReader versions would be intriguing since the OpenReader format will be able to properly represent, and user agents display, the complex document structures used in many of Ibsen's writings.

The Ibsen Project wants its digital collection to be useful for both research and intentional reading. Scholars and students must be able to compare works and to follow themes and ideas throughout the entire Ibsen corpus. Since the OpenReader format is being specifically designed to enable powerful inter- and intra-publication linking and related functionality, it is suitable for this purpose. OSoft's dotReader is a good candidate format, as it already provides powerful linking, annotation and text searching tools for an entire library of digital books, and it will soon support OpenReader as its primary format.

For many other collections of writer's works, such as the Oxford Text Archive and the Electronic Text Center at the University of Virginia Library, OpenReader is likewise a viable and intriguing format. These libraries hold freely accessible collections of classic literature in various e-book formats. The collections are for teachers, students, scholars, and the general public. The texts are mastered in XML (TEI), and converted into usable e-book formats using various tools (including XSLT). Since these texts are meant for intentional reading and certain forms of functional reading associated with study and learning, OpenReader is an intriguing and advantageous e-book format.

Scholarly and scientific archives

Scholarly and scientific information is published in digital form by online (Web-formatted) journals and in electronic document versions of printed articles and books. The information is available in both subject-specific and more general institutional archives.[33] BioMed Central[34] is an example of the first, and the California Digital Library an example of the second.[35] Today, HTML and PDF are the predominant text-based formats used in these archives. PDF is excellent for distribution of content intended primarily for print (it exactly reproduces the printed page), but is not ideal for screen reading. HTML is more flexible in presentation, but lacks some of the important features necessary for intentional reading.

Scholarly articles are typically read both intentionally and functionally. For research purposes, what is needed in addition to readability is the ability to do robust searching, add notes, insert bookmarks, highlight text, and, when allowed, copy portions of the text for pasting into other documents. The OpenReader format is suitable for all of these purposes provided, of course, as the user agent has such functionality built in. As noted previously, OSoft's dotReader, which will soon support OpenReader, has already implemented these functional reading features. The openness of the OpenReader is also a huge advantage for the scholarly access.

Conclusion

The minimal use of e-books in digital libraries has been disappointing. We cite one of the main reasons to be the current e-book formats — many of which have their roots in traditional print publishing — which are unsuitable for flexible, high-quality screen presentation needed for digital library use, particularly for intentional reading. In addition, the very existence of multiple and mutually incompatible formats, many of which are specific to particular hardware and OS platforms, is itself an inhibitor. Finally, none of the existing formats meets all the well-recognized requirements (especially the needs of publishers and end-users) that a format must meet to become universal.

Research indicates that the open standard, XML-based Open eBook Publication Structure framework (OEBPS), developed in 1999, is an excellent starting point for developing such a universal, open standard, e-book and digital publication format.

The OpenReader Consortium is currently developing a universal, end-user digital publication format based on the OEBPS framework model. The OpenReader framework employs an updated and improved version of OEBPS. In addition, because of features in the original OEBPS framework plus the OpenReader improvements, the OpenReader format will enable powerful features and functions very useful to both intentional and functional reading in the digital library environment, including for certain research purposes. At least two organizations, one commercial (OSoft) and the other open source (OpenBerg), are now building OpenReader-capable user agents, with more likely to follow.

Thus, for digital libraries, a standardized end-user format with the necessary functionality, such as OpenReader, will be good news, easing the burden to provide multiple — and non-optimum — formats, as well as providing useful features to enhance the use of the texts.

The suitability of using OpenReader or similar format by digital libraries will differ. Some digital libraries are simply text databases used primarily for statistical analysis, while other digital libraries just collect manuscript images. Neither of these types of digital libraries has a need for OpenReader or similar format.

However, where the collection is already in a digital text form, and access is through recognized publications such as a collection of literary works, OpenReader is intriguing, not only for high-quality and flexible electronic presentation which allows intentional reading on a wide range of platforms, but also to enable powerful functional reading features such as annotating, bookmarking, highlighting, searching, and inter-linking between works.

Acknowledgment

We want to thank participants in the AURORA program "Digital Publishing and Reading," especially Claire Bélisle, ingénieure de recherché CNRS, who suggested to use the concepts of "intentional" and "functional reading," and Alois Pichler, leader of The Wittgenstein Archive, for fruitful discussions and comments on earlier drafts of the article.

Jon E. Noring is a long-time e-book technologist, advocate, standards developer, and publisher. He is a cofounder and the interim executive director of the OpenReader Consortium. In the last few years he has co-founded, advised to, and served as a director or trustee for several companies and non-profit organizations in the digital-content arena. Since 1999 Dr. Noring has been an invited expert in the development, authoring, and maintenance of all versions of the Open eBook Publication Structure (OEBPS), and has served in a leadership capacity, including acting vice chair, in the associated working group. Dr. Noring holds a Ph.D. (1981, University of Minnesota) in mechanical engineering with supporting work in chemical engineering and computer modeling. He was a staff scientist or engineer for three U.S. Department of Energy National Labs, primarily researching alternate energy technologies such as solar, biomass, and hydrogen.

Terje Hillesund is an Associate Professor at the University of Stavanger, Norway. He has a Masters Degree in Sociology from the University of Oslo and a Ph.D. in Media and Communication Theory from the University of Bergen. His main research interests have been in printed media - newspapers, journals, and books - and their transformation in the age of computers and the Internet. From 2000 to 2002 he led a project for Arts Council Norway in which a research team studied the potential impact of e-books on the Norwegian book industry. The results were presented in three reports in 2002, and since then Dr. Hillesund has published several scholarly articles on subjects such as e-books, digital reading, XML, digital text editions, and open access.

NOTES

Berglund, Y., Morrison A., Wilson R. and M. Wynne. (2004) "An Investigation into Free eBooks." ADHS literature, languages and linguistics/JISC/The Oxford Text Archive. http://www.ahds.ac.uk/litlangling/ebooks/
International Digital Publishing Forum. http://www.idpf.org/
Thompson, J. B. (2005) Books in the Digital Age: The Transformation of Academic and Higher Education Publishing in Britain and the United States. Cambridge: Polity Press.
OpenReader Consortium http://www.openreader.org/
SourceForge project: OpenBerg. http://sourceforge.net/projects/openberg/
Coleman, A and T. Sumner. (2004) "Digital Libraries and User Needs: Negotiating the Future." Journal of Digital Information, Volume 5, Issue 3. http://jodi.tamu.edu/Articles/v05/i03/editorial/
Arms, W.Y. (2005) "A Viewpoint Analysis of the Digital Library" D-LibMagazine, Volume 11, Issue 7/8. http://www.dlib.org/dlib/july05/arms/07arms.html
Malama, C., Landoni M. and R. Wilson (2005) "What Readers Want: A Study of E-fiction Usability" D-Lib Magazine.http://www.dlib.org/dlib/may05/wilson/05wilson.html
Kellner, D.M. (2002) "Technological Revolution, Multiple Literacies, and the Restructuring of Education" in Silicon Literacies: Communication, Innovation and Education in the Electronic Age." Florence, KY, USA: Routledge.
Marshall, C.C. and C.Ruotolo. (2002) "Reading in the Small: A Study of Reading on Small Form Factor Devices." Proceedings of the 2nd ACM/IEEE-CS Joint Conference on Digital Libraries. Waycott, J. and A. Kukulska-Hulme. (2003) "Students' Experiences with PDAs for Reading Course Materials." Personal and Ubiquitious Computing7 (1). Berglund, Y., Morrison A., Wilson R. and M. Wynne (2004).
Hill, B. (2001) The Magic of Reading. Redmond, WA: Microsoft Corporation. http://www.microsoft.com/reader/includes/TheMagicofReading.lit (to view this text requires Microsoft Reader, which is available at http://www.microsoft.com/reader/downloads/pc.asp)
Concepts introduced by Claire Bélisle at the Bergen Workshop 26 - 27 May of the AURORA program "Digital publishing and reading: Challenges and processes in critical editions and reading activities on digital medium." http://gandalf.aksis.uib.no/info/AURORA%20Programme.pdf
Sellan, A.J. and R.H.R. Harper. (2002) The myth of the paperless office. Cambridge, Massachusetts: MIT Press.
Hillesund, T. (2005) "Digital Text Cycles: From Medieval Manuscripts to Modern Markup." Journal of Digital Information, Volume 6, Issue 1. http://jodi.tamu.edu/Articles/v06/i01/Hillesund/
OEBPS 1.2 http://www.idpf.org/oebps/oebps1.2/index.htm
For a presentation and discussion of OEBPS, see Chapter 11 "Electronic Books & the Open eBook Publication Structure," A. Renear and D. Salo in Kasdorf, W.E. (2003) The Columbia Guide to Digital Publishing. New York: Columbia University Press.
Open eBook File Format 1.0 Draft Version 001, November 5, 1999 http://web.archive.org/web/20000926004335/www.nuvomedia.com/oebff/OEBFile1DRAFT001.htm
Fictionwise http://www.fictionwise.com/
Etext Center at the University of Virginia Library http://etext.lib.virginia.edu/
The Oxford Text Archive http://ota.ahds.ac.uk/ebooks/
Lee, K., N. Guttenberg and V. McCrary. (2002) "Standardization aspects of eBook content formats." Computer Standards and Interfaces, Vol. 24, No. 3. [doi: 10.1016/S0920-5489(02)00032-6]
Noring, J. (2003) "OEBPS: The Universal Consumer eBook Format?" First published on eBookWeb (an online ebook journal now offline: http://www.openreader.org/OEBPS-UCF.html
OSoft http://www.osoft.com/store/
Arms, W.Y. (2005) "A Viewpoint Analysis of the Digital Library." D-LibMagazine, Volume 11, Issue 7/8. http://www.dlib.org/dlib/july05/arms/07arms.html
The Norwegian Newspaper Corpus http://avis.uib.no/leksikon.page
Medieval Nordic Text Archive at the University of Bergen http://gandalf.aksis.uib.no/menota/
The Library of Congress American Memory http://memory.loc.gov/ammem/index.html
Bodleian Library, University of Oxford http://www.bodley.ox.ac.uk/dept/scwmss/wmss/medieval/browse.htm
"L'Écho de la Fabrique" http://echo-fabrique.ens-lsh.fr/
The Wittgenstein Archives at the University of Bergen http://gandalf.aksis.uib.no/wab/
Oxford University Press: Wittgenstein's Nachlass: The Bergen Electronic Edition http://www.oup.co.uk/academic/humanities/philosophy/wittgenstein/
Henrik Ibsen's Writings http://www.ibsen.uio.no/his/hjemmeside/english.html
Westrienen, G. van and C. A. Lynch. (2005) "Academic Institutional Repositories: Deployment Status in 13 Nations as of Mid 2005." D-Lib Magazine, Volume 11, Issue 9. http://www.dlib.org/dlib/september05/westrienen/09westrienen.html
BioMed Central http://www.biomedcentral.com/
California Digital Library http://www.cdlib.org/

Top of page

the journal of electronic publishing

Digital Libraries and the Need for a Universal Digital Publication Format

ABSTRACT

Digital reading

Comments on Text Formats

Universal digital publication format requirements

The OpenReader Format

Potential Use of OpenReader in Digital Libraries

Critical Text Editions

Collections of classic literature

Scholarly and scientific archives

Conclusion

Acknowledgment

NOTES