Humanities Text Initiative American Verse ProjectThe Humanities Text Initiative (HTI) is assembling an electronic archive of volumes of American poetry. Most of the archive is made up of 19th century poetry, although a few 18th century and early 20th century texts are included. The full text of each volume is being converted into digital form and coded in Standard Generalized Mark-up Language (SGML) using the TEI Guidelines.(1) The volumes already online, which include books of poetry by a number of African-American and women poets, represent an interesting selection. In many cases, the texts selected are the only existing editions of the author's work.
The University of Michigan Press is collaborating with HTI in an experimental venture to make these materials available to a wider audience over the Internet. The project has several purposes: first, it allows the Press to explore new ways of providing access to World-Wide Web documents. The HTI provides several levels of access to the American verse texts, and guidelines for use (2) are stated clearly at the beginning of each document. Individuals are allowed to use the texts freely, whether to create new editions, distribute to students, or use as a basis for multimedia products. Institutions such as universities, publishers, or online providers are required to seek permission from the Press and, in some cases, pay a fee, in order to use or distribute the texts.
A second goal of the project is to provide a service to scholars by advancing their ability to use Web documents in their work. Currently, the Internet does not have well-established mechanisms for authors seeking to integrate complete texts, or parts of texts, into their scholarship. The TEI Guidelines provide clearly defined ways of linking from one SGML document to portions of another; however, no one has yet set up a Web server to accept this sort of linking. The HTI proposes to explore this as part of the American Verse project. This will allow, for example, someone writing about Dickinson to embed links in his or her electronic text pointing the reader to various poems, stanzas, or lines from volumes that are part of the project without having to replicate the material within his or her own document as is currently the case. The evidence of scholarship would remain in this central archival server, rather than be replicated on a number of different scholars' machines.
The project is designed to reproduce already-published texts without any additional (i.e., modern) critical material. The project is also structured as an archive to which additional volumes of poetry will be added continually. The HTI is responsible for the production of coded texts; the Press's role includes developing guidelines for access and publicizing the archive through regular and subject catalogs, as well as over the Internet.
Selection Process: American Verse
In selecting verse for this pilot project, standard bibliographies, anthologies, and histories of American literature were consulted, including the 1993 Columbia History of American Poetry, Spiller's Literary History of the United States, Waggoner's American Poets from the Puritans to the Present, and Mattheissen's 1950 Oxford Book of American Verse. These were supplemented by specialized bibliographies of writing by American women and people of color.
American literary historians from Michigan's Department of English were consulted, and the list expanded to include poets of special interest. Contemporary scholars emphasized the extent of current scholarly interest in eighteenth and nineteenth century popular poetry, and poetry by women and African American poetry. As part of the project's work, a list of nearly 400 American authors of poetry was assembled.
Finally, a survey of books held by the University Library was made. Several hundred volumes were evaluated, and texts were selected for scanning based on their scholarly interest as well as physical properties (e.g., deterioration and "scanability").
Creation and EncodingThe American Verse texts are created with primary concern for their accuracy, with the assumption that access to reliable texts is most important, and that a well-encoded and reliable edition is a sound foundation for later editorial and analytical markup. The texts have been scanned using a variety of software packages, though primarily BSCAN (for batch scanning) and TypeReader (which seems best-suited to older texts). A careful proofing stage takes place using the OCR package's proofing capabilities. Subsequently, automated routines are used to perform the first layer of markup and to identify probable scanning errors. Careful markup occurs in the next stage, one involving manual intervention using either emacs and psgml, or Author/Editor, during which ambiguities unresolved or misinterpreted by the automated markup are resolved. A formatted, printed copy is then used to perform final proofing and revision. Markup is performed according to the Text Encoding Initiative (TEI) Guidelines. The TEI "lite" DTD has proven to be more than adequate for all work done to date; only minor deficiencies (e.g., where figures may be located) have been encountered with the TEI Guidelines. All images found in the original text are scanned and indicated in the encoded text. An image of the table of contents and the title page(s) and verso for each text is also included from the text selection menu. Of course full bibliographic information, including a local call number, is included in the TEIHeader.
(1) The TEI Guidelines for Electronic Text Encoding and Interchange represent a multi-year process by scholars in the humanities with an interest in computing. The Guidelines provide mechanisms for the representation of information using international standards, ensuring the long-term viability of the text and making possible a process of sharing and transfer of texts across networks and divergent computing platforms. The TEI Guidelines are also available online.