Help Topics 
Searching
  - help for beginners
  - searching tips
  - about early modern spelling
  - choosing a search type
  - using simple search
  - searching regions
  - using boolean search
  - using proximity search
  - using citation search
  - using word index
  - using sgml tags >
Interpreting search results
Viewing a text
Viewing search history
Using the bookbag
FAQ

Using SGML tags

Advanced users can enter SGML tags into the search box to locate particular structures within texts.

Why would I want to search with SGML tags?

Using SGML tags allows you to customize your search in ways that the drop-down menus on the interface do not.  The search screen drop-down menus are made to suit the queries that users most commonly want to construct.  These menu options get their returns by reading SGML tagging, but there are plenty of tags in the texts that are not picked up when these options are selected.  If you know how to enter these tags into the search box, you can structure queries that meet more particular research needs.

How to use SGML tags in searching

SGML tags can be entered in any search box to locate occurrences of a particular label in the encoded text.  This sort of searching can be helpful for finding particular features of a text, helping to single out structures like epigraphs, block quotations, and lists.  It can also allow you to explore, for example, how many quotations are attributed to Horace in the database or in a bookbagged group of texts.

You can use SGML tags as query terms on any TCP search screen.  SGML tags can be combined with other non-tag terms for Boolean searching. 

Be sure to leave pointed brackets open and include an asterisk when searching with SGML tags.  Otherwise, the interface will not know how to interpret your query.  Because the database's display software suppresses the tags to make TCP returns easier to read, the tags will not be highlighted in red in the results page.  In fact, they will not appear at all, but the structures will indeed be present, and you can either see them in the text layout or rely on highlighted, non-SGML query terms to locate the information you were looking for.

To learn more about the SGML tags used in TCP encoding, please consult the Keyboarding Instructions and the project DTD.

Block Quotations: <Q*, <BIBL*

These have been marked with a <Q> tag, which is not associated with ordinary, shorter quotations.

If the source of a quotation is given, the name will be labeled with <BIBL>, which is used to denote information attached to a quotation. 

Typeface Changes:  <HI*

Every change from the predominant typeface is marked with the <HI> tag; the nature of the shift (e.g., from bold to italic) is not noted. 

Early modern printers often shifted fonts in order to emphasize place names and the names of people, so searching with the <HI> tag may serve as a useful shortcut for locating this sort of information.

Illustrations:  <FIGURE*, <HEAD*

TCP texts note the presence of illustrations with a <FIGURE> tag.  These illustrations have not been categorized (as maps or portraits, etc), because the TCP interface Boolean search option provides such categorization.

The captions for figures, however, have been recorded and can be searched.  They are labeled with the <HEAD> tag.  To search for captions mentioning Queen Elizabeth, then, you could perform a Boolean search for "<HEAD*" AND "<FIGURE*" AND "eliza*"

Arguments:  <ARGUMENT*

These summary paragraphs often appear directly under chapter headings and make mention of the different subjects treated in the section to follow.  They may be worded like a Table of Contents, but they are formatted as paragraphs, often with dashes separating their components.

Epigraphs:  <EPIGRAPH*

Generally serving as introductory comments on what is to come rather than previews of a section's contents, these epigraphs often take the form of quotations from famous authors. 

If the material in an epigraph is indeed a quotation, it will be labeled with the <Q> tag.  If the material is in verse, the lines with also be marked with the <L> tag.

Abbreviations:  <ABBR*

There has been no attempt to expand abbreviations, but they have been marked wherever they are recognized as such.

Where Latin abbreviations are noted, they are marked with their own particular codes.  Click here to view these codes along with the characters early printers used to signal these abbreviations.

Tables and Lists:

Tables and lists have been noted in the encoding, and their parts can be searched using the following tags

Tables:  <TABLE*, <ROW*, <CELL*

Lists:  <LIST*, <ITEM*

Other Encoding for Printed Characters:

The presence of ampersands, all symbols of the zodiac, and paragraph symbols that are present in the source text have been noted in the encoded version.

Tags Common to Particular Text Types:

Poetry

All poems are marked as such, and each line of poetry is marked with an <L> and each stanza marked as a line group <LG>.  You can choose "Line" or "Stanza" from the interface drop-down menu to focus your search, or you can enter these tags if doing so better suits your needs.

Instances where space issues have made a printer place the concluding words of a line next to the previous line cannot be searched – keyboarders have been instructed to "patch" the two parts of the line together.

Drama

The following features are marked and could be used to narrow searches:

<STAGE*       Stage directions

<SPEAKER*  Speaker's names (which can often be abbreviated)

<SP*               Speeches (which can appear without speaker names)

Letters

Marked as such when varying from the predominant form of the text.  Specific parts within them can be searched as well.

<SALUTE*    Salutations

<CLOSER*     Closing of the letter

Other Helpful SGML Tags to Know about:

General Divisions

The "section and work titles" search scans the divisions generated by the Table of Contents command.  These can also be searched by entering the tags that mark the divisions.  For example, you can enter:

<TEXT*         marks the start of every text in the database

<DIV*             marks the different divisions of the texts and is followed by a number -- DIV1, for example, would be assigned to Book 1, Book 2, and Book 3 of the Faerie Queene, and the chapters within it would be classified as DIV2

<PB*               marks each page break

Markers denoting absences in the encoding:

<UNCLEAR* marks places where text cannot be read because of blurred print or damage to the original

<GAP*           denotes a place in the text where material could not be encoded -- this can signal the presence of:

                                    Non-Roman alphabets (the individual characters have not been recorded)  

                                    Musical notation

                                    Missing pages

Related topics:

Searching regions