Early modern spelling and TCP searching
The original spellings and letter choices present in TCP texts
have been retained in the database. Since spelling was not regularized
in this early period of printing, the keyword term you are searching
for may appear in multiple forms. The more of these spellings you
use while searching, the more returns you will get. This takes some getting
used to, and it takes some creativity, but knowing a few simple
habits of early modern typesetters can help you increase the numbers
of returns you get.
Early modern spelling and typesetting habits to keep in mind:
The letter e often appears on the end of words where you
might not expect it. For example, regard may well appear
as regarde. Truncating your search with an asterisk, looking
for regard*, will ensure that you get returns that include
the e, as well as forms like regarding.
The letters u and v are often interchangeable. As
a result, if you are searching for the word slave, you might
also look for slaue. The Boolean search screen can be very
helpful in constructing these searches. Unfortunately, the system
does not support wildcard searching, so you cannot use sla*e
in a simple search to find more hits.
w often appears as vv. A search for wonder, then,
will be more productive if done from the Boolean
search screen, entering wonder in the first box, selecting "or"
from the drop-down menu, and then entering vvonder in the second
box. You can also search for wonder* and vvonder* if you
would like to pick up instances of wonderful or vvonderous.
The letter i often replaces j. You may want to search
for iealous as well as jealous for more complete results.
In many cases, TCP texts simplify characters for easier searching.
There are many early modern symbols that can be ignored in structuring
your search. For example:
ſ (long s) = s
= oe, and other ligatured or "joined" letters are
rendered separately
To see a list of spellings that might have been used for your search
term during the early modern era, try entering the term into the
Word Index. From there, you will be taken to a screen that shows
the spellings present in the database, listed in alphabetical order
with your word highlighted. From there, you can look for and select
for viewing other spellings that might be related.
While the TCP encoding standards have worked to make these
texts as accessible as possible, your search may well skip over
some occurrences of your search term for the following reasons:
Macrons
Early modern typesetters often purposely omitted letters from words
so that they would fit more easily on a line. These omissions are
often signaled with a horizontal bar over a character in the word
(such transcriptions are sometimes called macrons). For example,
convenient may appear as c_venient. TCP texts
transcribe such omissions using tildes, as in co~venient,
rather than attempt to expand these abbreviations, which are often
hard to make out. A search for convenient will not turn up this
spelling.
Truncations can be helpful in working around macrons, though that
may not always be so when the line stroke appears at the beginning
of a word. In cases where thoroughness really matters, you may want
to query the database using the tilde in the spelling of your search
term. A search for co~venient will indeed work, hitting this variant
spelling.
Abbreviations
Some TCP works use abbreviations borrowed from Latin texts. Words
with common Latin roots are sometimes spelled using special symbols
in place of ordinary letters. For example, perform can occasionally
be spelled as form.
These abbreviations are labeled with SGML tags but not spelled out
in the TCP encoded texts. As explained in using sgml tags, you may search
for the presences of these tags if you wish.
Word Division
In many texts, space considerations drove early modern typesetters
to break words in two at the end of one line to continue it on the
next. Sometimes, these typesetters used hyphens, as we do today.
When that happens, a pipe symbol (|) is inserted at the break, and
you will see this mark on your screen. If the early modern typesetter
omitted the hyphen, simply placing hap on one line and py
on the next, it will be encoded as hap+py, and the plus sign
will be visible.
The search engine will find words that are interrupted by the pipe
symbol and the plus sign. For example, a search for happy
will pick up hap|py, as well as hap+py.
Related topics:
Search tips
Searching regions
|