All of the material included here was prepared by Prof. Larry Benson (ldb@wjh12.harvard.edu) as part of a larger Glossarial DataBase of Middle English. It was subsequently marked up according the TEI (P3) Guidelines using the analytical markup section. Considerable work on this version of the DataBase is still necessary, including linking in Prof. Benson's "dictionary" and adding a variety of searches.
A sample line is included below to illustrate the markup of the text.
<DIV0 TYPE="frag" N="Frag1"> <HEAD>Fragment I, CT<HEAD> <DIV1 TYPE="CT" N="GP"> <HEAD>General Prlogoue </HEAD> <L N="GP:1"> <W LEMMA="whan" ANA="ADVC">Whan</W> <W TYPE="gram" LEMMA="that" ANA="CONJ">that</W> <W LEMMA="april" ANA="NOUN">Aprill</W> <W LEMMA="with" ANA="PREP">with</W> <W TYPE="infl" LEMMA="his" ANA="P1GN">his</W> <W TYPE="infl" LEMMA="shour" ANA="NPL0">shoures</W> <W LEMMA="sote" ANA="ADJ0">soote</W> </L>
Some words are marked, using the TEI "lang" attribute, as LAT (Latin), FR (French), or GR (Greek). Remaining words
are indexed as English and show in tables as EN.
Words are sometimes marked (using the TEI "type" attribute) as
infl (inflected), gram (grammatical), or
infl+gram (a combination of both). Most words are not
marked for Form. Benson describes grammatical and his practice
of markup as:
Grammatical Words
A number of words are labelled "grammatical": e.g. al as an adjective is labelled: al@al_gram#adj. This departs from the usual practice of following the MED's head word (which in this case follows the # rather than the @); it was adopted to prevent syntactic searches for, say, adjective+noun from bringing up this+noun, swich+noun, etc. and to clarify syntactic patterns (@this_gram+@n rather than @adj+@n).
If this proves unsatisfactory it can easily be changed. Note that by no means are all the grammatical words in Chaucer so labelled. This practice was adopted only for those words that were numerous enough and used often enough as nouns and adjectives to interfere with searches for combinations of fully semantic words.
The following words are labelled "grammatical":
al ani another ech everi ilke mani no non other same som swich that these thilke this tho tother what which
Codes used for the ana attribute on each word are
listed below. In some cases, the prose description from Prof.
Benson is provided. In other cases, his code (e.g., adv#interj
is provided instead. Parenthetical numbers indicate the number of
occurrences in the Caterbury Tales for this characteristic.
A1NSABBRADJ1ADJ2ADJ3ADJ0ADJCADJEADJIADJNADJPADJSADJVADV1ADV2ADV0ADVCADVIADVJADVNADVPADVSAJ2CAJ1NAJ1SAJNPAJNSAJPLAV-JCNJ1CONJDEFAG1PLG2PLGER1GER2GER3GER0GERAGRPLIDFAIN12INTJINTNJVNCN1AJN2AJN5AJN1GNN2GNN3GNN5GNN1INN1NGN2NGN2NON1PGN1PLN2PLN3PLN4PLN6PLNADJNADVNAPLNGABNGENNINTNNONNOU1NOU2NOU3NOU4NOU6NOU8NOUNNPL0NPLGNPRGNPRPNSUPNUAJNUAVNUM0NUMANUMCNUMNNUMRP1FGP2FIP2FJP2FOP1GNP2GNP1INP1NMP2OJP2PLPR1JPARTPN1IPN4JPNAJPPABPPAJPPJNPPL0PRABPRAJPRAVPRCOPRCTPREPPRGAPRGNPRINPRN1PRN2PRN3PRN4PRN5PRN6PRN7PRN8PRNMPRNOPROJPRONPRP2PRPGPRPJPRPLPRPNPRPOV1IMV2IMV3IMV4IMV1INV2INV3INV4INV8INV1JPV2JPV1JRV2JRV1PAV2PAV3PAV4PAV1PEV1PLV2PLV3PLV4PLV1PNV1PPV2PPV3PPV4PPV1PRV2PRV3PRV4PRV1PTV2PTV3PTV4PTV1TNV1TRV12IV12LV12PV12RVAJPVAJRVAVRVERBVIM0VINFVPPAVPPLVPPNVPPRVPR0VPRAVPRLVPRNVPRPVPT0VPTLVPTNVTLNVTPR