Add to bookbag
Author: Maximilian Kalus
Title: Semantic Networks and Historical Knowledge Management: Introducing New Methods of Computer-based Research
Publication Info: Ann Arbor, MI: MPublishing, University of Michigan Library
December 2007
Availability:

This work is protected by copyright and may be linked to without seeking permission. Permission must be received for subsequent distribution in print or electronically. Please contact mpub-help@umich.edu for more information.

Source: Semantic Networks and Historical Knowledge Management: Introducing New Methods of Computer-based Research
Maximilian Kalus


vol. 10, no. 3, December 2007
Article Type: Article
URL: http://hdl.handle.net/2027/spo.3310410.0010.301

Semantic Networks and Historical Knowledge Management: Introducing New Methods of Computer-based Research

Maximilian Kalus

Chair of Economic and Social History
Faculty of Economics
Friedrich-Schiller-Universität Jena
07737 Jena, Germany
+49 3641 94 33 24

Abstract: Historical, semantic networks are a computer-based method for working with historical data. Objects (e.g., people, places, events) can be entered into a database and connected to each other relationally. Both qualitative and quantitative research could profit from such an approach. Moreover, data can easily be shared among researchers. histcross is a project in progress that implements historical semantic networks.

Keywords: historical semantic database network knowledge management histcross prosopographics sociographics

Introduction

Arguably, the most difficult element of historical research is the management of information. Historians have to cope with both abundant and scarce sources and information contained in both types tends to fragmentary and often random. Modern history is lucky enough to have statistical data readily available, but before 1800 data become much more difficult. Before 1800 qualitative research (i.e., traditional history) becomes much more important while quantitative methods are often unavailable due to the lack of coherent data. In fact it seems that quantitative and qualitative research do not mix well at all-many historians consider themselves as followers of one faction, distrusting the other.

Computer scientists have long been convinced that computers are quite apt in doing calculations and thus can solve most statistical problems. In recent years the really hard nut to crack has been the evaluation of qualitative data. Interestingly, neither "traditional" historians nor "quantitative" ones have realized the potential of these developments in computer science yet (or, if they have, the results where comparatively modest-at least from a computer scientist's point of view). For instance, examples of qualitative computer science research that might be of interest to historians include natural language processing (or computer linguistics, e.g., semantic recognition of single sentences or Bayesian networks to analyse whole texts [1]), structured documents (e.g. the Semantic Web [2]), or knowledge management. This last example is where historical knowledge comes into play.

Simplifying and structuring qualitatively complex knowledge, quantify it in some way, and making it reusable and easily accessible are all desiterata that are not new to historians. Notably, computer science is currently approaching a solution to some of these problems of working with historical data. This paper gives a brief introduction to historical knowledge management. It presents a way to enhance qualitative research by introducing semantic networks. These also allow for qualitative knowledge to be eventually transformed into quantitative data (a procedure introduced here as a topic for future research). A work in progress called histcross is an example of what such a semantic networking database can look like.

Traditional Methods of Historical Knowledge Management

The term knowledge management (KM) refers to a wide range of concepts, such as corporate memories and instincts, expert systems, document managing systems, and learning organizations [3]. In a more general sense-and this is how the term will be used in this paper-it refers to methods to identify and capture knowledge in general, and to make knowledge assets available for transfer and reuse. In short, one wants to transform unstructured text into structured data. Technical systems that implement knowledge management are called knowledge management systems and the knowledge assets they use are contained within a knowledge base. Numerous other labels have been attached to these concepts-this paper will use only the aforementioned terms in order to minimize confusion for the uninitiated reader.

On an abstract level KM is nothing new. In fact, written matter in natural language (that is text) represents a very traditional method of knowledge management having been in use for several thousands of years. Following the definition given above, knowledge is "captured" on paper (or parchment, papyrus or clay, for that matter) and made available for reuse. Various levels of structuring provide a more or less easy access to such knowledge. A book might contain chapters, page numbers, a list of contents and an index: All these facilitate accessing the knowledge. It is easy to see, though, that knowledge assets contained in books are relatively costly to search. Usually a person looking for specific information has to browse through a book and often has to search several books. And even then there still is a high chance of missing out on important pieces. To that effect books and other written matter are not necessarily the most efficient method to store knowledge.

Another traditional example is knowledge contained within (paper) files. Compared to books, the knowledge is much more structured, say alphabetically or by date. This greatly facilitates searches. In fact, with the advent of computers this method has inspired a large number of methods to hold structured information on IT systems. Digital file systems (which adopted the terms "folder" and "file") and many databases (using structured sets of data) are two such examples.

Historians have been using databases for quite some time, with two types of special interest: prosopographical databases and social networking models.

Prosopography [4] identifies individuals and their biographies within a group. A prominent example is The Prosopography of the Later Roman Empire [5] which alphabetically lists every person attested to have lived in the Roman world between AD 260 and 641. Attached to each name is data on that person (e.g., occupation, title, and years of birth and death). Another example is Augsburger Eliten [6] which uses highly structured data sets to describe the economic elite of the city of Augsburg in the 16th century. It makes use of cross-referencing to other data sets in order to describe family (siblings, marriage relationships, etc.) and economic (debts, participation in companies, etc.) relationships which makes it mush easier to grasp the complex interlacing between members of such a class. Prosopographic databases comply with KM systems in the strictest sense: They clearly define data records that require specific information (the knowledge base). Unfortunately, they are generally quite stringent and use a predefined data model. There are times when data are too rich or too complicated to be modelled in such a system. An example might be that a dataset might provide ten "slots" for children of a specific person. What if a person actually had twelve children? One would either provide for additional slots (thus expanding the model which would be costly, error-prone and possibly waste memory) or live with such a restriction (e.g., dropping children above ten, something clearly not desirable). This, among other facts (i.e., the rigid data structure is simply outdated), is the reason why prosopographical base model will not be used in histcross, although the database is quite able to act as prosopographical knowledge base as well.

Social networks are another way to store information. Compared to prosopographical databases they possess both advantages and disadvantages. On the positive side, social networks are similar to the semantic networks described below. More precisely, semantic networks could be considered a superset of social networks. Social networks are well-established in the scientific community, therefore standard software such as UCINET or Pajek [7] is readily available and has been thoroughly tested on historical data. [8] Figure 1, created with Pajek, shows family relationships in Florence in the 15th century. [9] The figure shows the prominent positions of the Strozzi and Medici in a way easy to understand. This is one the main advantages of social network analysis software: It visualizes complicated correlations. Moreover, statistical methods can be applied to the model to find core nodes (Strozzi and Medici in Figure 1), isolated ones (like the Pucci), or helps indentify groups within the system (e.g., the closely related faction of the Strozzi, Peruzzi, Castallan, and Bischeri). On the negative side, the software can only be employed in a relatively narrow field. Data sets have to be quite complete in order to work correctly. This can pose a problem with fragmentary data which is common in historical research. Additionally, social network theory and analysis is mostly used on quantitative data and when only a few specific variables have to be defined. In the case of Figure 1 all kinds of blood and marriage relationships are mapped to a single variable "relationship". There is a certain danger in oversimplifying complex real-world facts to simple lines even when assigning weight measurement to edges and nodes (which could be visualized using different sizes or colour). As such, graphs can only show "refined" (as opposed to "raw") data which makes it difficult for researchers to re-use data acquired by colleagues. Still social network analysis is an extremely useful tool to visualize and simplify complicated patterns.

[figure]
[Figure 1: Florentine family relationships in Pajek.]

Both prosopographical and social data sets possess important properties, but each is too narrow to comply with the needs of working with data on a broader base. Several relatively recent developments have made possible the merging of the strengths of both concepts.

Theoretical Background: Frames and Semantic Networks

In the wake of artificial intelligence research, scientists soon were required to cope with the question of how a mind retains information and how it would be possible to represent such information in an artificial system. Several structural proposals were made, but one received special attention and still plays a role today. In 1974, Marvin Minsky [10] proposed a new theory based on memory structures called frames. In essence, frames represent pieces of information that contain attributes called slots or fillers. As an example, a frame could represent a specific human being. In this case slots could be first and last names, birth date, hair color, etc. Frames do not necessarily model specific objects but can also stand for prototypical and deducted knowledge. A generic frame city (as depicted in Figure 2) could contain prototypical slots like walled = yes, bishop's seat = no and number of households = unknown. A bishop's city frame could inherit that information, but change the prototypical bishop's seat to yes. Finally, a specific city might have walled = no and number of households = around 5,000. Frames are machine-usable formalizations of concepts or schemata. Frame theory has close ties to object-oriented programming, which could be regarded as a specific implementation of frame theory. (Historically, they developed side by side.)

[figure]
[Figure 2: An example of frame-based knowledge.]

Frames are a useful way to represent knowledge. In addition, they are quite easy for human beings to understand since they do indeed model some of our own memory patterns. The histcross database has partly been inspired by frame theory.

Alas, Minsky's frame concept is monolithic-every piece of information related to a certain object is stored within a frame. Although this is operationally easy to implement it is somewhat limiting. It can be difficult to extend frames without creating contradictions. In the example above, adding an additional slot "administration" with the default value of "city council" would create incorrect knowledge for all cities that were not ruled by a city council. In recent years, a more flexible approach has grown in prominence in computer science: semantic networks. In a formal way, a semantic network is a directed graph. This means it can be represented by vertices (nodes, points) and edges (lines). A vertex represents a word, phrase, concept, or location. Each line connects two vertices, representing a semantic connection between both. A vertex could be "London" connected to other nodes "England" and "walled city" to represent the idea that London is a walled city in England. Since such semantic networks are graphs, all the concepts and lemmas of graph theory [11] can be applied. Thus, its depth (longest distance within a graph), shortest path (between two nodes, representing "closeness"), the degree of a node (connections to other nodes representing its "centrality" within the network) can be used to explore semantic networks. Structurally, semantic networks differ from frame theory in how information is stored. Frames store slots and other information within the frame, while in semantic networks these are represented by semantic relations between nodes. Figure 3 is an example of a semantic network of natural language representing the sentence, "The 1755 earthquake of Lisbon destroyed the records of the Casa da índia". Node c is the center node of the sentence-a verb of the type"to destroy" with the time modification PAST. This verb has an agent o1 of the type earthquake with the location "Lisbon" and the temporal tag "1755". The affected object 2 represents a non-specified set of records possessed by the object "Casa da índia". (It should be noted that the historical networks mentioned below are much easier than MultiNet which tries to model the whole complexity of natural language.)

[figure]
[Figure 3: A semantic network in MultiNet [12] syntax.]

Based on linguistic theories, semantic networks try to meet both mind structure and natural language modelling requirements. They are generic concepts to describe knowledge which can be analysed both qualitatively, by browsing the net and interpreting nodes and edges, and quantitatively, by giving nodes weight numbers, by counting relations, or by applying statistical algorithms or rule-based inferences (meaning deductive rules like "If I know that node x and y are related I can deduct that x and z are related, too"). In fact, social networks (see above) are just a highly specialized form of semantic networks. Consequently, semantic networks can describe formal and informal information in a very generic (and still operationally useful) way.

Both concepts can be merged: If frames are regarded as nodes in a network, frames can be integrated into a semantic network. histcross is an implementation of this.

Before describing histcross, it should be noted that there are other possibilities to manage knowledge. One such concept is XML [13], an extremely versatile concept that has become quite popular since its official recommendation in 1998. In short, XML is a language to generically describe any type of data in an easy and both human and machine readable way. It can be used to model formal data, natural language and meta-information. In connection with KM it has attracted attention in combination with structured document theory which provides a means to separate content (data) from layout. Without going into further details, XML has tremendous possibilities but one grievous drawback: XML is slow [14]. Because of this, XML is generally not used to store live data, but used only in data exchange and description (pre-rendering) scenarios, both of which are not performance-critical. Thus, the role of XML in KM is relatively small and histcross will not use XML to store information.

Introducing Historical Semantic Networks: histcross

[figure]
[Figure 4: Screenshot of histcross.]

Historic Crossroads or histcross is the name of a database that combines a frame-based model with a semantic networking one in order to create and maintain historical knowledge (see Figure 4 above). histcross follows three principles:

  • Simplicity: Complexity slows down the person entering or changing data. To speed up working with the database and to keep the learning curve low, the database has to be comparatively simple.
  • Generic data: The data model of histcross has to be flexible. Users can define their own types and create their own inference rules.
  • Accessibility: histcross is web-based. Individual implementations can be accessed world-wide without the need of installing extra software (except for a web browser).
  • Complying to graph theory, there are two principal data structures: nodes and relations. Nodes - called objects in histcross - represent historical events, places, people, goods, groups, concepts and the like. Each object has a number of fields that may contain data:
  • Type and class: Types and classes help to classify objects. Each object is member of one type. Examples for types are person, city, village, event, ship or trading good. Each type is member of a class: Cities and villages are both types of the class location.
  • Title: This is a label given to the object, a person's name for example. Alternative spellings would generally go into the comment section.
  • Comment: A text of any length that describes the object. Usually, natural language information, alternative spellings of the title, parts of excerpts are kept here. In short, "Comment" is a catch-all field for information.
  • Start and stop date: Naturally, dates are a central theme in history. A number of ontologies [15] exist to logically describe time and time intervals. histcross uses the common interval model. There is a start date b(t) and a stop date e(t). A point in time is described as a time interval where b(t) = e(t). Both entries are optional. The user can just enter a start date or a stop date or none at all. Moreover, there are several granularity options. First, the user can just enter the date as year, as year-month, or as year-month-day (which is the smallest unit of time histcross can handle). Secondly, a granularity option allows the settings exact, circa, or unknown/unsure. Although the system does not (yet) interpret this additional qualifier, it forms an easy way for the user to see the reliability of the information. The last option is the calendar setting. At the moment, histcross can handle the Julian and Gregorian calendars (the later one being called "automatic" because it automatically switches the calendar system on October 14th, 1582. The use of Julian makes sense if one tries to compare English and Spanish dates in the 17th century, for example). Internally, histcross keeps dates in the Julian Day Number format (this is the number of days having elapsed since Monday, January 1st, 4713 BC in the proleptic Julian calendar).
  • Icon: Each object may have one icon attached from the icon database which shows up near the label. This makes it easier to visually recognize an object.
  • Bibliographic entries: A list of bibliographic entries (from the bibliographic database of histcross) can be attached to each item.

Obviously, the data model is relatively simple. Yet, it is possible to gradually add data to the database during research. An example object filled with data can be seen in Figure 5.

[figure]
[Figure 5: histcross object example.]

The second structural data element of histcross is the connection between objects. Relations, as they are called in the database, can each connect two objects in a semantic way. The basic data are similar to that of objects with the following exceptions:

  • No titles: Relations do not possess titles. Rather they are defined by their type only. This could be something like is mother of, is located in, etc.
  • From-object and to-object: These specify the two objects that are connected to the relation. It should be noted that the relation is directed to form relations like A–is mother of–B. All connections are bi-directional.

All the other fields are exactly the same. A relation can contain a start and a stop date, a comment and bibliographic entries. Because of this, the actual difference in handling objects or relations is not that large.

A typical (but simplified) semantic historical network described by histcross can be seen in Figure 6. The shape of the nodes depicts the type and the class as shown in the legend. The example of Octavian Secundus Fugger shows the use of different date granularities: His birth date is unknown, while the date of his death is. Likewise, it is not known when exactly the Fuggerische Erben Company started its operation in Goa, but estimates point to around 1587. This is depicted by the question mark after the start date of the relation Fuggerische Erben company–operating branch in–Goa.

[figure]
[Figure 6: Historical semantic network example.]

Before some implementation details of histcross are introduced, it should be stated that the database still is a project in development and that there are many more possible additions not yet implemented, including external references (links to other databases in the web), automated plotting of networks, sociological or geographical analysis [16], or the quantification of relations (adding numbers and measurement units to relations).

Operations on the Knowledge Base

After this short introduction of histcross, what is the advantage of using a historical semantic database? The main benefit surely is the possibility of accumulating knowledge in a structured way during research. For this purpose, semantic networks are not the only method, but histcross offers a standardized tool which can be used by individuals or by a group of researchers. Moreover, it can be searched fast (there is a full text search as well as the possibility to search for similar words). The data compiled in the knowledge base can be made public (on the Internet) or at least available for certain people (in an Intranet). Last but not least, it is closer to the sources than a scientific paper by possibly offering the raw data. This approach saves time and research resources and is a further step to networking within the scientific community itself.

In short, histcross enables the user to easily create and manage historical knowledge bases step-by-step. To return to the introduction of this paper, the user can actually enter qualitative data in a standardized form. He can thus make this data quantifiable (in the future). Additionally, besides the full-text search, the database boasts a query editor to create complex searches in the database. For example, histcross can answer questions like, "Show me all the Italian merchants that traded with pepper in India after 154". Naturally, in order to get meaningful answers to such a question, the data have to contain objects like India, pepper, Italian and the merchant objects that are related to all those objects. However, this small example shows the opportunity of semantic systems. The user is not confined to full-text searches, but has the possibility to undertake complex semantic queries.

This semantic quality of histcross has many implications. Quantification of data has already been mentioned. Connected to this are certain types of operations such as counting and measuring the "geography" of the network, and analysing "central" and "peripheral" nodes in the network by extracting certain subsets from the knowledge base. Consequently, sociographical analyses become possible. One of the more powerful features will be elaborated below: the possibility of adding automated rules to create new information.

In order to simulate artificial intelligence it is necessary that a system can somehow deduce new information from existing knowledge. This is generally done by implementing so-called inference rules. histcross uses a relatively simple inference engine, which can nevertheless handle the largest part of the inference requirements that might be wanted in historical research. Specifically, the application uses several forms of deductive chains that approximate predicate logic. For example, if we know that Goa is in India and India is in Asia, we can assume that Goa also is in Asia. Logically, we could write is_in(Goa, India) ^ is_in(India, Asia) ==> is_in(Goa, Asia), or - as a general rule: P1(x, y) ^ P2(y, z) ==> P3(x, z). histcross can also implement variants of this formula, e.g. P1(x, y) ^ P2(z, y) ==> P3(x, z). P1, P2 and P3 are not necessarily the same predicates, but will be semantically close in most cases. Some examples:

  • is_mother_of(x,y) ^ is_father_of(z,y) ==> is_husband_of(z,x)
  • is_father_of(x,y) ^ is_father_of(x,z) ==> is_sibling_of(y,z)
  • is_citizen_of(x,y) ^ has_confession(y,z) ==> has_confession(x,z)

It has to be stressed that these inference rules have to be optional implications rather than mandatory ones. The father of a child is not necessarily the husband of his or her mother (in case of an illegitimate child), a citizen of a city can follow another confession than the city's official one. Because of this, whenever a user creates or changes relations, a list of possible inferences are presented to the user to decide. As such, inference rules speed up data acquisition and support finding new inferences. This becomes very handy when entering genealogical data, for example. Instead of manually connecting all the family members with father-child, mother-child, sibling, uncle, aunt, etc. relations only one such relation has to be entered and the other one can be deducted from knowledge already acquired from the database. Sometimes this even might uncover relations having yet to receive attention, as the author discovered during his work with histcross.

Usability or the Field Test

There always exists the danger with new technology to get carried away by it without considering any real advantage. Is the application of a semantic database worth the effort of entering all the data into the system? In the course of almost two years the author has entered approximately 1400 objects connected by 6400 relations during his research on the European–Asian pepper and spice trade in the 16th century. Often, the work has been tedious and time consuming. Sources have to be read and it has to be decided what should be entered into the database. The process of entering data can take quite some time, especially when lots of new objects have to be created at once and connected to each other (like a group of persons connected to each other). The time consumption can be reduced somewhat by creating inference rules.

The main advantage of a semantic database reveals itself with time. Almost 1000 of the 1400 objects in the author's database are persons in the complex trade network of the 16th century spice trade. Persons involved are related to each other by blood, marriage, nationality or long-standing business relationships. In order to understand the underlying patterns and structures of this trade, it is helpful to reconstruct the network. The preliminary analysis supports the hypothesis that the European economy in the 16th century was controlled by a relatively small class of merchant-bankers. Some very distinct groups with strong connections within and few connections outside can be empirically identified, e.g. the Genoese, the Upper Germans, or the New Christians. The data have been accumulated from various sources: archival records, books, articles, data tables, the Internet. Data on persons are often quite narrow in such sources since biographies seldom exist and often leave out important aspects. [17] A person might first appear as an apprentice working for a major company in Venice. A decade later he might reappear in Lisbon as a merchant working on his own account doing business with rivals of his former employer but in a different sector (e.g., precious stones instead of spices). Biographies like this are common which makes it hard to keep track of all persons involved in the trade - a semantic database makes such a task much easier as it is possible to gradually add information uncovered on different persons. During the work new and interesting connections can be discovered by thoroughly accumulating data that, until now, have been too diverse to be grasped fully. This is especially true for merchants working as factors or agents for the big companies like the Affaitati, Fugger, or Ximenes.

The contemporary spelling of names in the 16th century is quite arbitrary and it can become inscrutable when translated into different languages: Filippo Sassetti might be written as Philippo Saseti, Filipo Sasseti, or Phillippo Sasetti. The German merchant Ulrich Ehinger was called Rodrigo Ynha by the Portuguese and Utz by his friends. Some New Christians used different names synonymously: Lopo Rodrigues d'Évora and Ruy Lopes d'Évora e Veiga are actually the same persons! histcross uses the Sphinx  [18] search engine to search for similar words in order to make finding similar spellings more easy (although spelling variants such as Ehinger and Ynha still have to be added manually).

In sum, there are at least two practical uses for the database. First, it makes gradual compilation of data from diverse sources easier. Secondly, the database can be searched quickly and even makes it possible to find names spelled in a different manner. All the aforementioned advantages make histcross quite intriguing for researchers who have to cope with similar problems.

Conclusion

Up until now, semantic databases have received little attention in historical research (their main application being language and other artificial intelligence analysis). The reasons for this are two-fold: Attributable to their somewhat different perception of time, historians trust traditional data storage matter (paper) more than modern ones. (The problems of archiving electronic data are obvious.) Moreover, most historians (mostly as computer amateurs) only possess a vague idea of the possibilities of computing. On the other hand, computer scientists have little concept of the qualitative problems historians have to cope with - unlike linguistics, history lacks a well-established field of computers and history  [19], in which these concepts could be advanced systematically. Also, semantic concepts in computing are still relatively new and are chiefly used within a comparatively close-knit community of knowledge management specialists (often scientific or commercial ones).

histcross as a historical semantic database has the potential to breach an additional barrier between history and computing, using relatively recent developments in information technology. Knowledge management combined with historical research is a matter worthy of future exploration.

In the future, much more might be explored by way of theories and implementation aspects. Prosopographics could form a base for historical semantic databases. Databases like histcross could be used in historical sociology (like the analysis of elites or clienteles), historical geography, reconstruction of source material, analysis of communication networks or that of other networks (like this researcher's own project that attempts to reconstruct merchant networks in the European–Indian spice trade of the 16th century). Naturally, a semantic database is not the ultimate tool to work with historical data, but it can support research work tremendously by automating certain research aspects and supporting easy search and addition of data. To conclude, semantic networking achieves something that does sound odd at first glance: It does offer a possibility to merge qualitative and quantitative aspects of historical data analysis and interpretation.

Notes

1. Examples for both cases: Consider the sentence "Joe hits the cow with the bell" which, even for human beings, requires considerable background knowledge to understand the role of bell. This is the field of natural language processing. (See Helbig 2006 and Gabrilovic 2006.) Bayesian networks are based on probabilistic inferences and are used, for example, in e-mail spam filters. (See Pearl 2000.)

2. W3C (a). Berners-Lee (1999).

3. Benjamins et al. (1999, 687).

4. Lawrence 1971. http://www.le.ac.uk/ee/pot/lichfield/lichwillsfram.html provides a very simple example of such a database.

5. Jones et al. (1971–92).

6. Reinhard (1996).

7. Analytic Technologies. Batagelj et al.

8. A good example is Häberlein (1998) which uses social network analysis based on UCINET to show family and commercial relationships around the Weyer family in Augsburg in the 16th century.

9. Data taken from http://vlado.fmf.uni-lj.si/pub/networks/data/Ucinet/UciData.htm#padgett.

10. Minsky (1975).

11. More on graph theory can be seen in Bondy/Murty (1976) and Diestel (2005).

12. Helbig (2006).

13. For some examples for historical research supported by XML see Schaefer (2004) and Spaeth (2004). For more on XML see W3C (b), W3C (c).

14. Essentially, XML data is "flat file" data that have to be parsed into usable data sets first. For data sets containing n entries this means that searching for one entry in the database the computer has to consider an average of n/2 entries. The same search in a relational database would take an average of log n lookups. In a database of 3000 entries the XML set would average 1500 lookups to be found an relational database would take around 4. Moreover relational databases try to keep their indexes in main memory while an XML file probably would be stored on hard disk - which means another speedup factor of 1000.

15. van Benthem (1991).

16. In the latest build of histcross it is possible to enter coordinates for each object and show its location on the globe.

17. One example: Filippo Sassetti who is generally described as ‘Florentine humanist interested in Sanskrit'. His actual business in India was pepper and spice trade for the Milanese merchant Giovanni Battista Rovelasca something few biographies bother to mention in detail.

18. Aksyonoff.

19. Meaning that there are very few chairs of "history and computer science" compared to "computer linguistics" which is very well established in linguistics.

Bibliography

Aksyonoff A.: Sphinx Search Engine. http://www.sphinxsearch.com/.

Analytic Technologies: UCINET 6 Social Network Analysis Software. http://www.analytictech.com/ucinet/ucinet.htm.

Benjamins, V. R., et al. 1999. (KA)2: Building ontologies for the Internet: A mid term report. International Journal of Human-Computer Studies 51: 687–712.

van Benthem, J. 1991: The Logic of Time: A Model-Theoretic Investigation into the Varieties of Temporal Ontology and Temporal Discourse. Dordrecht: Kluwer.

Berners-Lee, T. 1999: Weaving the Web. The Past, Present and Future of the World Wide Web by its Inventor. London: Orion Business.

Bondy, J. A., and U. S. R. Murty. 1976. Graph Theory with Applications, London: Macmillan. http://www.ecp6.jussieu.fr/pageperso/bondy/books/gtwa/gtwa.html.

Diestel, R. 2005: Graph Theory. Heidelberg: Springer. Graduate Texts in Mathematics, 173. http://www.math.uni-hamburg.de/home/diestel/books/graph.theory/.

Gabrilovic, E. 2006: Resources for Text, Speech and Language Processing, http://www.cs.technion.ac.il/~gabr/resources/books.html.

Häberlein, M. 1998: Brûder, Freunde und Betrûger. Soziale Beziehungen, Normen und Konflikte in der Augsburger Kaufmannschaft um die Mitte des 16. Jahrhunderts. Colloquia Augsustana, 9, Berlin: Akademie-Verlag.

Helbig, H. 2006: Knowledge Representation and the Semantics of Natural Language. Berlin: Springer.

Jones A. H. M., et al. 1971–92: The Prosopography of the Later Roman Empire. 3 vols. Cambridge: Cambridge University Press.

Minsky, M. A. 1975: A Framework for Representing Knowledge. In The Psychology of Computer Vision, edited by P. Winston, 211–277. New York: McGraw-Hill (See also: MIT-AI Laboratory Memo 306, June, 1974, http://web.media.mit.edu/~minsky/papers/Frames/frames.html).

Batagelj, V., et al.: Networks/Pajek. Program for Large Network Analysis. http://vlado.fmf.uni-lj.si/pub/networks/pajek/

Pearl, J. 2000: Causality: Models, Reasoning, and Inference, Cambridge: Cambridge University Press. http://bayes.cs.ucla.edu/BOOK-2K/index.html.

Reinhard, W., et al. 1996: Augsburger Eliten des 16. Jahrhunderts. Prosopographie wirtschaftlicher und politischer Fûhrungsgruppen 1500–1620. Berlin: Akademie-Verlag.

Schaefer, M. 2004: Design and Implementation of a Proposed Standard for Digital Storage and Internet-based Retrieval of Data from the Tithe Survey of England and Wales. Historical Methods 37/2:61–72.

Spaeth, D. A. 2004: Representing Text as Data: The Analysis of Historical Sources in XML. Historical Methods 37/2:73–86.

Lawrence, S. 1971: Prosopography, Daedalus 100/1: 46–71.

W3C (a): Semantic Net Activity. http://www.w3.org/2001/sw/.

W3C (b): Extensible Markup Language (XML). http://www.w3.org/XML/.

W3C (c): XML in 10 points. http://www.w3.org/XML/1999/XML-in-10-points.html.en.