Add to bookbag
Author: Gunner Lind
Title: A Toolbox for Historical Computing
Publication Info: Ann Arbor, MI: MPublishing, University of Michigan Library
August 1999
Availability:

This work is protected by copyright and may be linked to without seeking permission. Permission must be received for subsequent distribution in print or electronically. Please contact mpub-help@umich.edu for more information.

Source: A Toolbox for Historical Computing
Gunner Lind


vol. 2, no. 2, August 1999
Article Type: Work in Progress
URL: http://hdl.handle.net/2027/spo.3310410.0002.206

A Toolbox for Historical Computing

Gunner Lind

Department of History
University of Copenhagen

The value of machine-readable historical data has been recognized from the beginning of historical computing. Data are conserved and made accessible by many people and in many ways, ranging from the well-established historical sections of some social science data archives to small, individual websites. "Somewhere on the World Wide Web" is not quite the same as "at your fingertips", as we all know, but with some perseverance you will be able to locate a very large fraction of all machine-readable historical data in existence. They may be directly available for queries and/or downloading, or you may find the up-to-date contact information you need.

Historical computing is more than data, however. Over the years lots of thought has been spent on methods for working with historical data. It is easy for the experts to underestimate the need for channels for the dissemination of these methods. For obvious reasons historical computing is a field dominated by novices, and will continue to be so in any foreseeable future. Every year there will be new graduate students wanting to work with prosopography and needing a proven database structure which can be adapted to their needs. This is a task which has been solved many times. But this triviality in the eyes of the expert will not make the task any easier if you are a young historian with little computing experience. And even if you are counted among the experts you may need access to tools and methods developed by others. Years of experience with the analysis of historical accounts will not be very helpful if you are asked for advice on family reconstitution by a student or a colleague.

Most of these methods have been developed in the context of a historical project. So they have been directed at specific data. They are embodied in specific programming languages, perhaps even as macros and other internal programming of applications. Still, many of them are generally valuable for a large number of people working with data of a comparable nature, at least if they are described in a way accentuating what is general rather than what is specific.

It is not impossible to find descriptions of methods used in historical computing, but it is difficult. They must be located in old issues of Historical Methods or Historische Sozialforschung, or dug out from Appendix X of a monograph. When they are found, they are often less detailed and less technically explicit than the user may wish, because of the space and readability constraints imposed by printing. In short, we need an online toolbox which can provide both readable texts and other files like the archives and finders for freeware and shareware programs.

In March 1999 I put out an invitation on the AHC-L list to join me in a virtual working group with the purpose to make an actual design for the toolbox. Three persons joined: Juan Grigera (Argentina), Daniel Pfeifer (USA) and Paul Craven (Canada). Since then we have discussed a number of issues. This article present the toolbox project as it stands now. A number of questions have been clarified, but it is still work in progress. All contributions to the project will be welcome at the list address ahc@biophnet.unlp.edu.ar.

What would you like to find in the toolbox?

Probably everything of general value for historical computing, except data. These objects can be roughly divided into three categories:

  1. Software used for solving problems in historical computing, such as normalization of names, record linkage, conversion of dates, analysis of accounts...
  2. Descriptions of data structures designed for complex historical data, such as reconstituted families, life histories, document types ...
  3. Supportive material, such as tables for the normalization of names, conversion of measures and currencies...

Files or links?

It is possible to make the objects accessible in two ways. An object can be stored as a file at the toolbox location, or the toolbox may just provide the URL linking to the place where it can be found. The first approach means one-stop shopping, no broken links, no objects disappearing from the public view forever. The second approach, however, means no trouble with file handling, no difficulties keeping track of different versions, and potentially richer information for the user, because an URL can provide a link to a website with much richer contents than the usual README and Help files. A small, simple and stable object will be best served with the first approach, a large ongoing project with the second. Perhaps the wiser choice will be to provide both possibilities, and this can be implemented in a simple way:

  1. The toolbox catalog guides the user to the URL.
  2. The URL may point to a file at the same site as the catalog - a file warehouse at the toolbox site - or it may point to somewhere else in the world.

In this way bits and pieces with no home of their own can be placed physically at the toolbox site, and objects with their own home on the web can stay where they are. This solution places the decision in the hands of those who want to upload items for the toolbox. They get the choice between uploading the URL for a site they control, or uploading the actual file(s) for storage at the toolbox site. But they know more than anybody else about the nature of the object they want to store in the toolbox and should be able to make a good choice.

Describing the contents

The content providers must also provide the basic information for the catalog. A good catalog will consist of a number of indexes, describing both the history and the computing side of the objects in the toolbox. Some indexes will be straightforward, such as an index of authors and an index of object names. An index of historical periods may also be useful, even if not all objects can be assigned to a period.

A type classification is more difficult to make. So far we have identified three different flavors of "type":

  1. Technical type: Is this a binary program, an algorithm, a database structure, a SGML DTD ... This index may have to be divided into several or (more probably) a hierarchical structure.
  2. The kind of sources which can be treated with this method. This may be general (like "texts") or more specific ("customs records"). Again, this means several indexes or a hierarchical index structure.
  3. The kind of problem you want to solve: Name studies, family reconstitution, voter behavior ...

A classical controversy in catalog making goes between those who hate and love the thesaurus approach. A thesaurus - in this context: a fixed list of classification terms - means clarity and ease of use for those who classify and those who search. But no thesaurus will fit any object, or any user's ideas about the natural terminology, no matter how carefully it is designed.

A combination of approaches is possible, however. Part of the description can be bound by to a thesaurus, part of it free. A practical implementation may be to solicit those who provide an object to describe it in two ways. First answer a few questions by picking the most appropriate class from a list. This will provide a thesaurus-bound basic classification within the three indexes outlined above. Second give a brief text description of the tools and their application which can provide a basis for free-text searching tools. Such a description will probably be needed anyhow as a part of lists for browsing and responses to queries.

Conditions of use

Until now we have not modified my original proposal for conditions of use: The contents of the toolbox shall be open for any noncommercial use, provided that the user

  1. notifies the provider
  2. declares that these materials have been used, citing the name of the provider in all appropriate places, i.e. in online and printed documentation of software, in the notes of scholarly works etc.

Contributions are accepted on the following conditions:

  1. That you have the copyright of what you contribute, or can substantiate that it is freely distributable.
  2. That you accept the conditions of use above.
  3. That you accept that the toolbox administrators have the right to refuse the contribution and to remove it from the toolbox.

The future

We need to flesh out some out the points which are presented in outline above. The thesaurus lists for description, for example. But the most difficult part will probably be the contents of the toolbox. We want everything of general value. So we want to locate many old contributions to the methodology of historical computing, and to avoid what is commonplace even when it is offered. And we want all contributions in a form making them as easy as possible to adapt and apply in the solving of new problems. It is not clear how these goals can be conveniently realized. The design of the toolbox is only the first step in any case. The future of the project will depend on the goodwill and labor of many people, most of all of the many who have something to contribute.