Economics and Usage of Digital Libraries: Byting the Bullet
Skip other details (including permanent urls, DOI, citation information) :This work is protected by copyright and may be linked to without seeking permission. Permission must be received for subsequent distribution in print or electronically. Please contact : [email protected] for more information.
For more information, read Michigan Publishing's access and usage policy.
13.3 The ReDIF metadata
From the material that we have covered in the previous section, we can draw a simple organizational model of RePEc as:
Many archives ⇒ One dataset ⇒ Many services
Let us turn from the organization of RePEc to its contents. RePEc is about more than the description of resources. It is probably best to say that RePEc is a relational database about economics as a discipline.
One possible interpretation of the term "discipline" is given by Karlsson and Krichel (1999). They have come up with a model of a discipline as consisting of four elements arranged in a table:
resource | collection |
person | institution |
A few words may help to understand that table. A "resource" is any output of academic activity: a research document, a dataset, a computer program, or anything else that an academic person would claim authorship for. A "collection" is a logical grouping of resources. For example, one collection might be comprised of all articles that have undergone the peer review process. A "person" is a physical person; a person may also be a corporate body acting as a physical person in the context of RePEc.
These data collectively form a relational database describing not only the papers, but also the authors who write them, the institutions where the authors work, and so on. All this data is encoded in the ReDIF metadata format, as illustrated in the following examples.
A closer look at the contents
To understand the basics of ReDIF it is best to start with an example. Here is a piece of ReDIF data at http://www.econ.surrey.ac.uk/discussion_papers/RePEC/sur/surrec/surrec9601.pdf:[2]
When we look at this record, the ReDIF data resembles a standard bibliographical format, with authors, title etc.. The only thing that appears a bit mysterious here is the "Author-Person" field. This field quotes a handle that is known to RePEc. This handle leads to a record maintained at a RePEc handle server.[3]
In this record, we have the handles of documents that the person has written. This record will allow user services to list the complete papers by a given author. This is obviously useful when we want to find papers that one particular author has written. It is also useful to have a central record of the person's contact details. This eliminates the need to update the relevant data elements on every document record. In fact the record on the paper template may be considered as the historical record that is valid at the time when the paper was written, but the address in the person template is the one that is currently valid.
In the person template, we find another RePEc identifier in the "Workplace-Institution" field. This points to a record that describes the institution, stored at another RePEc handle server.
This information in this record is self-explanatory. Less apparent is the origin of these records.
Institutional registration
The registration of institutions is accomplished through the Economics Departments, Institutions and Research Centers (EDIRC) project, compiled by Christian Zimmermann, an Associate Professor of Economics at Unversité du Québec à Montréal on his own account, as a public service to the economics profession. The initial intention was to compile a directory of all economics departments that have a web presence. Many departments that have a web presence now; about 5,000 of them are registered at the time of this writing. All these records are included in RePEc. For each institution, data on its homepage is available, as well as postal and telephone information. For some, there is even data on the main area of work. Thus it is possible to find a list of institutions where—for example—a lot of work in labor economics in being done. At the moment, EDIRC is mainly linked to the rest of the RePEc data through the HoPEc[4] personal registration service. Other links are possible, but are rarely used.
Personal registration
HoPEc has a different organization from EDIRC. It is impossible for a single academic to register all persons who are active in Economics. One possible approach would be to ask archives to register people who work at the related institution. This will make archive maintainers' work more complicated, but the overall maintenance effort will be smaller once all current authors are registered. However, authors move between archives, and many have work that appears in different archives. To date, there is no satisfactory way to deal with moving authors. For this reason, the author registration is carried out using a centralized system.
A person who is registered with HoPEc is identified by a string that is usually close to the person's name and by a date that is significant to the registrant. HoPEc suggests the birth date but any other date will do as long as the person can remember it. When registrants work with the service, they first supply such personal information as the name, the URL of the registrant's homepage, and the email address. Registrants are free to enter data about their academic interests—using the Journal of Economic Literature Classification Scheme—and the EDIRC handle of their primary affiliation.
When the registrant has entered this data, the second step is to create associations between the record of the registrant and the document data that is contained in RePEc. The most common association is the authorship of a paper; however, other associations are possible, for example the editorship of a series. The registration service then looks up the name of the registrant in the RePEc document database. The registrant can then decide which potential associations are relevant. Because authentication methods are weak, HoPEc relies on honesty.
There are several significant problems that a service like HoPEc faces. First, since there is no historical precedent for such a service, it is not easy to communicate the raison d'être of the service to a potential registrant. Some people think that they need to register in order to use RePEc services. While this delivers data about who is interested in using RePEc services—and to whom we have been unsucessful to communicate that these services are free—it clutters the database with records of limited usefulness. Last but by no means least, there are all kinds of privacy issues involved in the composition of such a dataset.
To summarize, HoPEc provides information about a person's identity, affiliation and research interests, and links these data with resource descriptions in RePEc. This allows the identification of a person and the maintainance of related metadata in a timely and cost-efficient way. These data could fruitfully be employed for other purposes, such as maintaining membership data for scholarly societies or lists of conference participants.