Economics and Usage of Digital Libraries: Byting the BulletSkip other details (including permanent urls, DOI, citation information)
This work is protected by copyright and may be linked to without seeking permission. Permission must be received for subsequent distribution in print or electronically. Please contact email@example.com for more information. :
For more information, read Michigan Publishing's access and usage policy.
The architecture of RePEc
RePEc can be understood as a decentralized academic publishing system for the economics discipline. RePEc allows researchers' departments and research institutes to participate in a decentralized archival scheme which makes information about the documents that they publish accessible via the Internet. Individual researchers may also openly contribute, but they are encouraged to use EconWPA.
Each contributor needs to maintain a separate collection of data using a set of standardized templates. Such a collection of templates is called an "archive". An archive operates on an anonymous ftp server or a Web server controlled by the archive provider. Each archive provider has total control over the contents of its archive. There is no need to transmit documents elsewhere. The archive provider retains the liberty to post revisions or to withdraw a document.
An example archive. Let us look at an example. The archive of the OECD is at http://web.archive.org/web/20010829193045/http://www.oecd.org/eco/RePEc/oed/. In that directory we find two files. The first is oedarch.rdf:
This file gives basic characteristics about the archive. It associates a handle with it, gives an email address for the maintainer, and most importantly, provides the URL where the archive is located. This archive file gives no indication about the contents of the archive. The contents list is in a second file, oedseri.rdf:
This file lists the content as a series of papers. It associates some provider and maintainer data with the series, and it associates a handle with the series. The format that both files follow is called ReDIF. It is a purpose-built metadata format. Appendix B discusses technical aspects of the ReDIF metadata format that is used by RePEc. See Krichel (2000) for the complete documentation of ReDIF.
The documents themselves are also described in ReDIF. The location of the paper description is found through appending the handle to the URL of the archive, i.e. at http://web.archive.org/web/20010627025821/www.oecd.org/eco/RePEc/oed/oecdec/. This directory contains ReDIF descriptions of documents. It may also contain the full text of documents. It is up to the archive to decide whether to store the full text of documents inside or outside the archive. If the document is available online—inside or outside the archive—a link may be provided to the place where the paper may be downloaded. Note that the document may not only be the full text of an academic paper, but it may also be an ancillary files, e.g. a dataset or a computer program.
Participation does not imply that the documents are freely available. Thus, a number of journals have also permitted their contents to be listed in RePEc. If the person's institution has made the requisite arrangements with publishers (e.g. JSTOR for back issues of Econometrica or Journal of Applied Econometrics), RePEc will contain links to directly access the documents.
Using the data on archives. One way to make use of the data would be to have a web page that lists all the available archives, and allow users to navigate the archives searching for documents of interest. However, that would be a primitive way to access the data. First, the data as shown in the ReDIF form is not itself hyperlinked. Second, there is no search facility nor filtering of contents.
Providing services that allow for convenient access is not a concern for the archives, but for user services. User services render the RePEc data in a form that make it convenient for a user. User services are operated by members of the RePEc community, libraries, research projects etc.. Each service has its own name. There is no "official" RePEc user service. A list of services in at the time of writing may be found in Appendix A.
User services are free to use RePEc data in whatever way they see fit, as long as they observe the copyright statement for RePEc. This statement places some constraints on the usage of RePEc data:
Within the constraints of that copyright statement, user services are free to provide all or any portion of the RePEc data. Individual user services may place further constraints on the data, such as quality or availability filters.
Because all RePEc services must be free, user services compete through quality rather than price. All RePEc archives benefit from simultaneous inclusion in all services. This leads to an efficient dissemination that a proprietary system can not afford.
Building user services. The provision of a user service usually starts with putting frequently updated copies of RePEc archives on a single computer system. This maintenance of a frequently updated copy of archives is called "mirroring". Everything contained in an archive may be mirrored. For example, if a document is in the archive, it may be mirrored. If the archive management does not wish the document to be mirrored, it can store it outside the archive. The advantage of this remote storage is that the archive maintainer will get a complete set of access logs to the file. The disadvantage is that every request for the file will have to be served from the local archive rather than from the RePEc site that the user is accessing.
An obvious way to organize the mirroring process overall would be to mirror the data of all archives to a central location. This central location would in turn be mirrored to the other RePEc sites. The founders of RePEc did not adopt that solution because it would be quite vulnerable to mistakes at the central site. Instead, each site installs the mirroring software and mirrors its own data. Not all sites adopt the same frequency of updating. Some may update daily, while some may only update weekly. A disadvantage of this system is that it is not known how long it takes for a new item to be propagated through the system.