spobooks5621225.0001.001 in

    13.4 The open library

    This section attempts to find a general theory applicable to a wide set of circumstances in which systems similar to RePEc are desirable. I call this general concept the open library. The parallel to the open source concept is intentional. It is therefore useful to review the open source concept first.

    The open source concept

    There is no official and formal definition what the term, open source, means. On the Open Source Initative at http://opensource.org/ an elegant introduction to the idea is found:

    The basic idea behind open source is very simple. When programmers can read, redistribute, and modify the source code for a piece of software, the software evolves. People improve it, people adapt it, people fix bugs. And this can happen at a speed that, if one is used to the slow pace of conventional software development, seems astonishing.

    We in the open source community have learned that this rapid evolutionary process produces better software than the traditional closed model, in which only a very few programmers can see the source and everybody else must blindly use an opaque block of bits.

    Open source software imposes no restrictions on the distribution of the source code required to build a running version of the software. As long as users have no access to the source code, they may be able to use a running version of the software, but they can not change the way that the software behaves. The latter involves changing the source code and rebuilding the running version of the software from the source code. Since building the software out of the source code is quite straightforward, software that has a freely available source code is essentially free.

    Open Source and open library

    The open source movement claims that the building of software in an open, collaborative way—enabled by the sharing of the source code—allows software to be built better and faster. The open library concept is an attempt to apply the concept of the open source to a library setting. We start off with the RePEc experience.

    Within the confines of RePEc as a document collection, it is unrealistic to expect free distribution of a document's source code. Such a source code is, for example, the word processor file of an academic paper. If such a source code were available for others to change, then the ownership of the intellectual property in the document would be dissolved. Since intellectual ownership over scientific ideas is crucial in the academic reward system, it is unlikely that such source code distribution will take place. Within the confines of RePEc's institutional and personal collection, there is no such source code that could be freely shared.

    To apply the open source principle to RePEc we must conceptualize RePEc as a collection of data. In terms of the language adopted by the open source concept, the individual data record is the "source code". The way the data record is rendered in the user interface is the "software" as used by the end user. We can the define the open library as a collection of data records that has a few special properties.

    The definition of the open library

    An open library is a collection of data records that has the following characteristics:

    • Every record is identified by a unique handle. This requirement distinguishes the library from an archive. It allows for every record to be addressed in an unambiguous way. This is important if links between records are to be established.

    • The syntax in all records of field names and field values is homogeneous. This constraint causes the open library to appear like a database. If this requirement were not present, all public access pages on the Web would form an open library. Note that this requirement does not constrain the open library to contain a homogeneous record format.

    • The documentation of the record format is available for online public access. For example, a collection encoded in MARC format would not qualify as an open library because access to the documentation of MARC is restricted. Without this requirement the cost of acquiring the documentation would be an obstacle to participation.

    • The collection is accessible on a public access computer system. This is the precondition to allow for the construction of user services. Note that user services may not necessarily be open to public access.

    • Contributing to the collection is without monetary cost. There are of course non-monetary costs to contribute to the open library. However the general principle is that there is no need to pay for either contributing or using the library. The copyright status of data in an open library should be subject to further research.

    The open library and the Open Archive

    Stimulated by work of Van de Sompel, Krichel, Nelson, et al. (2000), there have been recent moves towards improving the interoperability of e-print archives such as arXiv.org, NCSTRL, and RePEc. This work is now called the Open Archive Initative, see http://www.OpenArchives.org . The basic business model proposed by the OAI is very close to that of the RePEc project. In particular, the open archive technical protocols allow data provision to be separated from data implementation, a key feature of the open library model as pioneered by RePEc since 1997. In addition, because of their ability to transport multiple data sets, the open archive protocols allow for several open libraries to be established on one physical system.

    The conceptual challenge raised by the open library

    The open library as defined in Subsection 13.4 may be a relatively obvious concept. It certainly is not an elaborate intellectual edifice. Nevertheless, the open library idea raises some interesting conceptual challenges.

    Supply of information. To me as a newcomer to the Library and Information Studies (LIS) discipline, there appears to be a tradition of emphasizing the behavior of the user who demands information rather than the publisher—I use the word here in its widest sense—who supplies it. I presume this orientation comes from the tradition that almost all bibliographic data were sold by commercial or not-for-profit vendors, just as the documents that they describe. Libraries then see their role as intermediaries between the commercial supply and the general public. In that scenario, libraries take the supply of documents and data as given.

    The open library proposes to build new supply chains for data. If all libraries contribute metadata—data about data—about objects that are local to them—what that means would have to be defined—then a large open library can be built.

    An open library will only be as good as the data that contributors give to it. It is therefore important that research be conducted on what data contributors are able to contribute; on how to provide documentation that the contributor can understand; and on understanding a contributor's motivation.

    Digital updatability. For a long time, libraries could only purchase material that is essentially static. It might decay physically, but the content is immutable. The advent of digital resources provoked a debate. Because they may be changed at any time, digital resources may be used for more than the preservation of ideas. Traditionally inclined libraries have demanded that digital resources be like non-digital resources in all but appearance, and view the mutability of digital data more as a threat than as an opportunity. The open library, however, is more concerned with digital updatability than preservation. Clearly, this transition from static to dynamic resources poses a major challenge to the LIS profession.

    Metadata quality control. In the case of a decentralized dataset, an important problem is to maintain metadata quality. Some elements of metadata quality cannot be controlled by a computer. For example, each record must utilize a structure of fields and values associated with these fields to be interoperable with other records. In some cases the field value only makes sense if it has a certain syntax. This is the case, for example, with an email address. One way to achieve quality control is through the use of relational metadata. Each record has an identifier. Records can use the identifiers of other records. It is then possible to update elements of the dataset in an independent way. It is also simple to check if the handle referenced in one record corresponds to a valid handle in the dataset. Highly controllable metadata systems are an important research concern related to the open library concept.