spobooks5621225.0001.001 in

    Appendix B: The ReDIF metadata format

    The ReDIF metadata format is inspired by Deutsch et al. (1994) commonly known as the IAFA templates. In particular, it borrows the idea of clusters from the draft:

    There are certain classes of data elements, such as contact information, which occur every time an individual, group or organization needs to be described. Such data as names, telephone numbers, postal and email addresses etc. fall into this category. To avoid repeating these common elements explicitly in every template below, we define "clusters" which can then be referred to in a shorthand manner in the actual template definitions.

    ReDIF takes a slightly different approach to clusters. A cluster is a group of fields that jointly describe a repeatable attribute of the resource. This is best understood by an example. A paper may have several authors. For each author we may have several fields of interested: name, email address, homepage, etc.. If we have several authors then we have several such groups of attributes. In addition, each author may be affiliated with several institutions. Here each institution may be described by several attributes for its name, homepage etc.. Thus, a nested data structure is required. It is evident that this requirement is best served in a syntax that explicitly allows for it, such as XML. However when ReDIF was designed in 1997, XML was not available. While the template syntax is more humanly readable and easier to understand, the computer can not find which attributes correspond to the same cluster unless some ordering is introduced. Therefore we proceed as follows. For each group of arguments that make up a cluster, we specify one attribute as the "key" attribute. Whenever the key attribute appears a new cluster is supposed to begin. For example, if the cluster describes a person then the name is the key. If an "author-email" appears without an "author-name" preceding it, the parsing software aborts the processing of the template.

    Note that the designation of key attributes is not a feature of ReDIF. It is a feature of the template syntax of ReDIF. It is only the syntax that makes nesting more involved. I do not think that this is an important shortcoming. I believe that the nested structure involving the persons and organizations should not be included in the document templates. What should be done instead is to separate the personal information out of the document templates into separate person templates. This approach is discussed extensively in the main body of the paper.

    ReDIF is a metadata format that comes with tools to make it easy to use in a framework where the metadata is harvested. A file that is simply harvested from a computer system could contain any type of digital content. Therefore the harvested data must be parsed by a special software that filters the data. This task is accomplished by the rr.pm module written by Ivan V. Kurmanov. It parses ReDIF data and validates its syntax. For example, any date within ReDIF has to be of the ISO8601 form yyyy-mm-dd. A date like "14 Juillet 1789" would not be recognized by the ReDIF reading software and not be passed on to application software that a service provider would use.

    The rr.pm software uses a formal syntax specification redif.spec . This formal specification is itself encoded in a purpose-built format code-named spefor . Therefore, it is possible for ReDIF-using communities to change the syntax restrictions or even design a whole new ReDIF tag vocabulary metadata vocabulary from scratch.