Author: Tatiana I. Filimonova
Title: The "Depository" Information Retrieval System: New Facilities to Research and Present Archival Documents
Publication Info: Ann Arbor, MI: MPublishing, University of Michigan Library
August 2001

This work is protected by copyright and may be linked to without seeking permission. Permission must be received for subsequent distribution in print or electronically. Please contact for more information.

Comprehensive manuscript investigations make a continuous process comprising regular recording of new findings, requiring automatic frameworks. The "Depositary" information retrieval system involves research cataloguing for archival material, user-formulated quest for investigation or educational purposes, and record printing in required formats. It implies a hierarchy of different types of directories providing various information access levels and accounting for different tasks of the staff, investigators, and general users. The structure incorporates digitized documents and support image data, opening way to paleographic, codicological, textual, and content analysis of the material under study.

Rapid development of computer engineering and advanced Internet technologies has opened up new ways to improve the form and quality of education. Today there are strong grounds for believing that libraries, archives and museums are not mere repositories of human cultural heritage but rather the sources of unlimited opportunities for interaction and interpenetration of different cultures, science and education in society. Therefore the fact of advanced technologies objectively contributing to worldwide processes of cultural integration can be hardly overestimated.

On the other hand, current developments in information and telecommunications indicate also that specialist knowledge, depending on each specific field, "becomes obsolete" in the period of up to 3-5 years. This poses a special problem of working out the system of providing knowledge, which, because of its systemic and comprehensive nature, should constantly lead the human race to new levels in understanding the social and natural world.

Ensuring information on all available items of cultural heritage is among the central missions of libraries. While all major world depositaries undertake electronic processing of new acquisitions, archival and, in particular, manuscript collections in electronic formats are far fewer than book publications due to their sophisticated record structuring.

The situation is explained by a number of objective reasons: large-scale book production generally overshadows the need to implement automated processes for manuscript collections. the financial costs of developing an electronic manuscript catalogue far exceed the public demand because archives are less frequently used than printed publications.

But the importance of facilitating public access to valuable historical sources and their availability for research, irrespective of geographical distance, and the fundamentally new level of processing, collection description, and preservation of endangered texts and images, govern the need for electronic catalogues of manuscript collections. Realization of this goal, aiming at promotion of humanistic personal development — a task incomparable in social value — will eventually justify all possible costs. The National Library of Russia is among the centers where the mentioned obstacles are being overcome.


The Library was founded in 1795 at the order of Catherine II, and was one of the first public libraries in Europe. Its book collections exceeding 32 milllion volumes, the Library hosts one the world's richest archives. Its holdings, presenting different civilizations and cultures, include over 400,000 items, spanning the period of the 2nd century BC up to the end of the 20th century.

The total holdings of the Slavic manuscript collection now amount to over 30,000 items, and include Old Slavonic, Bulgarian and Serbian manuscripts. These are kept in 42 collections. Among them we find the famous Ostromir Gospel, the earliest known Russian book dated 1057.

Large library collections came from ecclesiastical and secular institutions like Cyril-Belozersky monastery, Saint-Petersburg Academy of Theology, etc, incorporating nearly 10,000 items, dated 11th -19th centuries. The Russian archival collection of the 18th-20th centuries has about half a million units of issue and is one of the most rich and most significant colections in the country. The papers represent the economical, political, spiritual, and cultural life of Russia. Archival collection possesses a great number of documents in Russian literature, fine arts, theatre, and music. The latter section include handwritten materials by M.I.Glinka, A.S.Dargomyzhsky, M.P.Musorgsky, C.A.Cui, A.P.Borodin, P.I.Chaikovsky, A.K.Glazunov, A.G.Rubinstein, N.A.Rimsky-Korsakov, and others.

The unique collections of West European manuscripts in Petersburg contain 6,000 fifth- to twentieth-century codices and over 70,000 documents). Among them are: a manuscript of St. Augustine (5th century) and the Venerable Bede's Ecclesiastical History of the English People (746); the only surviving copy of a hymn by Caedmon, the first Anglo-Saxon poet in the 7th century; and the world-renowned Grands Chroniques de France illuminated by Simon Marmion, the 15th-century "Prince of Miniature". There are medieval tales of chivalry, poems, treatises, and psalm books. Most of the "gems" came from royal or princely libraries in France. The most famous book in the collection belonged to Mary Queen of Scots who, according to the legend, took it to the scaffold.

Other collections are no less valuable, in particular the Greek manuscripts. The specimens of ancient writing include papyri of the 2nd- to 4th- century BC fragments of the famous Codex Sinaiticus, the Porphyrian Gospel (835) and the Psalter (862), and the 10th-century Gospel of Trebizond.

The Eastern manuscript collection demonstrates the evolution of writing in the East. Rich information can be derived from Egyptian papyri, texts written on palm leaves, stone, leather, parchment, boards, birch bark, paper, or metal plates, or embroidered in silk or painted on canvas.

Manuscripts in eastern languages were acquired by the Imperial Public Library since its foundation, and by the time of its formal inauguration in 1812 there were 183, including 103 from Russian diplomat and collector P. P. Dubrovsky. Large acquisitions in the first years were associated with Russian military victories in wars against Persia and Turkey, amounting to 429 Persian, Arabic and Turkish manuscripts. Subsequent developments were due to collecting activities of Russian diplomats and missionaries. The richest private collection belonged to Karaite traveller, merchant and archaeologist Abraham Firkovich, including about 18,000 items of storage and private archives of over a thousand items.

One of the library's unique divisions is the Plekhanov House, which holds archives and book publications. It was established in 1928 as a research center for the history of European and Russian social thinking and working-class movements. The department was founded with the Plekhanov family's donation to the Soviet Union, including the archives and private library of the prominent philosopher and leader of international social and revolutionary activity Georgii Valentinovich Plekhanov (1856 - 1918), members of the "Emancipation of Labour" group, and their comrades-in-arms and confederates.  

The collection also includes extensive archives of Saint-Petersburg Theological Academy. To date, the Plekhanov House holds 44 collections amounting to over 30,000 items and covering the period between 1832 and 2000. They incorporate documents on Russian and international history, philosophy, art, history of religions and religious studies, fine literature, economics, and a range of illustrative material. An important part is the expansive correspondence of the holders and members of their milieu. Georgii V.Plekhanov's private collections include: over 9,000 volumes in 18 European languages, of books and periodicals, including 15 hundred copies with notes in Plekhanov's hand. G. V. Plekhanov's archives, 5,030 items, enclosing biographical material, works (including edited and manuscript versions, preparatory material, such as plans, notes, page-to-page book reviews, abstracts, extracts from books, and fragments of translations), and correspondence, handwritten materials of other persons, leaflets, and illustrative material.

Early efforts of the Archive cataloguing team faced a range of problems concerning attribution. These are unresolved to date in part due to the following problems: a) many of Plekhanov's papers have hierarchical pagination and contain unrelated notes on their back sides; b) there are a number of disconnected sheets preserving titles and fragments or separate portions of either lost or uncompleted works; c) in numerous cases Plekhanov's dictation was taken by secretaries or assistants, whose manner of writing is seldom identifiable; d) Plekhanov's own hand and style varied over time. Another difficulty is associated with attributing and explaining Plekhanov's marginalia in the books: some are partially cropped in binding, others, made in pencil, are gradually becoming obliterated. All these peculiarities of G.V.Plekhanov's papers have been specified here because, with very few exceptions, they constitute characteristic features of the Modern History documents, which make up the major part of the collections of national archives, and, consequently, the institutions have to deal with the same dilemmas.

In the course of two hundred years the Library has carried out publication and research of the most significant materials from its collections. The results were registered, apart from printed editions, in voluminous catalogues and card files. In many ways their scholarly significance is still indubitable. But at the same time, it is necessary to admit that a number of important characteristics could not be fixed by means of traditional card files. Their abundance andvariety, the impossibility of their simultaneous use in the research, along with the need of handling originals, not always obtainable, have governed the need of accumulating the previously marked information in a set of comprehensive and consistent knowledge of the subject under study in the form of an electronic catalogue.


What kind of new facilities, if compared with traditional ones, has our group aimed at? a) the possibility of presenting a required amount of information simultaneously; b) the possibility of demonstrating detailed images, especially desirable for historical documents; c) the capability of locating necessary segments of the document in order to carry out searches using the directories incorporated into the catalogue; d) the possibility of organizing associative searching. Have we succeeded in solving all the problems? Not at the moment. "The Depository" — as the information retrieval system that has been produced is called — has been developed in the FO?PRO environment. Its second variant will be decided on the VISUAL C++ setting, but the basic principles of the catalogue would not be affected by corresponding alterations. In the paper below some basic conceptual principles that make the core of the "The Depository" information retrieval system are formulated.

The two following articles, composed by Drs Luidmila Emelianova and Catherine Krushelnitskaya, dwell on the technology implicit in the above aims, and some of the systems functions. UNIMARC, USMARC, UKMARC formats exist for cataloguing archival collections. They follow national traditions in manuscript storage and cataloguing. Their recording system covers available documents in collections, accounting for the relevant formal characteristics in this specific storage system.

In Russia the situation is different, and the UNIMARC-based RUSMARC is recommended for book cataloguing. It is therefore to be used in developing a record format for archival documents in machine-readable form. We have no state standard record format for archival collections equipping an adequate solution on the current cultural, scientific and educational levels for: The recording, storage, and accumulation of specific bibliographic, archeographic, palaeographic and content information. Supporting research in the humanities, objectively aiming at enhanced level of knowledge and, consequently, expanded information space. Establishing a preservation collection in the form of digitized documents.

All these goals necessitated identification of analytical hallmarks that might be able to represent the variety of attributive and substantial characteristics of documents in a single system, irrespective of their type, form or contents. The difficulty of structuring this multi-faceted historical data set for recording, preservation, conservation and research purposes has led to the transformation of the problem, in terms of research and development ideas as the recording system for archival collections development project. Implementing this conception turned out to be our main difficulty. Working out a bibliographic record format for current printed books, actually amounting to title-page data, required many years of joint efforts in major library agencies specially established for the purpose.

However, the much more sophisticated set of problems associated with electronic cataloguing of manuscript collections has required, in many cases, radically different approaches, as the following example clearly demonstrates. Electronic cataloguing of current printed books is primarily information recording. Its search mechanism is confined within the range of traditional parameters, invariable and quite sufficient both for library services and human communications. Formal components of bibliographic records for current printed books (the author, place and date of production) are easily standardized on the principles of electronic database structure, search engine and communicative facilities.

In manuscript cataloguing, the same traditional components of bibliographic record — the author, date and place of production — require research. Thus, with traditional title-page information on the author, time and place of publication lacking, dating and localization of a manuscript are accomplished by research. Author information is specified where established with confidence. Otherwise, in accordance with archeographic traditions, author name is incorporated in research title rather than constituting the "Author" record component. Only findings from subsequent historical text and supplementary studies can provide information for author identification.

The "Depository" IRS version described below is a system equally applicable for recording and research purposes, and also serves as a preservation/deposit collection. The latter is of special importance because, irrespective of a user's purposes for the source, access is provided to the electronic version, ensuring preservation of the original copy.


The following concepts form the basis of the IRS:

The database is constructed on the existing format for current printed books. This is essential for database communications and shared bibliographic record standards for printed and manuscript texts; only observing the condition it will be a supplement to the functioning search systems.

Supplying records and search facilities for dozens of specific parameters (omitted in current printed book cataloguing) of archeographic, palaeographic, codicological, art, linguistic and content analysis of the manuscript, of particular significance for documents of Modern history. This is important because the manuscript is both the source of information and the subject of research for several humanitarian disciplines. Therefore an electronic database for manuscript collections inevitably involves the following purposes of scholarly research:

  • Accounting of the record format for objective manuscript features, with sufficient adaptability and a structure representing relevant research information.
  • Provision of possible addition of new records to the electronic catalogue. This will lead to new classifications for analyzed components, new record outlines, and identification of relevant parameters. The range of manuscript record parameters is not limited, and can be extended, detailed and formally classified over time. This capacity is essential because multifaceted manuscript research is a continuous process. Functioning of the electronic database within a hierarchical system of directories providing authority information for data, susceptible to standardization and use of the data in query-based search. Research data that cannot be standardized in the existing electronic system should be recorded as they are.
  • Provision of the maximum use of the system's search components for successful research, as they enhance associative searching.
  • Allowing record reuse for research, educational, publishing and other purposes; Incorporation of digitized and sound files linked in the electronic catalogue structure, with their programmed functioning.

The "Depository" information retrieval system involves research cataloguing for archival material, user-formulated requests for investigation or educational purposes, and record printing in required formats. It implies a hierarchy of different types of directories providing various information access levels and facilitating the different needs of three groups of users: the staff, investigators, and general public. Any solution here is complicated by the fact that in pursuing research goals one cannot envisage the range of retrieval elements, with the exception of authority-records directory types though due to be re-structured or extended in maintenance.

The system of directories is being developed in the process of document data input and search in the electronic catalogue. The maximum possible use provides for single input of recurrent description elements and unified query data. Archival description format is based on RUSMARC communications format for book publications. RUSMARC includes the national usage block (block 9) designed to represent multiple manuscript description characteristics. The range of database fields can be categorized in two groups: Fields, taken from book publication formats (RUSMARC, Authorities) and adapted for manuscript description; Fields, structured to contain specific information on manuscripts, irrelevant in classifying modern publications. They present a system of analytic hallmarks (chronologically conditioned attributive and content characteristics), specifying data, essential for medieval papers, and differentiating and fixing content elements of modern history documents. Database applications for eastern manuscripts or music collections will not require different field selection but rather additional basic directory components derived from concepts and terms used in recording each specific manuscript type. For some fields directory content may be modified, as for instance, targeted on genre, binding design, etc.

Consequently, data input work sheets will differ for different manuscript types. The structure incorporates digitized documents and supports image data, opening the way to palaeographic, codicological, textual, and content analysis of the material under study. The inclusion of digitized document data and auxiliary imagery considerably reduces the need to enter retrieval-irrelevant descriptive information. File directory matches the imagery to identified document description fields. Similarly, the image file name contributes to records retrieval in electronic catalogue. This mechanism permits simultaneous data use in a single searching environment by separating input of descriptive document characteristics from partial digitizing. The digitized data stock is particularly useful in attribution by handwriting. The available directory facilitates correct attribution and dating in historical documents.

Another important element of the data structure approach was the enhanced search by auxiliary fields, providing the best possible insight into the document. This involved a subjective query formulation in electronic catalogue enquiry that leads the investigator to associative exploration.

Database records in this format can be converted to European and US formats USMARC, UKMARC, etc. The complexity of the task in the context of rapid engineering developments and information technologies necessitates and involves continuous programme upgrading and creative expert approaches to new electronic catalogues and digitized collections. Careful provisions for adequate programme product quality would enhance the efficiency and motivation levels of both the staff and investigators.

