|Authors :||Gary K. Peatling, Chris Baggs|
|Title:||Creating a database of British Public Library Annual Reports, 1850-1919|
|Publication Info:||Ann Arbor, MI: MPublishing, University of Michigan Library
This work is protected by copyright and may be linked to without seeking permission. Permission must be received for subsequent distribution in print or electronically. Please contact firstname.lastname@example.org for more information.
Creating a database of British Public Library Annual Reports, 1850-1919
Gary K. Peatling, Chris Baggs
vol. 3, no. 3, November 2000
Creating a database of British Public Library Annual Reports, 1850-1919
This paper describes preliminary steps taken in a project involving the construction of a database using a collection of public library annual reports covering the period 1850 to 1919 recently acquired by the University of Wales Aberystwyth. Particular problems faced included the size of the collection, and the variety of data contained in the reports. As is described in the paper, the procedures that are being used to resolve these difficulties involve a description of the content of the reports by a set of data types which are each defined by an adjustable knowledge base.
By 1918, 566 library authorities in Britain and Ireland had been created . Almost all of these resulted from the adoption by a local authority in the relevant area of the Public Libraries Act 1850 or subsequent legislation. These institutions had deep roots in their surrounding societies, often being linked at their foundation to other local institutions of education, recreation and culture, such as museums and art galleries. As public libraries they evolved into one of the most popular public services in modern Britain . Yet it has been suggested that, certainly until recently, historians have overlooked the wider historical significance and context of these institutions .
The majority of British public libraries produced and published annual reports in most years of their existence during the period 1850-1919. These documents often present evidence of having been prepared with considerable care and potentially represent a major source of information about the foundation phase of British public libraries. Contemporary commentators on the British public library movement made extensive use of annual reports . However, the statistical and non-statistical evidence in the reports requires careful interpretation . Indeed, the annual reports produced today or in recent history by a variety of agencies do not have a reputation as particularly interesting or reliable publications. Thus while a variety of library and other historians have made sporadic use of annual reports , there has been little systematic study of the annual report as a genre, and little systematic consideration of the ways in which they might best be put to use by historians.
The University of Wales Aberystwyth recently acquired a major collection of the annual reports of the public libraries of England, Scotland, Wales and Ireland. Funding has been obtained from the British AHRB (Arts and Humanities Research Board) for a project investigating the value of these documents for historians of the period 1850-1919. While the most obvious utility of these documents is to library historians, the reports also have a wider potential value. They also document the experiences of museums and other sister institutions of the public libraries. The ranges of attitudes and behaviours described therein should be of interest to social, local and cultural historians, and the expanding discipline of book history.
These annual reports were prepared either by librarians or local library committees. Public libraries were under no legislative compulsion to produce them, but the reports were available to several groups of people and served a number of purposes. They acted as a means of reporting to local government authorities, of circulating information of professional interest among librarians, and of attracting support from the general public, particularly influential members of the local community.
The aims of the current project are:
- to write a commentary on the collection of public library annual reports at the University of Wales Aberystwyth, evaluating their uses for historians, and
- to construct a database from the content of those annual reports in the collection produced during the years 1850-1919. It is anticipated that this database will ultimately be freely available to users over the World Wide Web.
The rest of this paper describes recent progress towards the attainment of the latter aim.
03. Recent Progress
The production of the most useful possible database from the annual reports has already necessitated detailed consideration and a number of delicate decisions. Early on in the database project, it was determined that it would not be feasible to transcribe data or to create an online image of the reports as the time and expense involved would be prohibitive.
It was decided instead that a database containing a content analysis or listing of the annual reports would at once be a more realistic aim, and would yield benefits to other scholars. In determining appropriate software for such a database it is necessary to reconcile the needs to concentrate expertise in particular software and to avoid committing to inappropriate software . Examination of the annual reports quickly revealed one-to-many relationships between elements of the data contained therein, and relational database software thus seemed appropriate to the task. Among relational packages, the use of Microsoft Access 2000 corresponded best to the skill set available to the database project and certainly seemed the most likely software to use.
However, as writers on computer-assisted history have been pointing out for over a decade, the relational database paradigm is by no means supremely adapted to all historical research . Indeed the exact applicability of any relational package to at least part of the current database project was by no means obvious. Two immediately distinct features of the annual reports are narrative accounts of proceedings affecting the library and/or other linked institutions for the relevant period (usually by way of introduction to a report), and statistics or lists of data relating to the library and/or other linked institutions. It was considered highly desirable to include both of these in the database if at all possible: excluding either element would have involved prejudging more than was absolutely necessary the uses to which the eventual database would be put by scholars .
Other historical databases have included some data from annual reports (such as trade union annual reports) , but not all the data contained therein. In the current database, on the contrary, it will be necessary to classify each element of the annual reports as precisely as is feasible. This was to be balanced against the desirability of avoiding an excessively complex database structure, which would make the database daunting and difficult to use .
04. A Description of the Model
The process described as entity relationship modelling (or entity-attribute-relationship modelling ) was undertaken, and demonstrated the structure of the data contained in the annual reports to be complex, but not excessively complex. The model eventually produced might best be conceived as a series of linked hierarchies of classifications of data. Initial review of the data revealed that much of it could be classified into certain broad thematic types. To take an example, a number of statistics, tables and graphs in the annual reports could be broadly categorised as "Issue information" — that is, statistical information relating to the number of items issued at a library. Within this broad type, however, there are a large number of variations in the nature of such statistics presented in annual reports - for instance, a report might contain aggregate issue statistics, issue statistics broken according to a library's classification system, and/or issue statistics broken down by month of the year. It was thus necessary to link records of particular pieces of information to more narrowly defined issue information types. It was also desirable to retain the ability to add to these narrow issue information types as data was accumulated in the database.
The relevant small section of the database has been structured as follows:
Section of the relationships in the annual reports database
The Microsoft Access form derived from this section of the database, where the presence of issue information can be recorded, appears as follows:
Issue information form
Through this form a procedure has been designed to facilitate linking each record to relevant identifiers of a particular report, and of a particular type of issue information. An important part in this process is played by the "Information type notes" field. This field offers clarification as to the meaning and past usage in the database of each data type. This thus facilitates accurate and consistent linking of each new record for a particular piece of information to a data type.
Although the "Information type notes" field cannot be edited at this point, it is possible to edit it elsewhere in the database. There are a large number of occurrences in the annual reports of pieces of data of a very similar structure but with small differences between them. It is desirable to avoid the creation of an excessive number of data types since this will create complexity and difficulty for an operator deciding to which data type to link each record for a piece of information. However, it is also desirable to avoid using any data type which does not really describe the particular piece of data. This would be similar to the malpractice sometimes referred to in discussion of relational databases as "squeezing", a consequence of 'the difficulty in making a real world source document "fit" into the rigid artificial model of the desktop data analysis package' . These contradictory requirements have been reconciled in this database through the ability to edit the information type notes linked to each data type. In other words, the definition or knowledge base linked to a data type is not rigid, but can be developed to cover new pieces of information with slight differences. It is anticipated that these notes fields ultimately will be searchable by end users — in other words, they should act as thesauri or sets of keywords, linking searchers to data types which are most likely to contain records of data items of interest.
We thus do not regard defining and maintaining data types for the statistics or lists of information in the annual reports (such as those relating to issues from public libraries) as insurmountable tasks. Many such statistics and lists are, by their very nature, quite formulaic. The problem of defining and maintaining data types for the largely unstructured free text of the narrative introductions to the annual reports is apparently more chronic. Many writers on computerised historical databases advise against the use of relational databases with unstructured sources, since there are potentially acute problems of data "squeeze" . Where a source contains some unstructured data, and some tabulated or explicitly structured data, one possible solution is simply to omit the former from the database .
For reasons discussed above, however, this was an option we wished to avoid in this database. Fortunately there are reasons for regarding the danger of data squeeze with unstructured free text as not prohibitive, given a certain type of treatment.
Even scholars commenting on relatively early relational database models intimated possible lines of progress in the treatment of unstructured free text in relational databases. In 1994 James Bradley suggested three conditions for the use of relational databases; one should not be interested in reproducing the source, data should be open to aggregation and thus some form of statistical analysis, and the majority of relationships between elements of data should be one-to-many or many-to-many. These conditions do not seem too restrictive, though Bradley suggested other forms of analysis not statistical in basis might be more useful for unstructured text . Prior to this Philip Hartland and Charles Harvey had pointed out that even some ostensibly unstructured data might contain underlying structures . This suggests that certain types of treatment of free-text data would help to negate doubts about its handling within relational databases. This is corroborated by a number of projects (some of them recent) which have dealt with unstructured text, not by entering it directly into a database, but by indexing the text via keywords or characteristics of the text, within or outside of the context of a database .
There are thus grounds for believing that data concerning the narrative free-text part of public library annual reports can be recorded in our database with a procedure similar to that used for other kinds of data. A form for entering records of narrative data similar to the forms for other types of data has thus been created:
Narrative information form
As elsewhere in the database, one can record the existence of a piece of information by linking a new record to two identifiers, respectively for the appropriate annual report, and for a more specific data type (or in this case, "narrative theme"). This process is designed to represent the appearance of a particular topic in the narrative part of an annual report. The number of "narrative themes" in this case is likely to become unavoidably large, meaning that the "Information type notes" field has correspondingly increased value as a knowledge base informing the operator and the eventual end-user about a narrative theme.
An additional feature of this form is the "Extent of coverage" field. This enables the operator to indicate in the database the amount of narrative free text in each annual report devoted to discussion of a particular theme. One of three options can be entered into this field, corresponding to 5 lines of narrative or less, 6 to 15 lines, or more than 15 lines per theme. Clearly this is only a very approximate guide, but should be of some use to end-users in guiding them to those reports which have most to say about particular subjects.
In this way, the design of this part of our relational database acts as a mechanism for making explicit the implicit structure of the narrative parts of the annual reports. Of course, data entry still depends upon the operator's interpretation as to what themes are present. However, there are five reasons for suggesting that this too is not a critical defect of this part of the database. First, interpretation of sources is, of course, an integral part of many of even the most basic aspects of historical research . Second, without this mechanism, one would either have to enter unstructured text from the annual reports verbatim into a database, or not record unstructured text within the database at all, or produce some summary of the narrative which would be totally reliant upon operator interpretation.
Third, similar databases have evolved some methods for minimising the percentage error involved in reliance upon operators' interpretation of free text. One of the most useful such techniques is that of multiple entry or multiple definition . This technique recognises that it is possible (indeed common) for even relatively short pieces of free text to allude to a number of themes. This suggests that an operator should be prepared if appropriate to create multiple records in the narrative information form for the same piece of text to cover allusions to more than one theme. This technique could of course be carried to absurd lengths, producing a database which bombards the end-user with numerous, often trivial "hits" and is thus hard to search. We are working to minimise this problem by developing relatively broadly defined narrative themes, which can often adequately encapsulate sections of text.
A fourth reason for believing that reliance upon operator interpretation of unstructured free text will produce minimal percentage error in our database is suggested by the process of validation which will be undertaken. As Charles Harvey and Jon Press observe, errors are likely even when a project involves merely the transcription of source materials: 'errors will result from the inevitable tendency when keying in data to make assumptions about what the source says rather than transcribing it literally, character by character' . In the context of the current database, the accurate description of even lists and tables requires meticulous attention for similar reasons — there will be an 'inevitable tendency' to assume that a particular piece of data corresponds to a previously defined data type when in fact there may be subtle differences between the two. After an initial stage of testing and modifying the structure of the database, once initial data was eventually entered into the finalised database structure, it was found useful periodically to review what had been done. This involved careful comparisons between entries in the database and the actual sources. This was particularly useful for checking the definition and range of the narrative themes that had been entered into the database. As more data is entered, going back to every report will become at once less feasible and less necessary, though it will certainly be desirable to review a sample of reports. It will also be useful to continue periodically to review the creation and usage of narrative themes; rationalisation, merging, and disaggregation of themes may occasionally be desirable in order to maintain the optimum number of distinct themes for the accurate and efficient recording of data. The multi-table nature of the database should also assist review of other tables and the identification of inaccurate and inconsistent entries in specific fields .
The project aims accord a final justification for this treatment of unstructured free text in the database. The real focus of the project is raising the profile of public library annual reports among the historical community. It is not our intention to promote research dependent on any database and on the interpretations of those who compiled it, but to encourage historians to read the sources from which the database is derived. The database is supposed to facilitate historians' use of collections of annual reports, by providing an easier means of locating parts of such a collection likely to be of relevance for those who may or may not previously have been aware of the utility of such materials to their historical research programmes.
05. Limitations of the Database
One possible limitation of the database is that while it is a source of information about public library annual reports, it does not really provide a point of access to them. The database will only to some extent fulfil the functions of what Hartland and Harvey describe as a "hybrid" resource, i.e., one that acts as both an indirect and a direct research facility . Several (though not all) of the aforementioned projects which have also adopted the strategy of indexing free text come closer to fulfilling this typology by also offering access to online editions of the sources themselves. As previously mentioned, the extent of the public library annual reports themselves, and other limitations on this project, prohibit the provision of full online editions of the reports within the database. This lack of direct access to public library annual reports themselves may become a source of frustration to some users of the eventual database.
But this certainly does not mean that the public library annual reports database will have no uses. Indeed, we envisage that it will serve three functions:
- It will be useful for scoping possible research programmes. Initial searches in the database will indicate the utility of public library annual reports to particular research interests. This will enable researchers not based in the vicinity of the collection of annual reports in Aberystwyth, or of any other collection, to assess the importance of utilising such a collection, and may thus also provide assistance with funding bids for research trips or projects.
- It will direct those who do have access to a number of annual reports to relevant parts of such a collection. It is worth noting in this connection that there are a large number of small collections of British public library annual reports from the period 1850-1919 held in many locations in the United Kingdom and elsewhere. Many public libraries or archive centres, for instance, have collections of the annual reports produced by a local library.
- It will be possible to use the database as a direct research facility for certain purposes. The percentage error is likely to be higher in some parts of the database than in others. Nonetheless, the database will provide an indication as to the presence or absence of certain features or themes in annual reports. It could be used, for instance, as an index of changes over time in the level of specific attention library authorities gave to issues such as women's or juvenile use of public libraries.
Within the restrictions of this project, the approach described above seems to be the most promising way to resolve the methodological difficulties inherent in the attempt to create a database from British public library annual reports in the period 1850-1919. To date only a small amount of data has been entered into the database. Data entry is currently proving a process requiring considerable care. We anticipate however that as data types and themes are defined in more detail, data entry will become more efficient. It is hoped that greater clarity of the eventual interface and enhanced convenience to the user will result from this initial investment of time and attention. Naturally, however, future progress in collating further data, and ultimately the uses to which the database is put, will constitute more tangible tests of how well this part of the project aims are achieved.
1. Thomas Kelly, A History of Public Libraries in Great Britain, 1845-1965, 2nd ed. (London: Library Association, 1977), 112.
2. Margaret Kinnell and Paul Sturges, 'Introduction', in: Margaret Kinnell and Paul Sturges eds., Continuity and Innovation in the Public Library: The Development of a Social Institution (London: Library Association Publishing, 1996), ix-x.
3. Alistair Black, A New History of the English Public Library: Social and Intellectual Contexts, 1850-1914 (London: Leicester University Press, 1996), 16-9.
4. Edward Edwards, Free Town Libraries (London: Trübner and Co., 1869): Thomas Greenwood, Public Libraries: A History of the Movement and a Manual for the Organization and Management of Rate-supported Libraries, 4th ed., [reprinted] (High Wycombe: University Microfilms Ltd for the College of Librarianship, Wales, 1971). See also the scathing review of the first edition of Greenwood's book, 'A Book on Free Public Libraries', Library Chronicle, iii (1886), 73-5.
5. Black, New History of the English Public Library, 22-4.
6. Graeme Morton, Unionist-Nationalism: Governing Urban Scotland, 1830-1860 (East Linton: Tuckwell Press, 1999), 124 n.44, 119 n.39, 115 n.33, 110 n.22: R.J. Morris, 'Voluntary Societies and British Urban Elites, 1780-1850: An Analysis', Historical Journal, xxvi, no.1, (1983) 98 n.10, 102 n. 27, 108 n.46, 116 n.77: W.L. Guttsman, The German Social Democratic Party, 1875-1933 (London: George Allen & Unwin, 1981), 185, 187, 211: John Prest, Liberty and Locality: Parliament, Permissive Legislation, and Ratepayers' Democracies in the Mid-nineteenth Century (Oxford: Clarendon, 1990), 146 n. 31: Robert Snape, Leisure and the Rise of the Public Library (London: Library Association Publishing, 1995), 81-7, 93-4.
7. Charles Harvey and Jon Press, Databases in Historical Research: Theory, Methods and Applications (Basingstoke: Macmillan, 1996), 257-8.
8. Gwyn Price and Alec Gray, 'Object Orientated Databases and their Application to Historical Data', History and Computing, vi, no.2, (1994), 45: Manfred Thaller, 'The Need for a Theory of Historical Computing', in: Peter Denley, Stefan Folgevik and Charles Harvey eds., History and Computing II (Manchester: Manchester University Press, 1989), 4: Harvey, Press, Databases in Historical Research, 188-96, 204-8.
9. Peter Wakelin and David Hussey, 'Investigating Regional Economies: The Gloucestershire Portbooks Database', in: Harvey, Press, Databases in Historical Research, 16.
10. David Gilbert and Humphrey Southall, 'Case Study E: Indicators of Regional Economic Disparity: The Geography of Economic Distress in Britain before 1914', in: Harvey, Press, Databases in Historical Research, 144: 'Visualising Geographical Mobility', available at http://www.geog.qmw.ac.uk/lifeline/consult/essay.html#geog_mobility, in: Humphrey Southall and Ben White, 'Mapping the Life Course: Visualising Migrations, Transitions and Trajectories', paper relating to the LifeLine project, available at, http://www.geog.qmw.ac.uk/lifeline/consult/essay.html.
11. '3.3,Links Between Source and Database', available at http://hds.essex.ac.uk/g2gp/digitising_history/sect33.asp, in: Sean Townsend, Cressida Chappell and Oscar Struijvé, Digitising History: A Guide to Creating Digital Resources from Historical Documents, AHDS Guides to Good Practice (Arts and Humanities Data Service, 1999), available at http://hds.essex.ac.uk/g2gp/digitising_history/index.html.
12. 'Design your own Database', available at http://seastorm.ncl.ac.uk/itti/design.html, in: Lorna Scammell, 'The Database Service', (1997, some updates 1999 and 2000), available at http://www.ncl.ac.uk/ucs/databases/
13. '3.3,Links Between Source and Database', available at http://hds.essex.ac.uk/g2gp/digitising_history/sect33.asp in: Townsend, Chappell, Struijvé, Digitising History, available at http://hds.essex.ac.uk/g2gp/digitising_history/index.html: Harvey, Press, Databases in Historical Research, 223-4.
14. Harvey, Press, Databases in Historical Research, 55.
15. '3.3,Links Between Source and Database', available at http://hds.essex.ac.uk/g2gp/digitising_history/sect33.asp in: Townsend, Chappell, Struijvé, Digitising History, available at http://hds.essex.ac.uk/g2gp/digitising_history/index.html.
16. James Bradley, 'Relational Database Design and the Reconstruction of the British Medical Profession: Constraints and Strategies', History and Computing, vi, no.2, (1994), 75.
17. Philip Hartland and Charles Harvey, 'Information Engineering and Historical Databases', in: Denley, Folgevik, Harvey, History and Computing II, 48.
18. Julie Dunk and Sebastian Rahtz, 'Strategies for Gravestone Recording', in: Denley, Folgevik, Harvey, History and Computing II, 72-80: Catherine Harbor, 'Case Study B: The Register of Music in London Newspapers, 1660-1800', in: Harvey, Press, Databases in Historical Research, 40-6: Arts and Humanities Data Service, 'Developing a Database Interface: The Continental Origins of English Landholders', available at http://ahds.ac.uk/casestudies/coel.html: Arts and Humanities Data Service, 'Coded Letters: The Kircher Correspondence Project. Structuring and Classifying Records for a Database', available at http://ahds.ac.uk/casestudies/kircher.html: Stéphane Haffemayer, 'Data Processing and Study of Gazettes from the Ancien Régime, an Account of the Work', Journal of the Association for History and Computing, ii, nr. 1, (April 1999), available at http://hdl.handle.net/2027/spo.3310410.0002.103: 'Example 1: The correspondence of William of Orange 1533-1584', in: Donald Haks, 'Two Examples of the Impact of Computer Technology on Historical Editing: The Correspondence of William of Orange 1533-1584 and the Resolutions of the States General 1626-1651', ii, nr. 3, (Nov. 1999), available at http://hdl.handle.net/2027/spo.3310410.0002.302. Also see the CD-ROM HarpWeek: The Civil War Era, (1857-1865) (Norfolk, Virginia: HarpWeek, 1997), information available at http://www.harpweek.com/6HarpWeekProducts/TheCivilWarEra/TheCivilWarEra.htm, reviewed in Joel D. Kitchens, 'HarpWeek: The Civil War Era, (1857-1865)', Journal for MultiMedia History, i, nr. 1, (Fall 1998) available at http://www.albany.edu/jmmh/
19. Lou Burnard, 'Relational Theory, SQL and Historical Practice', in: Denley, Folgevik, Harvey, History and Computing II, 71.
20. 'Visibility: Reaching the Correspondence', available at http://ahds.ac.uk/casestudies/kircher2.html and http://ahds.ac.uk/casestudies/kircher3.html, in: AHDS, 'Coded Letters', available at http://ahds.ac.uk/casestudies/kircher.html.
21. Harvey, Press, Databases in Historical Research, 88.
22. Daniel I. Greenstein, A Historian's Guide to Computing (Oxford: Oxford University Press, 1994), 82-3.
23. Philip Hartland and Charles Harvey, 'Information Engineering and Historical Databases', in: Denley, Folgevik, Harvey eds., History and Computing II, 47-8.