spobooks5621225.0001.001 in

    20.3 Methodology

    Although no detailed study of the usage of the Digital Library has been undertaken, it is important to understand how users access the collection, what problems they find there, and what their usage patterns are. Data about where users are in the organisation are important because of the need to allocate costs and charges.

    The Digital Library studies the library server's log files and uses a number of channels to encourage user feedback. The difficulties of log file analysis are well documented (Wright, 1999). The log files are huge and processing them can be time-consuming. Unless user authentication is required, logs record only machine addresses and not personal identifiers. Each server transaction is logged, so a user retrieving a page with five graphics is recorded in six lines in the log file. Users share machines at cybercafes, operate behind proxies, or they use dynamic IP addresses, so that the IP address cannot readily be tied to a single user. Although Wright is describing the problems of log file analysis for servers on the internet, the Digital Library faces the same challenges. BT's intranet is large and proxies and firewalls are installed between different portions of the network. Users share machines in public spaces or borrow colleagues'PCs. Dynamic allocation of IP addresses is used within the intranet to increase flexibility.

    A number of packages are available to assist in log file analysis (Busch, 1997). These packages report statistics such as the total hits on the server, the number of Not Modifieds (304's), Redirects (302's), Not Founds (404's), Server Errors (500's), the number of unique URL's served, the number of unique client hosts accessing the server, the total kilobytes transferred, the top one second, one minute, and one hour periods, the most commonly accessed URL's, and the top 5 client hosts accessing server. These deliver a higher level of management information than the librarian needs and are not used in the BT Library

    Wright describes techniques for grouping unidentified readers into "constituencies", based on their usage patterns (Wright, 1999). These constituencies, such as robot checkers, users checking the What's New page, new users, or demonstrators, are identified by analysing the server's log files and can then be used to observe navigation of the site and spot usability problems.

    The BT Digital Library adapted Wright's ideas in its own analysis of its log files. The purpose of the analysis is

    • to understand who is using the Digital Library,

    • to track individual usage of the library, to enable personalisation and collaborative filtering,

    • to understand which library resources are being used and the extent of that usage to inform renewal decisions,

    • to track which material was being requested through the document delivery system to suggest additions to the collection,

    • to ensure that usage of material is within the licenses agreed upon with publishers.

    Perl scripts are used to extract meaningful usage data from the server's log, concentrating on the html pages and pdf files read and the server's cgi-scripts run, and ignoring less meaningful traces of usage. Accesses from robot checkers and from the library's own staff were excluded in the analysis. Weekly reports are prepared detailing

    • the number of distinct IP addresses accessing the server (as a proxy for the number of individual users),

    • the number of users who logged into the server,

    • the number of searches done in each of the library databases,

    • the number of individuals searching each of the library databases,

    • the total number of online journal articles read from the library server,

    • the number of readers of the library's collection of online books,

    • the number of articles read from each of the online journals purchased through individual subscriptions,

    • the number of users accessing their SDI pages,

    • the number of users who subscribe to journals' tables of contents and the number of journals which have subscriptions to their tables of contents,

    • the number of users annotating database records and the number of annotations made.