Appendix A: Interviews with Librarians and Publishers

OhioLINK, Los Alamos National Labs (LANL), and the Florida Center for Library Automation (FCLA) all host journal databases. They were selected for inclusion in this white paper because they had to develop the same capabilities being requested of publishers. Villanova University was included because it has closed stacks for its bound journals, which means that it has good measures of use. James Mullins, the university librarian at Villanova, was on the task force that created guidelines for the statistics that JSTOR delivers.

Academic Press, Elsevier, MCB, and the Institute of Physics (IOP) host their own journals and have experience with collecting statistics. The American Institute of Physics (AIP) and Association for Computing Machinery (ACM) are in the process of developing this capability.

JSTOR and Catchword both host content from a variety of publishers. JSTOR was part of the initial discussions about library requirements, while Catchword is further developing its statistics capability. Like the library hosts, these providers have a standard platform that provides consistent data to enable comparisons.

Academic Press

Academic Press found that the off-the-shelf software packages that summarize hits do not provide the data that libraries need. It is hiring a full-time statistician and measurement analyst to help address the issue. The company experienced a dramatic increase in usage when it introduced its new platform in the fall of 1999.

Data gathering is complicated because Academic Press's journal database (IDEAL) is loaded on remote sites such as OhioLINK and OCLC, and Academic Press needs to combine data from several sources for a complete picture of usage of its own journals. Data are used internally by sales, accounting, and editorial staff to examine correlations and draw conclusions about the cost per-article for each institution. This allows the publisher to understand how the library might equate the cost per-article to a relevant measure indicating value.

In the print world, subscription revenues indicate the health of a journal. When that journal is part of a database, the equation changes completely since some of the articles used were in previously non-subscribed titles.

For every 1.5 log-ins to the database, one article is downloaded, and for every abstract viewed, there is one article downloaded. Academic Press summarizes the total number of log-ins by journal and of articles downloaded by journal each month for each institution and consortium.

Chrysanne Lowe, director of online sales and marketing, noted that the journals that have the most articles downloaded are considered the company's most successful titles. These are large journals with many articles. The list of journals in greatest demand changes when the number of articles downloaded is compared with the number of articles published in the title.

Philosophically, Academic Press is opposed to a business model in which charges increase with use because that discourages use. Academic Press offers marketing support with promotional items and coordinates training with librarians and faculty members.

American Institute of Physics

Doug LaFrenier, director of marketing at AIP, noted that the market has changed dramatically. Providing statistical data to libraries represents a new set of responsibilities for publishers — one that has associated costs. LaFrenier's primary concern is the lack of standards which makes it impossible to compare data.

AIP is concerned that it is undercounting because its system does not count searches and requests for abstracts. It counts only requests for the full text of an article that requires either a subscription or pay-per-view access. At the same time, AIP has discovered that one of the interfaces was triple counting downloads because of the way it grabbed the content.

The American Institute of Physics, working with the American Physical Society (APS) has devoted much of one full-time programmer's activity to developing Web-based statistics that libraries can access for their own use. The statistics, which will be available to other publishers that AIP hosts, are planned for delivery early in 2001.

AIP demonstrated the system at the Special Libraries Association 2000 meeting. The demonstration showed year-to-date download statistics. Librarians who attended this session persuaded AIP that libraries want to be able to specify their own time periods. They also want to be able to compare current data with information from prior years. AIP found it difficult to identify who within the library should have rights to view this information.

Previously, AIP had given its own publishing customers reports from the server logs that summarize activity by journal title. The company also has analyzed time-of-day performance data to support decisions in running an online journal platform. It has been able to identify the most active journals and accounts and believes that much of the information developed for online publishing customers will be useful in developing usage-statistics reports for libraries.

Anyone using the AIP Web site has the option of buying an article online. Sales grew significantly when the company simplified its interface and reduced the number of steps required for the user to obtain the article. This further supports the importance of ease of use on usage.

Association for Computing Machinery

The Association for Computing Machinery is evaluating what statistics need to be collected. As staff experimented internally with data, they found that the most frequently downloaded article in any given month was neither a current article nor one they would have expected to be so popular. High-use article titles provide clues for editors about the topics in demand.

Catchword

Catchword delivers service that is paid for by the publishers, who decide what information to share with libraries. Catchword is expanding its statistics ability according to ICOLC guidelines, and it will have data that can be used by libraries. Catchword has decided to add turnaway statistics that reflect the number of times a user attempts to access the full text of an article in a journal to which the library does not subscribe. Catchword can also track pay-per-view access. Although the company has a single source to produce these data, its challenge is to summarize data from eleven servers around the world.

Elsevier

Elsevier has at least two staff devoted to managing usage data from its ScienceDirect database installations. Most libraries subscribe to only a portion of the 1,170 titles in Elsevier's database; therefore, data on the use of non-subscribed titles are helpful in considering the addition of electronic or print versions of a title.

Although Elsevier is committed to providing as much information as the customer believes is useful, staff acknowledge that custom reports are not economical to generate. The company can see the impact of marketing on journal usage, and it has a staff of account-development managers devoted to training librarians and users on the system. As the volume of articles used rises, the cost per use drops.

To keep current in their field, researchers scan about a dozen journals regularly by browsing their tables of contents. This activity is reflected in how the database is used when researchers select a journal title from a list and then browse the tables of contents of various issues, rather than search by subject, author, or title.

Elsevier has paid particular attention to global requirements for a privacy policy, which appears on a full page on the Web site for ScienceDirect. Some customized services, such as an e-mail address for an alerting service, cannot be provided if the user does not provide a minimal amount of personal information. To ensure privacy, all data on individual users are scrubbed at the organizational level before being processed and aggregated.

Florida Center for Library Automation

FCLA is the central agency that supports the online catalogs of the 10 universities in Florida. Like OhioLINK and LANL, FCLA loads a number of full-text journal databases, for which it produces statistics locally as well as links to publishers' remote sites.

FCLA would like to track the number of searches, the number of documents retrieved, and the number of requests denied. The number of hits is not a valid indicator of use because there is no consistent way to measure them. The number of articles viewed by journal title is counted when the PDF is viewed. Reports on usage of full-text journals are updated nightly in a formatted report that the librarians can download.

When users link to a publisher's database, they have effectively left their home system. The library can tell which database they linked to, but it cannot track actions taken on the publisher's Web site. Consequently, libraries must rely on publishers for usage data and then merge such information with their own local data.

Institute of Physics

Bridget Pairaudeau, producer of electronic publications at IOP, just completed the design of IOP's statistics form for internal use. It allows staff to select the following variables:

  • Who: user files and the subscription records from IOP's internal systems

  • What: data from log files on the type of activity and time frame

  • View: display options, such as grouping subscribed journals

Users of the IOP system also have the option of creating a graph by selecting elements for the x and yaxes. If they chose to graph usage of Web pages on both axes, they can show navigation to full text from the table of contents compared with navigation from the subject keyword search. Data on the use of options that can be customized, such as profiling, use of filing cabinets, and activating a table of contents alerting service, show which features are most used.

The editorial and marketing staffs are interested in knowing which articles and journals are most requested and which institutions are most active. The sales department is interested in the level of use by specific customers, and system designers want information they can use to enhance features, navigation, and usability.

IOP screens out data on internal use, guests, free use, trials, production applications, and robot attacks, because they can greatly skew statistics. When IOP's internal data analysis did not match that of the commercial package, staff discovered that NetTracker counts HTML views but not PDF downloads.

JSTOR

The ICOLC guidelines are based on those developed by a task force in conjunction with JSTOR in 1997. JSTOR data are updated nightly and can be queried and exported to a spreadsheet. Individual site data can be compared with average data for all sites in the same JSTOR classification and with summary data for all JSTOR titles. Both publishers and librarians can sign on and retrieve data.

Data presented include the number of pages viewed, PDFs printed, searches conducted, and tables of contents browsed. Since JSTOR includes as articles all items (e.g., reviews and letters), it lists full-length articles separately for clarity.

In a presentation at the Conference on Economics and the Usage of Digital Library Collections, JSTOR President Kevin Guthrie observed that the articles that are most often downloaded are not those that advance research or that are most often cited (Guthrie 2000). "Value needs to be clearly defined as libraries consider acquisition and cancellation decisions for electronic content," Guthrie stated. (Marthyn Borghuis from Elsevier noted that citations reflect author activity while usage reflects reader activity).

The notion of perishability of content varies with the discipline. The average age of the most-used articles was also surprising: 13 years in economics and 32 years in mathematics. When there are a small number of total accesses for the discipline, the actions of a few people can sway the results.

Guthrie cautioned that usage does not necessarily equate to value in the research sense. "Older articles may be absolutely vital to the continuation of high-quality scholarship and research in the field, but that may not lead to extensive use," he said.

Los Alamos National Labs

The LANL Library has gone through three stages of development, according to Director Rick Luce. The data the library collects depend on how far it parses its log files. The first stage, which entailed parsing UNIX logs and "beat code," cost $20,000 and required nominal staff support. The second stage, which involved producing static usage data on the basis of scripted code, cost $50,000; one staff member performed this activity. The third stage, designed to enable the user to perform a query and export the results, may cost $250,000. Programming staff will be involved in doing the analysis.

LANL has 3,500 electronic journals available to its users; of these, 2,000 titles are loaded locally and 1,500 are accessed remotely. When LANL did a trial with Elsevier, all titles in the database were used and the participating libraries did not own the most-used titles.

Luce concludes that librarians do not know exactly what users need, confirming the discovery process in research and the learning curve in the electronic environment.

LANL enables its users to connect to full text from links within secondary publications, from browsing selected titles, and from performing subject searches. It takes six months for users to discover, remember, and fully use a new service. Keys to success are to ensure that links are established, to allow sufficient ramp-up time, and to promote awareness. LANL has expanded its electronic holdings since 1995, and user satisfaction with library services has increased dramatically.

MCB University Press

In addition to the normal data on time-of-day activity that helps it determine the load on systems, the system at MCB University Press tracks hits and sessions. To learn how users come to the site, MCB also analyzes the top referring sites, top browsers, top entry pages, and the most popular and least popular pages in the database.

MCB University Press is interested in knowing which institutions generate the most requests and which articles and journals are most requested. How users search is also of interest; for that reason, data on the tables of contents, search pages, and browse pages are collected.

Heavy use of the tables of contents through the browse functions indicates that many users know the title they wish to see. However, MCB discovered that the most-used titles at some institutions were the first titles in the alphabet. This indicates that users are learning how to use a system and suggests the need to evaluate the interface or provide more training.