Economics and Usage of Digital Libraries: Byting the BulletSkip other details (including permanent urls, DOI, citation information)
This work is protected by copyright and may be linked to without seeking permission. Permission must be received for subsequent distribution in print or electronically. Please contact firstname.lastname@example.org for more information. :
For more information, read Michigan Publishing's access and usage policy.
Analysis of individual use
A key innovation in the Columbia online books project was the introduction, in 1997, of the ability to identify the activity of unique users. This was a fortunate byproduct of the security system, developed to permit people to read online books from home. To maintain confidentiality of the users, system analysts replaced the identities of individual users with uninformative labels.
With anonymity ensured, we were permitted to link usage to administrative files containing demographic information about the users. Typical results are those shown in Table 16.3, reporting the distribution of the status of individual users at the time they first used a particular resource. The resource in this case was the online version of the Oxford English Dictionary. While we had a number of reference works available online and, by the close of the project, close to 200 books in online form, the total usage of the OED represented approximately 50 percent of all online usage, and so it is used here to illustrate the types of analyses that we performed.
There were 3,600 individuals who used the OED during the study period. Just over 2,000 of these were undergraduate students at the time of first use. Nearly 300 were graduate students and close to 140 were faculty members.
We analyzed the ways in which individual users used the resource. To do this we introduced the rule that an inactive period of 15 minutes or more was considered to mark the end of a session. This is a reasonable rule based on detailed analysis, which showed that there was a natural break in the distribution (over all users) of the interval between "clicks" at somewhere around 10-15 minutes. We interpret this as meaning that continuation of a session over a break of this duration will be a rare event, which we can safely ignore. We also studied the total amount of use that individuals made of specific resources. This is illustrated by data on the OED. The mode (that is, most common) number of clicks that an individual user made on the OED is somewhere between 2 and 3. Above that number the number of clicks that a person made on the OED drops exponentially. The rate of the drop is such that the chance to go on to two more clicks is about 2/3 at any time. (The chance to add one more click is the square root of this number, or about, 83%.)
As shown in Figure 16.3, the time spent using the OED online follows an exponential distribution. This indicates that at any time in the course of using the OED an individual has a constant probability of just quitting and deciding never to use it again (roughly 100%-83%=17%).
This apparently exponential behavior is intriguing and we pursued it in another way. Since we could anonymously track individual users, we could plot how much an individual used the resource against how long it was since the first time that the individual used it. With 100% adoption this graph would be roughly linear. We show the actual data for the OED (which had heavy use) in Figure 16.4.
Figure 16.4 is a scatter plot. Each point represents one individual user. The y-coordinate of the point represents the number of sessions that an individual had with the OED and the x-coordinate represents the number of days since that individual first used the OED. The steep line represents the expected usage relationship if adopters continued to use the resource at a steady rate. In fact, a regression analysis shows that the best fit is nearly horizontal, which indicates there is little ongoing use by individuals. It is apparent that many observations are not well-predicted by this model, and indeed, that some usage did persist.
We can plot this data in a more familiar form by showing the distribution of time since first use, without paying attention to how much use there has been. We do so by projecting the preceding figure onto a horizontal axis; See Figure 16.5. We see, as have most researchers in the academic setting before us, that it is very easy to discover the existence of the semester. Each of the five peaks in this graph corresponds to an academic semester. There might be some cause for optimism in the fact that the leftmost peak, which represents the most recent surge in use, spring 1999, seems to rise higher than any of the earlier ones. However we don't know quite what to make of the fact that the one before it (fall 1998) represents a drop from the preceding fall.