The OpenFactor: Toward ImpactAligned Measures of OpenAccess eBook Usage
Skip other details (including permanent urls, DOI, citation information)This work is licensed under a Creative Commons Attribution 4.0 International License. Please contact mpubhelp@umich.edu to use this work in a way not covered by the license. :
For more information, read Michigan Publishing's access and usage policy.
Abstract
A statistical analysis of usage data for openaccess ebooks from two different publishers and from a free ebook distribution platform indicates that openaccess ebook usage is distributed following lognormal statistics, and meaningful analysis results after calculating the logarithm of the download counts. To assess usage impact from raw usage data in alignment with the goals of openaccess ebook publishing, future impact analyses should use logarithmbased metrics to measure an “openfactor”.
Introduction
Ever since the notion of an “impactadvantage” was posited in the early years of openaccess[1], open access publishers have looked for ways to measure their impact. Unit sales and profits do not align well with the goals of openaccess publishing. Since the whole point of removing tollaccess barriers is to increase access to information, openaccess publishers have looked to their usage logs for validation of their efforts and mission. Early attempts to correlate download logs to journal impact in digital libraries (for example, Kaplan and Nelson[2] and Thirion et al.[3]) had shown uncertain promise. Bollen et al.[4] demonstrated that coincidence data derived from usage logs could reveal structural properties of the scientific information landscape, including “journal centrality rankings”. Brody and Harnad[5] found correlations between ArXiv downloads and later citations, but needed to do logarithmic correlations.
The search for impact metrics is not new to the digital world; libraries have always needed ways to measure the effectiveness of their collections. Books are for use,[6] and if a print collection doesn’t circulate, then readers aren’t having their needs met. In the absence of validated measures of impact, presentation of cherrypicked statistics will inevitably undermine advocacy around the impact of openaccess content. For a review of evidencebased measures of openaccess impact, see Tennant et al.[7]
Counting downloads or circulations may seem easy to do, but there are pitfalls to avoid when interpreting the resulting data. Counting hits on a web server is particularly fraught with danger of misinterpretation. For example, a research item might be accessed multiple times in the course of a scholar's use, on multiple devices. In some cases, an item might require redownloading because of a poor internet connection or a bad user interface so that an increased hit count is associated with a poor access rather than a successful access. Software agents such as indexing spiders routinely download materials; although many of these agents can be excluded from download counts, all downloads are ultimately done by software and it can be impossible to distinguish software serving an individual from software operating as a robot.
As openaccess ebooks become more common, the complexities of measuring their impact have become apparent. Compared to journal articles, ebooks are delivered and use diverse formats and modalities. They may be browsed, downloaded, and segmented. Ebooks cost more to produce than individual journal articles, so publishers have a bigger stake in justifying open access. As might be predicted, there have been exuberant press releases and white papers presenting gaudy download statistics without much context or statistical grounding.[8]
Supporters of open access publishers, such as libraries, also need impact validation. It's reasonable for a library administrator to ask "How many times have users from the library used items from Publisher X". In answering questions like these, publishers want to present their statistics in a favorable light. Initiatives such as COUNTER[9] have sprung up to help publishers and libraries produce and analyze statistics that purport to compare usage across publishers and across libraries. Physical circulation counts can be also be problematic, depending on how the data is used. Circulation of a particular item is affected by time off shelf, which may depend on many factors completely unrelated to the item’s usage. Reshelving stats can also undercount because of “helpful” patrons. Any of these problems may lead to poor decisions if the data are used to inform collection development and management processes.
Several groups have begun tackling the problem of measuring impact for open access ebooks. One strand of activity has drawn from the field of altmetrics. Commercial services such as Altmetric.com[10] aim to provide data relevant to ebook impact. Projects such as HIRMEOS[11] have similarly begun to fill the data vacuum. A study of data for UCL Press openaccess ebooks was a good first step towards providing much needed grounding and context.[12]
Objectives
Data on usage informs discussions about how open access resources are discovered and accessed. It's reasonable to ask questions about the effect on usage of license, format, subject area, and indexing. The author recently participated in such a study, focusing on openaccess scholarly ebooks, funded by the Andrew W. Mellon Foundation. A joint effort of University of Michigan Press, Open Book Publishers, and the Free Ebook Foundation[13] looked at a variety of data, including server log data, to determine how and where the ebooks are discovered and used. A full report of our conclusions can be found in Michigan’s Deep Blue repository.[14] Cumulative download data on open access ebooks downloaded via Unglue.it[15] was also examined.
One set of objectives of this study was to compare attributes of the open access ebooks across the two participating publishers' catalogs, to learn what factors promoted sales and usage. For example, we wanted to know if usage was correlated to sales, and what effect the price had on sales. To answer these and other questions, we collated bibliographic, usage and sales data from the disparate systems used by the two publishers.[16] Google Analytics[17] was used to gather book download and webpage usage data for both publishers; Google Analytics data was validated by comparison to data from server logs. In particular, we were careful to aggregate and assign urls to specific books to account for the different webpage layouts used by the two publishers. Sales and usage data were taken from roughly equivalent time periods; a small burst of usage for each book appeared at its initial publication, but for most books this burst was eclipsed by usage over study’s time span, and we did not attempt to correct for its effect. The python libraries NumPy[18], SciPy[19] and Pandas[20] were used in data analysis, curve fitting and regression analysis.
Results
Figure 1 shows a scatter plot of gross sales versus downloads, normalized by the length of the time period measured, for the 118 titles[21] we studied. The linear regression analysis of sales vs. downloads, if we are to believe it, indicates a weak dependence of sales on online traffic. A linear regression of downloads vs. unit sales, by contrast, shows a strong dependence, suggesting that this sort of regression analysis is inappropriate. Sales and downloads outliers are overweighted in the regression analysis at sparse ends of the distributions.
Other statistical measures lend skepticism to the observed correlation. Statistical distributions can be characterized by their variance and kurtosis. Daily download statistics can be examined to determine if they are described by specific distributions. For example, normally distributed data has a variance of σ^{2} and an excess kurtosis of 0. We computed an excess kurtosis of 100500 in the daily download numbers for popular books in our data set. This is characteristic of “spiky’ data, not data that follows familiar statistical distributions.
Statistical measures of both downloads and sales are dominated by titles with sales and downloads that are not representative of the collection as a whole: by selectively omitting outliers carefully, we can make the numbers tell any story we want.
Is there anything that we can learn about the collections by looking at the downloads or sales numbers? Do the numbers mean anything? If one book has been downloaded more than another book, does it mean that the book was easier to find, or that the publishing press had done a better job of promotion or search engine optimization, or that it was more "viral"?
A better understanding of the download data is obtained by examining the distribution of downloads across the collections and their distribution over time. We looked at the daily download count for individual books and computed averages, variances, and kurtosis. Most of the books exhibited very large kurtosis in the daily download counts; only 2 books had kurtosis below 2 (kurtosis would be 0 for a normal statistical process). This indicated that comparing averages or variances was poorly justified. In other words, averages and variances would be typically dominated by the download counts on a very small number of days in the year.
Figure 2a shows a plot of daily page views for a less “spiky” book in the study, exhibiting the irregular nature of the data even though the kurtosis for this book is still 6. Seasonal usage is evident, suggesting its use in courses tied to an academic calendar.
Figure 2b shows page view data for a more typical ebook in the study, with kurtosis=53.8. Seasonal variation is less apparent; spikes of usage occur, apparently at random times.
Analysis of the page load data indicated that the data could be most usefully characterized by lognormal distributions. Figure 3a and 3b show the distribution of daily page loads for the ebooks examined in figures 2a and 2b. It can be seen that all the “spikiness” seen in figure 2 is well described by a normal distribution in the natural logarithm of the daily page views. There is a departure from the lognormal fit at small counts where the continuous lognormal distribution is a poor approximation of the discrete measurements.[22]
Building on the insight that ebook page views, downloads, and related quantities can be described by lognormal distributions, we can return to our analysis of downloads and unit sales across the 150 books in the study. Figure 4 shows the histogram of the logarithm of the download count per title. It is well fit by the classic bell curve of a normal distribution. Figure 6 shows a similar analysis for unit sales. A histogram of the logarithm of unit sales per year also looks like a normal distribution.
Repeating the regression analyses, in figure 6 it is seen that there’s little correlation between sales and pageviews. The slope obtained, 0.35 ± 0.10, corresponds to a rather weak power law, and there is a large amount of scatter, leading to a correlation coefficient of 0.306. Apparently, the qualities that attract sales are only weakly related to the qualities that attract page views. There's no statistically significant evidence in this dataset that downloads drive sales or that downloads suppress sales.
With a methodology grounded in lognormal statistics, it’s possible to answer a variety of questions. For example, the lists from the two publishers in the study, University of Michigan Press and Open Book Publishers were statistically quite similar. Unit sales for a Michigan title had a logarithmic mean of 7.27, and standard deviation of 1.056, while an OBP title register logarithmic mean sales of 7.45 with standard deviation of 0.77. We wondered whether characteristics of a book’s title affected the page views. The answer: apparently not. Measures such as title length and title entropy were not significantly correlated with ebook page views. We wondered if there was a relationship between a book’s sales and its price. The answer: not in this collection. We found no single book characteristic that strongly correlated to downloads or sales. The book publishing industry has an aphorism that explains this: "every book is different".
The website Unglue.it provides free hosting and links for thousands of freelicensed and publicdomain ebooks. The larger number of titles and diverse content should make it easier to see patterns in the usage distribution. Download data for 10,109 ebooks available for at least 9 months via Unglue.it was analyzed. As seen in figure 7, the distribution of usage appears to result from at least two factors. The frequently downloaded distribution marked in green are books that have been featured on the Unglue.it homepage. These books are promoted in the Unglue.it twitter feed, on the Unglue.it Facebook page, and are given a higher weight in the website’s sitemap. The distribution with the orange histogram corresponds to ebooks that are available in EPUB and MOBI formats. It’s clear from the data that both of these two factors are important determinants of usage on the Unglue.it website.
Discussion
Distributions that look like a bell curve after application of a logarithm are called lognormal distributions. These are well known across science but are periodically rediscovered in new fields of inquiry. Lognormal distributions are typically found in systems exhibiting exponential growth (or in some fields “preferential attachment”).[23] The size distribution of raindrops are a good example: drops grow in size at a rate proportional to their size. It should not be surprising to see this distribution in ebook downloads or page views; the publishing industry understands the process as "wordofmouth". Sales of books or downloads are driven by favorable comment; the number of these comments is proportional to the number of books that have been sold or downloaded.[24]
In fact, it has been noted that the long tails in data sets of book sales and website hits can be well fit by lognormal distributions.[25] Analysis of circulation data from University of Huddersfield library was very well fit over several orders of magnitude by a lognormal distribution.[26] In the field of altmetrics, patterns of article citation are often characterized by lognormal distributions.[27]
Having observed that open access ebook downloads and library book circulation exhibit lognormal statistics, wellgrounded statistical analyses are possible. To make collectionlevel statistical comparisons of download counts, unit sales, or circulation counts, we first compute the logarithm of the count.
When we compute an average or a variance, we’re implicitly making an assumption about the quantity we’re averaging, i.e. that is has an average value, and that if we use a larger sample, we’ll get a better measure of that average. In statistics, this is called the Central Limit Theorem. If the distribution of the measured quantity is “Normal” or “Gaussian”, we can use measured variance to compute the probable value of a subsequent measurement. Further, we can use familiar tools and criteria to decide if a result is “statistically significant”. If we try to average measurements of a quantity that follows lognormal statistics, we’ll need to make an exponentially large number of measurements before the averages converge.
An important message of this work is that statistical averages are frequently misleading. It’s obvious in the extreme case: if averages are used in a sales analysis of novels published in 2011, the analysis will erroneously conclude that books with “grey” in the title[28] will tend to outsell books with other sorts of titles. If, as is likely, book sales generally follow lognormal statistics, a more useful analysis will result if the quantity analyzed is the logarithm of the sales.[29]
Specifically, the present results are relevant to anyone trying to use download counts to measure or understand open access ebook usage. For example Emery and coworkers[30] reported ebook usage data purporting to show that OA increases average downloads per book compared to nonOA books. This effect is large and supported by the data, but the authors go on to say that "Engineering, mathematics and computer science OA books perform much better than the average number of downloads for OA books across all subject areas." but declined to release any actual download data or any statistical characterizations of the data.[31] Since this conclusion is based on average download counts for a collection of similar size to our study, it's as likely as not that the reported subject area differences will disappear in an analysis using lognormal statistics.
More generally, the present results suggest a rethinking of how the OA ebook community measures “usage”. Even the term “usage” is wrong, in that it implies that a resource is used up or depleted. Open access ebooks provide value in many ways  communities read them, digest the information they contain, recommend them to colleagues, cite them, repurpose them. OA ebooks are NOT used up or depleted. Each of these activities will leave a signature in the download data. Because the downloads appear to follow lognormal statistics, it’s the logarithm of the download counts that usefully measure a normally distributed quantity. Let’s call this quantity the “openfactor” (explained below). For a commercial publisher, optimizing sales (and thus profits) is the goal. For an impactdriven publisher, whether university press or librarypublisher, or scholarly nonprofit, perhaps the figure of merit should be something more like the openfactor, not download counts, its exponential. The openfactor isn’t so tricky to measure (because it allows use of familiar statistical methods) and it aligns better with organizational impact than raw downloads.
What is this mysterious “openfactor”? It’s a good name because it’s a quantification of what causes the usage or impact of an open ebook to grow. Mathematically, it’s the probability that a usage will generate another usage. Practically, it’s a combination of quality and accessibility; the quality of a book makes people want to spread its use, open accessibility enables them to do so. If the data is download counts, simply taking the logarithm of the counts will give a quantity roughly proportional to the openfactor.[32] If the measurement was of internet memes instead of ebook downloads, the quantity could be called the “repostfactor” of the meme. A large repostfactor blows up exponentially resulting in a viral meme. If social media feeds valued repostfactor rather than its exponential, social media feeds would probably fill up with reasoned discourse instead of cats and angry babies.
Conclusion
This article describes a study of openaccess ebook usage data and finds that it is lognormally distributed across the titles studies. It suggests that the openfactor, a quantity characterizing the lognormally distributed data, is a useful measure of an open access ebook’s impact.
Organizations that optimize this openfactor rather than downloads will avoid being slaves to virality and the radicalization that results from using a measurement (download count) that values the extremes that result from exponentiation of a book’s qualities. They’ll value a breadth of quality; we don’t want open access to be dominated by 50 shades of scholarly grey.
Acknowledgements
This work was sponsored in part by a grant from the Mellon Foundation ”Mapping the Free Ebook Supply Chain: Grant 11600118 (03/01/16 to 02/28/17).
Bibliography
 Altmetric. Accessed 29 Mar, 2019. https://www.altmetric.com/.
 Anderson, Porter. "University College London Press Passes 1 Millionth Open Access Book Download". Publishing Perspectives, May 24, 2018. https://publishingperspectives.com/2018/05/universitycollegelondonuclpressmillionopenaccessdownloads/.
 Bollen, J., H. Van de Sompel, A. Hagberg, L. Bettencourt, R. Chute, M. A. Rodriguez, L. Balakireva. "Clickstream Data Yields HighResolution Maps of Science". PLOS ONE 4, no. 3 (2009): e4803. https://doi.org/10.1371/journal.pone.0004803.
 Brody, Tim and Stevan Harnad. "Earlier Web Usage Statistics as Predictors of Later Citation Impact." arXiv:cs/0503020 (2005).
 Brody, Tim, Stevan Harnad, and Leslie Carr. “Earlier Web Usage Statistics as Predictors of Later Citation Impact." Journal of the American Society for Information Science and Technology 57, no. 8 (2006):1060–1072.
 Chakraborty, Subrata. "Generating discrete analogues of continuous probability distributionsA survey of methods and constructions". Journal of Statistical Distributions and Applications 2, no. 6 (2015). https://doi.org/10.1186/s4048801500286.
 Clauset, Aaron, Cosma Rohilla Shalizi, M. E. J. Newman. "Powerlaw distributions in empirical data", arXiv:0706.1062 [physics.dataan] (Submitted on 7 Jun 2007)
 Emery, Christina, Mithu Lucraft, Agata Morka, Ros Pyne. "The OA Effect: How Does Open Access Affect the Usage of Scholarly Books?". Springer Nature, November 2017. https://media.springernature.com/full/springercms/rest/v1/content/15176744/data/v3.
 Gatti, Rupert. "Handle with Care: pitfalls in analysing book usage data". Dr Rupert Gatti. 11 Dec. 2017. Accessed 29 Mar. 2019. https://rupertgatti.wordpress.com/2017/12/11/handlewithcarepitfallsinanalysingbookusagedata/.
 Google Analytics. Accessed 26 Nov. 2018. https://analytics.google.com/analytics/web/.
 Harnad, Stephen and Tim Brody. "Comparing the Impact of Open Access (OA) vs. NonOA Articles in the Same Journals". DLib 10, no. 6 (June 2004). doi:10.1045/june2004harnad http://www.dlib.org/dlib/june04/harnad/06harnad.html
 Hirmeos Project – High Integration of Research Monographs in the European Open Science infrastructure. https://www.hirmeos.eu/
 Kaplan, Nancy R. and Michael L. Nelson. "Determining the Publication Impact of a Digital Library". Journal of the American Society for Information Science 51, no. 4 (2000): 324339.
 Leaver, Tama, Lucy Montgomery, Cameron Neylon , Alkim Ozaygen. "Getting the best out of data for open access monograph presses: A case study of UCL Press". Humanities Commons (2018). http://dx.doi.org/10.17613/M6HQ3RZ0T.
 Michigan Publishing. "Mapping the Free Ebook Supply Chain  Michigan Publishing." Accessed 26 Nov. 2018. https://www.publishing.umich.edu/projects/mappingthefreeebook/.
 Mitzenmacher, Michael. "A Brief History of Generative Models for Power Law and Lognormal Distributions" Internet Mathematics 1, no. 2 (2004): 226251.
 NumPy. Accessed 28 Nov. 2018. http://www.numpy.org/.
 Pandas. Accessed 28 Nov. 2018. https://pandas.pydata.org/.
 Project COUNTER. Accessed 26 Nov. 2018. https://www.projectcounter.org/.
 SciPy.org. Accessed 28 Nov. 2018. https://www.scipy.org/.
 Shalizi, Cosma Rohilla. "The Distribution of Library Book Circulation Is Not a Power Law, or, Gauss and Man at Huddersfield". Bactra. 16 Mar. 2011. Accessed 28 Nov. 2018. http://bactra.org/weblog/744.html.
 Tennant, J.P., F. Waldner, D.C. Jacques, P. Masuzzo, L.B. Collister, C. H. Hartgerink. "The academic, economic and societal impacts of Open Access: an evidencebased review". F1000 Res. 5 (2016): 632. doi: 10.12688/f1000research.8460.3. PMID: 27158456; PMCID: PMC4837983.
 Thelwall, Mike. "Three practical field normalised alternative indicator formulae for research evaluation". Journal of Informetrics, 11, no. 1 (2017):128151. https://doi.org/10.1016/j.joi.2016.12.002.
 Thirion, B., Stefan Darmoni, Jacques Bénichou, "Reading factor: A bibliometric tool to manage a virtual library", Studies in health technology and informatics 84 (2001): 3859. doi:10.3233/9781607509288385.
 Unglue.it. Accessed 28 Nov. 2018. https://unglue.it/.
 University of Michigan. "Mapping the Free Ebook Supply Chain: Final Report to the Andrew W....". 16 Jun. 2017. Accessed 26 Nov. 2018. https://deepblue.lib.umich.edu/handle/2027.42/137638.
 Wikipedia. "Fifty Shades of Grey". Accessed 29 Mar. 2019. https://en.wikipedia.org/wiki/Fifty_Shades_of_Grey.
 Wikipedia. "Five laws of library science". Accessed 29 Mar. 2019. https://en.wikipedia.org/wiki/Five_laws_of_library_science.
Stephen Harnad and Tim Brody,“Comparing the Impact of Open Access (OA) vs. NonOA Articles in the Same Journals”, DLib 10, no. 6(June 2004). doi:10.1045/june2004harnad http://www.dlib.org/dlib/june04/harnad/06harnad.html
Nancy R. Kaplan and Michael L. Nelson, “Determining the Publication Impact of a Digital Library”, Journal of the American Society for Information Science 51, no. 4 (2000): 324339.
B. Thirion, Stefan Darmoni, Jacques Bénichou, “Reading factor: A bibliometric tool to manage a virtual library”, Studies in health technology and informatics 84 (2001): 3859. doi:10.3233/9781607509288385.
J. Bollen, H. Van de Sompel, A. Hagberg, L. Bettencourt, R. Chute, M. A. Rodriguez, L. Balakireva, “Clickstream Data Yields HighResolution Maps of Science”. PLOS ONE 4, no. 3 (2009): e4803. https://doi.org/10.1371/journal.pone.0004803
Tim Brody and Stevan Harnad, “Earlier Web Usage Statistics as Predictors of Later Citation Impact.” arXiv:cs/0503020 (2005). Also: Tim Brody, Stevan Harnad, and Leslie Carr, "Earlier Web usage statistics as predictors of later citation impact" Journal of the American Society for Information Science and Technology 57, no. 8 (2006): 1060–1072.
The first of Ranganathan’s Five Laws of Library Science. https://en.wikipedia.org/wiki/Five_laws_of_library_science
J.P. Tennant JP, F. Waldner, D.C. Jacques, P. Masuzzo, L.B. Collister, C. H. Hartgerink, “The academic, economic and societal impacts of Open Access: an evidencebased review”, F1000 Res. 5 (2016), 632. doi: 10.12688/f1000research.8460.3. PMID: 27158456; PMCID: PMC4837983.
Some examples of the genre include Porter Anderson “University College London Press Passes 1 Millionth Open Access Book Download”, Publishing Perspectives, May 24, 2018 https://publishingperspectives.com/2018/05/universitycollegelondonuclpressmillionopenaccessdownloads/ and Christina Emery, Mithu Lucraft, Agata Morka, Ros Pyne, “The OA Effect: How Does Open Access Affect the Usage of Scholarly Books?”, (Springer Nature, November 2017) https://media.springernature.com/full/springercms/rest/v1/content/15176744/data/v3
"Project COUNTER." https://www.projectcounter.org/. Accessed 26 Nov. 2018.
"Altmetric." https://www.altmetric.com/. Accessed 29 Mar. 2019.
“Hirmeos Project – High Integration of Research Monographs in the European Open Science infrastructure” https://www.hirmeos.eu/

Tama Leaver, Lucy Montgomery, Cameron Neylon , Alkim Ozaygen, “Getting the best out of data for open access monograph presses: A case study of UCL Press”
Humanities Commons (2018) http://dx.doi.org/10.17613/M6HQ3RZ0T
"Mapping the Free Ebook Supply Chain  Michigan Publishing." https://www.publishing.umich.edu/projects/mappingthefreeebook/. Accessed 26 Nov. 2018.
"Mapping the Free Ebook Supply Chain: Final Report to the Andrew W...." 16 Jun. 2017, https://deepblue.lib.umich.edu/handle/2027.42/137638. Accessed 26 Nov. 2018.
"Unglue.it." https://unglue.it/. Accessed 28 Nov. 2018.
The work required to assemble the data sets was greatly in excess of what was anticipated!
"Google Analytics." https://analytics.google.com/analytics/web/. Accessed 26 Nov. 2018.
"NumPy — NumPy." http://www.numpy.org/. Accessed 28 Nov. 2018.
"SciPy.org." https://www.scipy.org/. Accessed 28 Nov. 2018.
"Pandas." https://pandas.pydata.org/. Accessed 28 Nov. 2018.
Data was collected for 138 titles, taken from the catalogs of University of Michigan Press, including its Open Humanities Press imprint, and of Open Book Publishers. 20 titles of the 138 were excluded from the present statistical analysis either because sales data or page view data was unavailable in a form that could be compared to the rest of the list.
You can’t take the logarithm of zero! For a review of discrete analogs of the lognormal distribution, see Subrata Chakraborty, “Generating discrete analogues of continuous probability distributionsA survey of methods and constructions", Journal of Statistical Distributions and Applications 2, no. 6 (2015). https://doi.org/10.1186/s4048801500286
Michael Mitzenmacher, “A Brief History of Generative Models for Power Law and Lognormal Distributions”, Internet Mathematics 1, no. 2 (2004): 226251.
In calculus, growth of a quantity in proportion to the quantity is the defining characteristic of the exponential; the logarithm is the inverse of the exponential.
Aaron Clauset, Cosma Rohilla Shalizi, M. E. J. Newman. “Powerlaw distributions in empirical data”, arXiv:0706.1062 [physics.dataan] (Submitted on 7 Jun 2007)
Cosma Rohilla Shalizi, “The Distribution of Library Book Circulation Is Not a Power Law, or, Gauss and Man at Huddersfield” , Bactra. 16 Mar. 2011, http://bactra.org/weblog/744.html. Accessed 28 Nov. 2018.
Mike Thelwall, “Three practical field normalised alternative indicator formulae for research evaluation”, Journal of Informetrics 11, no. 1 (2017):128151. https://doi.org/10.1016/j.joi.2016.12.002
"Fifty Shades of Grey  Wikipedia." https://en.wikipedia.org/wiki/Fifty_Shades_of_Grey. Accessed 29 Mar. 2019.
Note that Brody and Harnad, cited above, intuitively took logarithms before analyzing data; lognormal distribution in the data would justify that treatment.
Christina Emery, Mithu Lucraft, Agata Morka, Ros Pyne, “The OA Effect: How Does Open Access Affect the Usage of Scholarly Books?”, (Springer Nature, November 2017) https://media.springernature.com/full/springercms/rest/v1/content/15176744/data/v3
Rupert Gatti, "Handle with Care: pitfalls in analysing book usage data” Dr. Rupert Gatti. 11 Dec. 2017, https://rupertgatti.wordpress.com/2017/12/11/handlewithcarepitfallsinanalysingbookusagedata/. Accessed 29 Mar. 2019.
It’s more complicated than just taking a logarithm, because real measurements sum over a distribution of lognormally distributed phenomena.