Most discussion around reading data focuses on how it can be used by producers to fine tune a written product. In a number of areas, especially instructional and pedagogical, this is a sensible priority. However for expressive writing (literature, journaling), producers have little practical need of or use for reading data. Moreover avoiding confirmation bias around reading data may be tremendously difficult for the producers, and consumers may not like sharing reading data with producers. I believe that for expressive writing, reading data is best used by reader rather than writer and that it may represent the most culturally powerful expression of reading data technology. This talk represents a call to developers to offer readers a deeper understanding of their own reading experience, to provide feedback loops, ways to visualize and abstract the tacit processes of reading. Tools like Jawbone Up and Fitbit in exercise, and RescueTime in productivity, are the crude first efforts in what could be called self-nudging, tools to allow us to coax ourselves to do things that reward us in the long run, when otherwise we’d succumb to instant gratification.

Download Slides (PDF)

Prepared Remarks


I try to always speak about something new here with this friendly and curious crowd Peter brings to Books in Browsers.

This year I not just trying to unveil some new start-up, or announce the demise of a new start-up, or try out a new talk—I'm trying to figure out a fresh approach to reading and data.

I’m going to try to do it somewhat in real-time. For a big chunk of this talk, including its very title, was inspired by a conference I attended Tuesday afternoon, in New York City. 

But let me for a minute go back to the initial feelings that have give rise to this. Over the past year or so, as I’ve been advocating for the "reading is a service" model of understanding the publishing business, itself derived from a lot of what I’ve been thinking about in the previous five years, I became somewhat dissatisfied with the conversation around data and reading. 

So much of it was focused on how data can help creators and producers make better books and/or sell books better. Now this isn’t unreasonable. There’s an entire universe of work being done around personalizing education, on and offline, that relies on analytics around learning, and understanding reading is a very important part of that. Moreover it is perfectly reasonable to proceed from formal education and say that a good deal of instructional/how-to publishing will benefit greatly from feedback loops, allowing the materials to be fine-tuned and/or personalized, to improve the skills-acquisition outcomes.

But there was large swathes of creative endeavor in which this data seemed completely beside the point. Where the creator is engaged in a more emotional act, where seeking to communicate is less about feedback loops, and more about authenticity. Data about how one of your readers reads will make you a better teacher, but it will not make you a more authentic writer. 

Which is a long preamble to the second slide.


This, to my mind, is about the funniest image I can use to talk about what it otherwise a tremendously pedantic subject, choice architecture. The idea that, given what we now know about how irrational we really are as human beings, it is good public policy to frame choices in a way that makes us more likely to do the optimal thing. More popularly called the nudge. The fly in the urinal, toward which men are more likely to aim, is placed where there will be the least splashback, making life better for you, and the next man, and the cleaning staff. 

Kin to choice architecture is another phrase, the quantified self, coined by a man who is no stranger to BiB, Kevin Kelly. To architect choices, whether the trajectory of your pee, or the size of the ice cream, or the intensity of your workout, or your drinking, you need data. And since choice architecture is about enabling individual choice, then it is about your data.


Which brings me to the name of the talk which I mentioned was something I decided on about 36 hours ago. Small Data is the term used by Deborah Estrin, a Professor of Computer Science at the new Cornell Tech campus in New York City and co-founder of the non-profit startup, Open mHealth, to describe how we might gather all the digital breadcrumbs we leave, that get captured by all this hardware and software we interact with on a daily basis, our motion, or location, even the language we use in our speech and writing, the tone in our speech, our heart rate etc.  all the little things. Small data is my data. 

And what interests me about reading data is exactly that. My data. It is data about my reading, not data about your writing. Which again is not to say that there isn’t a very valuable role for my reading data to help your writing practice (because it is also data about your writing), and for my reading data to help companies seeking to enhance our skills by tacking us more effectively, but that we’ve not thought nearly enough about how to put data directly into readers’ Not only will it make us more likely, I believe, to read more fully, but also, by giving data to readers, it’ll enhance transparency around data which is a key element of best practices around managing privacy in a Big Data world.

4, 5

I believe that reading functions similarly to the main area in which the quantified self has been applied thus far, health, or, more specifically, exercise and nutrition. The benefits of exercise, of healthy eating, are, by now, extremely well-documented; the benefits of of complex and immersive reading increasingly well-documented. But but the payoff is long. And the opportunity cost is now. I will gladly run on the treadmill Tuesday, for a hamburger today. And Tuesday never comes. 


And, in my dataverse, the world of my own data, I have in fact started reading more since I got by Jawbone. Why? Because Jawbone pointed out that I was significantly below average in the amount of deep sleep I got. And it suggested that eliminate screen time before sleeping, and instead read a print book. (Although eInk works almost as well...) Of course I knew in the abstract I should be doing that, but seeing how my actual deep sleep patterns compared to men my age got me going. 


We obviously see some activity in quantifying behavior for individual use in other areas, like Last.fm’s various data visualizations, designed, I should emphasize not just to make pretty pictures but to be able to put data in our own hands. God, I miss listening to X, or Man, I’m stuck in a rut.


But reading is harder than music, to capture I mean. In fact, even the example I cited earlier, reading print, or eInk device, depending on the device, can be extremely difficult to capture, share, integrate your data. Data is siloed. Data requires manual input. A true user-friendly reading quantification project should include your web reading, your Pocket reading, your print magazine reading, and it should not capture what you don’t want it capturing and so forth. So I don’t pretend that any of this will be easy. I wish I could do show-and-tell, rather than tell-and-hector. But it will be valuable. Most of the value created around reading is that there’s more of it, and it’s cheaper, but relatively little value has been created around actually helping people take advantage of that by integrating reading more fully into their lives. 

And while the logic for this is largely driven by emotional and immersive work, it can only help creatives and producers of instructional work too.

Now, as it so happens, a couple months ago a couple guys approached me because I’d been nodding around these ideas in a published interview, and it turns out they’ve gotten started on doing something just like this. It’s very much an alpha product.