EPUBs are an experimental feature, and may not work in all readers.

It has been nearly two years since Google first announced its intention to scan the contents of five of the world's largest libraries. Since then, the project has been discussed everywhere from the New York Times to NPR to the Duke Law and Technology Review, and at countless media outlets and blogs in between. Until recently, most of the discussion has explored the immediate legal question: whether the Google Books Library Project tramples the spirit of copyright, as claimed by the lawyers representing the Association of American Publishers (AAP) and Authors' Guild (AG) in their lawsuits against Google. And yet, mass digitization initiatives like Google's implicate a far broader range of social issues than copyright alone.

This March, the University of Michigan Libraries and the National Commission on Libraries and Information Science (NCLIS) sponsored a symposium entitled "Scholarship and Libraries in Transition: A Dialogue About the Impacts of Mass Digitization Projects," dedicated to pushing public discourse on mass digitization beyond the existing adversarial debate over copyright law. The word "Google" notably absent from the symposium title, the event cast a broad conceptual net, taking into consideration the many nuanced possibilities and problems that digitization can raise for a diverse range of stakeholders - not only the publishing industry, but also librarianship, research, and public policy.

The symposium presented a mix of speakers: Tim O'Reilly, Founder and CEO of O'Reilly Media, delivered the keynote address, and Clifford Lynch of the Coalition for Networked Information the closing remarks. University of Michigan President Mary Sue Coleman and Google's Adam Smith also spoke. A series of three-person panels supplemented the speakers' points with dialogues highlighting the major topic areas that large-scale digitization most greatly affects: libraries, research, publishing, economics, and public policy.

If the symposium had a failing, it was the notable lack of representation of the widely held viewpoint that the Google Library Project is both illegal and immoral, and should thus be stopped. Though this imbalance in perspective likely allowed the conversation to proceed deeper into more nuanced philosophical areas than might have been possible with a greater focus on the legal debates, at times the event seemed overly insulated from serious engagement with the perspectives of those who oppose the digitization of copyrighted materials. Whatever the symposium may have gained through the absence of the opposition, it gained at the expense of a fully rounded consideration of the issues - and, more sadly, at the expense of a valuable opportunity to try to shift entrenched positions.

Still, the symposium left me, and I suspect most participants, with a wealth of new information to consider. The speakers, panels, and question-and-answer sessions both provided fodder for golden visions of digitization's vast potential for social good and delved into a number of less-often-aired issues and problems that bear further consideration. Both of these sets of ideas help to push the discourse on mass digitization beyond the acrimonious legal squabble that has dominated the conversation to this point.

Set One: A Digital Elysium

1. Mass Digitization Provides a Massive Backup

In her introductory speech, University of Michigan President Mary Sue Coleman focused strongly on the potential of mass digitization to save libraries and collections from catastrophic devastation such as that which befell New Orleans libraries and archives during Hurricane Katrina. Once books are digitized, the information in them can outlive the destruction or decay of the physical volume. Later speakers and audience members echoed this emphasis on preservation, though some noted that for the purpose of posterity, making Google's aggregated digital archive redundant intact would be of greater value than backing it up in parts by sending the digital files to the libraries involved, as this would allow the retention of not only the information, but the added value that such a critical mass of data can provide.

2. And All the World Shall Have a Book to Read

Another, perhaps vaster potential of mass digitization lies in its ability to extend the geographic and intellectual reach of the creativity and ideas contained in books. As University of Michigan Economist Paul Courant pointed out on the second day of the symposium, digitization helps turn "a local public good [into] a global one." Though few in the developing world (or in less-affluent parts of the developed world) have access to an Internet connection, doubtless even fewer have access to the means to purchase a plane ticket to Ann Arbor or Stanford. Digitization is not a 100% solution to access-to-knowledge issues - but it is nonetheless a major step forward.

3. BookPods, Citation Webs, and Other Playgrounds for Textual Data

A third inspiring possibility - or rather, set of possibilities - that digitization enables arises from the spirit of the "Web 2.0" concept pioneered by the symposium's keynote speaker: copious digital textual content can provide the raw materials for an array of Web services and technological innovations limited only by the imagination. Stanford Librarian Michael Keller described a number of plans under development at his home library, including taxonomic indexing, associative searching, citation linking, and book recommendation systems. Google's Adam Smith cited various kinds of statistical analyses that could create linkages in authorship, citation, temporal relationships, and topical similarity, thus making the digitized books more conducive to search and digital indexing. Tim O'Reilly suggested that the digital library could provide fodder for the kind of user-driven collaborative annotation, reviewing, subject tagging, and application mash-ups that sites like Flickr and Google Maps have inspired; he also noted, in a nod to more physical innovation, that the existence of mass quantities of raw textual data could do for the development of e-book readers what the massive, user-driven conversion of CDs into MP3s did for the development and salability of the iPod and its ilk.

4. Seeding the Public Consciousness

Google's foray into the world of mass digitization has raised the public profile of such initiatives to previously unimaginable levels. This rise in widespread consciousness of digitization has, in turn, pulled a number of important but often ignored social issues into popular discourse. A number of speakers alluded to the range of topics Google's project has brought out into the open - among them, fair use, copyright law, orphan works, the rights and responsibilities of libraries, the inefficiencies of the publishing system, and the future of the printed word. Many of these problems enter the public mind only rarely, and then often in ways not productive for the public good (like the RIAA suing thirteen-year-olds for copyright infringement). The elevation of such topics to the level of broad-based, meaningful public discourse will itself be a major achievement of mass digitization.

Set Two: But What About...

1. Copyright, Trust, and the Still-Frightened Publishers

The issue of copyright formed a baseline for much of the symposium's content; yet, perhaps because so much had already been said on that issue, the symposium ultimately added few truly novel ideas on the subject. University of Michigan Associate Provost and Interim University Librarian James Hilton gave an impassioned presentation on the problematic conflation of chattels and intellectual property in the public mind, while Adam Smith and Hal Varian - both affiliated with Google - both ran through similar, somewhat disappointing, versions of Google's boilerplate fair use argument. Still, the trepidation expressed by ProQuest's Suzanne Bell about the still-perceived possibility that the dark archive might come to light, and the concerns voiced by the Publishers' Licensing Society's Alicia Wise about Google Publisher Program's one-size-fits-all licensing models gave some indication of interesting directions in which the discussion could have ventured, had further such ambivalent views been given the symposium stage.

2. The Potential Mendacity of Digital Avatars

The issue of digital document authentication and the problems with citing electronic media came up on nearly every panel, and formed the focal point of two presentations, given by Jean-Claude Guédon of the University of Montreal and Bruce James of the GPO. Guédon asserted that because digital information is so easy to alter, it is sometimes difficult to identify the best, most authoritative version for research; he cited Wikipedia as an example. In the context of the digital library, this ease of mutation means that there needs to be some way for researchers to confirm that the digital version they are using is in fact an accurate representation of the original. On the library panel, Michael Keller suggested that such trust might be accomplished through the use of hashes and other encryption procedures. Still, if the products of mass digitization are to be useful for academic research, the development of such authentication systems will be an ongoing subject of concern.

3. The 800-Pound Gorilla (and the things it might sit on)

A set of fears related to Google's control over the aggregate collection formed another overarching theme that ran through the symposium. The questions in this area were many and varied: what happens to the critical mass of the collection if Google disappears? What limitations will Google place on the content it holds? Will the libraries be allowed to be more open with their copies? Is it necessarily beneficial to pursue mass digitization according to the corporate principle of competitive advantage? In his closing remarks, Clifford Lynch astutely observed that having a copy of something that is in the public domain does not carry any imperative to make that copy available to others. Though in Google's case, this lack of imperative is somewhat tempered by the provision of copies to the partner libraries, for whom access provision is an imperative, the limitations already placed on the public domain materials in its Book Search service - if a user wishes to download an entire book, she must do it one page at a time - left open access and public domain advocates in the assembly feeling a bit queasy. Google may not be the only avenue through which users can access these books, but it will undoubtedly be the most popular one; as such, its Book Search team ought to think carefully about balancing Google's responsibilities as a corporation against the obligations which at least ought to come with the power the company will wield as the purveyor of such a valuable public service.

4. The Space Beyond 32 Million Volumes

For many symposium-goers, scanning the contents of five libraries - roughly 32 million volumes, by OCLC's estimate[1] - does not go far enough. Several participants pointed out that archival ephemera, film, and artworks - some of the likeliest materials to become orphan works - also make deeply valuable targets for digitization, particularly in light of their uniqueness and fragility. Additionally, many noted that in the rush to digitize the analog materials of the world, libraries and their partners should not neglect the archival needs of born-digital materials. Tim O'Reilly observed that much of the early history of the Web, of potential value to current debates and ongoing lawsuits, has been completely lost - erased through a constant process of update and iteration. The Internet Archive currently freezes and saves many areas of the Web, but it did not come early enough, and its reach is not far enough; it is, after all, only one organization. The constant, ongoing loss of vast quantities of digital-only information will remain a vital issue for archivists and historians moving forward.

Large-scale digitization carries with it the massive potential for social good. It can help protect cultural heritage, disseminate information more democratically around the world, feed the imaginations of innovators, and spark meaningful and necessary public debate. But it is not perfect. Many intellectual property issues remain unresolved; document authority is difficult to ensure; the involvement of one of the most powerful companies in the world complicates the public-good orientation of the largest going initiative; books, while the current focus, are not the end of information. But one line, which kept popping up at the "Scholarship and Libraries in Transition" symposium, provides a good mantra for going forward:

"Don't let the perfect get in the way of the good."

Current efforts at large-scale digitization are not perfect, but they are workable. And in the long run, perfection simply may not be required to realize the enormous benefits digitization can bring - in which case, waiting for it would become a disservice to the public good.


Elisabeth Jones is currently pursuing an MSI in Information Economics, Management, and Policy at the University of Michigan School of Information, focusing on policies related to libraries and archives, access to information, and intellectual property in the digital environment. During the academic year, she works as a Google Partnership Research Intern at University of Michigan Media Relations and Public Affairs.

NOTES

    1. Brian Lavoie, Lynn Silipigni Connaway, and Lorcan Dempsey, "Anatomy of Aggregate Collections: The Example of Google Print for Libraries," D-Lib Magazine 11, no. 9 (September 2005), retrieved April 5, 2006 from: http://www.dlib.org/dlib/september05/lavoie/09lavoie.html.return to text