/ Open Access Publishing and the Emerging Infrastructure for 21st-Century Scholarship

In the movie Shakespeare in Love, there is a rousing scene in which rehearsals are just beginning for the play that young Will is struggling to draft. The working title is “Romeo and Ethel, the Pirate’s Daughter.” Eventually, the romance between Will and Viola de Lessups catches fire, and inspires Will to finish the play under the new and much improved title of “Romeo and Juliet.” However, in this particular scene, the theater owner, Phillip Henslowe, is worried about the thinness of the script, and the financier, Hugh Fennyman, to whom Henslowe is massively indebted, is pressing hard to see a product. Will tries to mask the panic induced by his writer’s block, and complains that there are not enough actors to perform the play he wants to write. In bursts Ned Alleyn with his troupe of actors, the Admiral’s Men. Alleyn makes a long speech trumpeting his performance credentials during which Fennyman interrupts: “One moment, sir.” Alleyn shouts him down: “Who are you?” Fennyman responds, “I’m, uh...I’m the money.” Unimpressed and confident that the theater is primarily for actors and their audience, not for the moneymen and the administrators, Alleyn retorts, “Then you may remain, so long as you remain silent.”[2]

As a representative of the Andrew W. Mellon Foundation—the money, or at least some of it, in our collective business—I am acutely aware of my primary obligation to stay silent and listen carefully to the higher education community that we serve, and I note in passing that one of the minor story lines in Shakespeare in Love is how Fennyman, the money man, learns this lesson. In the opening scene of the movie, Fennyman is putting Henslowe’s feet to the fire—literally torturing him by holding his feet over burning coals—to pay off his debts. By the end, Fennyman is an engaged partner, supporting the theater and defending artistic integrity, and the scene I have quoted is a turning point in his learning.

In the spirit of engaged partnership, I welcome the opportunity to share some thoughts with you about the future of scholarly communications. This is a big topic and includes the processes by which scholars generate, record, report, preserve, and disseminate their knowledge-building activities for each other and the larger society. Here, I want to focus your attention at least in part on arguments about so-called “open access” publications. My intention is not to rehearse these arguments. Rather I want to situate them in a context and suggest that the issues may not be as straightforward as they appear to those partisans who are actively engaged in the debates. I also want to suggest that we should not allow the fine points of dispute—and the scarce intellectual and political capital they require—to distract us from the many other weighty issues that lie before us in higher education and that require all-hands-on-deck attention.

The Global Context

First, let me begin by summarizing some of key global trends.[3] They are of course suggestive, not definitive, mainly because they are based largely on data about trends in science and engineering. The state of the humanities and social sciences is much less clear because we lack adequate measures about attainment in those fields, but the general trends seem to be that:

  1. Time-to-degree at the undergraduate level has been steadily increasing and there has been a pronounced slowdown in the rate of increase in the receipt of college degrees over the last several decades;
  2. The US is now behind a number of Asian and European countries in the percentage of the 25-to-34-year-old population that has attained tertiary education.
  3. The US is even farther down the rank order of countries when we examine the ratio of first-university degrees in the natural sciences and engineering to the 24-year-old population: the US ranked 15th out of 19 countries included in a recent study;
  4. At the doctoral level, the number of degrees awarded to US citizens in all fields has declined by about 5% over the past 30 years. Growth in international recipients has offset the fall-off in the US numbers, and in science and engineering the representation of international students has approached 30 percent;
  5. In 2000, out of a worldwide total of doctoral degrees in science and engineering, 78% were earned outside the US and the growth in doctoral education has been especially dramatic in China, followed by Japan, South Korea, and the United Kingdom;
  6. With regard to scholarly publication, the US has had by far the largest share of science and engineering article output at 31% with Japan as the next-highest-ranking country at 9%. However, article output by United States–based authors “has remained flat since 1992, even though real R&D expenditures and the number of researchers continued to rise.” Article output has continued to grow strongly in Western Europe and Asian countries, and the US-based share of world scientific output has declined. The reasons for the “flattening out” of US output are said to be “unknown and under investigation.”

What do these various trends mean for US higher education and particularly for scholarly communications? Overall, one can conclude that countries in Asia and Europe are intent on trying to mimic the success the US has had in building an exceptional base of highly educated talent. They are developing a widening array of educational and research opportunities and they are aggressively competing for talent—and this development is probably true in the humanities and social sciences as well as the sciences. Given these opportunities, students and researchers on whom US higher education, especially at advanced levels, has so much depended are now increasingly able either to stay home or go elsewhere. Simply to ensure that the US labor force has the brainpower necessary to sustain a competitive economy, US higher education must therefore do a much better job of reaching the broadest possible citizenry here in this country. Moreover, in a world where there is a rapidly growing and broadly distributed system of centers of excellence in research and higher education, US higher education institutions are necessarily re-defining leadership in terms of being outstanding worldwide partners with other institutions in the knowledge-building enterprise rather than solely in terms of their own output and excellence.

These objectives urgently need attention as a matter of public policy as well as significant investment both within the academy and in the broader society, but the solutions are necessarily complex and multifaceted, requiring deeper understandings ranging from the study of race and equity issues at home to the building of deep, long-term relationships with institutions abroad. Moreover, both objectives—reaching citizens in this country and partnering with academic institutions abroad—are themselves activities of scholarly communications and require, among other factors, openness in the flow of knowledge as a means to advancing knowledge. It is to this principle of openness that arguments for open access publishing deeply resonate, but the danger is that arguments for the general principle have become confused with and have tended to replace arguments about workable public policy.

Open Access Publishing

In its narrow formulation, open access publishing would shift the burden of generating revenue from the demand side through widespread use of subscriptions, to the supply side by charging authors or their sponsors for dissemination, or by some kind of institutional subsidy, making use cost-free. In very few disciplines other than a handful in the life sciences do scholars have sufficient funds from grants and other sources to pay author fees. The suggested alternatives—institutional sponsorships and subsidies—would require academic institutions to support author fees with massive reallocations from library acquisitions budgets and other sources when they are already contributing mightily to the “real” costs of scholarship and scholarly publication by paying faculty salaries, providing library and laboratory resources, and so on. The cost of peer review and distributing the publication, significant as it is, is only the “final stage” of the process, and reallocating existing funds among faculty across disciplines for this purpose in any kind of fair and equitable fashion is an administrative challenge that few institutions have undertaken or even seriously contemplated. Publications adhering to this model of open access will undoubtedly continue to be created and survive, but they will probably be limited in number unless and until sources of supply-side revenue can be found other than grant support, or there is a fundamental administrative and financial overhaul of our institutions.

A broader approach to open access recognizes some of these practical difficulties and either encourages publishers to make articles freely accessible after a limited time during which they generate subscription revenue, or embraces the older call for authors to “self-archive,” that is, to retain rights to make their articles openly accessible in pre-print and/or post-print form. To help advance this approach, universities are deploying institutional repositories, and, in some cases, cheap, easy-to-use and manage publishing tools, and are relying on search engines for discovery and retrieval. So far, however, faculty have generally been reluctant to participate in open access schemes because they do not see the advantages: it takes too much effort and they have so much at stake in the ways that the current systems of peer review and publication preserve trust in the authenticity of academic work and reliably allocate credit for that work. Even in the life sciences, many of the top scholars understandably want to publish in the best journals, even if those journals charge expensive subscription prices. To these objections, open access advocates therefore argue that institutions and funders, including the federal government, need to persuade or require faculty members to participate.

The idea that foundations and universities can or should mandate open access publishing raises a number of very delicate policy questions that rarely receive the attention needed. In some fields and for some types of research, a limited number of very wealthy public or private funding agencies, such as the US government’s National Institutes of Health or the private Wellcome Trust, are able to make very large grants that provide comprehensive funding for major research projects. In these cases, open access requirements seem reasonable and appropriate, and accountability may be straightforward. In many other fields, however, funding at both the university and foundation or government agency level is much more fragmentary and partial, and mandates therefore run up against de minimis limitations. That is, if the funding from a particular source does not reach a minimum threshold as part of the overall costs of the project, then it may be completely unreasonable for the particular funder to impose demands or mandates regarding downstream products.

In addition, responsible grant-making often requires that interests in open access be balanced against the need for sustainability. It may be in the public interest to mandate open access, but it may equally be a failure of public trust if such a mandate is not balanced by consideration of a requirement for sustainability so that the content and the publisher endures. In many cases of funding by the Mellon Foundation, a sustainability model based, for example, on user fees spread across a broad base of users, and funded by academic institutions at relatively low cost so that access is free to members of the institution at the point of use, is often more effective than having the Foundation bear the long-term costs of open access. JSTOR is a good example of this model and the public good is arguably better served because colleges and universities have accepted responsibility for funding the resources that are core to the academic mission. Meanwhile, the Mellon Foundation has been able to contribute to more and varied projects than it might have been able to do had it tried to enforce an open access model and pay to sustain it. One can only admire the responsible approach that the Wellcome Trust has taken to invest time, effort, and money in building an infrastructure for open access in the biomedical sciences and in helping publishers in those fields craft alternative business models to accommodate the Trust’s open access mandates. But one can only wonder at the effects of unfunded mandates by government legislatures and other agencies that have expressed little interest in fostering the development of alternative models of sustainability. After all, pricing can be a barrier to access, but it is also a means of allocating responsibility, and not charging for scholarly publications can mean in the end that no one accepts responsibility for this key resource.

It is all too easy to focus on the trendy, glitzy, heart-pounding rhetoric about the initial step of making materials freely available, especially those materials that “your tax dollars helped make possible,” and to trust that only good consequences will follow downstream. It is much harder to focus strategically on the full life cycle of scholarly communications and ask hard questions such as: open access for what and for whom and how can we ensure that there is sufficient capital for continued innovation in scholarly publishing? One worry about mandates for open access publishing is that they will deprive smaller publishers of much needed subscription income, pushing them into further decline, and making it difficult for them to invest in ways to help scholars select, edit, market, evaluate, and sustain the new products of scholarship represented in digital resources and databases. The bigger worry, which is hardly recognized and much less discussed in open access circles, is that sophisticated publishers are increasingly seeing that the availability of material in open access form gives them important new business opportunities that may ultimately provide a competitive advantage by which they can restrict access, limit competition, and raise prices.

For scholarship to thrive, there is little question that data and other forms of content need to be open, in the sense that economic, intellectual property, and other barriers must be low enough to permit an easier flow of information, especially into rich computational environments that help readers locate and, indeed, discover new information. In many fields, these barriers have been falling steadily for over a decade with innovations such as embargo periods, moving walls, toll-free access, and special forms of license. Even the major publishers such as Elsevier, Wiley, and others, are actively participating in these activities, opening their published content. However, these publishers are not innovating in the direction of greater openness for its own sake, but to advance innovative new business opportunities that depend on open content. That is, they can begin to incorporate and recombine materials that they, other publishers, and the academy have produced with data and other related materials in sophisticated databases and subject them to sophisticated search, data mining, and semantic algorithms. The principle of openness thus is crucial in the formation of public policy that encourages new forms of sustainable businesses to emerge that support scholarship, but simple advocacy of openness for its own sake is not necessarily sufficient or wise.

There are many opportunities for useful partnerships among scholars, libraries, information technologists, and publishers to develop imaginative and useful data-mining and other computation-based services that advance knowledge in ways that engage the broadest possible citizenry and depend on and extend scholarly collaboration around the world. But the nature of the partnerships is key. It is easy to imagine—especially in the absence of hard-nosed and aggressive strategic planning by, and collective action among, scholars, libraries, information technologists, and their universities—that the large, heavily capitalized publishing and other media firms will simply exploit open access repositories, cherry-pick the most valuable open access products, combine them with the most valuable new databases and resources, and sell them back to the academy at a significant profit, thereby chasing out sources of capital from within the academic community that are desperately needed to advance scientific, humanistic, and social science study. If the academy is unwilling or unable to think carefully now about possible downstream consequences of open access publishing and ways to steer clear of undesirable consequences, then the mantra about journal publishing—that the academy gives away its products only to buy them back at exorbitant prices—will surely return to haunt the academy in an even scarier garb than before, and prove to be even more financially debilitating.

Key Strategic Issues

At a general level, then, the debate about open access publishing has had a significant public policy benefit in helping the academy and the nation to embrace the broad principle of openness that US higher education needs to achieve the twin goals of reaching the broadest possible citizenry in the US and serving as an outstanding partner in the worldwide knowledge-building enterprise. But openness is a means, not an end, and we need constant reminders to keep the larger goals in view. To paraphrase the slogan from the 1992 Clinton campaign: “It’s the scholarship, stupid!”

The conditions of scholarship are rapidly changing in field after field, and they require significant attention. The call to action can be found in the recently published findings of the British Academy committee on E-Resources for Research in the Humanities and Social Sciences, which was chaired by the late Karen Spärck Jones. The committee found that, in general, electronic resource provision is “ad hoc and fragmented”; that “[m]ore strategic, coordinated and well-targeted action is needed”; and that these actions “must . . . be grounded in researchers’ actual, not deemed, requirements.” [4] The British Academy findings are replicated in many other places, including the Atkins report on cyberinfrastructure and the recently released ACLS report on cyberinfrastructure for the humanities and social sciences.[5] The urgency and frequency of these various calls to action prompt me to suggest that as important and as pregnant as the discussions about open access may be, there are even larger forces at play in determining the future of scholarship and the need to address them may well be obscured by a narrow but high-volume focus on pricing and open access publishing.

As Richard Lanham argues in his recently published work on the Economics of Attention, we are awash in information.[6] The broadening deployment of computer-based data conversion and capture instruments and sensors has greatly expanded the scale of humanistic and scientific data to be digested. The Google library project is one example of this expansion; the Sloan Digital Sky Survey is another. To reach the broadest possible citizenry and to partner effectively in the world-wide knowledge-building effort, a key strategic challenge for our institutions of higher education is to mobilize resources in the design of systems that operate at scale in helping faculty and students with the most traditional activities that we associate with rigorous scholarship, including discovering evidence, aggregating it, arranging and editing it for use, analyzing and synthesizing it, and disseminating the results through reports and teaching. Let me conclude with some modest suggestions for how we might together build this environment.

The analytic process. One of the fundamental building blocks of scholarly communications is search for the purpose of both basic discovery and more complex analysis and synthesis. Search is effective as a tool, however, only insofar as a sufficiently rich body of sources is comprehensively aggregated to be worth searching. Successful aggregation of sources at scale is the unsung hero of the success of the search-engine industry, and if we are looking for models of advanced repositories, it is to the Googles and Amazons that we must surely look, not just for how they have stored these aggregations, but also for how they have been gathered in disparate formats from multiple sources operating under a variety of business models and intellectual property regimes, and then normalized and indexed for rapid delivery.

But search for discovery is only the beginning of the scholarly process. Scholars then must zero in on the subsets they find—the primary and secondary source objects of interest to their work. They need to pull together these selected subsets for deeper analysis. The process of aggregation at this stage is more difficult and complicated because data need to be reviewed for anomalies, normalized, and prepared in a more rigorous fashion than is likely to be necessary or affordable to the commodity search engines. Provenance and authenticity of the information needs to established; rights cleared, and databases and database schemas created; textual objects may need to be translated and marked up for grammatical and structural features as well as semantically according to certain knowledge structures; numeric data may need conversion to common measures; and assumptions and guesswork throughout need to be carefully documented. Over centuries of data-driven work in the humanities, such processes were codified and standardized in the hands of what became commonly known as documentary editors. Today, given the amount of data, the more these processes can be automated, the better, and functions are increasingly regarded as “curatorial” rather than “editorial” in nature. The main difference between the two designations seems to be that a documentary editor engages in active and largely manual tasks, while the data curator tends to take a more hands-off role, and instead presides over an increasingly automated set of transformations.

More research is desperately needed to improve the accuracy and reliability of automated data preparation or “curation” processes. The Mellon Foundation has supported the development of selected aggregations of primary and secondary data in ARTstor and JSTOR. We have also supported the development of smaller online collections in fields ranging from medieval studies to musicology, literary studies, and archaeology. We are now beginning to invest in the systems that support deeper analysis across collections of data by researchers at collaborating institutions in literary studies, medieval studies, and archaeology. We are looking for imaginative, significant scholarly projects in other fields involving other institutions and data sources in the humanities, including those created by Google, and I invite you to bring your ideas and resources forward. Higher education cannot architect the capacity it needs on two or three examples; it needs many. And this brings me to my last point.

Discipline-based culture. Although one can identify important challenges associated with the general features of the scholarly process as it moves from discovery to data preparation and analysis, it is also necessary to recognize, as numerous studies have shown, that significant differences exist in these practices among disciplines and fields of study. Moreover, and perhaps more importantly, Jud King and Diane Harley of Berkeley’s Center for Studies in Higher Education make the useful distinction in their recent work between formal and informal modes of communication and observe that the formal modes, such as publication in peer-reviewed books and journals, tend to be the most deeply resistant to change.[7] In the informal realm, at the edge of the reputational and promotional system where credentials are being formed rather than fixed, innovation is easier and more likely to occur.

The important distinction between formal and informal modes of scholarly communication helps explain why the e-print arXiv, to which all high-energy physicists routinely deposit their papers, continues to exist alongside of rather than (as some have promised for almost two decades) as a replacement for a publication system to which they also routinely submit their papers: one is an informal mode of communication and the other is formal. The innovative automation of the preprint process in arXiv in the early 90s was built on a stunning ethnographic insight about the informal traditions in physics of circulating preprints and working papers, and it has been usefully extended to other fields in the sciences and the social sciences where there have been similar informal communications practices. However, even though physicists remain dependent on arXiv, very little innovation has occurred in this system since the initial breakthrough and, as Paul Ginsparg has recently reported, even the code base for the arXiv system has changed little since the mid-90s.[8] Real innovation in scholarly communications is now occurring elsewhere in the formal and informal systems of communications, and continued attention to the potential interaction between pre-prints and formal publication, which is a mainstay of the open access publishing debates, threatens to divert resources from other areas where they might be needed and better invested.

Perhaps the most important opportunity is to focus on the construction and curation of data sets. It may be sufficient for funding agencies and journal and book publishers to mandate that original datasets on which new publications are based be deposited and maintained in publicly accessible repositories. However, there are some fields that are thinking even more innovatively and are trying to build peer-review systems around the data so that they can be judged formally on qualities of coherence, design, consistency, reliability of access, and so on. With JISC support in the UK, scientists and professional associations in the field of meteorology have joined to establish a new kind of electronic publication called a data journal, where practitioners would submit data sets for peer review and dissemination.[9] Bernard Frischer, who is a specialist in online virtual reconstructions of archaeological sites, has received NSF support to plan a journal-like outlet that would provide peer review of virtual reconstructions.[10] And with Mellon support, in the field of nineteenth-century literary studies, Jerry McGann at the University of Virginia has organized scholarly societies into a federation toprovide peer review for data in the form of online documentary editions of nineteenth-century authors.[11] More research and experimentation with forms of peer-reviewed data could have significant impact in helping organize the field of data curation, provide additional information for promotion and tenure committees, and avoid wasting resources in a frontal assault on a long-established and, by many accounts, still highly valued system of formal publication. Again, the Mellon Foundation would welcome your good ideas about how best to proceed in this area.


I am tempted to close by calling for cooperation and good feeling as we march forward into this uncertain but promising future. Instead, I would reiterate the more complex approach that I have suggested throughout this paper. There are certain areas, such as the development of key elements of the infrastructure, in which cooperation is absolutely necessary to achieve scale and other benefits, and we need some careful discussions together about what those elements might be. But some good, old-fashioned aggressive entrepreneurship is also sorely needed to help advance scholarship in various disciplines, to reach the broadest possible citizenry in this country and be the best possible citizen in what I have called the global knowledge-building enterprise, and this entrepreneurship is necessary to ensure the diversity in universities that has long been hailed as one of the strengths of our system of higher education.

Donald Waters is Program Officer for Scholarly Communications at The Andrew W. Mellon Foundation.


    1. This paper is based on a talk delivered at the meeting of the Association of American Universities, Washington, D.C., April 24, 2007.return to text

    2. Shakespeare in Love, DVD, directed by John Madden (1998; Burbank, CA: Walt Disney Video, 1999).return to text

    3. The data and analysis in this section are drawn from: William G. Bowen, “Addressing Disparities in Educational Outcomes and the Need for Stronger Institutional Collaborations,” Remarks at a Conference Celebrating 50th Anniversary Year for the University of Michigan’s Center for the Study of Higher and Postsecondary Education, March 22, 2007; and William G. Bowen, Martin A. Kurzweil, and Eugene M. Tobin, Equity and Excellence in American Higher Education (Charlottesville: University of Virginia Press, 2006), pp. 39–72.return to text

    4. The British Academy, E-Resources for Research in the Humanities and Social Sciences: A British Academy Policy Review (April, 2005), p. 1, <http://www.britac.ac.uk/reports/eresources/report/eresources-pdf.pdf> .return to text

    5. National Science Foundation, Revolutionizing Science and Engineering through Cyberinfrastructure: Report of the National Science Foundation Blue-Ribbon Advisory Panel on Cyberinfrastructure (January 2003), <http://www.nsf.gov/cise/sci/reports/atkins.pdf>. American Council of Learned Societies, Our Cultural Commonwealth: The Report of the American Council of Learned Societies Commission on Cyberinfrastructure for the Humanities and Social Sciences (2006), <http://www.acls.org/cyberinfrastructure/OurCulturalCommonwealth.pdf>.return to text

    6. Richard A. Lanham, The Economics of Attention: Style and Substance in the Age of Information (Chicago: University of Chicago Press, 2006).return to text

    7. Diane Harley, Sarah Earl-Novell, Jennifer Arter, Shannon Lawrence, and C. Judson King, “The Influence of Academic Values on Scholarly Publication and Communication Practices,” Research and Occasional Papers Series, CSHE.13.06, September 2006 (Berkeley, Calif.: Center for Studies in Higher Education), <http://cshe.berkeley.edu/publications/publications.php?id=232>return to text

    8. Paul Ginsparg, “Read as We May,” presentation at the De Lange Conference VI, Rice University, March 6, 2007, webcast available at <http://webcast.rice.edu/webcast.php?action=details&event=985>.return to text

    9. Alan Gadian, Principal Investigator, The Overlay Journal Infrastructure for Meteorological Sciences (OJIMS) Project, <http://www.see.leeds.ac.uk/research/ias/dynamics/current/ojims.html>.return to text

    10. The SAVE (Serving and Archiving Virtual Environments) project. See http://www.iath.virginia.edu/save/. return to text

    11. Bethany Nowviskie and Jerome McGann, NINES: A Federated Model for Integrating Digital Scholarship, September 2005, <http://www.nines.org/about/9swhitepaper.pdf>.return to text