EPUBs are an experimental feature, and may not work in all readers.

When the Web was young, a common complaint was that it was full of junk. Today a marvelous assortment of high-quality information is available on line, often with open access. As a recent JSTOR study indicates, scholars in every discipline use the Web as a major source of information.[1] There is still junk on the Web — material that is inaccurate, biased, sloppy, bigoted, wrongly attributed, blasphemous, obscene, and simply wrong — but there is also much that is of the highest quality, often from obscure sources. As scholars and researchers, we are often called upon to separate the high-quality materials from the bad. What are the methods by which quality control is established and what are the indicators that allow a user to recognize the good materials?

It might be argued that this is not a new problem. Any visit to the stacks of a major library reveals materials so inferior that it is hard to believe they were every printed, let alone collected by a serious library: nineteenth century sermons, deservedly obscure plays, and long-discredited scientific theories. But the problem is much more severe on the Web. Historically, the cost of printing acted as a barrier; publishers played a vital role in validating materials and provided purchasers with a basic level of quality control. On the Web, the barriers are very low; anybody can be an author and anybody can be a publisher.

This paper is motivated by three interrelated questions:

  1. How can readers recognize good quality materials on the Web?
  2. How can publishers maintain high standards and let readers know about them?
  3. How can librarians select materials that are of good scientific or scholarly quality?

Several of us at Cornell University are members of the team developing the National Science Digital Library (NSDL), the National Science Foundation's new digital library for education in the sciences.[2] As part of the planning, we made an informal survey of the places on the Web where high-school students might be expected to look for scientific information. As an example of the quality-control challenge, one of the sites that we examined was the science section of about.com. Figure 1 is what we found. The first page on this site was about astrology. How would young students know that astrology is not science?

Figure 1. An astrology page in the science section of about.com
Figure 1. An astrology page in the science section of about.com

Human Review

The traditional approaches for establishing quality are based on human review, including peer review, editorial control, and selection by librarians. These approaches can be used on the Web, but there are economic barriers. The volume of material is so great that it is feasible to review only a small fraction of materials. Inevitably, most of the Web has not been reviewed for quality, including large amounts of excellent material.

"For the lowest-quality journals, peer review merely puts a stamp on mediocre work that will never be read"

Peer review is usually carried out by unpaid volunteers, but publishers need staff to administer the review process. Informal figures provided by publishers suggest that, for a typical scientific publisher, these administrative costs are at least $500 to $1,000 per published paper.[3] (An interesting aside is that it costs a publisher more to reject a paper than to accept one.)

For the NSDL, we are often urged to include only materials that have been selected by expert reviewers. Several of the projects that are building collections for the NSDL program follow this approach, but they are seeing administrative costs comparable to those for peer review.

Peer Review

Peer review is often considered the gold standard of scholarly publishing, but all that glitters is not gold. Consider Figure 2, which is from the online version of the Journal of the ACM.

Figure 2. Journal of the ACM
Figure 2. Journal of the ACM

This example illustrates peer review at its best. It is indeed golden. The Journal of the ACM is one of the top journals in theoretical computer science. It has a top-flight editorial board that works on behalf of a well-respected society publisher. Every paper is carefully read by experts who check the accuracy of the material, suggest improvements, and advise the editor-in-chief about the overall quality. With very few exceptions, every paper published in this journal is first rate.

However, many peer-reviewed journals are less exalted than the Journal of the ACM. There are said to be 5,000 peer-reviewed journals in education alone. Inevitably the quality of papers in them is of uneven quality. Thirty years ago, as a young faculty member, I was given the advice, "Whatever you do, write a paper. Some journal will publish it." This is even more true today.

One problem with peer review is that many types of research cannot be validated by a reviewer. In the Journal of the ACM, the content is mainly mathematics. The papers are self-contained. A reviewer can check the accuracy of the paper by reading the paper without reviewing external evidence beyond other published sources. This is not possible in experimental areas, including clinical trials and computer systems. Since a reviewer cannot repeat the experiment, the review is little more than a comment on whether the research appears to be well done. Many years ago, I wrote a paper that developed a statistical hypothesis by analyzing data from a published dataset. Twenty years later, I heard rumors that the data was fraudulent. The referees of my paper had no way of knowing this fact. The hypothesis in the paper has been confirmed by other experiments, but the erroneous paper — in a respectable, peer-reviewed journal — can still be found on library shelves.

Figure 3. Fifth ACM Conference on Digital Libraries
Figure 3. Fifth ACM Conference on Digital Libraries

To illustrate the vagaries of peer review, Figure 3 is from another ACM publication, the Fifth ACM Conference on Digital Libraries. This looks very similar to Figure 2, and might be expected to meet the same quality standards, but it does not, for subtle reasons that only an experienced reader can know. ACM conference papers go through a lower standard of review than journal articles. Moreover, many of these papers in this conference summarize data or experiments that the reviewers could not check for accuracy by simply reading the papers; for a full review, they would need to examine the actual experiment. Finally, the threshold of quality that a paper must pass to be accepted is much lower for this small conference than for the ACM's premier journal. This is a decent publication, but it is not gold.

In summary, peer review varies greatly in its effectiveness in establishing accuracy and value of research. For the lowest-quality journals, peer review merely puts a stamp on mediocre work that will never be read. In experimental fields, the best that peer review can do is validate the framework for the research. However, peer review remains the benchmark by which all other approaches to quality are measured.

Incidental Uses of Peer Review

Peer-reviewed journals are often called "primary literature," but this is increasingly becoming a misnomer. Theoretical computer scientists do not use the Journal of the ACM as primary material. They rely on papers that are posted on Web sites or discussed at conferences for their current work. The slow and deliberate process of peer review means that papers in the published journal are a historic record, not the active literature of the field. In a recent conversation, Kim Douglas of the California Institute of Technology suggested that, when selecting where to publish important results, an established researcher is often more interested in establishing primacy through rapid publication than in the imprimatur of peer review.[4]

Peer review began as a system to establish quality for purposes of publication, but over the years it has become used for administrative functions. In many fields, the principal use of peer-reviewed journals is not to publish research but to provide apparently impartial criteria for universities to use in promoting faculty. This poses a dilemma for academic librarians, a dilemma that applies to both digital and printed materials. Every year libraries spend more money on collecting peer-reviewed journals, yet for many of their patrons these journals are no longer the primary literature.

For the NSDL, the conclusion is clear. We need to devise ways to identify and select the working literature, whether or not it has been formally reviewed, and to indicate to readers its probable quality.

Seeking Gold

As we look for gold on the Web, often all we have to guide us is internal evidence. We look at the URL on a Web page to see where it comes from, or the quality of production may give us clues. If we are knowledgeable about the subject area, we often can judge the quality ourselves. Internal clues, such as what previous work is referenced, can inform an experienced reader, but such clues are difficult to interpret.

Figure 4. Request for Comment (RFC) 791. The Internet Protocol
Figure 4. Request for Comment (RFC) 791. The Internet Protocol

Figure 4 is an example of the difficult of judging quality from superficial clues. This document appears to be of little consequence, some sort of typewritten specification, written years ago, long before modern networking. Actually it is one of the most important documents of our time, the official definition of the protocol that forms the basis for the Internet. This specification is still current and this simply formatted version remains the official version. It is definitely gold.

Figure 5. A course Web site
Figure 5. A course Web site

Figure 5 is another example where the clues may be misleading. This is the Web site for one of my courses. The clues are promising: a higher-level course taught by a professor at a reputable university. But is the faculty member an expert in this field? How carefully was this material prepared and checked? Actually, the Web site contains some good material and some that is less good. It has several errors, ranging from obvious typos to a couple of mathematical mistakes. It may be useful, but it is not gold.

Figure 6. The e-Print archives, arXiv.org
Figure 6. The e-Print archives, arXiv.org

Figure 6 is a final example of how difficult it is to judge the quality of a Web site from appearances. This does not look like an important source of scientific information. Yet this is gold. Physics has been a pioneering discipline in exploring new ways to disseminate research. This is the home page of the e-Print archives, arXiv.org, established by Paul Ginsparg at Los Alamos National Laboratory in the early 1990s and recently moved to Cornell University. Of all the publishing innovations on the Internet, this Web site has had the biggest impact on a major scientific discipline. Even before submitting their research papers to journals, physicists post them on this site, where they are available with open access. Ginsparg and his colleagues work closely with the American Physical Society, which subsequently publishes many of the papers in their journals, but for much of physics, arXiv.org is the primary literature while the journals are the historic archive.

Strategies for Establishing Quality

The Publisher as Creator

Many of the most dependable sites on the Web contain materials that are developed by authors who are employed by the publisher. Readers judge the quality through the reputation of the publisher.

Figure 7. The New York Times on the Web
Figure 7. The New York Times on the Web

Figure 7, The New York Times on the Web provides a good example. Everything on this Web site was either written by staff of the newspaper, or selected and edited by the staff. The newspaper's quality and reputation is based on the standards maintained by the staff.

The National Weather Service
The National Weather Service

Figure 8 is similar to Figure 7. This is the home page of the National Weather Service. It is the entry to an enormous amount of scientific data, forecasts, and statistics about the weather. The National Weather Service acts as a publisher in creating and maintaining this Web site. As with The New York Times on the Web, everything on this site was produced by staff of the National Weather Service or known associates from around the world. The quality and reputation is once again based on the standards maintained by the staff.

Figure 9. American Memory at the Library of Congress
Figure 9. American Memory at the Library of Congress

Figure 9 is a final example of this category of publisher as creator. It shows an image from American Memory at the Library of Congress. In this example, the publisher is the Library of Congress. A reader's belief in the accuracy of these collections derives from confidence in the curatorial processes of the library.

Editorial Control

In the three previous examples, the content was created or selected by the publisher's staff. As an alternative, the publisher can rely on an editorial process whereby experts recommend which works to publish. The editors act as a filter, selecting the materials to publish and often working with authors on the details of their work.

Figure 10. D-Lib Magazine
Figure 10. D-Lib Magazine

Shown in Figure 10, D-Lib Magazine is an example. Selection decisions are made by a small group of editors who rely on their own expertise to decide which materials to publish. There is no external review. The quality of the publication depends on the judgment of the editors and their success in encouraging authors to submit good material. In practice, much of the best research in digital libraries is first published in D-Lib Magazine, not in peer-reviewed journals. When an author chooses to publish in the magazine, the author is choosing the magazine's benefits of rapid publication (two weeks from receipt of the paper), broad readership, and an established reputation.

Outsiders sometimes think that peer-reviewed materials are superior to those whose quality is established by editorial control, but this is naive. For instance, the Journal of Electronic Publishing contains some papers that have gone through a peer review and others that were personally selected by the editor. This distinction may be important to some authors, but is irrelevant to almost all readers. Either process can be effective if carried out diligently.

Figure 11. "The Digital Dilemma"
Figure 11. "The Digital Dilemma"

Figure 11 is an example that combines features of editorial control and external review. This is "The Digital Dilemma," a report from the National Research Council published by the National Academy Press. Every report of the National Research Council goes through an exceptionally thorough review process by experts. That process enhances the quality of the reports, yet the review process is not the underlying reason that National Academy reports maintain their high standard. The quality comes from the expertise of the people who sit on their panels and the professional staff that works with them in drafting and editing their reports.

Reputation

In every example so far (except about.com), the author, editor, or publisher has a well-established reputation. The observations about the quality of the materials begin with the reputation of the publisher. How can we trust anything without personal knowledge? Conversely, how can a new Web site establish a reputation for quality?

"The only way we can build reputation is over time."

Tom Bruce, co-director of the Legal Information Institute

"Our greatest achievement is coming out, on time, every month for five years."

Amy Friedlander, founding editor of D-Lib Magazine

These two observations are from people who have succeeded in creating Web sites that have established reputations for quality of material and reliability of service.

"Most universities wish to have an objective process for evaluating faculty and this is the best that can be done"

Caroline Arms of the Library of Congress has suggested that everything depends upon a chain of reputation, beginning with people we respect.[5] As students, we begin with respect for our teachers. They direct us to sources of information that they respect — books, journals, Web sites, datasets, etc. — or to professionals, such as librarians. As we develop our own expertise, we add our personal judgments and pass our recommendations on to others. Conversely, if we are disappointed in the quality of materials, we pass this information on to others, sometimes formally but often by word of mouth. Depending on our own reputations, such observations about quality become part of the reputation of the materials.

Volunteer Reviews

Reviews provide a systematic way to extend the chain of reputation. Reviewers, independent of the author and publisher, describe their opinion of the item. The value of the review to the user depends on the reputation of the reviewer, where the review is published, and how well it is done. A simple use of the Web is to distribute conventional reviews. The issue of D-Lib Magazine in Figure 10 includes a book review. The Web lends itself to novel forms of review, which can be called "volunteer review" processes. In a conventional review process, the editor of a publication carefully selects the reviewers. In a volunteer review process, anybody can provide a review. The publisher manages the process, but does not select the reviewers. Often the publisher will encourage the readers to review the reviewers. The reputation of the system is established over time, based on readers' experiences. The next three figures illustrate these new methods.

Figure 12. A book review from Amazon.com
Figure 12. A book review from Amazon.com

Amazon.com was one of the first Web sites to allow anybody to post a public book review. Figure 12 is an example. Over time, the reviewers have been divided into categories. This example is an editorial review; the reviewer is associated with Amazon.com. Other categories are customer and spotlight reviewers. A reviewer becomes a spotlight reviewer by a form of popularity test. At the end of Figure 12, readers are asked to vote, "Was this content helpful to you?" Reviewers who receive a sufficient number of "yes" votes are promoted to the category of spotlight reviewer and their reviews are given prominence.

Figure 13. Epinions.com
Figure 13. Epinions.com

The pioneer of the concept of voting for reviewers was probably Epinions.com. As seen in Figure 13, this site provides opinions of consumer products, submitted by readers. To encourage conscientious reviewing, the site has a complex process by which readers review the reviewers. Respected reviewers receive recognition, such as having their photographs added to the Web site.

Figure 14. An Internet Draft
Figure 14. An Internet Draft

Figure 14 is an Internet draft, a proposal to change the technical standards of the Internet. It is published by the Internet Engineering Task Force. Anybody can submit an Internet draft, which will be immediately mounted on the Web site, but for six months only. During this period it is discussed on an open mailing list. Comments on the mailing lists range from minor suggestions to brutal criticism. Every few months the people who are interested in the engineering of the Internet meet and vote on the current drafts. Successful drafts are forwarded through a process that eventually leads to an official publication, such as Figure 4. Unsuccessful drafts are withdrawn. In this way, every expert in the field gets to comment in detail and vote openly before a paper is accepted.

In each of these three examples — Amazon.com, Epinions, and the Internet draft process — the process is completely informal; anybody can write a review, anybody can vote. Yet in practice the level of review is remarkably high. Some people consider Epinions to be superior to Consumer Reports for the quality of its reports; the technical scrutiny given to Internet drafts exceeds all but the most exceptional peer review.

Automated Methods

The success of volunteer reviews shows that systematic aggregation of the opinions of unknown individuals can give valuable information about quality. This is the concept behind measures that are based on reference patterns. The pioneer is citation analysis (see, for example,[6]). More recently, similar concepts have been applied to the Web with great success in Google's PageRank algorithm.[7] These methods have the same underlying assumption. If an item is referenced by many others, then it is likely to be important in some way. Importance does not guarantee any form of quality, but, in practice, heavily cited journal articles tend to be good scholarship and Web pages that PageRank ranks highly are usually of good quality.

The basic concept of PageRank is to rank Web pages by the number and rank of pages that link to them. A page that is linked to by large numbers of highly ranked pages is given a high rank. Since this is a circular definition, an iterative algorithm is used to calculate PageRanks, and various modifications of the basic concept are needed to deal with the peculiarities of the actual Web.

Since the mathematical basis of citation analysis and PageRank are known, it is natural for people who wish to have their own work ranked highly to attempt to manipulate the data. For example, a group of authors could increase citation counts by repeatedly referencing each other's papers. It is undoubtedly possible to manipulate PageRank slightly, but the Web is so vast that it has proved hard to inflate the ranks excessively.

A promising area of current research is to combine automated ranking methods with language-based methods for estimating relevance. The research combines ideas from information retrieval, Web crawling and machine learning. Perhaps the most advanced work was done by the Pittsburgh company WhizBang![8] This company has unfortunately gone out of business, but it successfully demonstrated large-scale selection of relevant and high-quality materials with minimal human intervention.

Quality Control in the NSDL

The goal of the NSDL is to be comprehensive in its coverage of digital materials that are relevant to science education, broadly defined. To achieve this goal requires hundreds of millions of items from tens or hundreds of thousands of publishers and Web sites. Clearly, the NSDL staff cannot review each of these items individually for quality or even administer a conventional review process. The quality control process that we are developing has the following main themes:

  • Most selection and quality control decisions are made at a collection level, not at an item level.
  • Information about quality will be maintained in a collection-level metadata record, which is stored in a central metadata repository.
  • This metadata is made available to NSDL service providers.
  • User interfaces can display quality information.

As an example, consider the NSDL search service, which is currently under development. When a user submits a query, the search service will return a list of items that match the query, with ranks provided by the search engine. In displaying each item, the user interface can specify the collection from which the item comes and it can also display quality information from the metadata record. For example, one item may be labeled as from a reviewed collection of educational software, another as from an academic publisher selected by the NSDL staff, and another as found by a Web crawler without any human review of its quality. Usability studies will be carried out to see how best to display this information to the users and let them know how to interpret it.

Academic Reputation

How does a scholar or scientist build a reputation outside the traditional peer-reviewed journals? A few people have well-known achievements that do not need to be documented in journal articles. IBM's chief scientist moves to Harvard University; Cornell's physics department recruits the creator of the e-Print archives. Such examples are important but rare. A university needs considerable confidence in its judgment to make such appointments. More often, promotions are based on a mechanical process in which publication in peer-reviewed journals is central. Although it is manifestly impossible, most universities wish to have an objective process for evaluating faculty and this is the best that can be done. As the saying goes, "Our dean can't read, but he sure can count."

There are a few exceptions to this emphasis on peer-reviewed journal articles. The humanities have always respected the publication of scholarly monographs, and conference papers are important in some fields.

In very few disciplines other types of online publication convey academic respect; two have already been mentioned. The Internet RFCs are so important that authorship of a core RFC is valued highly in building a reputation as a leader in the field of networking. Even so, researchers who have written such an RFC will often publish a journal paper on the same topic when they are due for promotion. The paper has no value in scientific communication, but looks good on a resume. Many of the papers that appear in D-Lib Magazine describe research projects. A faculty member who is being reviewed for promotion or tenure puts together a set of papers that are sent to external reviewers for comment. As an external reviewer, I probably see more papers from D-Lib Magazine than from any other publication in the fields of digital libraries and electronic publishing. These are small examples, but they may suggest that a researcher can build a reputation in the online world, outside the conventional system of peer review.

Meanwhile we have a situation in which a large and growing proportion of the primary and working materials are outside the peer-review system, and a high proportion of the peer-reviewed literature is written to enhance resumes, not to convey scientific and scholarly information. Readers know that good quality information can be found in unconventional places, but publishers and librarians devote little efforts to these materials.

The NDSL project is one example of how to avoid over-reliance on peer review. Most of the high quality materials on the Web are not peer-reviewed and much of the peer-reviewed literature is of dubious quality. Publishers and libraries need to approach the challenge of identifying quality with a fresh mind. We need new ways to do things in this new world.


Acknowledgement

This work was partially funded by the National Science Foundation under grant number 0127308. The ideas in this paper are those of the author and not of the National Science Foundation.


William Y. Arms is Professor of Computer Science at Cornell University. He has a background in mathematics, operational research, and computing, with degrees from Oxford University, the London School of Economics and Sussex University. He has very broad experience in applying computing to academic activities, notably educational computing, computer networks, digital libraries, and electronic publishing. His career includes positions as Vice Provost for computing at Dartmouth College, Vice President for Computing at Carnegie Mellon University, and Vice President of the Corporation for National Research Initiatives. He has been chair of the Publications Board of the Association for Computing Machinery and editor-in-chief of D-Lib Magazine. Currently, he is on the Management Board of the MIT Press and the board of eCornell, Cornell University's new distance learning venture. MIT Press published his book, Digital Libraries, in November 1999. You may contact him by e-mail at wya@cs.cornell.edu.


Examples

The examples in the paper come from the following URLs.

Figure 1: About.com (http://www.about.com). [The astrology page is now located in the science subsection of the Homework Help part of this Web site (http://home.about.com/homework/).]

Figures 2 and 3: ACM Digital Library (http://www.acm.org/dl/).

Figures 2 and 3: ACM Digital Library (http://www.acm.org/dl).

Figures 4 and 14: Publications of the Internet Engineering Task Force (http://www.ietf.org).

Figure 5: Cornell University, Computer Science course Web sites (http://www.cs.cornell.edu/courses).

Figure 6: e-Print archives, arXiv.org (http://arxiv.org).

Figure 7: The New York Times on the Web (http://www.nytimes.com).

Figure 8: The National Weather Service (http://www.nws.noaa.gov).

Figure 9: American Memory at the Library of Congress (http://memory.loc.gov).

Figure 10: D-Lib Magazine (http://www.dlib.org).

Figure 11: The National Academy Press (http://www.nap.edu).

Figure 12: Amazon.com (http://www.amazon.com).

Figure 13: Epinions.com (http://www.epinions.com).

Figure 14: An Internet draft published by the Internet Engineering Task Force (http://www.ietf.org).


References


    1. Kevin M. Guthrie, "What Do Faculty Think of Electronic Resources." ALA Annual Conference Participants' Meeting. June 17, 2001. http://www.jstor.org/about/faculty.survey.ppt.return to text

    2. Lee L. Zia, "Growing a National Learning Environments and Resources Network for Science, Mathematics, Engineering, and Technology Education: Current Issues and Opportunities for the NSDL Program." D-Lib Magazine 7(3), March 2001. [doi: 10.1045/march2001-zia].return to text

    3. Presentation by Mark Doyle at the Workshop on the Open Archives Initiative and peer review journals in Europe, CERN, Geneva, March 22-24, 2001.return to text

    4. Kim Douglas, private conversation, June 2002.return to text

    5. Caroline Arms, private conversation, March 2001.return to text

    6. Eugene Garfield, Citation Indexing: Its Theory and Application in Science, Technology, and Humanities. Wiley, New York, 1979.return to text

    7. Sergey Brin and Lawrence Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine. Seventh International World Wide Web Conference. Brisbane, Australia, 1998. http://www7.scu.edu.au/00/index.htm return to text

    8. The WhizBang! home page is currently still available at http://www.whizbang.com/. Links to the underlying research can be found at the home page of Tom Mitchell of Carnegie Mellon University (http://www-2.cs.cmu.edu/~tom).return to text