
Peer Review: Reform and Renewal in Scientific Publishing
Skip other details (including permanent urls, DOI, citation information) :This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. Please contact : [email protected] to use this work in a way not covered by the license.
For more information, read Michigan Publishing's access and usage policy.
Challenges of Peer Review
Peer review has become ubiquitous in scholarly journals, being seen as the hallmark of a journal’s credibility and what qualifies research as science. Yet, as we have seen, peer review did not originate as the guarantor of the validity of research, and it is clear that it has not always performed that function effectively. For example, a study was conducted at the British Medical Journal (BMJ), taking a paper that was about to be published and introducing eight errors. The paper was then sent out to 420 potential reviewers, 221 of whom responded. None of the respondents spotted more than five of the errors; the average respondent identified only two errors; and 16 percent didn’t identify any errors at all (Godlee, Gale, and Martyn). However, it is unclear how many of the reviewers continued to review the paper after spotting the first two or three errors.
A similar conclusion is demonstrated looking at published papers. García-Berthou and Alcaraz (2004) found statistical inconsistencies in 38 percent of papers in Nature and 25 percent in the BMJ. The number of papers retracted every year for problems in methodology, for breaches of research ethics, or for fabrication of data is indicative of the fact that peer review does not, and cannot, be a truth sieve through which only valid science can pass (see http://www.retractionwatch.com/). John Ioannidis (2005) presented a probability algorithm to demonstrate the likelihood that most published studies are false. As well as methodological problems, factors such as conflicts of interests, prejudice, and perceptions concerning what are the “hotter” topics have served as ineffective filters for journals. These are problems of editorial peer review that, among other things, increase the likelihood that a journal will publish articles with false conclusions. If we accept that the majority of published work is incorrect and that the primary purpose of peer review is to prevent the publication of false and invalid research, then it is clear why many now question the system of traditional peer review.
One can say that peer review does serve a useful function in the filtering of submissions, citing research into the fate of rejected articles. Fifty to sixty-five percent of articles rejected from the Annals of Internal Medicine were published elsewhere, indicating that as many as 50 percent of submissions were simply unpublishable and rightly kept out of the scientific literature (Williamson).
If we were to set the standard lower and say that we expect peer review to improve the quality of papers, then the results are less clear. A study published in the Journal of the American Medical Association (JAMA) would seem to bear this out, concluding that “editorial peer review, although widely used, is largely untested and its effects are uncertain” (Jefferson et al., “Effects”). Similarly, a Cochrane review (2003, updated in 2007) of studies into peer review found “little empirical evidence to support the use of editorial peer review as a mechanism to ensure quality of biomedical research, despite its widespread use and costs” (also see Hopewell et al.).
However, the absence of study data about the effects of peer review is not the same as evidence that peer review has a null effect. Surveys of researchers consistently find that peer review is perceived as improving the quality of papers. A survey of 4,000 people conducted by Sense about Science found that 9 out of 10 authors believe that peer review improved their last published paper (Mulligan, Hall, and Raphael). This is the same proportion reported by a similar survey of 3,040 academics conducted by the Publishing Research Consortium (Ware; also see Lu). In the perception of authors and reviewers, the question is not whether peer review improves the quality of papers but to what extent and in what areas. In descending order, Ware (2008) found that authors perceived some improvement in presentation (94 percent), language or readability (86 percent), missing or inaccurate references (78 percent), scientific errors (64 percent), and statistical errors (55 percent). These opinions indicate that peer review does improve (or at least is perceived to improve) the quality of papers but is less significant in those areas that might be considered most integral—that is, questions of validity.
The effectiveness of peer review is also challenged by the fact that reviewers are not often unanimous in their recommendations. Rothwell (2000) found that the probability of two reviewers agreeing were only slightly better than chance. While using two to three independent reviewers is usually considered the standard for peer review, Rothwell concluded that you would need six reviewers per paper to produce a reliable result. Those journals that operate with only a single reviewer per paper—and there are plenty of such journals—are particularly vulnerable to the inconsistency in peer judgment. This discrepancy occurs not only in the general recommendation but in the specific requests for revisions. Those authors unfortunate enough to have new reviewers look at a manuscript they have revised can sometimes find themselves being requested to add in elements that the previous reviewers asked them to remove (or vice versa).
In addition to inconsistency, there are other perceived problems with peer review, including the speed of the process, financial cost, potential bias, instances of abuse, and difficulty of detecting fraud or misconduct (Williamson).
The speed of peer review, and thus the speed of publication, is often a surprise for the uninitiated. The fact is that peer review is conducted unpaid by academics or practitioners who have other, more rewarding, calls on their time: “A single peer review takes about four hours, but organizing two or three reviews takes on average four months or more” (Johnston). In the fast-paced area of medicine, for instance, waiting a quarter of a year or more for a first decision delays dissemination of significant, sometimes life-changing, results. While there are tools and techniques for reducing unnecessary delays, improving the speed of peer review will be difficult while it depends on reviewers donating their spare time.
The cost of peer review is a significant concern, especially for publishers in an age of limited growth in subscriptions and disruption to the market from open access and other forms of digital dissemination. The cost of the editorial process is between $90 and $600 per article, far more costly than copyediting or printing or other publishing services (Williamson). This cost is incurred even though reviewers are not paid; it includes editor honoraria, fees for electronic systems, and the salaries of editorial office staff, among other things. Were reviewers to be paid, it is unlikely that they would ever be paid equivalent to the time they devote to peer review, and in most cases, such reviewer payments would prove prohibitive to continued publication. Those journals that do pay reviewers often cover the costs by charging a submission fee.
Bias in peer review is more difficult to evaluate. Williamson (2003), for example, asserts that there is evidence of prestige bias, geographical bias, and gender bias. Other common concerns about peer review are the danger of confirmation bias (the tendency to read evidence as confirming your own preconceptions) and the conservatism of peer review, favoring consensus over proposals from outside the current paradigms. However, in a wide-ranging review examining such studies, Lee et al. (2012) questioned Williamson’s methodological assumptions and found little empirical evidence for these concerns about bias. Anecdotal evidence does indicate that bias and prejudice occurs, though it is difficult to determine how widespread it is and what impact it has on publication.
It is well known that peer review can be open to abuse. One common problem is that the line between authors and reviewers can actually become blurred. Some have encountered cases where a supervisor of the lead author was initially recommended as reviewer of a paper only to later claim to also be an author on the same paper. There have been other cases where reviewers (or even editors) have attempted to gain authorship credit on a paper by offering to collaborate with authors for a revised version. This problem of “gift authorship” can be preempted by adopting clear policies about what qualifies one as an author, but the incentive of an easy publication means that this form of abuse will always be a danger.
More recently, so-called peer review rings have emerged, where authors have created fabricated reviewer accounts that allowed them to subvert and manipulate the peer-review process. Hundreds of published articles were retracted after it became apparent that authors reviewed their own work or had a friendly third-party review done on their behalf (Oransky).
Other recorded forms of abuse would leave the casual observer incredulous, ranging from attempts by reviewers to plagiarize studies they review, “scooping” the original authors by rapidly publishing or releasing their data in a public forum such as an academic conference, and even giving authors the option to recommend reviewers (many journals struggle to find reviewers, which is why they sometimes ask authors for help—the potential for abuse is obvious, though). There are now agencies that offer to create fake e-mail accounts for unscrupulous authors to use for their recommended reviewers so that, if selected, these accounts can provide favorable reviews and increase the chances of publication. This problem highlights an overdependence on recommended reviewers by editors, who often struggle to get reviewers to agree, as well as an overdependence by editors on the recommendations of reviewers in preference to their own assessments of a paper.
In noting that reviewers are “almost useless” at spotting fraud or misconduct, Williamson is really only highlighting the fact that reviewers are limited by circumstance. Reviewers were not present when a research study was conducted, when the results were analyzed, or when the conclusions were authored. Often the only information the reviewer will have is the submitted paper. If the authors have misreported their methodology, omitted reference to their unethical conduct, or fabricated their data, how would the reviewer be able to tell? Sometimes a reviewer might be able to identify that the reported results look “fishy” or too good to be true, but accusations of misconduct would be hugely detrimental to a researcher’s career, and reviewers are unlikely to bandy such accusations lightly. To address this concern, efforts such as the Enhancing the Quality and Transparency of Health Research (EQUATOR) Network have emerged. The EQUATOR Network is an international initiative that promotes transparent and accurate reporting and wider use of robust reporting guidelines. Currently they provide information on more than 300 guidelines (http://www.equator-network.org).
A reviewer might spot other types of fabrication, such as image manipulation. But most lack the training and expertise to identify all possible forms of manipulation. Similarly, a reviewer can only spot plagiarized material if he or she happens to have read and remembered the prior publication. While reviewers can assist in spotting cases of misconduct, it is simply unrealistic to expect reviewers to identify such misconduct consistently (let alone infallibly).
Another threat to the peer-review system is the advent of “predatory” open-access publishers. These are sham journals that exist only to take advantage of the author-pays business model by publishing almost anything for a price. Most of these fake publications claim to conduct peer review but in fact do not. Young researchers from developing countries are particularly at risk of becoming the victim of predatory publications (Xia). Awareness of predatory open-access publishers has grown due the work of Jeffrey Beall, a librarian at the University of Colorado Denver, as well as some high-profile exposés. In 2010, Beall coined the term “predatory publisher” and created what has come to be known as “Beall’s List of potential, possible, or probable predatory scholarly open-access publishers” (Butler). In 2009, Phil Davis, a Cornell University doctoral student interested in investigating the increasing prevalence of scam journals, submitted a manuscript composed of computer-generated nonsense to a suspected predatory journal (Basken). The paper was accepted and the ruse exposed on the popular industry blog the Scholarly Kitchen. More recently, John Bohannon, a staff writer for Science magazine, targeted the open-access system in 2013 by submitting an intentionally deeply flawed paper to more than 300 open-access journals. Approximately 60 percent of those journals accepted the fake medical paper (Bohannon).
While some have tried to use the practices of predatory publishers to paint all open-access journals and all forms of peer review as illegitimate, it is important to note that most open-access journals are authentic and do carry out peer review in an effort to publish material that contributes to the body of scientific knowledge.
Different Perspectives on Peer Review
This process of sending manuscripts to independent experts for critical appraisal—peer review—has become a mainstay of academic publishing. Yet despite the perception that peer review is an essential minimum requirement for a journal, and despite general expectations that peer review will be fair to authors and ensure or improve the quality of papers, there is actually considerable variance in the way peer review operates.
According to the Committee on Publication Ethics (COPE), journal editors should make decisions based on importance, originality, clarity, validity, and relevance. Editors should ensure that peer review is fair, unbiased, and timely and that materials submitted are kept confidential while under review. They should also encourage reviewers to comment on ethics, including research misconduct and plagiarism. However, as to the specifics of peer review, COPE states that editors should adopt “peer review methods best suited for their journal and the research community it serves” (Committee on Publication Ethics 6).
According to the International Committee of Medical Journal Editors (ICMJE), journal editors should ensure that manuscripts are reviewed in a timely manner, that reviewers are not part of the editorial staff, and that peer review is “unbiased, independent, and critical.” The ICMJE also states, “A peer-reviewed journal is under no obligation to send submitted manuscripts for review, and under no obligation to follow reviewer recommendations, favorable or negative. The editor of a journal is ultimately responsible for the selection of all its content, and editorial decisions might be informed by issues unrelated to the quality of a manuscript, such as suitability for the journal. An editor can reject any article at any time before publication, including after acceptance if concerns arise about the integrity of the work.”
Again the ICMJE does not provide stipulations as to the specifics of peer review, recognizing that these will vary among fields and among journals. For this reason, they encourage journals to publish a description of their peer-review process (http://www.icmje.org/).
When considering editorial peer review across all disciplines and subject areas, there is indeed a great deal of difference in practice and process. Some examples are as follows:
Number of Reviewers. There is some intuitive appeal in the idea that more than one reviewer should be consulted to ensure the objectivity of the outcome. While many journals, perhaps the majority, usually require two reviews before making a decision, there are a significant proportion of journals that obtain only one review per manuscript. On the other hand, a significant number of journals require three reviews, or even more, as standard. As cited previously, Rothwell’s study indicates that six reviewers would be required to ensure a reliable majority agreement between reviewers, so there is no empirical basis for the preference for two reviewers. Two is probably the preferred standard simply because of the difficulty of acquiring more than two reviews in a timely manner.
Decisions without Review. Many authors now expect that their papers will be sent out for review because there is a general perception that this independent assessment is essential for a fair judgment of their manuscript. However, as quoted previously, the ICMJE (among others) places no requirement on editors to actually send papers out for review. Many journals operate a form of triage, called desk rejects, which dismisses a proportion of papers on submission. For some journals, desk rejects are limited to such reasons as being outside the scope of the journal, but for others, the editors are making an assessment about whether the paper will weather peer review or will definitely be rejected. This triage policy reduces the time authors wait to receive a negative decision, allowing them to submit to another journal more quickly and reducing the number of papers that reviewers need to be found for. However, the fact that not all papers are peer-reviewed and there is no unanimity about those circumstances under which a paper is not sent for review is indicative of the fact that it is editors, not reviewers, who are ultimately responsible for what is published.
Focus of Review. The perception that peer review is the standard for what is worthy to be published assumes that the focus of review is on validity and quality. While most, if not all, journals will ask reviewers to consider these issues, they are not the only factors on which reviewers might be asked to comment. Since many journals are only interested in publishing articles that will be highly cited (to improve their impact factor) or highly downloaded, reviewers might be asked to comment on the novelty or significance of a topic. Since many journals have constraints about page budgets, reviewers might be asked to comment on whether a paper is a high priority for publication. These other decision factors are symptomatic of the fact that journal publishing is not only concerned with maintaining the scientific record.
Objects of Review. While there is probably a general sense among journals as to what peer review entails, there is considerable variance as to what reviewers are expected to comment on. Expectations might include appropriateness of title, succinctness of abstract, relevance of images, quality of language and expression, appropriateness of methodology, validity of statistical analysis, and validity of conclusions, among many other things. Some journals give reviewers specific instructions, perhaps even a checklist or questionnaire detailing those areas to consider, but there are generally few checks and balances to ensure that reviewers actually conduct the review systematically and thoroughly. There is certainly no universally recognized standard as to what reviewers should (and should not) comment on.
Therefore, despite the perception that peer review is the minimum standard for journal publishing and should be a guarantee of the validity of what is published, in reality, peer review as it operates today is a somewhat amorphous and inconsistent practice. Yet this should be of no surprise given how peer review arose and developed. Editors turned to reviewers to supplement their opinions and to assist them in their task of filtering submissions for publication (Spier). Peer review was not established to remove responsibility for decisions from the editor or to act as the guardian of science or a safeguard against unethical or fraudulent papers. Peer review has never been a monolithic concept or standard. This diversity of practice is likely to increase.