Reference citations in scholarly publishing are critical, providing measurability and discoverability. The problem is that reference quality and accuracy have been known weaknesses in scholarly publishing for many years. Most academic authors understandably pay more attention to their ideas than to their references. As a result, many references are inconsistent, incomplete, poorly structured, or just plain wrong. Perhaps at one time publishers might have assumed that informed readers (or their graduate students) encountering faulty references might dig deeper to find the cited works, but today we entrust computers to make the links. Despite continuing advances in reference-matching technology, for the most part reference-link quality reflects the expression “garbage in, garbage out.” This article looks at the importance of reference citations, the common problems, and the steps publishers are taking to solve them.

Reference citations in scholarly publishing have never been more important than in the age of electronic distribution. Not only do references cite relevant previous work, they also provide one of the most valuable functions in electronic publishing: an actionable, clickable link to more information about the citation. The link may be to additional metadata in a secondary source or to the full text of the cited reference from a library collection or a publisher’s Web site.

Actionable reference links are just the tip of an interconnected scholarly communication system. Almost from the earliest days of the Web, publishers have worked toward an interlinked body of knowledge and have envisioned links to gene sequences, chemical structures, multimedia, and Web site data in addition to the journal literature.[1] Increasingly, scholars are making their data available through databases such as GenBank, the Protein Data Bank, or the fMRI Data Center. In some fields, journals are beginning to require authors to submit data sets to these data depositories. Links between the literature and the data will enhance the value of each.

In an age of evolving publishing models, participants in the scholarly communications process are paying more attention than ever to measurable results. At the forefront of this emphasis on quantification are statistics that rely on accurate references and reference linking. Any problems with reference accuracy can underestimate the impact of an author’s work or of a journal. For example, the measurable impact of institutional repositories will increasingly depend on linking the content to other sources.[2] Studies of open-access journals have compared the impact (measured by citation analysis) of journals accessible under traditional subscription models with those that are free to readers.[3]

Thomson Scientific’s Impact Factor, developed in the 1960s, was the original measure of citation analysis to evaluate journal quality. The impact factor—and more recently developed citation measures, such as the h-index, proposed in 2005 by physicist Jorge Hirsch[4]—provide input to a host of decisions, from library journal selection, to tenure, to appointments, to grants. Citation analysis remains a growing and important part of the environment,[5] available not just through Thomson’s Web of Science and Citation Indexes, but also through comprehensive tools such as Scopus from Elsevier, Google Scholar,[6],[7] and discipline-specific secondary services such as the Smithsonian/NASA Astrophysics Data System (ADS).[8]

“reference quality and accuracy have been a known issue in journal publishing for many years.”

Clearly, reference citations are critical, enabling measurability and discoverability. The problem, of course, is that reference quality and accuracy have been a known issue in journal publishing for many years. Even before the prevalence of electronic journal distribution, reference studies exposed sloppy reference work and exhorted authors and journal publishers to do better.[9]

Such studies continue in a number of fields and report reference errors in published articles that range from 4% to 48%, depending on the community and journal being studied.[10],[11] According to Diane Berneath Lang, editorial manager of the journals division at the University of Chicago Press, “Difficulties in reference accuracy didn’t come to light until everything became electronic.”

What Readers and Authors Want

Studies of journal readers consistently show that reference linking is among the most highly valued features of electronic publications. A reader survey by Canada’s National Research Council reported that 59% wanted linking from citations to cited articles—second only to backfile availability.[12] Likewise, studies of academic and family physicians have shown that reference linking is among the most highly valued electronic journal features.[13],[14]

Authors are readers too, and also appreciate the convenience of linking. As an example, the journal Peptide Science includes in its list of author benefits a variety of linking services—including reference linking to and from Chemical Abstracts Service and Web of Science, as well as database linking to Cambridge X-Ray Group, the Genome Database, and the Protein Data Bank.[15]

Researchers also increasingly rely on linking variations like forward linking and multiple resolution. Forward linking identifies papers that have cited a particular article after the article has been published, providing a valuable navigation path to researchers. Multiple resolution links allow a single reference citation to point to more than one location for the same article. Multiple resolution helps readers get to the appropriate copies of cited articles, since they may have subscription access to a copy from one source but not from another.

Value Added

Author guidelines from many journals place the burden of reference quality squarely on authors. For instance, the author instructions for the journal Radiology say, “It is the responsibility of the authors to ensure the accuracy of all references. This accuracy is essential for Radiology Online: In the reference section of the online article, the hyperlinks to the referenced articles will not function unless the bibliographic information matches.”[16]

Despite this tradition, in reality, publishers do assume a large part of the burden of checking and correcting references. A report on the recent Web forum on access to primary literature, sponsored by Nature, provides a succinct list of added value by scientific publishers, including copyediting, reference checking, and linking.[17]

One might expect claims of enhanced value from publishers who charge for their content, but even open access publishers acknowledge the importance of copyediting and reference linking and are now paying for those services. Hindawi and the Public Library of Science (PLoS) are examples of highly visible open-access publishers who invest in these processes in order to ensure quality for their readers.[18],[19]

Many journals have their own reference styles. Some use numbered styles; some use alphabetical references. Some are based on standard reference styles, such as the AMA Manual of Style[20] or the Publication Manual of the American Psychological Association.[21] Most authors submit to journals with differing reference styles, and the tedium of changing from one style to another may not be the most pressing activity on a researcher’s to-do list. And, as we shall see, even if the style is correct, references may contain inaccurate data.

A good copyeditor, alone, or working in combination with an automated reference extraction and checking process, can catch a majority of reference errors, which means higher link-matching rates. As a result, the linked articles will see an increase in usage, and the metrics such as impact factor will increase for the target journals.

“references to resources that are available electronically may still be limited in accessibility if data errors prohibit seamless linking access”

Ten years ago, Clifford Lynch, director of the Coalition for Networked Information, speculated that journals not available electronically would become second-rate—not by their lack of quality, but by their lack of accessibility.[22] Similarly, even references to resources that are available electronically may still be limited in accessibility if data errors prohibit seamless linking access.

The Importance of Copyediting to Accurate Reference Linking

In the pre-electronic world, faculty members would frequently send research assistants to the library building to copy journal articles they discovered in an article’s references. If the references contained major errors, retrieving the original articles became difficult.

Reference errors today interfere more fundamentally with a researcher’s ability to retrieve a work. First, those errors cause increased costs. Despite the near-universal policy of charging authors with reference accuracy, publishers spend a substantial amount of their production time in providing quality control for references. Some publishers acknowledge the reality that authors might not be able to perform this task adequately. Schwartzman and his colleagues at the American Geophysical Union have stated that shifting the responsibility of reference accuracy from authors to publishers can allow authors to concentrate on scholarship, while publishers undertake the burden of creating accurate reference links.[23]

For most publishers, however, the responsibility is likely to continue as a shared load. Petit Ferrer, editorial-services manager for SPi’s Publishing Division, reports that for one publisher customer, 16% of the time in copyediting and post-editing was spent on the references. Similarly, in a recent study of manuscripts at Blackwell Publishing, Wates and Campbell found that a high percentage of authors’ changes in response to copyediting queries concerned reference clarifications.[24]

Citation reference linking is primarily concerned with links from the reference section of a scholarly article or book to the full text of the cited articles—whether in journals or proceedings. Many other links can be found in reference sections: links to secondary records (such as PubMed); links to books or theses; links to Web resources such as technical reports, author home pages, company Web sites, and so forth; and increasingly, links to data.

Citation references to full-text articles can be linked in several ways:

  1. Links to full text at the publisher’s (or aggregator’s) own site;
  2. Links to full text at another publisher’s site, implemented through digital object identifiers (DOIs) managed through the CrossRef linking service;
  3. Links to link resolvers or other OpenURL systems available from library-systems vendors;
  4. Links to bibliographic records, which may then lead to additional full-text services; and
  5. Links to author preprints posted on individual home pages or collected in institutional repositories.

All of these types of linking rely on the accuracy of the reference data to provide the best results. Below I look at how some organizations are handling reference accuracy.

Crossref Matching Issues

One of the metrics tracked closely at CrossRef is the query-matching rate, which measures the percentage of queries that find a matching DOI. CrossRef’s overall matching rate has been steadily increasing over the years. For example, Ed Pentz, executive director of CrossRef, reported that the match rate for 2007 was up 16% over 2006, to 40%.[25] Amy Brand, director of business development at the organization, says the query matching rate is dependent on a number of factors, including the content (breadth and depth) of material in the CrossRef database, CrossRef’s matching technology, and, of course, data quality. Data quality issues can be caused both by errors in the metadata of target articles deposited by publishers, and by errors in the references submitted as part of the CrossRef query process.

CrossRef staff have made improvements in all of the components of the equation, although the data-quality issues are somewhat beyond their control, as they accept deposits and queries prepared by a wide variety of publishers and agents. Still, CrossRef has added staff to support metadata-quality initiatives. The plan is to give more feedback to publishers on the journal metadata and reference query data they submit so that they may either correct data or improve their processes for higher-quality submissions in the future. “I still find it surprising and frustrating that there is no truly reliable source of metadata in our industry,” Brand says. “There will eventually be more machine-driven techniques to help create what we call trustworthy information.”

Typical Data Problems

Data problems that cause CrossRef queries to fail to match include inaccuracies in the year, problems with author names, and variations in journal titles. Koscher estimates that 50% of all failed queries may be caused by problems in journal titles. Another type of problem that can affect either author names or titles includes variations in character representations. As an example, a dash in a manuscript can be represented by many variations: a short dash, an ASCII dash, an en-dash, or a dash from a particular font set. The computer reads them all as different, although a human would not have any trouble interpreting them, even if they were used inconsistently. CrossRef is constantly revising its matching algorithms to catch these kinds of inconsistencies, but the number of possible variations is very large, and each code change requires a time-consuming cycle of discovery, implementation, testing, and resolution, Koscher says.

Alphanumeric page numbers can create another common type of reference data problem. In preparing references, an author might inadvertently leave off a letter in a page number that should contain both numbers and letters. Even the best copyeditor would not necessarily catch such a mistake without an automated reference-validation process.

“machines don’t deal as well with ambiguities as humans”

Koscher says that CrossRef’s matching technology is binary—either it finds a DOI or it doesn’t. Although fuzzy matching has been added to the technology, he notes that machines don’t deal as well with ambiguities as humans. One possible solution is to return multiple hits, where an editor can choose the appropriate answer. “The critical point,” Koscher says, “is that once the article is out of human hands, the world is very rigid. The dash variations are insignificant to humans, but they will stop a machine that doesn’t know about them. We can accommodate some problems by making technological improvements in the matching algorithms, but it’s a lot like playing ‘Whac-a-Mole’—several new problems pop up each time you fix one. Editorial processes are a far more effective way of dealing with the problem, and changes can be made instantly.”

Highwire Linking Experience

A journal article or other scholarly content often resides in interlinked publisher, subject, or other aggregations. One such collection, HighWire Press, analyzes references in each article and inserts toll-free links to other HighWire-hosted content, according to Helen Atkins, a journal manager at Highwire. As an optional service, Highwire offers publishers the ability to send any unmatched references after the first filter through subsequent matching processes. These processes create links to the publisher’s choice of additional services including CrossRef, bibliographic databases, and Web of Science. Highwire is also developing additional link types, including OpenURL outbound links

Although HighWire does not publish its matching rates, its publishers report that where time and effort are spent proofing references, higher link rates result. Some publishers incorporate DOIs, PubMed IDs, and other identifiers in the tagged references submitted to Highwire in order to increase reference-linking rates.

Pubmed Linking

Many electronic refereed articles in the biological and medical sciences are linked to PubMed, a service of the US National Library of Medicine (NLM). In addition, as we shall see, some publishers in the disciplines covered by PubMed also use its citations to validate references during production, or even as early in the process as peer review. PubMed provides a host of linking services that include full-text links to PubMed Central, publisher or aggregator sites, related articles, sequences in molecular biology databases, sections of textbooks, library holdings, and non-bibliographic resources. The combination of inbound and outbound links means that PubMed can act as a “hub through which you can link to and from the Web-based version of a journal and much other data.”[26]

A total of 75% of the metadata in PubMed is created from data supplied by publishers, and the remainder is processed using scanning and optical character recognition.[27] Although a quality-control process is in place to verify the accuracy of the records, any data collection activity of this magnitude will inevitably contain data errors. Still, many organizations use PubMed as a de facto authority file for bibliographic records in the fields of life sciences and medicine.

Workflow Options

So how are publishers incorporating the important step of reference checking and validation into their workflows?

Publisher Workflows

All publishers struggle with the problem of improving reference quality. Many of those interviewed for this paper have instituted automated processes to supplement the critical hands-on copyediting step.

For example, the current process at the University of Chicago Press (UCP) is for accepted manuscripts to be converted to a structured format. After a pre-editing step, references are checked against the CrossRef and PubMed metadata databases for author, journal, and page number information. At this stage, the process does not retrieve the DOI, as UCP does not print the DOI in its journals. Echoing the problems CrossRef’s Koscher identified, Diane Lang notes that the automated checking process against these metadata databases doesn’t handle Greek letters, italics, or special characters very well, so matches may fail even if the items are present in the databases. If the check against the databases results in a link, the copyeditor edits the reference for conformance to the appropriate journal style. Manuscript editors renumber references, add missing citation information, check existing links to make sure they go to the right place, and add missing links. They also standardize and link odd citations like legal references or letters or transcripts.

The Association for Research in Vision and Ophthalmology (ARVO) supports two distinct production workflows—one for the electronic-only open-access Journal of Vision (JOV), and another for the more traditional Investigative Ophthalmology & Visual Science (IOVS). JOV authors are required to provide their own links, which must lead to freely available versions of the articles. A production editor randomly checks the links for accuracy. Some 90% of JOV references are linked to full text or to PubMed, from which users can generally link to further sources. On the other hand, IOVS authors provide traditional reference sections that are cleaned up in the copyediting process. ARVO is in the process of migrating to the NLM XML DTD (a standard markup) for transferring files to HighWire, which then inserts links to PubMed, other HighWire-hosted articles, and CrossRef. Karen Schools Colson, ARVO’s director of publishing and communications, estimates that 95% or more IOVS references are linked.

The computer science publisher Association for Computing Machinery (ACM) has a slightly different twist on reference processing, according to Bernard Rous, deputy publications director. ACM matches vendor-supplied XML files—where the references are tagged as references but not yet parsed into reference components—with records from an internal metadata database using technology from a third-party source. The reference links that users see in the online version of an ACM journal article are actually dynamically generated from this database rather than displayed from the author-submitted references.

At Nature Publishing Group, XML-aware editorial software is used to clean up references and run queries against PubMed and CrossRef databases at the copyediting stage. A copyeditor puts unmatched references through an individual quality assurance step. Nature’s process builds reference sections from an internal RefMaster metadata database, which queries a range of citation databases in addition to PubMed and CrossRef. A pre-query filter normalizes titles to increase the match rate, which is 44%, according to Amanda Ward, Web production manager.

Vendor Workflows

So what goes on inside the “black box” of vendor-supplied copyediting services, and how does the process affect reference linking accuracy? According to SPi’s Ferrer, who manages the company’s copyediting operations for journal publishing in the Philippines, the workflow often depends on the publishers’ requirements. SPi’s Philippine operation has about 100 copyeditors in its Manila and Dumaguete facilities. Another group of copyeditors that focuses on books is based in Pondicherry and Chennai, India. In India and Manila a smaller number of freelance editors working on Web-based pre-submission editing supplement these teams.

In a typical customer workflow for SPi, the input is frequently a Microsoft Word file but can be whatever an author submits. The pre-editing or file structuring step results in a formatted and tagged manuscript, in preparation for copyediting and transformation to XML or SGML (as required by the publisher). At this stage, references are tagged to a granular level. SPiCE, SPi’s proprietary Word-based editing tool, allows editors to interactively use macros and menus to tag and clean the document. An optional step is reference validation against metadata databases, such as CrossRef or PubMed. To solve problems, editors can examine references that don’t link.

“Databases for reference validation add a useful step,” Ferrer says. “It would speed up the production process if all the references were complete and accurate during pre-editing. We find that even though the author bears responsibility to ensure that references are correct, in production there is still a lot of work to do, especially in terms of following the journal styles. References generate a lot of queries to authors. The earlier in the process the references are checked, the better it is. Typesetters shouldn’t have to work on them. Authors prefer to concentrate on their research and not worry about the references, but they are also very important.”

The next stage is for the tagged Word file to be copyedited. Because of the pre-editing and post-editing automation routines, copyeditors can concentrate on the core language editing tasks instead of spending a lot of time checking references. One of the reference tasks still left to the copyeditor is checking the completeness of citation article titles. In the post-editing stage, author queries are compiled into a document, and an editor looks over the reference list. Any incomplete references generate an author query.

A similar workflow takes place in book production, according to Rukhshad Banaji, who heads SPi’s editorial services in Pondicherry, India. There he and his staff provide end-to-end editorial and production services for STM and scholarly books. The workflow may vary from publisher to publisher, and even from book to book. The editorial process, like the process for journal production in Manila, consists of both pre-editing and copyediting phases. The pre-editing phase includes checking references. Reference callouts are checked against the citation list to make sure every callout corresponds to a reference and that every reference is cited in the text. References are also reordered to conform to the required style for the project, which may include reordering surnames and first names, replacing names with initials, and punctuating properly. The pre-editing phase uses a combination of fully automated tools and those that rely on operator interaction. As with SPi’s Journals team, SPiCE plays a major role in both the pre-editing phase and the subsequent copyediting of the manuscript.

After the pre-editing stage, when the copyeditor discovers any missing citations or citations with errors through a manual inspection, the editor uses additional resources such as Internet searches, CrossRef, PubMed, or Google Scholar to verify missing reference information. The copyeditor resolves as many issues as possible within the time and budget constraints of the project. PDF proofs are sent to authors with the queries compiled in the editing process. Banaji estimates that as many as 50% of the queries involve corrections to references, noting that large reference books have many more references, and thus more reference problems than other kinds of books.

Final PDF and XML files that conform to the DTD specified by the publisher include granularly tagged references. Banaji notes that references in books tend to represent a wider variety of content types than those in journals. References to proceedings, reports, and theses are common, in addition to references to journal articles. Tagging these differently structured references adds complexity to the problem.

Automation Tools

A number of tools are used by publishers and copyediting vendors, ranging from proprietary programs to licensed software.[28] A few of the more common options are discussed in this section.

CrossRef Simple Text Query

Available as a free tool for individual authors on the Web at http://www.crossref.org/freeTextQuery/, Simple Text Query is based on a customized version of eXtyles refXpress, which takes an unparsed text string, marks it up with XML, and passes it to the CrossRef system, which returns the DOI. Users paste in the references, which are capped at 10,000 characters. The DOIs that are returned are live hyperlinks. Simple Text Query works on the references regardless of their editorial style. At least one publisher has asked its authors to use Simple Text Query to incorporate DOIs in their submissions.[29]

Peer Review Systems

Both Manuscript Central and Editorial Manager have incorporated reference checking into pre-acceptance stages of manuscript processing. Aries Systems has partnered with Inera to incorporate its eXtyles technology into Editorial Manager.[30] Thomson ScholarOne has announced a pilot at Blackwell Publishing that integrates Web of Science with Manuscript Central to facilitate reference checking during the review process.[31]

Microsoft Word and Extensions

A number of publishers think that copyeditors are more productive working in a familiar environment such as Word. Several tools, including Inera’s eXtyles (http://www.inera.com/extylesinfo.shtml) and SPi’s SPiCE, make the most of both automation and human judgment by creating macros and menus that will automate reference processing and mark references in a way that makes it easier for a person to identify the components that are present and accurate in the familiar Word environment.

XML Editors

The most widely used XML editors among scholarly publishers include:

Additional editors are available, including some open source options, and these are discussed in a comprehensive review by van den Broek.[32]

Reference Processing

Automated tools to parse, tag, and match references can be built from text-mining technologies. Some organizations build their own tools, while others purchase them from vendors such as Parity Computing. Parity’s Reference Extractor technology automatically parses untagged references and normalizes them against an internal or public metadata database.[33]

This list is a sampling of tools. Publishers and service providers that specialize in electronic editing incorporate a number of these and other tools in ensuring reference accuracy.


If reference data accuracy is key to providing a better experience for readers and for authors, what are the best practices for improving it?

Publishers ranging in size from small societies to huge commercial operations have adopted a number of effective practices to improve reference quality and reference linking rates. Some publishers, whether small or large, have the technical resources to implement these practices in house. Others have outsourced the solution to organizations whose technical competencies complement their own internal strengths. Recommended actions include:

Structure Documents

In order to accomplish automated transformations for every stage in production where references are important—from manuscript submission through acceptance, copyediting, proofing, and online and print publication and distribution—a structured workflow based on XML will allow the automation of workflow tasks. XML allows transformations among different applications of the data. Well-formed XML can drive author proofs; create aggregator metadata deposits; be transformed for CrossRef deposits and reference queries; create print and online output; and, finally, provide files to archive services such as Portico.

References must be parsed and tagged in a detailed and granular fashion to maximize the benefit for reference citation applications. This tagging may occur through automation, computer-assisted, or manual methods. An XML workflow also allows publishers to create additional outputs or to check quality.[34],[35]

Check Reference Quality Earlier in the Process

Some organizations are incorporating reference validation and accuracy checks as early as the peer review stage. The ability of peer review systems to do reference checking can improve the quality of the peer review process, even for papers that may never be accepted for publication.[36] The process allows editors and reviewers to catch some of the problems of poor reference citation practices, as documented in the literature discussed previously.

CrossRef is also working with organizations to move reference matching earlier in the production process. This change in workflow allows time to correct data errors that may be causing reference mismatches. According to Koscher, reference checking is much more productive in the editorial phase than as a postproduction step. “The further upstream references get linked, the more powerful the process,” he says.

Separate Routine Cleanup from the Art of Editing

A number of required editorial changes to manuscripts, including references, represent technical changes for style, font substitution, adequate abbreviations, or order. Regardless of whether the more routine steps are automated (see below), separating the routine (and sometimes mind-numbing) tasks from those requiring the judgment, discretion, and art of a skilled editor allows both greater accuracy and greater creativity for more complex problems.

Automate Style Changes

The most successful reference processing methods incorporate some type of automation, whether Word macros, third-party tools, or batch processes, to normalize references for style. The degree of automation can vary from a fully automated batch process to a computer-assisted process where the software flags anomalies for a human to resolve. Publishers must, of course, adopt a consistent reference style.

A style change process will typically perform the following functions:

  • Identify the document type of the reference;
  • Normalize punctuation;
  • Replace journal names or abbreviations with standard journal title abbreviations;
  • Mark missing or suspicious reference elements;
  • Reorder references (alphabetical or numbered); and
  • Make author names consistent and accurate.

Incorporate Authority Control

A number of publishers incorporate an automated matching process to metadata databases. Some go as far as replacing author-supplied references with records from internal metadata databases; others supplement reference data in the manuscript with data from matching records in internal databases such as PubMed or other MEDLINE variations, CrossRef, ADS, or other bibliographic citation resources.

Take Advantage of Standards

In choosing reference styles, DTD, and reference policies, publishers are well advised to adhere to existing standards. A variety of metadata standards exist: an exhaustive survey is beyond the scope of this paper but can be found elsewhere in the literature.[37] In addition, organizations such as the Council of Science Editors publish helpful policies on editorial practices, such as the policy paper on citing references with group (sometimes called corporate) authors.[38]

Provide Training

Some publishers prefer to have copyeditors work in the native author-supplied file format (such as Word), which may or may not be enhanced with macros to help with reference editing. Others choose to adopt XML-based editing tools or typesetting software. Either choice can work well. The key to success is ultimately the editor’s familiarity with the editing environment. The training investment is likely to be greater for XML tools.[39]

Accuracy Now Leads to More Options in the Future

The world of scholarly communication has rapidly embraced linking technology for references, and it is poised for “richer mutual linking between journals and databases, and in a 10- or 15-year time frame...the rise of new kinds of publications that offer the best of both of these worlds,” according to Nature’s Timo Hanney.[40]

The resurgence of interest in evaluation metrics for journals and other scholarly communication vehicles like institutional repositories makes the accuracy of linking citations more critical than ever before for authors, for journal publishers, and for the academic community.

Carol Anne Meyer is Product Manager at Aries Systems, where she guides the development of the Web-based Preprint Manager production tracking system. She has 25 years of experience in trade, scholarly, and professional markets, predominantly in digital publishing. Her previous experience includes consulting with publishers and vendors in the scholarly publishing community. She has also served as Journals Publisher at the Association for Computing Machinery (ACM) and Director of New Media at Little, Brown & Company. She may be reached at cmeyer@ariessys.com.


This article was originally published as a white paper by SPi in September 2007. The author would like to thank SPi for supporting the paper, especially Frank Stumpf for the original idea, Sameer Raina, Petit Ferrer, and Rukhshad Banaji for thoughtful feedback, and the unnamed copyeditors who made invaluable suggestions and checked the references. Thanks also to Judith Turner and Peter Suber for their insightful reviews. All mistakes (including broken links and bad references) remain my own. Readers may find the original white paper at http://www.spi-bpo.com/content/71/17/1/Reference_Accuracy.pdf


    1. Karen Hunter, “Adding Value by Adding Links,” The Journal of Electronic Publishing 3, no. 3 (March 1998). [doi: 10.3998/3336451.0003.307]return to text

    2. Frank Scholze and Susanne Dobratz, “International Workshop on Institutional Repositories and Enhanced and Alternative Metrics of Publication Impact, 20–21 February 2006, Humboldt University Berlin, Report,” High Energy Physics Libraries Webzine 13 (October 2006), http://library.cern.ch/HEPLW/13/papers/2/.return to text

    3. Marie E. McVeigh, “Open Access Journals in the ISI Citation Databases: Analysis of Impact Factors and Citation Patterns: A Citation Study from Thomson Scientific” (October 2004), http://scientific.thomson.com/media/presentrep/essayspdf/openaccesscitations2.pdf. return to text

    4. J. E. Hirsch, “An Index to Quantify an Individual’s Scientific Research Output,” Proceedings of the National Academy of Sciences 102, no. 46 (November 2005): 16569–72. [doi: 10.1073/pnas.0507655102] return to text

    5. Eugene Garfield, “The Agony and the Ecstasy—The History and Meaning of the Journal Impact Factor” (International Congress on Peer Review and Biomedical Publication, Chicago, September 16, 2005), http://garfield.library.upenn.edu/papers/jifchicago2005.pdf. return to text

    6. Lokman I. Meho and Kiduk Yang, “Impact of Data Sources on Citation Counts and Rankings of LIS Faculty: Web of Science vs. Scopus and Google Scholar,” Journal of the American Society for Information Science and Technology 58, no. 13 (2007): 2105–25. [doi: 10.1002/asi.20677]return to text

    7. Nisa Bakkalbasi, Kathleen Bauer, Janis Glover, and Lei Wang, “Three Options for Citation Tracking: Google Scholar, Scopus and Web of Science,” Biomedical Digital Libraries 3, no. 7 (2006), [doi:10.1186/1742-5581-3-7] or http://www.bio-diglib.com/content/3/1/7. return to text

    8. Alberto Accomazzi, Gunther Eichhorn, Michael J. Kurtz, Carolyn S. Grant, Edwin Henneken, Markus Demleitner, Donna Thompson, Elizabeth Bohlen, and Stephen S. Murray, “Creation and Use of Citations in the ADS,” in Library and Information Services in Astronomy V (LISA V), Sandra Ricketts, Christina Birdie, and Eva Isaksson, eds. (San Francisco: Astronomical Society of the Pacific, 2007), author preprint http://arxiv.org/abs/cs/0610011. return to text

    9. Gerald de Lacey, Carol Record, and Jenny Wade, “How Accurate are Quotations and References in Medical Journals?” British Medical Journal 291 (1985): 884–86, http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1416756. return to text

    10. Ivan Krešimir Lukić, Anita Lukić, Vicko Glunčić, Vedran Katavić, Vladimira Vučenik, and Ana Marušić, “Citation and Quotation Accuracy in Three Anatomy Journals,” Clinical Anatomy 17, no. 7 (2004): 534–39, [doi: 10.1002/ca.10255]. return to text

    11. Robert Siebers and Shaun Holt, “Accuracy of References in Five Leading Medical Journals,” The Lancet 356, no. 9239 (October 21, 2000): 1445, [doi: 10.1016/S0140-6736(05)74090-3]. return to text

    12. Aldyth Holmes, “Publishing Trends and Practices in the Scientific Community,” Canadian Journal of Communication 29, no. 3 (2004), http://www.cjc-online.ca/viewarticle.php?id=839&layout=html. return to text

    13. Dario M. Torre, Scott M. Wright, Renée F. Wilson, Marie Diener-West, and Eric B. Bass, “What Do Academic Primary Care Physicians Want in an Electronic Journal?” Journal of General Internal Medicine 18, no. 3 (March 2003): 209–12, [doi: 10.1046/j.1525-1497.2003.20529.x] return to text

    14. Dario M. Torre, Scott M. Wright, Renée F. Wilson, Marie Diener-West, and Eric B. Bass, “Family Physicians’ Interests in Special Features of Electronic Publication,” Journal of the Medical Library Association 91, no. 3 (July 2003): 337–40, http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=164396&rendertype=abstract. return to text

    15. American Peptide Society Publications, Peptide Science Web page, http://www.americanpeptidesociety.org/publications/journal.asp. return to text

    16. Radiological Society of North America, Radiology Publication Information for Authors, “References,” http://www.rsna.org/publications/rad/PIA/preparation/references.html. return to text

    17. Nature Web Focus, Experiments in Publishing, “Access to the Literature: The Debate Continues,” http://www.nature.com/nature/focus/accessdebate/35.html. return to text

    18. Paul Peters, email to LibLicense mailing list, February 5, 2007, “Hindawi Adds Ten New Titles to its Open Access Collection,” Hindawi news release, www.library.yale.edu/~llicense/ListArchives/0702/msg00031.html. return to text

    19. PLoS Biology Guidelines for Authors, http://journals.plos.org/plosbiology/guidelines.php. return to text

    20. C. Iverson, S. Christiansen, A. Flanagin, et al. AMA Manual of Style: A Guide for Authors and Editors, 10th ed. (Oxford; New York: Oxford University Press, 2007).return to text

    21. Publication Manual of the American Psychological Association, 5th ed. (Washington, D.C.: American Psychological Association, 2001).return to text

    22. “Networked Information: Finding What’s Out There: Clifford A. Lynch Interview,” Educom Review 32, no. 6 (1997), http://www.educause.edu/apps/er/review/reviewArticles/32642.html. return to text

    23. Alexander B. Schwarzman, Hyunmin Hur, Shu-Li Pai, and Carter M. Glass, “XML-Centric Workflow Offers Benefits to Scholarly Publishers” (XML 2004 Conference & Exhibition Proceedings, November 15–19, 2004), p. 1, http://www.idealliance.org/proceedings/xml04/abstracts/paper71.html. return to text

    24. Edward Wates and Robert Campbell, “Author’s Version vs. Publisher’s Version: An Analysis of the Copy-Editing Function,” Learned Publishing 20, no. 2 (April 2007): 121–29, [doi: 10.1087/174148507X185090] return to text

    25. Ed Pentz, “Key Statistics,” CrossRef Newsletter (February 9, 2008), http://crossref.org/01company/10newsletter.html#anchor5. return to text

    26. Kent A. Smith and Ed Sequeira, “Linking at the US National Library of Medicine,” Learned Publishing 14, no. 1 (January 2001): 27, [doi: 10.1087/09531510125100232]. return to text

    27. “Keyboarding Ceases as a Data Creation Method for MEDLINE® Citations,” NLM Technical Bulletin 339 (July–August 2004): e5, http://www.nlm.nih.gov/pubs/techbull/ja04/ja04_keyboard.html.return to text

    28. Nancy Wachter, “Editing Tools that Help to Streamline the Publishing Process,” Science Editor 27, no. 5 (September–October 2004): 155, http://www.councilscienceeditors.org/members/securedDocuments/v27n5p155.pdf. return to text

    29. Carol Anne Meyer, “CrossRef Annual Meeting: Building on Success” (Cambridge, Mass., November 1, 2006), summary (December 8, 2006), p. 5, http://www.crossref.org/10meetings/2006_mtg_summary.pdf. return to text

    30. Aries Systems, “Aries Systems and Inera Collaborate to Reduce the Cost and Time of Manuscript Publication,” news release, November 17, 2004, http://www.editorialmanager.com/homepage/press-releases/200411.pdf. return to text

    31. Thomson Scientific, “Thomson Scientific, ScholarOne and Blackwell Publishing Collaborate to Offer a Simple Manuscript Review Process,” news release, July 10, 2006, http://www.scholarone.com/06-07-10_press1.shtml. return to text

    32. Thijs van den Broek, “Choosing an XML Editor,” 2004 (updated 2005), http://ahds.ac.uk/creating/information-papers/xml-editors/. return to text

    33. Parity Computing, “Reference Processing and Linking: Improve margins by enhancing usability,” http://www.paritycomputing.com/web/solutions/reference_processing_linking.html. return to text

    34. Wendell Piez, “XSLT for Quality Checking in a Publication Workflow,” XML Conference & Exposition 2003 Proceedings (December 7–12, 2003), http://www.idealliance.org/papers/dx_xml03/papers/04-04-02/04-04-02.html. return to text

    35. Schwartzman et al., “XML-Centric Workflow Offers Benefits to Scholarly Publishers,” p. 8.return to text

    36. Majied Robinson, “Beta, Collaboration and Workflow Tools: The STM Publishing Survival Kit,” EPS Insights, February 8, 2006.return to text

    37. Cliff Morgan, “Metadata for STM Journal Publishers: A Review of the Current Scene,” Learned Publishing 17, no. 1 (January 2004): 31–37, [doi: 10.1087/095315104322710232] return to text

    38. Council of Science Editors, Editorial Policies, “CSE Recommendations for Group-Author Articles in Scientific Journals and Bibliometric Databases,” January 28, 2006, http://www.councilscienceeditors.org/editorial_policies/groupauthorarticles.cfm. return to text

    39. Schwarzman et al., “XML-Centric Workflow Offers Benefits to Scholarly Publishers,” p. 9.return to text

    40. Timo Hannay, “Transforming Scientific Communication,” Towards 2020 Science (Microsoft, 2005), p. 19, http://research.microsoft.com/towards2020science/downloads/T2020S_ReportA4.pdf. return to text