Hyperlinks are uniquely bidirectional. Links that point forward to other pages are common knowledge, but it is only in the last several years that backlinks, or links that point to a resource rather than from it, have become a subject of study and interest. (Backlinks are defined here as links from other pages to webpage or URL x rather than the links that are contained in, or pointing away from, webpage or URL x.) As Soumen Chakrabarti and his colleagues point out, citations, since they emphasize the idea that all current knowledge builds on prior knowledge, are a “fundamental part” of academic writing. “By following citations,” the paper notes, “the reader is able to trace backward in time through the evolution of ideas that leads to the current work.” Until relatively recently, tracing citations was a unidirectional process — readers could find out where an idea came from, but not much more. Backlinks transform the process into a two-way street, showing not only where an idea comes from, but also where it is going – what level of attention or interest that a resource, an article, or a paper is receiving or generating; what influence it is having on those who encounter it; what intellectual fruit the seed of the idea is germinating; what intellectual ferment the idea is fostering.

For a closer look at backlinks, this article focuses on the Journal of Electronic Publishing (JEP). Started by the University of Michigan Press in 1995 and now published by the University of Michigan’s Scholarly Publishing Office, the Journal of Electronic Publishing (JEP) is a regular collection of online articles and essays examining the world of electronic publishing and its scholarly context. The publisher’s note for JEP describes it as “a forum for research and discussion about contemporary publishing practices, and the impact of those practices upon users.”

JEP featured a total of 197 unique articles between 1995 and 2002. Of those articles, almost exactly two-thirds (a total of 130) received at least one backlink. A total of 16% of the 197 articles received more than five backlinks. What follows is a brief analysis of (a) the backlinks to one of the most heavily-linked of those 32 articles and (b) the backlinks to the journal as a whole. This article concludes with some general observations about the implications of backlinks as a means of increasing the impact of electronic scholarly journals.

Theoretical Background

According to the deep Web research firm BrightPlanet, the World Wide Web (both surface content and “deep web” content) contained about 7500 terabytes of data as of 2000. Another estimate by the School of Information and Systems at UC Berkeley places the surface content of the World Wide Web at 170 terabytes as of 2002. The increasingly centralized importance of the search engine as an essential component of Web browsing has contributed to the transformation of the retrieval of specialized information into an everyday activity, even a “consumer commodity” (Mauldin 1997). At one point in the development of the Web, clustering (or, as Jon Kleinberg describes it, breaking down explicit nodes of information into cohesive and easily understandable subsets) and relevance were prime criteria for narrowing down the vast amount of information available on the Web. The explosion of the amount of information on the Web, and increased consumer demand for access to that information, have made it more difficult for users to filter and narrow down the information they are seeking based solely on those two criteria. Quality and authoritativeness have become ever more crucial determinants of a Web page’s importance (Kleinberg 1999).

Google, which employs an algorithm called PageRank as what it calls “the basis for all our Web search tools,” refreshes its Web-crawl index once a month and is currently crawling eight billion Web pages. The PageRank algorithm is based not only on the importance, or as Google terms it, the “popularity” of a Web page, but more significantly, on the importance of the Web pages that link to it (the selected page’s backlinks). As Sergey Brin and Larry Page have described their algorithm, “The probability that a random surfer visits a page is its PageRank,” with “random” meaning not hit-and-miss surfing but the activity of a surfer who clicks all of the links on a Web page, never hits the browser’s “back” button, eventually gets bored, and moves to another random page. (Brin and Page 1998.) A backlink's value lies in the fact that each backlink “represents an endorsement of the target page by the author of the source page” (Thelwall 2004). The theory that animates the idea of the “random surfer” is that the probability that a page chosen at random will be visited is increased if other pages have already linked to it. According to Brin and Page, “Intuitively, pages that are well cited from many places around the Web are worth looking at” (Brin and Page 1998).

A key difference between traditional citations and backlinks is that traditional citations almost always refer to sources created at an earlier point in time, whereas backlinks, which can be added to a Web page after its creation, in effect, point both backward and “forward” in space and time (Bar-Ilan 2002). Search engines like Google feature algorithms such as PageRank and other metrics to ensure “weighting” for pages that are more frequently viewed and linked to, in order to solve the trustworthiness problem. Even Google, with its eight billion-and-growing index, does not retrieve all (or even close to a totality) of the link pages that it indexes. Part of the rationale behind a “weighting” algorithm such as Google’s, however, is to give higher placement to Web pages and resources that are most frequently linked or pointed to by other pages, on the assumption that the more frequently a page is linked or pointed to, the more of an “authority” (Ding et al. 2004) it is on the subject matter it covers.

This rationale dovetails with the observation made by scientists that citations and links across the World Wide Web tend to follow a power law distribution, meaning basically that a small number of highly-linked pages have a greater impact than a much larger number of much-less-linked pages. This power law distribution tends to break down, however, in academic and scientific pages, where the distribution of links is much more of a mix of “preferential and uniform attachment,” as opposed to a popularity contest (Pennock et al. 2002). This observed phenomenon makes it easier to study the impact of backlinks in academic and scientific contexts.

Electronic journals, while not yet having supplanted the established network of formal journals with a historic presence in paper and a tradition of citation sourcing, have become increasingly critical as a scholarly medium (Sweeney 2000). Researchers increasingly rely upon Web publication to disseminate, share, and transfer information and findings (Fong et al. 2002).

A 2003 study by Liwen Vaughan and Mike Thelwall listed three major factors affecting a scholarly journal’s impact: content, age, and scholarly discipline. The more content a journal Web site contains, the longer it has had a stable Web address, and the more the content is perceived as adhering to the rigors of the discipline to which it belongs, the more impact the Web site will have (Vaughan & Thelwall 2003). Moreover, the hard distinction between citations and backlinks is not necessarily useful in gauging an electronic journal’s impact: “The citation may measure the impact within the immediate research community while [the] Web link may measure the impact to a wider constituency including practitioners and students” (Vaughan and Thelwall 2003).

As Aldrin Sweeney pointed out in the December 2000 issue of JEP, although the traditional view of citations has been that citations are an ironclad method of proving that an article is being read and paid attention to, given the wide audience of cited journals and the intense competition for entry to the market that those journals comprise, citations, tied as they are to a paper-based system of incentives and rewards, are increasingly becoming complemented by electronic journals and backlinks (Sweeney 2000). Furthermore, a universe of open-access electronic journals and online scholarly publication facilitates access to scholarly information, maximizes the impact of that information, and enhances the progress of intellectual inquiry (Lawrence 2001).

Yet in a one-year longitudinal study, Google failed to return between 48% and 70% of the links to the home page for the electronic-only journal Cybermetrics each of the four times it was queried (Bar-Ilan 2002). Another study by the same researcher noted the increasing prevalence of manipulation of backlinks by bloggers and other strongly “motivated users” (as opposed to casual or random Web surfers) to influence (either by artificially increasing or decreasing the number of backlinks) the page’s rank on Google, a phenomenon that would seem to militate against the reliability of link analysis as a tool for predicting the importance of a given Web page (Bar-Ilan 2006). While search engines provide the grease for the electronic citation wheel, their effect is not yet all that a scholar might wish. Nevertheless, along with the growing use of Google Scholar as a research tool, search engines are an importance resource in tracking the electronic dissemination of information, and well worth using, even with a skeptical eye.

Methodology

Using Google to analyze links to JEP has, as the caveats above imply, considerable methodological limitations. It might be possible to introduce a more rigorous protocol for link counting. One way might be to develop an independent Web-crawling mechanism. Another might be to institute an alternative document model that groups Web pages together heuristically and accounts for specific classes of link anomalies, such as by assigning similar pages that share certain group conceptual characteristics (such as domain, directory, or institutional origin) to the same document. A commonly accepted protocol for assessing scholarly journal impact, the JIF (journal impact factor), used by Thomson ISI, does not list JEP in its source title index, so it was not employed in this research. However, as Alastair Smith’s research indicates, the source title index does not reveal how often a journal is cited by the works that are listed in the source title index. Thus, an electronic journal may have significant impact and not appear in the ISI source title index or be listed with a JIF coefficient.

This article looks at each of the 197 unique URLs in the JEP for the period 1995-2002. This included not only featured articles but other regular features such as “Q.A.,” “Editor’s Gloss,” and the letters to the editor section, which was cumulatively published at one page location. I used the Google link:URL search for each article URL, which returns links only to the exact URL specified. For example, typing the string link:http://www.press.umich.edu/jep/07-01/bergman.html, returns results that will be backlinks only to the URL of the Michael Bergman article specified.

Results

Backlinks to JEP as a whole tend to be from lists of online resources (both academic and non-academic), from blogs on electronic publishing and the Internet, and from library and information science-related resources. The rank-order table below shows that of the ten most heavily weighted Google results for backlinks to the journal as a whole, most were in the academic category (including three from a university Web site, one from the Web site of the director of a university department, and one from a national engineering laboratory). One link came somewhat surprisingly from a personal blog that seemed primarily political in nature.

HIGHEST-WEIGHTED BACKLINKS TO JEP AS A WHOLE

Source

URL

Nature of link

CSA Illumina/PAIS International Electronic Journals List

http://www.csa.com/factsheets/supplements/paisejournals.php

Academic/scientific (Free online journal list)

UNC/Chapel Hill Computer Science Department

http://www.cs.unc.edu/PubsOffice/Resources.html

Academic/scientific (Printing, Publishing and Web Resources list on university Web site)

Prehled URL zajímavých pro chemiky

http://www.scitech.cz/stlinky.htm

Personal/scientific (“Internet URL connections, possibly interesting to chemists, compiled originally by O. Tokar, unfortunately for some, in Czech”)

John Unsworth curriculum vita

http://www.iath.virginia.edu/~jmu2m/really.brief.vita.html

Personal/academic (Links to JEP from CV of member of editorial board)

Links to Online Journals: Computers & Composition/Rhetoric

http://www.readerly-writerlytexts.com/Links.htm

Academic/scientific (List of online journal links from online rhetorical/pedagogical theory journal Readerly/Writerly Texts)

Flagrancy to Reason

http://www.flagrancy.net/?begin=30

Personal (Part of “Hangdogs and Bloodhounds” list on blog of systems administrator, developer, and political ranter Joshua Buermann)

Webgrafia Università degli Studi di Napoli Federico II

http://www.bfs.unina.it/firstlevel/portal/internet.htm

Academic/scientific (Internet-related links list on university Web site)

Revistas Electrónicas de Acesso Livre (Instituto Nacional de Engenharia, Tecnologia e Inovação)

http://193.136.150.133/ficheiros/1083338521_revistasacessolivre.htm

Academic/scientific (Free online journal links list on Portuguese state engineering laboratory Web site)

University of KwaZulu-Natal Library - Pietermaritzburg Campus Computer Science E-journals List

http://www.library.unp.ac.za/e-journalsComputerScience.htm

Academic/scientific (Computer science and free online journal links list on university Web site)

Memory Retention Studies (Jón Erlendsson)

http://www.hi.is/~joner/eaps/wh_memr.htm

Academic/scientific (Link from Web site of director of Scientific and Technical Information Services Department at University of Iceland)

The table below lists in rank order (according to number of unique and verifiable backlinks) the ten most heavily linked JEP articles during the 1995-2002 period. Of these articles, four of the top five were from 2001 and 2002, seeming to indicate a pattern of interest in the journal’s most recent content, with a possible trickle-down effect from that interest to content in earlier volumes. In addition, a pattern can be seen in the type of content most heavily linked: Articles bearing closely on subject matter in heavy currency among an audience interested in developments related to scholarly and electronic publishing, such as peer review of electronic papers, copyright, and the transition from text to hypertext, attracted significant numbers of backlinks. One reprint article, Vannevar Bush’s “As We May Think,” also attracted numerous links, most likely because of ongoing interest in it as a prophetic essay in an age of electronic information delivery.

TEN MOST HEAVILY LINKED JEP ARTICLES

Author

Title and URL (and number of unique links)

Date

Charles Bailey

“The Scholarly Electronic Publishing Bibliography”

http://www.press.umich.edu/jep/07-02/bailey.html

[27 links]

December 2001

Michael Bergman

“The Deep Web: Surfacing Hidden Value”

http://www.press.umich.edu/jep/07-01/bergman.html

[21 links]

August 2001

Rob Kling et al.

“Locally Controlled Scholarly Publishing via the Internet: The Guild Model”

http://www.press.umich.edu/jep/08-01/kling.html

[17 links]

August 2002

William Arms

“What Are the Alternatives to Peer Review? Quality Control in Scholarly Publishing On The Web”

http://www.press.umich.edu/jep/08-01/arms.html

[16 links]

August 2002

Vannevar Bush

“As We May Think”

http://www.press.umich.edu/jep/works/vbush/vbush.shtml

[14 links]

January 1995

William Strong

“Copyright in the New World of Electronic Publishing”

http://www.press.umich.edu/jep/works/strong.copyright.html

[13 links]

January 1995

Lloyd Davidson and Kimberly Douglas

“Digital Object Identifiers: Promise and Problems for Scholarly Publishing”

http://www.press.umich.edu/jep/04-02/davidson.html

[12 links]

December 1998

Hill Rosenblatt

“The Digital Object Identifier: Solving the Dilemma of Copyright Protection Online”

http://www.press.umich.edu/jep/03-02/doi.html

[12 links]

December 1997

Sharmilla Ferris

“Writing Electronically: The Effects of Computers on Traditional Writing”

http://www.press.umich.edu/jep/08-01/ferris.html

[11 links]

August 2002

Mindy McAdams and Stephanie Berger

“Hypertext”

http://www.press.umich.edu/jep/06-03/McAdams/pages/

[11 links]

March 2001

Many of the links to JEP articles are from the Scholarly Electronic Publishing Weblog [http://info4.lib.uh.edu/sepb/sepw.htm] and the Scholarly Electronic Publishing Bibliography [http://info4.lib.uh.edu/sepb/sepb.htm], updated by Charles Bailey, a JEP contributor. The most heavily linked article (with 27 unique links) is Bailey’s December 2001 article on the Scholarly Electronic Publishing Bibliography. The second most heavily linked article (with 23 unique links) is Bergman’s article on the deep Web.

For the Bergman article, Google indicates that it actually indexes 27 backlinks, but only 21 of those links are successfully retrieved when queried, with the remainder either dead or inaccessible from the surface Web. Of those 21, a total of 20 are unique links. Of the 20, one is from JEP itself. Of the remaining links, all but one of the ten most heavily-weighted are in the academic category, including links from conference blogs, other electronic publishing journals, and a digital library proposal.

HIGHEST-WEIGHTED FUNCTIONING BACKLINKS TO BERGMAN ARTICLE

Source

URL

Nature of link

Russian Digital Libraries Journal

http://www.elbib.ru/index.phtml?page=elbib/eng/journal/2003/part3/OLB

Academic/scientific (Citation in April 2003 D-Lib Magazine article, “Trends in the Evolution of the Public Web”)

DPC/PADI What's New in Digital Preservation

http://www.dpconline.org/docs/DigitalPreservation-WhatsNew-Issue1.pdf

Academic/scientific (Recent publication citation in May 2002 digital preservation newsletter)

Recherchen Blog

http://recherchenblog.ch/index.php/C8/P90/

Academic/scientific

(Swiss search-engine and Internet blog entry dated 5 November 2004 on “Deep Web reach” cites and links to Bergman)

Content Conference 2001 Presentations and Reports

http://www.contentsummit.com/conference/eventdetails01_day3.php

Academic/scientific (citation from November 2001 conference blog entry titled “Everything is text: the search engine”)

Enhancing Infrastructure for OAI: A Proposal to the Andrew W. Mellon Foundation

http://128.82.7.230/grid/ProposalAM.doc

Academic/scientific (citation in 2004 Open Archive Initiative proposal by Digital Library Research Group at Old Dominion University)

Blogika khod myslei

http://blog.seotext.ru/internet/129/

Academic/scientific (Russian search-engine and Internet blog entry dated 7 April 2005 on “A few impressive figures on the World Wide Web” cites and links to Bergman)

LLRX “Notes from the Technology Trenches”

http://www.llrx.com/columns/notes47.htm

Business/technical (15 November 2001 Cindy Curling column on “The Continued Need for Web Training” cites Bergman)

D-Lib Magazine

http://www.dlib.org/dlib/january02/kenney/kenney-notes.html

Academic/scientific (January 2002 “Preservation Risk Management for Web Resources: Virtual Remote Control in Cornell's Project Prism” cites Bergman)

Barista: Heartstarters for the Hungry Mind

http://barista.media2.org/?p=772

Personal/academic (3 July 2004 entry in Australian current events/political blog titled “Invisible web grows” links to Bergman)

A Note on Ausarbeitung Block 08

http://www2.iicm.edu/0x811bc833_0x00b182a2

Academic/scientific (Link from the Web site of the Institute for Information Systems and Computer Media at Graz University of Technology cites Bergman)

Conclusion

Backlinks to the most heavily viewed JEP articles tend to be academic/scientific in nature: links from university Web sites, links from specialty conferences, links from blogs dealing with the Internet or information science issues. This phenomenon puts the journal squarely in the mold of other electronic scholarly journals and inter-university links, which, for the most part, skew toward the informal exchange of intellectual information and inquiry, the incubation of a new theory (Wilkinson 2003), and the capture of what Blaise Cronin terms “liminal expressions of peer esteem, influence, and approbation” (Cronin 2001). Backlinks are to electronic scholarly publishing what the citation index is to traditional scholarly publishing, and algorithms like PageRank have made it easier to conduct analysis that builds a network of electronic citations that will, with time, be analogous to the network of citations in the traditional scholarly realm. In time, such automatic tools could be a boon for scholars ranking articles that have Web-only presences, and could help build cases for tenure and promotions.

For more information about PageRank, see http://www.google.com/technology/ and http://pr.efactory.de/e-pagerank-algorithm.shtml .

Sources

Bar-Ilan, J. (2002). How much information do search engines disclose on the links to a web page? A longitudinal case study of the ‘Cybermetrics’ home page. Journal of Information Science 28 (6): 455-466. [doi: 10.1177/016555150202800602]

Bar-Ilan, J. (2006). Weblinks and search engine ranking: The case of Google and the query “Jew.” Journal of the American Society for Information Science and Technology 57 (12): 1581-1589. [doi: 10.1002/asi.20404]

Brin, S. and L. Page. (1998). The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30 (1-7): 107-117. Text at http://www-db.stanford.edu/~backrub/google.html

Chakrabarti, S., B. Dom, S.R. Kumar, et al. (1999). Mining the Web's link structure. Computer 32: 60-67. [doi: 10.1109/2.781636]

Cronin, B. (2001). Bibliometrics and beyond: Some thoughts on web-based citation analysis. Journal of Information Science 27 (1): 1-7. [doi: 10.1177/016555150102700101]

Ding, Chris H.Q., Zha, Hongyuan, et al. (2004). Link analysis: Hubs and authorities on the World Wide Web. Society for Industrial and Applied Mathematics Review 46 (2): 256-268.

Fong, A.C.M., Hui, S.C., and Vu, H.L. (2002). Effective techniques for automatic extraction of Web publications. Online Information Review 26 (1): 4-18.[doi: 10.1108/14684520210418347]

Google Information for Webmasters (2006). http://www.google.com/webmasters/2.html.

Harries, G. et al. (2004). Hyperlinks as a data source for science mapping. Journal of Information Science 30 (5): 436-447. [doi: 10.1177/0165551504046736]

Kleinberg, J. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM 46 (5): 604-632. [doi: 10.1145/324133.324140]

Lawrence, S. (2001, May 31). Free online availability substantially increases a paper’s impact. Nature 411.

Mauldin, M. (1997, January-February.) Lycos: Design choices in an Internet search service. IEEE Expert 12 (1): 8-11. [doi: 10.1109/64.577466]

Pennock, David, Flake, Gary, et al. Winners don’t take all: Characterizing the competition for links on the web. Proceedings of the National Academy of Sciences of the United States of America 99 (8): 5207-5211.

Smith, A. (2005). Citations and links as a measure of effectiveness of online LIS journals. IFLA Journal 31 (1): 76-84. [doi: 10.1177/0340035205052651]

Sweeney, A. (2000). Tenure and promotion: Should you publish in electronic journals? Journal of Electronic Publishing 6 (2). [doi: 10.3998/3336451.0006.201]

Thelwall, M. (2004). Link analysis: An information science approach. Amsterdam: Elsevier Academic Press.

Vaughan, L. and M. Thelwall. (2003). Scholarly use of the Web: What are the key inducers of links to journal Web sites? Journal of the American Society for Information Science and Technology 54 (1): 29. [doi: 10.1002/asi.10184]

Wilkinson, D., et al. (2003). Motivations for academic web site interlinking: Evidence for the Web as a novel source of information on informal scholarly communication. Journal of Information Science 29 (1): 49-56. [doi: 10.1177/016555150302900105]