When it comes to digital content, access and preservation are two sides of the same coin. Without ongoing efforts to ensure preservation, access services falter and become uncertain; without clear evidence of access and supporting services, the mandate to preserve materials loses its force. At the HathiTrust Digital Library this understanding of digital preservation has influenced development of the repository and services as well as HathiTrust’s mission “to contribute to research, scholarship, and the common good by collaboratively collecting, organizing, preserving, communicating, and sharing the record of human knowledge.”[1] Preservation and collection of digital materials is tied to access to and use of the content. One key facet of the access services that HathiTrust provides is to users who have disabilities (such as blindness, dyslexia, physical or cognitive impairments, etc.) that prevent them from being able to easily read printed material (disabilities commonly referred to as “print disabilities”). Outlined below are the strategies that HathiTrust has implemented to ensure that users with print disabilities can use the website and access digital materials in the library corpus, activities driven by the mission of HathiTrust as well as its short, albeit eventful, history.

What is HathiTrust, in brief?

HathiTrust is, first and foremost, a collaboration of libraries and research institutions that work together to preserve the digital content in the HathiTrust collection long into the future. Over 100 institutions, located primarily in the United States, but also Canada, Australia, Spain, and Lebanon, are part of the membership.[2] Under the aegis of the University of Michigan, members shape the mission and activities of HathiTrust by voting on board members, approving the budget, and participating in discussions around activities through annual member meetings and working groups. Members are also responsible for the development and management of the repository.[3] The main body of materials, over 13 million volumes, is comprised of scanned copies of printed books held in academic institutions around the world. We strive to provide some level of access to all materials, and 38%[4] of content is in the public domain either in the United States or worldwide and can be accessed in full over the Internet.

A mission to provide access

Since its founding in 2008, HathiTrust has incorporated access for individuals who have print disabilities into its mission and planning. In the first iteration of the functional objectives as created in 2008, providing access mechanisms for persons who have disabilities (which included designing an accessible interface and establishing access to copyrighted materials) was one of the top short-term priorities.[5] To emphasize its particular aim to increase access to all users, the organization’s bylaws, approved by the membership in 2012, include the statement that “HathiTrust is organized to [...] to dramatically improve access to these materials in ways that, first and foremost, meet the needs of the co-owning institutions, with a particular emphasis on ensuring access for individuals who have print disabilities.”[6] Accessibility and providing services to users with print disabilities are thus written into the very mission and goals of HathiTrust.

HathiTrust operates within a long history of libraries providing services for users who are blind or have other print disabilities. In the United States, the Library of Congress established in 1931 a nationwide program, the National Library Service for the Blind and Physically Handicapped, which circulates materials for such users. Through this program, patrons have had access both to books in accessible formats as well as to the devices needed to access them. The UNESCO Manifesto for Public Libraries of 1994 stated, “The services of the public library are provided on the basis of equality of access for all, regardless of age, race, sex, religion, nationality, language or social status. Specific services and materials must be provided for those who cannot, for whatever reason, use the regular services and materials, for example linguistic minorities, people with disabilities or people in hospital or prison.”[7] Libraries and other authorized entities in the United States have legal protection through the Chafee Amendment, Section 121 of the Copyright Law of the United States (17 USC §121),[8] which codifies and authorizes “authorized entities” to reproduce and distribute materials in accessible formats for use by users with print disabilities. Libraries have long understood that their role in providing access to information includes serving all of the population. The ARL Code of Best Practices in Fair Use for Academic and Research Libraries, guidelines created to help libraries understand what the US Copyright Law allows them to do, notes “making library materials accessible serves the goals of copyright, not to mention the goals of a just and inclusive society.”[9]

A mission tested in court

Because of its stated mission to provide access to users with print disabilities and because of the potential that the HathiTrust Digital Library has to transform the ability of individuals with print disabilities to engage in their communities, when HathiTrust was sued by the Authors Guild in 2011, the National Federation of the Blind (NFB) intervened as a defendant in the case.[10] Testimony by Marc Maurer, president of the NFB, revealed that there are a very limited number of books available to individuals with print disabilities. The National Library Service for the Blind and Physically Handicapped offers only 52,000 titles a year and has the capability to produce only 2,000 new titles every year; Bookshare, a nonprofit which is also an authorized entity under the Chafee Amendment, provides access to tens of thousands of titles.[11] Students at large universities may also have access to some services designed for students with disabilities, such as university offices that will scan books on demand for the students; however, as Georgina Kleege said in her testimony, such an office often “cannot timely process all requests for accessible print materials. Although the people in [such an] office are highly skilled and educated to provide training and counseling for students, they spend the majority of their time scanning books for print-disabled students.”[12]

The HathiTrust Digital Library contains 13+ million books, the majority in accessible formats (see section “How users with print disabilities interact with open content”). Even for users who are not affiliated with a member and thus do not have access to copyrighted materials, they still have access to over 5 million volumes that are open to the public worldwide. Perhaps for the first time ever, users with print disabilities have the option of accessing the wealth of human knowledge and heritage that people who do not have print disabilities have taken for granted.

In 2011, the Authors Guild and other associated parties filed a lawsuit against HathiTrust for copyright infringement, based on the practice of member libraries of scanning materials without seeking permission from rights holders. Among other things, they sought “the impoundment of all unauthorized digital copies within Defendants’ possession,”[13] meaning the seizure of all materials believed to be copyrighted. The Hon. Harold Baer, Jr., judge for the United States District Court for the Southern District of New York, granted summary judgment in favor of HathiTrust in 2013. His opinion focused on the transformative nature of full-text search as well as of providing access to those with print disabilities. “Print-disabled individuals are not considered to be a significant market or potential market to publishers or authors....As a result, the provision of access for them was not the intended use of the original work (enjoyment and use by sighted persons) and this use is transformative.”[14]

The Authors Guild appealed the case, but yet again, the court found in favor of HathiTrust. In the opinion, written by Barrington D. Parker, judge of the Second Circuit Court of Appeals, the Circuit Court disagreed that providing access to users with print disabilities was a transformative fair use; however, they found that it was a traditional fair use, which means that HathiTrust could continue to provide accessible access to users with print disabilities without seeking authorization from copyright holders. The court also let stand Judge Baer’s determination that the University of Michigan was an “authorized entity” under Section 121 of the Copyright Act—giving HathiTrust two strong codified bases to continue to provide accessibility services. Moreover, the court opinion quotes the Americans with Disabilities Act in stating that “our Nation’s proper goals regarding individuals with disabilities are to assure equality of opportunity, full participation, independent living, and economic self-sufficiency for such individuals.”[15]

Although the Authors Guild sought to challenge the legitimacy of the HathiTrust Digital Library, the lawsuit ended up providing clear legal support for HathiTrust’s mission and services. In light of this mandate, HathiTrust has continued to build upon existing services and to seek out new ways to serve this user group.

Strategies and methods for providing access

When designing the HathiTrust website and services for users with print disabilities, HathiTrust staff took several approaches to enabling access. The website serves as a portal to the corpus, much like the building, the shelves, and the infrastructure of a bricks-and-mortar library serve as access mechanisms to the books on the shelf. There are certain requirements that a bricks-and-mortar library must meet in order to ensure that all patrons are able to get around the building, and in the same way, the website must be designed and built in order to enable users who are accessing it via assistive technology to meet their goals and to complete their desired activities. In addition to accessible website design, HathiTrust staff had to ensure the accessibility of user interactions with the digital content in the corpus, taking into consideration the challenges presented by the formats of materials, and also create a method of providing privileged access to copyrighted materials for a small set of eligible users.

Designing the website for accessibility

The HathiTrust repository was initially conceived as a preservation environment; access systems were gradually built on top of the repository. A page-viewing application and a collection-building application were built first, in order to allow users to interact with the content. Then an information website was built as the first access point by which users could learn about HathiTrust. Over time, other applications were added as well, such as the catalog and full-text search. Our developers worked to make the tools accessible as they were built, but as systems became more complex, a more intentional approach was needed to ensure that the user experience was consistent across the various tools.

In 2012, a large project was undertaken to redesign the HathiTrust website and all of the various applications that together comprise the HathiTrust system. A significant part of the redesign process was designing for accessibility and ensuring that changes met a pre-determined set of standards. All developers were trained on accessibility requirements prior to development, and time for accessibility testing was built into the deployment timeline. As various parts of the HathiTrust system were redesigned and rebuilt from the code up, developers tested the tools against two accessibility guidelines:

  • W3C Web Content Accessibility Guidelines (WCAG) 2.0[16] at level A. The WCAG 2.0 provides recommendations as well as success criteria built around 4 principles of accessible design, which mandate that content must be perceivable, operable, understandable, and robust.
  • Section 508 of the United States Rehabilitation Act of 1998.[17] Section 508 provides basic accessibility requirements for a variety of electronic technologies, including websites.

A comprehensive “accessibility audit” was maintained throughout the process, providing an overall view of the areas where the website passed and areas that needed work. The new site launched in April 2013, and ongoing development of the site continues to be checked against the accessibility guidelines.

Accessibility and accessing the digital corpus

How users with print disabilities interact with open content

The HathiTrust Digital Library provides full and open access to around 5.5 million volumes (38% of the collection) and limited access to 8.3 million volumes. Books are open access for a variety of reasons.[18] For example, some materials are in the public domain because copyright has expired, and authors and rights holders have opened other materials to the public with an assortment of licenses. All users are able to directly access the open materials on the HathiTrust website.

When accessing HathiTrust materials, the majority of users automatically see images of the printed page. When users with print disabilities access HathiTrust materials, however, they often choose to access the plain-text version of the book’s text as opposed to the images of text; a user who has disabled styles in their browser will view a text-only representation of the page as well. This plain-text version has numerous benefits. A user is able to change the way his or her browser displays the plain text, for example, by switching to a high contrast color scheme, or by increasing the size of font. Some users employ screen readers (software that detects text in a browser and provides it to a user in a different way, such as text-to-speech or a Braille display) to navigate and read the content; a screen reader would skip over the images but would be able to access and interpret the plain text.

For nearly all items in HathiTrust, the plain-text version of the books is created from the scanned images of a print volume through optical character recognition (OCR). This process searches within the scanned image of a page, recognizes the text, and outputs it in a format that is machine-readable. Errors that negatively impact a user’s experience can occur during the OCR process. For example, the quality of the original source material may be poor (e.g., faded or obscured text) or otherwise difficult for current OCR technologies (e.g., text in fonts that are difficult to decipher or represented in tabular format). The particularities of printed text may cause difficulties, as in the case of words that are split in two and hyphenated for printing purposes. OCR engines themselves are not 100% accurate. In some exceptional cases, OCRed text may not even exist for materials in HathiTrust, as with handwritten manuscripts or materials in non-Roman alphabets.

Errors in the OCR can especially cause difficulty for users using screen readers, which would mispronounce the word and possibly render it unidentifiable to the user. Users can alert HathiTrust staff to significant errors in OCR quality by submitting a feedback form (links to the form are in the header and the footer of the website). User support staff work with content providers and in many cases are able to obtain corrected or higher quality versions of materials.

Another tool that has been included in the design of the page-viewing application is the ability to use access keys when navigating a book. Access keys are keyboard shortcuts that allow a user to navigate without the use of a mouse. Users can skip the site navigation, switch to the OCRed view of the book, flip pages, and go to the beginning or end of a book.[19]

How users with print disabilities access copyrighted content

Access to copyrighted content is restricted to designated individuals at member institutions who serve as proxies for eligible users. The proxies access and download content on behalf of their users with print disabilities and then provide the downloaded material to their users. Such access provides additional security to ensure that copyrighted works are accessed only by eligible users.

An eligible user with a print disability (i.e., one who has been certified at his or her institution as having a print disability of such severity that such access is appropriate) accesses copyrighted materials by searching for books of interest in HathiTrust and then sending a request for the volume to the designated proxy at their institution. Because users with print disabilities are only able to access content that is currently or has been previously held at their institution, before beginning their search, it is best that they log in so that the system knows their affiliation. They can then limit their search results to materials held at their institution by selecting the appropriate checkbox in the full-text search advanced search screen. Having received a request from the student or faculty member, the proxy can then log in, download the requested materials, and share them with the student or faculty member. For many universities that provide services to students and faculty with disabilities, this can drastically reduce the staff time spent on manually scanning books cover-to-cover.

Several pieces of infrastructure must be in place in order to provide users with print disabilities with access to copyrighted materials. First and foremost, institutions must have set up authentication via Shibboleth, the mechanism used by HathiTrust for authentication, in order to allow their users to log into HathiTrust services with their institutional IDs. As part of the partnership process, partners are required to submit data about their print holdings (i.e., the books they currently hold and have held in the past in their collections). An official proxy must be selected by the institution and must be set up for privileged access to materials. HathiTrust uses a registration process and additional verification for these proxies, in order to increase security around access. In addition, institutions must have an existing process in place for determining which users are eligible for access to copyrighted materials.

Beyond accessibility

The information in the previous section describes the work that HathiTrust has done in providing access to materials and ensuring the accessibility of its content. As an organization whose services are accessed primarily on the Internet, basic guidelines of accessibility are followed. In addition, there are a number of activities that surpass mere accessibility in order to expand the usefulness of our services. Detailed next are the activities that HathiTrust has undertaken and can continue to focus on. Because the Internet is a constantly changing environment and because HathiTrust serves such a wide range of users, there is always more that could be done.

HathiTrust has done work in the following areas and will continue to pursue these activities. Any academic or research organization interested in going beyond the basics of accessibility could do the same.

  • Continue and broaden user testing. Because there are many kinds of disabilities that prevent a user from being able to access print media, in order to fully understand how users with print disabilities access the website, staff should work with a broad range of users. Finding such users may be difficult, but academic institutions can begin by communicating with the services offices for students who have disabilities on their campuses.
  • Learn about and implement web accessibility guidelines. Technology changes rapidly, and developers must constantly assess how accessible new technologies are.
  • Stay current with web accessibility guidelines and strive to implement guidelines at more advanced levels. For the WCAG 2.0, there are three levels of success criteria. Level A is the minimum requirement for making a website useable; Levels AA and AAA are intended to increase the ease with which users can retrieve information and navigate the website.

Based on user feedback and conversations with partners, HathiTrust has identified the following issues that, when addressed, would expand the impact and usefulness of our services.

  • Improve OCRed text. This is a complex problem, in part because of the immense number of volumes in HathiTrust, and because automated techniques can only do so much. Conversations are ongoing in the digital humanities field about this problem, which impacts many other organizations and projects.
  • Increase collaboration with nationwide organizations that represent users with print disabilities in order to provide access to these users. HathiTrust’s status as an institution under Section 121 increases possibilities for ways that access could be expanded, and collaborations are key to reaching relevant populations.
  • Increase communication and outreach to communities of individuals with print disabilities in order to ensure they are aware of the opportunities offered through HathiTrust, how to use our services, and what is available. This also ties into the point above regarding doing user testing, as increased interaction with a community provides increased opportunity for valuable data about user needs and habits.

Through legal protections and a history of action, libraries have long engaged in providing services to users who have print disabilities. Although a relative newcomer to the field, HathiTrust has been able to expand traditional library services in a relatively short time by providing access to a large corpus and services for users with print disabilities. Ultimately, our mission is to make our materials as useful as possible to as many people as possible for as long as possible; through our collaborations with institutions who represent a large variety of users and which contain varied expertise in many areas, HathiTrust has the opportunity to achieve this goal.

Angelina Zaytsev is a project librarian for HathiTrust, where she manages a number of activities related to partner management and user support.


