Chapter 6. Digital Databases

While the NLS relied on reading aloud and translation into braille, initiatives around the turn of the twenty-first century began to use digitization technology to convert print into e-text or digital content, a new kind of accessible format. Although these projects were not primarily concerned with the needs of readers with disabilities, they relied on technology originally developed in accessibility (optical character recognition) and now are positioned to benefit readers with print disabilities more than any other conversion project. Today, there are various databases that specialize in accessible electronic formats, supplied directly by the publisher or converted from print. To protect copyright and the market value of published content, these databases have strict user permissions and are not available to everyone—not even to everyone with print disabilities. Yet for individuals with print disabilities who are affiliated with U.S. institutions of higher education, millions and millions of accessible publications are now immediately available through databases like HathiTrust, Bookshare, and others, without the need for local conversion.

Early Digitization Efforts

Optical character recognition (OCR), used in today’s technologies of scanning and document digitization, has its own history in accessibility technology (Mills, 2015). One of the first machines capable of “reading” the light patterns of printed characters and converting them into usable data was developed between 1910 and 1915 expressly as an aid for the blind. The Optophone, developed by the British physicist Fournier d’Albe, was a handheld optical scanner that emitted a distinct audio tone for each printed character it moved over, so that “after learning the character equivalent for the various tones, visually impaired persons were able to ‘read’ and interpret the printed material” (Schantz, 1982, p. 3). Developments in OCR (multifont OCR and the coupling of OCR with computer-synthesized speech) were likewise driven by accessibility applications for blind users. In the 1960s, OCR emerged as part of research and development for the Kurzweil Reading Machine, a “computer for blind and print-impaired individuals [that] converts printed materials directly into synthesized speech” and allows users to read “with privacy and independence” (Hoff, 2008).

By 1978, the first commercial OCR scanner was marketed broadly and quickly adopted by business, government, and especially “organizations that were paper-laden and had fairly predictable workflows” (Centivany, 2016, p. 26). From the 1980s to the 1990s, major scanning projects for preservation and digital document delivery began at the National Library of Medicine, the National Archives, Cornell University, and University of Michigan (Centivany, 2016, p. 26–28). These early digitization projects, focused on accessibility broadly speaking but not on accessibility for people with disabilities, created among the first scholarly digital repositories of converted electronic texts. They laid the groundwork for later digitization but were quickly surpassed by the turn-of-the-century Google project to scan the world’s books.

The period from 1980 onward was also the period in which OCR technologies expanded to include non-Roman alphabets. Given its initial development on writing systems in which each character is bordered by white space, OCR was unprepared to process scripts like Devanagari (the script for Hindi as well as Sanskrit, Marathi, and Nepali) and Arabic (the script for the Arabic language as well as Urdu, Farsi, Chawi, Kardi) (Yadav, Sánchez-Cuadrado & Morato, 2013; Alkhateeba, Doush & Albsoul, 2017). Today, many languages are scannable with readily available OCR systems, although accuracy rates vary by writing system and OCR software, and additional work is necessary to improve accuracy rates and develop methods sensitive to multilingual and multiscript cultures (Yadav, Sánchez-Cuadrado & Morato, 2013; Risam, 2015; Alkhateeba, Doush & Albsoul, 2017).


HathiTrust is an online repository of over sixteen million digitized volumes from the holdings of major research libraries, managed under a shared governance structure representing the partner institutions. Its early history began in the Google “mass digitization project,” an initiative to scan all the world’s books. Following on the heels of several localized, some failed, and some burgeoning large-scale book-scanning projects, Google’s project was the first mass digitization project that rapidly changed the landscape (Centivany, 2016). Begun in 2002, the project at its height scanned approximately 30,000 volumes per week, which is more than the previous projects, by the “most aggressive and technologically advanced library digitizers,” had scanned in a decade (Centivany, 2017, p. 2361).

Although the Google book-scanning project was not pursued on the grounds of accessibility for users with print disabilities, it was quickly put to that use through the agreement between Google and the University of Michigan, which stipulated that the University would retain digital copies of Google-scanned library holdings and could make use of them for Web-based access. Since the University of Michigan library was already engaged in local scanning of holdings when necessary to meet the needs of print-disabled users, this new digital collection provided the opportunity to greatly improve services for those users (Centivany, 2017). As this early effort evolved into the multi-institution project that became HathiTrust, launched in 2008, the provision of access to electronic copies for users with disabilities grew with it. And, as HathiTrust came under legal challenge from groups representing copyright holders for scanned works, so too did this practice, which was eventually argued before the Southern District Court of New York and Second Circuit Court of Appeals, which ruled that HathiTrust’s provision of access to print-disabled users did not infringe copyright. HathiTrust and the legal challenges around it have tested and clarified the rights and responsibilities of libraries and universities in meeting the needs of students with print disabilities. (See chapter 11 “Copyright” for more details.)

Today HathiTrust operates with different access levels for the public at large and for affiliates of supporting libraries. Anyone can search the database and read works that are in the public domain, while affiliates at supporting institutions can read in-copyright works owned by their institution. Readers with print disabilities at supporting institutions may additionally read in-copyright works from the complete database by first verifying their eligibility, then requesting works through a designated staff proxy (usually based in the library or a disability service office) (HathiTrust, n.d.). This database represents a giant leap in the volume and breadth of works available in an accessible format for readers with print disabilities, especially in the area of scholarly publications. And for individuals at the more than 120 partner institutions, it represents a huge advance in access to those works. However, this service is not at present available to the reading-disabled public at large.


Bookshare is a repository of over 600,000 accessible digital publications donated directly by publishers or uploaded by individuals. The service originally began as an online collection of works scanned individually by blind readers for their personal use following the maturation of OCR and synthesized text-to-speech technologies (Candela, 2009). On this early model, Bookshare staff would “proofread the contributions to eliminate scanning errors and make them available via the Internet for download to other users” and would also directly “scan books to increase the library collection” (Candela, 2009, p. 124). In the past decade, as publishing workflows became capable of readily producing accessible electronic copies, this original model shifted to one of primarily collecting electronic files directly from publishers. The service also collects scans made by college student disability service offices, taking advantage of and leveraging the conversion work that is done locally in many of these offices but otherwise not usually shared or coordinated across campuses.

Since Bookshare began as a service for blind adult users in particular, it has a focus on accessible and audio formats and is optimized for individual users as well as libraries. Schools and students in the United States have free access to Bookshare (through an award from the U.S. Department of Education Office of Special Education Programs). Non-student individuals in the United States and in any country may get personal access to the database by paying a fee, the cost of which is determined by the World Bank income rating of the country: $50 per year for high-income countries, $20 for upper-middle, and $10 for low. Organizations that serve individuals with print disabilities in any country may likewise pay an income-adjusted fee based on their number of downloads (Bookshare, n.d.).

Other Models

A slightly different model is found in AccessText, a secure portal through which disability service providers may request, and publishers may supply, accessible versions of books. The service has a focus on educational publications, and membership is limited to “post-secondary educational institutions throughout Canada, the United States and its territories” (AccessText, 2017). Although AccessText is not a repository, many files are available for immediate download because publishers have authorized access to their entire digital catalog, allowing for automatic processing of requests. In the K-12 education context, other options are available. Learning Ally, for example, is a repository of 80,000 audio format “K-12 books including popular fiction, classic literature, textbooks, test prep and study aids” for students with visual impairments or dyslexia (Learning Ally, 2018).

Apart from databases focused on accessible content for readers with print disabilities, most libraries provide access to electronic books and other content through various databases managed by outside entities or vendors. These digital databases have the potential to support accessibility but have been found inaccessible according to digital accessibility standards and effectively unusable by individuals with print disabilities who rely on assistive technology. (See chapter 8 “Platforms.”) However, libraries are increasingly demanding accessible electronic resource databases that can be equitably used by all patrons. At present, when these databases fall short, academic libraries may have to locally convert or externally obtain accessible alternate format copies of an item they already licensed as an electronic file. In these and other situations, databases like HathiTrust and Bookshare are key stopgap measures. They work to create, collect, and lawfully circulate accessible digital copies of publications without compromising the value of those publications in the digital marketplace.