Finding the Public Domain: Copyright Review Management System ToolkitSkip other details (including permanent urls, DOI, citation information)
This work is licensed under a Creative Commons Attribution 3.0 License. Please contact firstname.lastname@example.org to use this work in a way not covered by the license. :
For more information, read Michigan Publishing's access and usage policy.
This section gives brief overviews of several pilot projects we did in the course of executing the suite of CRMS projects. Several opportunities arose to experiment with a variety of applications of the CRMS model. There is considerable interest in this work and in how it may be extensible to works from other countries, subject to the laws of other nations, and other media beyond books.
We experimented with books from Spain in HathiTrust as a formal part of our second grant from the IMLS in conjunction with the Universidad Complutense de Madrid. Opportunities arose throughout the CRMS projects that allowed us to test theories and improve resources, from reviewing books from Spain with scans in the CRMS interface to reviewing Spanish-language books without scans. We also tested reviewing books from Germany and adapted the CRMS interface to develop versions of CRMS that could be used for future projects. Other work included improvement of Name Authority Cooperative Program (NACO) records and review of government documents produced by US states, which are presumptively subject to copyright, unlike the work of employees of the US federal government.
Reviewing Works Published in Spain
Collaborators: Dean Atiya, Antonio Moreno Cañizares, Nerea Llamas, Almudena Caballos Villar
CRMS developed a pilot program to review a limited set of Spanish-language volumes published in Spain. The pilot built on research performed by HathiTrust partner Universidad Complutense de Madrid.
In collaboration with Universidad Complutense de Madrid, HathiTrust’s first partner outside the United States, we piloted a project to review Spanish-language books published in Spain. Complutense was interested in a significant number of scans of the books from their collection; they wanted to make these available for annotation. The books had not yet been reviewed for copyright status and were thus inaccessible to users in Spain. Specifically, the Complutense researchers prepared a list of Spanish author names and death dates to inform the scope of our inquiry. Complutense approached HathiTrust with a proposal for collaboration with LEETHI (Literaturas Españolas y Europeas del Texto al Hipermedia) and ILSA (Implementation of Language-Driven Software and Applications) research groups. Their project, “Mnemosine: The Digital Library of Rare and Forgotten Spanish Texts (1868–1939),” centered on building a system for annotating public domain digital texts. Our review of HathiTrust volumes facilitated this project.
The project was designed to review Spanish-language works through a modification to the CRMS-World infrastructure. The interface was adapted based on Spanish copyright law.
- Approximately seven hundred volumes
- Works first published in Spain
- Primary author death dates preconfirmed to be 1934 and earlier
- Monographic works only
- CRMS interface modifications—one week developer time
- Legal research and project preplanning—two to three weeks
- Review of seven hundred volumes—approximately one month
- Copyright research specialist
- Three reviewers familiar with Romance languages
- Project manager and developer
- Open volumes
- Collect data on efficacy of using an author-centered approach
- Gain experience in assessing foreign language front matter (publication conventions, terminology, inserts)
- All activity supported through CRMS grant funds and allocation of cost-share time
We created a partition within the CRMS-World interface as a low-cost way of performing Spanish-language reviews without committing to the development of a stand-alone interface solely for Spanish publications. This allowed us to proceed quickly with only minor software development.
From a spreadsheet of Spanish authors provided by Universidad Complutense de Madrid, we selected only authors with a confirmed death date prior to 1934. Given the Spanish copyright term of author life + 80 years, we decided that any monograph with a primary author death date of 1934 or later was not an eligible candidate. The list of eligible authors was matched against bibliographic records in HathiTrust to create a candidate pool of volumes; often there would be several volumes per author.
The copyright specialist performed a preliminary test of our review process with a limited number of volumes. This check did not identify any unforeseen issues with the candidates, so we went ahead with the CRMS double-review process, following the decision tree below:
- A concern at the outset of this project was that reviewers would need to be fluent in Spanish. We discovered, however, that a moderate familiarity with Romance languages was sufficient. Publishing conventions and similarities in front matter, combined with online translation tools, provided enough context to analyze copyright-relevant information.
- The resources most accessible to non-Spanish speakers were the Virtual International Authority File (VIAF) and Spanish Wikipedia. Language was a barrier to searching foreign language databases such as Spanish newspaper archives for author death dates. Collaboration with language specialists may help expand the scope of a copyright review project. The native speakers from Universidad Complutense de Madrid provided us with author death dates from sources such as the El País newspaper, which we would not have been able to find on our own.
- We saw greater efficiency when works by the same author were reviewed in close proximity. A number of authors tended to publish greatly similar works with repetitive use of coauthors, editors, and illustrators. Reviewing these works in succession made it easier to recall dates and sources without repeating a recently completed search.
- An author-based research process, in which a reviewer’s confirmation of an author’s death date and nationality could then be propagated to other works by that author, would be more efficient for copyright regimes based on the life of the author.
- Over the course of the project, we identified information gaps, which specialists more familiar with Spanish works could have helped us resolve. (Developing a mechanism for soliciting help from a specialist community is ideal.)
- Reviews for this pool of candidates required 56 hours. Approximately 20 hours of developer time was needed to set up the infrastructure. Average review time per volume was 18.9 minutes.
In total, we reviewed 730 volumes, 467 of which were determined to be in the public domain.
The primary reasons for keeping a work closed were as follows:
- The volume was coauthored by an author who died after 1934.
- We could not locate a coauthor’s death date.
- The volume included in-copyright or unknown copyright photographs, paintings, and other works created by third parties.
Latin American Works from the Benson Collection at University of Texas at Austin
This scenario describes a pilot project of Spanish-language works carried out by the University of Texas at Austin (UT). This pilot was carried out using physical volumes rather than the CRMS interface because of contractual restrictions placed on UT’s scans. The information in this report was taken from the presentation “CRMS South America: A Study of Argentine Monographs in the Benson Latin American Collection, University of Texas at Austin,” presented by Carlos Ovalle, Caron Garstka, and Georgia Harper in September 2014 to the CRMS Advisory Working Group.
This pilot was conceived and run by Georgia K. Harper, a member of the CRMS Advisory Working Group and Scholarly Communications Advisor at University of Texas at Austin Libraries. She engaged the help of Carlos Ovalle and Caron Garstka, two graduate students from the UT School of Information. The project centered around the Benson Latin American Collection, a valuable resource of UT Libraries that contains materials on Mexico, Central and South America, the Caribbean, and the Hispanic presence in the United States.
Most volumes in the Benson collection were digitized, but at the time of this inquiry, it was not possible to obtain access to the digital scans of in-copyright works. This pilot was designed to evaluate the efficacy of reviewing physical books for the purpose of copyright review using the CRMS methodology without the interface tool.
The project was modeled after CRMS, including two independent reviews of each volume and a narrow project scope. Lack of access to digital scans meant that the project could not employ the CRMS online interface. Data collection was by spreadsheet.
- Sample of one hundred volumes
- Argentinian published monographs
- Publication dates ranging primarily from 1906 to 2005
- Because of the nature of the collection, 88 percent were published post 1940
- Selected randomly, but selected volumes represented one hundred unique authors
- Five-month timeline
- Four months for library staff to create the book list because of problems with system software migration
- One week for library staff to pull the books from shelf; ten books could be pulled per hour, provided the books were on site
- Sixteen to twenty hours for researchers to enter catalog data
- Eight to sixteen hours for researchers to determine author death dates
- Scholarly Communications Advisor at UT
- Two UT graduate students with Spanish comprehension
- Develop proof of concept for comprehensive rights review by UT
- Collect data
- Ascertain time and labor required for completion of entire pool
- Conceptualize a longer term project
- Predict future entry into the public domain of currently copyrighted works
- Assess whether CRMS assumptions about inserts were significant and their implications for determining public domain status of a work otherwise believed to be in the public domain
- Have a basis to determine whether “principal text in the public domain” should be a rights category for allowing access to digital scans
- All activity funded internally by UT
Other factors that aided in this pilot were
- access to Benson collection curators
- access to a library cataloger for general cataloging questions
- working knowledge of written Spanish
- Google Translate
The University of Michigan CRMS team supplied informative resources, both legal and procedural, to assist UT in setting up their workflow. The UT researchers selected a set of a hundred Argentinian monographs from the Benson collection for copyright review. Because the digital scans had not been deposited in HathiTrust, the rights metadata could not be collected in the standard CRMS fashion and thereby associated with a unique volume. Therefore, UT developed their own data collection procedure, modeling it on the data collected by CRMS. Lack of access to digital scans also necessitated a revised workflow to accommodate working with physical volumes.
According to the legal research done by UT, Argentine copyright law requires registration. Verifying registration would have been very costly and impractical to implement in the workflow, so UT began with a presumption of registration for their entire sample because registration could potentially occur at any time prior to copyright expiration. UT’s legal research also indicated that in the case of translations, authorization was required for up to ten years after the death of the author. After this time, anyone could make a translation without authorization by paying an arbitrated fee. They found that whether a translation was authorized was not always clear. This has an impact on the rights a translator could hold in the translation.
UT student researchers identified at least one reliable source for each author death date—preferably two sources in accordance with CRMS-World standards. Two people at UT independently reviewed each volume and then examined the results jointly. Useful Argentinian author death date resources included the UT catalog, Biblioteca del Congreso, Wikipedia, LoC Name Authorities, Google Search, social media such as LinkedIn, university websites, newspaper articles, Biografias y Vidas, Minibiografias, and Todotango.com.
- Digital scans are essential to a viable process for copyright review. Selecting a sample from the Benson collection and then pulling volumes from the shelves was prohibitively time and labor intensive. Any large-scale review system would necessarily depend on the availability of scanned content.
- Future projects may seek ways to engage graduate students as reviewers. Features of a program involving graduate students should include a monitored, consistent process applied to all reviewed works and minimal judgment required once a framework has been established.
- Foreign language volumes raise specific issues related to the characteristics of the language. For example, accented characters proved to be a complicating factor for searching the catalog record.
- Due to a sample set that was predominantly composed of late twentieth-century volumes, many of the works in the Argentine collection will not enter the public domain for many years. However, a long-term strength of this pilot was the collection of relevant metadata to assist in determining when a work would enter the public domain in the future. UT recommended including a “predicted public domain” date within the CRMS system, with a mechanism for flagging works entering the public domain at the beginning of each year.
- Storage changes to the collection over time had an impact on how accessible the physical volumes were for this pilot.
One hundred volumes were reviewed; the project results are as follows:
Undetermined—needing further investigation
- 64 percent of volumes had inserts
- 16 percent of volumes were compilations with many authors
- 3 percent were translations
- One instance of a differing death date between independent reviews
- One instance of locating an author with two death dates
- Catalog information was 99 percent accurate in terms of author information
- One entry mentioned two authors but only one author could be found within the work
- Catalog data often indicated “et al.” for multiple author entries rather than listing all names
Found to be public domain
- Nine public domain in Argentina
- Eight public domain in the United States
- Four public domain in both Argentina and the United States
Volumes able to forecast a date of copyright expiration
- Thirty-one predicted with copyright expiration date in Argentina
- Forty-six predicted with copyright expiration date in the United States
- Thirty-one predicted with copyright expiration date in both Argentina and the United States
Results: Author information identified
- Thirty-nine authors were identified as probably still living
- Thirty-seven authors had definitive death dates
- Fifteen authors could not be found
- Three items were authored by a government or entity without individual personal attribution
Humboldt University of Berlin: Rights Research Project for German Books
Collaborators: Lovis Atze, Rebecca Behnk, Karina Georgi, Regine Granzoq, Joyce Ray, Michael Seadle
This scenario describes a pilot project of German editions of Greek and Latin classical texts carried out by iSchool students at Humboldt-Universität zu Berlin. In this project, we were unable to provide access to scans for the purposes of copyright review; physical volumes were pulled from the Humboldt Library collection for examination.
Over half a million books in HathiTrust are published in German, which is the second most represented language in the collection after English. This indicates a rich source of books about art, science, medicine, and classics—all prominent areas of German scholarship and heavily represented in North American research libraries. We speculate that some of these may no longer exist in Germany because of the disruption of war. If identified as public domain, these could be made widely available.
This project was initiated by Michael Seadle, a Director and Dean at the Institut für Bibliotheks- und Informationswissenschaft (IBI) at Humboldt and led by visiting professor Joyce Ray, Program Coordinator and Lecturer for the Johns Hopkins University Museum Studies program.
Graduate students enrolled in the IBI summer project seminar learned how to make copyright determinations on German works. Legal assumptions were formulated with collaboration from Katharina de la Durantaye, Juniorprofessur für Bürgerliches Recht, insbesondere Internationales Privatrecht und Rechtsvergleichung, Humboldt-Universität zu Berlin. The class met with Melissa Levine via Skype under Professor Ray’s direction.
- Approximately 120 volumes
- German monographic works from a HathiTrust collection entitled “German editions of Greek and Latin Works 1873–1933”
- Works were by ancient authors, with additive content by more contemporary German editors
- Three months, during the IBI summer term
- Four students enrolled in the IBI project seminar
- Serve as a learning exercise for IBI students; results were not intended to be legally actionable for HathiTrust
- Identify impediments, legal and practical, to operating a collaborative rights review of German works with an international partner
- Evaluate processes and resources for performing copyright determination on works of German authorship
- All activity funded internally by Humboldt University or part of seminar requirements for the students
With the help of experts in German law, the students learned about copyright as it relates to German authors’ rights and copyright term. They compiled a list of works to be examined, created a spreadsheet of editor names extracted from those works, and identified reliable resources in which to search for death date information. The students opted to take a name-based approach by assigning each editor a unique number and searching once for all works by that editor in the candidate pool. At least two students searched each editor’s name to confirm dates in multiple sources.
In order to confirm that the works researched by students and those in HathiTrust were the same, the students photocopied the front matter of each work and submitted it to staff at the University of Michigan for verification prior to applying a rights determination to the digital scan. Upon verification that the rights determinations had been performed upon matching volumes, HathiTrust then opened up the books that students identified as being public domain in Germany.
As part of the IBI coursework, students kept a record of their search process and noted their observations of the usefulness of various death date sources. Their experiences are published in a D-Lib paper, “Testing the HathiTrust Copyright Search Protocol in Germany: A Pilot Project on Procedures and Resources,” D-Lib Magazine 20, no. 9/10.
- For foreign language works, the compilation of a glossary of terms and abbreviations was helpful. The students translated words and phrases most helpful when searching for and interpreting terminology used in the front matter of a work.
- Bibliographic metadata containing author and editor death dates immensely simplified the copyright review process. Of fifty authors represented in the sample set, only twelve required a death date search. Of those twelve, despite a detailed search being performed, some editor death dates were not findable (although rough “flourished” dates could be inferred). Perhaps some copyright determinations could be based on knowledge of life-spans and living dates even when a precise death date cannot be found.
- Students realized that volumes could have multiple entries in HathiTrust when different schools had contributed a scan of the same volume. They needed to make sure that only one copyright determination was performed when the result could be applied to multiple copies of the same work in HathiTrust.
- As in other CRMS projects, the top two sources for death date information continued to be a work’s catalog record and the VIAF, even when the project is based on non-English works.
- Students attempted without success to gain access to databases and records kept by German publisher Teubner-Verlag, the Deutsches Historisches Museum, and VG Wort, a collecting society for German authors and publishers. It is unknown whether having access to those records would have impacted the outcome for editors whose death dates could not be discovered, but it highlighted the importance of having open resources to aid copyright determination projects.
The student project resulted in the following outcomes:
- Students identified author and editor death dates for 109 volumes.
- Students identified one hundred volumes as public domain; these volumes were opened in HathiTrust.
- Students identified nine volumes as in copyright; these volumes remained closed in HathiTrust.
- Students prepared a glossary of German/English publishing terms to facilitate future research of German-language works.
- Students compiled a list of reputable death date sources for German authors and editors.
Contributing to Name Authority Cooperative Program (NACO) Records
CRMS developed a pilot program working with the Name Authority Cooperative Program (NACO) to enhance authority records during the CRMS-World grant period (2011–14).
Copyright review is most efficient when the catalog record contains an author’s death date. When a death date is absent, the reviewer must look to outside resources for this information. CRMS-World reviewers often identified author data that had not yet been added to name authority records. However, our systems were not able to update catalog records automatically, so author data captured for a single review would not be accessible for future reviews of that author’s work.
In order to address this issue, we created a pilot project in partnership with our library’s NACO liaison to funnel author information back into NACO authority records, which are exported to VIAF each month. VIAF is a primary source for finding author death dates; it receives data from national libraries around the world. The standards national libraries have established for creating name authority records are long-standing and trustworthy. Consequently, VIAF has proven to be the most central and reliable source for author death dates that is currently available on the open web.
We offer details about this pilot project below in the hope that future copyright review projects will also contribute to this important work.
The project was designed to engage the problem in a low-tech, low-cost way. The following parameters informed the design of this project:
- Improve copyright-relevant data by contributing research to authority records
- Raise awareness nationally on the value of enhancing authority records for copyright determination
- As the project progressed, a new goal emerged to explore ways for expanding the activity to additional HathiTrust institutions
- Began in 2013 and continued for the duration of the CRMS grant
Staffing and volunteers
- Four CRMS reviewers contribute monthly spreadsheets
- Two U-M Technical Services catalogers update RDA NACO authority records
- Volunteer catalogers at Northwestern University, University of Chicago, and University of Minnesota
- Reviewer time is allocated as part of their CRMS grant cost-share contribution
- U-M Technical Services time is allocated as part of salaried work time
We started the pilot project with a small group of catalogers certified to meet RDA NACO standards. A few CRMS reviewers who were interested in contributing to this pilot volunteered to collect author death dates as they performed reviews. These reviewers maintained a spreadsheet with death dates identified during the course of their work. At the end of each month, the reviewers e-mailed the spreadsheet to the U-M Technical Services Division. An RDA NACO cataloger then worked through the spreadsheet to update or create NACO name authority records.
On average, seventy-five death dates are collected each month and it takes an estimated fifteen minutes to update one authority record. Each cataloger regularly contributes no less than two hours per week, with the following workflow:
The NACO trained cataloger searches the Library of Congress Name Authority File (LC NAF) through OCLC Connexion for possible variants of the name. If there is an existing authority record, they add the following:
- A closing death date to a preexisting birth date in the 100 field
- Birth and/or death dates to an 046 field
- A 370 subfield c location to designate the author’s “associated place” (use established place headings, noting source in subfield 2)
- 670 fields to add citations that support the information we added; use subfield u to link to URLs as needed
They upgrade the record to RDA, if necessary, by
- changing the rules fixed field to z and adding rda to subfield e in the 040 field
- taking any other steps necessary to make sure that the record is fully RDA compliant
If the name does not have an existing authority record, the cataloger creates an authority record according to RDA rules and NACO and PCC guidelines, including death date and domicile/nationality (if available).
Catalogers are free to add additional information if available, such as other forms of a name in the 400 field. We are most concerned with the death date, associated place, and source documentation. Once the records are created or existing records are updated, they are sent to the NACO liaison for review and bibliographic file management.
- Incorporate copyright-relevant information in cataloging practices. Cataloging practice does not require an author death date to be included in a record. Cataloging practice was not designed to serve copyright evaluation needs, and in many cases, the focus was on the creation of sufficient metadata for the disambiguation of content, not its complete description. With library budget cuts, catalogers may need reasons to justify spending time on what may be perceived by department managers as unnecessary information. On the contrary, this basic factual information is critical metadata today.
- The majority of authors identified by this pilot did not have existing NACO authority records. This information gap is an area of opportunity for those who wish to assist with public domain determination. Some rich sources for death dates have been public domain books in HathiTrust and Google Books (e.g., published proceedings of professional societies with obituaries for members). Public domain material can be used to help discover information relevant to copyright determinations.
- Rights and access issues are a primary concern for digital collection development. Enhancing authority records with optional fields does take time but also has a significant impact on our ability to identify public domain works. For books still in copyright, prediction tools can use author metadata to anticipate when works will enter the public domain.
- The number of death dates generated by CRMS indicates the benefit of linking copyright review projects with bibliographic enhancement initiatives. However, any library with a NACO liaison can independently work on enhancing authority records. This activity does not need to be coordinated or centralized within a copyright review project like CRMS.
US State Government Documents
This scenario describes a smaller project within CRMS-US to review the copyright status of approximately 61,000 US state government documents.
When initially studying the question of state government documents in HathiTrust, we explored securing permission from authorized state representatives. We also looked for states that, through legislation, had explicitly dedicated government documents to the public domain. These lines of inquiry were inconclusive, and we shifted our focus to what could be accomplished through copyright review. We have found copyright review of state government documents to be straightforward, with few complications and a high likelihood of works found to be in the public domain.
The workflow for reviewing state government documents easily mapped onto the CRMS-US infrastructure, allowing us to avoid the costs of a new project design.
These are the parameters informing the design of this project. All work was based on existing CRMS-US infrastructure and workflow modified for US state documents.
- Approximately 61,000 volumes
- First publication in United States with publication dates between 1923 and 1977 (Hawaii and Alaska limited to items published from 1960 to 1977)
- State government documents only
- Work to continue for the duration of the CRMS grant period
- Completion of entire pool of candidates is not expected
- Copyright research specialist
- Three reviewers with previous experience on the CRMS-US process
- Project manager and developer
- Open volumes full-text within the United States
- Collect data on the following:
- Cases where copyright notice is present in US state government documents
- Time and labor required for completion of entire pool
- How often copyright notice is indicated in the back matter
- All activity supported through CRMS grant funds and allocation of cost-share time
We generated a candidate pool using standard bibliographic indicators for US state government documents. US copyright law required copyright notice through 1977, so that year became the outer boundary of our inquiry.
We selected staff that were experienced with the CRMS-US decision tree and taught them the slight modifications required for reviewing US state documents. The project followed the standard CRMS double-review process using the decision tree below.
During the process, reviewers confirmed that the work was in fact a state government document before focusing on three key elements:
- The presence or absence of a copyright notice in the government document, including whether it appears in the back matter
- Whether the work was a reprint of an earlier in-copyright work
- Whether the work contained potentially in-copyright additional materials, such as a photograph produced by a third party
When a work did not contain third-party content, was not a reprint of an in-copyright work, and did not bear a copyright notice, it was determined to be in the US public domain.
- Reviewing state government documents for a lack of copyright notice is a relatively simple workflow with a high probability of identifying volumes as in the public domain. Stats from the first five-month period showed that out of 5,527 reviews performed, 71.5 percent were found to be public domain. In comparison, the public domain average of the cumulative CRMS-US project was 51.7 percent.
- A “bound-with” volume is one in which multiple, individually published documents have been bound together. Bound-withs present problems because they can require a lengthy process of checking internal sections of the volume for copyright notice. When one document bears a copyright notice, it will result in keeping the entire bound-with volume closed. We gave reviewers the option to disregard bound-withs due to their potential complexity. Our initial data collection showed that in 853 out of 17,307 reviews, the volume was determined to be a bound-with.
- Copyright review based on publication with notice can potentially be applied to other types of US publications.
The project results, current as of March 2015, are as follows:
- 25,329 total reviews
- 9,846 exported determinations
|Categories||May 2014||Jun 2014||Jul 2014||Aug 2014||Sep 2014||Oct 2014||Nov 2014||Dec 2014||Jan 2015||Feb 2015||Mar 2015|
|Public domain reviews||639 (62.6 percent)||1,107 (67.2 percent)||1,486 (75.5 percent)||781 (81.0 percent)||625 (75.6 percent)||734 (72.7 percent)||687 (71.0 percent)||450 (80.8 percent)||1,060 (71.2 percent)||1967 (70.9 percent)||2048 (67.1 percent)|
|In-copyright reviews||8 (0.8 percent)||88 (5.3 percent)||55 (2.8 percent)||7 (0.7 percent)||7 (0.8 percent)||3 (0.3 percent)||0 (0.0 percent)||0 (0.0 percent)||4 (0.3 percent)||16 (0.6 percent)||8 (0.3 percent)|
|Undetermined/needs further investigation reviews||373 (36.6 percent)||452 (27.4 percent)||427 (21.7 percent)||176 (18.3 percent)||195 (23.6 percent)||272 (27.0 percent)||280 (29.0 percent)||107 (19.2 percent)||425 (28.5 percent)||792 (28.5 percent)||994 (32.6 percent)|
|Time per review (mins)||1.1||0.8||0.9||1.2||1.3||1.3||1.4||1.5||1.3||0.9||0.9|