|Author:||Deborah Lines Andersen|
|Title:||Benchmarks: Digital Documents|
|Publication info:||Ann Arbor, MI: MPublishing, University of Michigan Library
This work is protected by copyright and may be linked to without seeking permission. Permission must be received for subsequent distribution in print or electronically. Please contact firstname.lastname@example.org for more information.
Benchmarks: Digital Documents
Deborah Lines Andersen
vol. 11, no. 1, April 2008
Benchmarks: Digital Documents
Benchmark: a standard by which something can be measured or judged. 
Additions to JAHC's Editorial Board
With this issue the masthead of the journal has changed. We welcome Michelle Harper as the new co‐editor of the journal. Michelle will be working with me on upcoming issues and will become the next editor of the journal when I step down in the next year. Jessica Lacher‐Feldman has taken on the role of Associate Editor‐Feature Articles. Steven Hoffman is now Associate Editor‐Columns. Both Steve and Jessica have served on the editorial board of the journal as peer reviewers. We welcome their increased participation in the journal's production. We also look forward to the August 2008 issue of JAHC which will focus on Geographic Information Systems and how they are and can be used by historians.
Historic Materials in Digital Form
This month both featured articles concern primary source documents that have been made digital. Mark Vajcner's "The Importance of Context for Digitized Archival Collections" explores the value of archival materials and the importance of context for collections. He is particularly concerned with what happens to context when documents are scanned and then available, individually, for research. They lose their context. A paper file maintains the order of documents‐they exist in a set of evidence. Looking at a single, digitized item is like pulling one letter out of a file full of correspondence. It is impossible to know what came before or after, or how the individual document fits into the whole. He muses about the consequences of "selective digitization."
Vajcner also explores what happened before we had digital access, discussing the practice of publishing volumes of historic documents in edited form. By necessity, editors needed to be selective. The scholar of history might use these volumes to gain initial access without having to travel, but travel was necessary to see the originals, in their entirety, in their original context.
Vajcner believes that, "archivists need to start thinking about... advanced finding aids now and to devote as much energy to them as to the digitization of materials themselves." He echoes Duane Webster, the Executive Director of the Association of Research Libraries (ARL) who said in a 28 February 2008 speech at the University at Albany that one of the major challenges for library science education of the future would be teaching students the skills of "electronic data curation." If we have decided to make extraordinarily large amounts of information available in digital form (e.g., the Google book project), then we will need information professionals who know not just how to do the scanning, but also how to create and make available the metadata that will allow all types of users to find the information they want, and in its appropriate context.
Kathleen Swan and Mark Hofer's article, "The Historical Scene Investigation (HSI) Project: Facilitating Historical Thinking with Web‐Based, Digital Primary Source Documents," explores the use of digital information in the context of a particular project aimed at social studies teaching in the K‐12 classroom arena. Using a case approach, they describe the work they have done in creating the Historical Scene Investigation Project, a classroom curriculum that uses digitized primary source documents as evidence that students can use to make historical arguments, evaluate source documents, and consider multiple perspectives of an event. They use the image of the detective for their exercises, prompting students to become investigators in exploring questions from a historical point of view, in looking for clues, and in "cracking the case."
Digitization and the Web
The growing digitization of materials that are now available on the Internet is a natural outgrowth of the increased technological sophistication and decreased cost of producing digital materials. With flatbed scanners selling for as little as one hundred dollars it is no wonder that they are appearing in libraries, on office desktops, and in individual homes. Everyone can be involved in creating information for the Internet. The obvious question, in light of Vajcner's paper, is how will we gain access to these materials, and will we be able to preserve access to their context? Alternately, will we let Google be our main search engine with its enormous ability to retrieve everything and require the reader, initiated or not, to make judgments about context and worth?
Personal Digitization Assessment
One needs to go no farther than online personal photograph collections to see the enormity of the curation task that is ahead of us. Anyone who has downloaded digital photographs from camera to computer knows the frustration of looking for a particular picture and having to go through a multitude of folders to find it. We lack metadata for our own materials. At the personal level we find ourselves struggling with a growing collection of digital materials that are easily lost, misfiled or deleted. Providing metadata for a library's collection of unique materials is more like creating finding aids for someone else's personal materials, when we do not know the person, the time or specific subject matter, or the context of what was captured on film. A single photograph might very well be put in the wrong context for lack of clues to its meaning, and then assigned a subject heading which fails to capture its original contents. How many of us have found an old box of photographs in an attic or closet, looked through the images to identify known relatives and wondered who all these other individuals were?
Digitization Assessment for Unique Institutions
When we move to public and private institutions digitization policy and process become exponentially greater and more complicated. How do we decide what to digitize and make available to the public? How do we provide context for these documents over an extended period of time? What processes and policies might we use? A recent consulting project provides an example of a process that might be used to find answers to these and other digitization questions.
Recently working with a group of information professionals for a local library system I found we were asking just these questions.  Many if not all of the library system members have unique items that are not accessible to the rest of the region. They are very interested in making these materials available, online. Over the course of three weeks 22 individuals spent three hours per week discussing the goals and action items for creating a digitization project for system members. Each participant was asked to write goals, one per piece of paper, and then present these to the group, pasting goal statements on a white board space for all to see. When the entire group of individuals had reported their goals, we organized them by topic area. The following items were in that list:
- Functionality: The group wanted digitized materials to be open source with a central access point for all materials.
- Collaboration and Inclusiveness: It was the hope of the participants that others would join in the digitization effort. This would include support for small libraries with limited budgets. We hoped to partner with area institutions that were not libraries and create a website that features resources of the area that were not previously available to all. Additionally, the group wanted these materials available to teachers and children who were studying the history of the region and needed primary source documents.
- Regionality: The region in which we live (upstate New York around the capital region) has a wealth of history. The participants wanted to be sure that the project developed context for the digitized collections for the region. In this way we could be using new technology to showcase some of the region's oldest holdings.
- Branding: It was critical to the group that there be a marketing campaign for the newly digitized materials so that they would be used widely and identified as a resource supported by the library system.
- Understanding and Training: Staff at member institutions needed to be trained to both digitize and provide access to the materials.
Group participants were particularly concerned about ease of access for users, the inclusion of many institutions and unique materials in this project, highlighting regional materials, preserving regional materials for future generations, making the resource attractive and available to the public, and providing training for both the library staff selecting and scanning collection items and potential users of the materials. There was an underlying theme tied to making sure that the materials, if digitized, were used, which had to do with linking to materials that explained context, creating a website, and generally integrating materials into local catalogs.
We continued in the next group exercise to list the digitization action items that we would need in order to make this project work. Using the same method as for the goals‐writing action verb statements on individual pieces of paper, presenting the statements to the group, and then posting the statements on a white board wall in order to organize them, the participants included the following action items:
- Functionality: Create a database of potential items.
- Collaboration and Inclusiveness: Increase communication with potential participants and help them develop context for their collections.
- Regionality: Create a regional understanding of what the projects can offer.
- Branding: Build a website, name the site, and publicize the project.
- Training: Determine training needs, help libraries identify suitable targets for digitization, educate participants in the standards and metadata, train library staff on how to digitize, establish training programs at a variety of levels, and provide a constant cycle of classes.
- Content, Interface and Integration: Avoid redundancy, continue adding content, and create an interface that allows users to browse by topic or theme.
- Funding: Develop a stable, sustainable funding mechanism.
- IT: Install software at libraries, be technologically sustainable, set up an electronic space where participants can ask questions and help each other.
- Planning and Policy: Create a checklist or action plan for every new contributor, identify local experts, and identify potential system staff need.
- Stakeholders: Identify and communicate with, regional participants, users and potential users.
- Evaluation: Evaluate a pilot project, and create a tool to evaluate the project.
Training needs were the item with the largest number of comments and suggestions. The consulting team introduced the potential for online training, online video demonstrations, or FAQ tutorials on the system website as these could be included as possible ways to provide information and understanding for participants. Additionally, blogs or wikis that might help libraries to help each other with digitization were a possible information source in keeping with the group's suggestion about Web 2.0 technology use. Other items that surfaced had to do with continuous addition of content and training, funding and staffing of the initiative, evaluation, and overall planning for how the project will proceed.
The Need for New Information Professionals‐Data Curators
As seen in both of this issue's feature articles, digitization appears to be the future. The question for this journal, for information professionals in general, and for the participants of the aforementioned consulting project is not so much if it should be done as how it should be done. We need experts who understand the larger issues of planning such projects, of selecting materials appropriate for digitization, and of providing context and metadata. We also need technology experts who understand how to appropriately scan materials and place them on a website where they will be easily accessible to all.
A new term has arisen in the information science field‐data curation.  Rather than speaking of archival curators of papers, maps, diaries and the like, we are now speaking of individuals who have the skills to select, organize, preserve and make accessible the rich data that we are producing in the form of digital products. These products might be born digital, such as email messages or online maps, but they are also the products that we select to digitize and make available in a new format. Historians, librarians, and other information professionals will soon find themselves relying upon the skills of individuals we are just starting to train to perform tasks we are just starting to understand and define. 
1. "Benchmark," American Heritage Dictionary, 4th ed., 2000
2. This project was a collaboration between members of the Albany, New York, Capital District Library Council and the consulting team of Fawzi Mulki, Andrew Whitmore, and Deborah Andersen from the Department of Informatics, College of Computing and Information at the University at Albany. Additional materials are available on request from this author.
3. For example, see the website for the University of Illinois Urbana-Champaign Graduate School of Library and Information Science concentration in data curation which focuses on data collection, preservation, archiving, data standards, and policy at http://www.lis.uiuc.edu/programs/ms/data_curation.html.
4. See Shannon K. Supple. "Managing and Archiving Records in the Digital Era: Changing Professional Orientations." American Archivist, 70 (2, Fall/Winter 2007): 415‐419, for a discussion of changes in professions brought about by digitization.