    Part 4. Writing with the Needles from Your Data Haystack

    How are electronic databases and text-analysis tools changing how historians research and write about the past? Are we finding more “needles in the haystack” that we otherwise might not have noticed? Ansley Erickson launches this section with “Historical Research and the Problem of Categories: Reflections on 10,000 Digital Note Cards,” which richly illustrates how using a relational database package reshaped her dissertation source-work and writing process and led her to reflect on broader questions of historical categorization. Reflecting on their long-term collaboration, Kathryn Kish Sklar and Thomas Dublin describe the transformation of their intellectual goals, technology, funding, and global audience, in “Creating Meaning in a Sea of Information: The Women and Social Movements Web Sites.” Finally, in “The Hermeneutics of Data and Historical Writing,” Fred Gibbs and Trevor Owens argue that historians should emphasize our research methods more than traditional narratives, with a case study using such tools as Google Book’s Ngram Viewer.

    Historical Research and the Problem of Categories: Reflections on 10,000 Digital Note Cards

    Once while taking a break at an archive, I stood at the snack machine alongside a senior historian. She let out a tired sigh and then explained that she was at the beginning of a project, at the point “where you don’t know anything yet.” For historians, research often takes a nonlinear or even meandering form, through many phases of uncertainty and redefinition. As global historian William McNeill described it, we begin with a sense of a historical problem and explore it through reading, which cyclically “reshapes the problem, which further directs the reading.” This back-and-forth can continue right up to publication. We might be more bold, like Stephen Ramsay, and celebrate the “serendipitous engagement” that happens when “screwing around” with sources, enjoying intellectually productive browsing and exploration. Whether we look forward to or struggle through these phases, much of our work happens while our research questions are still in formation.[1]

    Uncertainty is, therefore, a core attribute of our research process, one that we might take as evidence that we are guided by our sources. Yet it can produce challenges as well. How do we proceed to do research—the real nuts and bolts of it—if we acknowledge such uncertainty? How can we organize information and keep it accessible in ways that will facilitate our ongoing thinking and writing, if we acknowledge changing focal points or areas of interest?

    To research my dissertation, “Schooling the Metropolis: Educational Inequality Made and Remade, Nashville, Tennessee, 1945–85,” I started with various questions about desegregation in Nashville, Tennessee: Why did black students ride buses more and longer than white students? Was this due to power imbalances, ideologies, or explicit policies? Was the nature of Nashville’s economy relevant? I gradually worked my way toward the question I came to address—how the pursuit of economic growth fed educational inequality.[2]

    This essay considers a central challenge of historical research, one present in any long-term research endeavor but made more acute by shifting research questions: the challenge of information management. In the summer of 2006, I had a viable dissertation prospectus and was about to embark on the first of my trips to the archives. I was excited and I was scared that I would forget things. I knew what it took to manage the information involved in a seminar-length paper. Earlier, I had filled pages with handwritten notes or word-processed text, filtering through them as I built an argument. But how would I manage a project that would extend over years of research and writing? Where, in the most literal sense, would I put all of the information, so that I could find it when drafting chapters or, much later, revising for publication? I needed something that would backstop my own memory yet allow for shifts in my thinking. I also had to ensure that information stayed in the context of its originating source, while distinguishing between material from the sources and my interpretation of them.

    Following the example of some more senior graduate students and one young faculty member in my department, I decided to use a relational database to keep my notes.[3] I was far from the cutting edge of digital history or information sciences. As I designed my database, I leaned on the very analogue metaphor of the note card. Rather than reconceptualizing my historical work in deep interaction with new tools, as many scholars in digital history (including several in this volume) have done, I used a new tool to do familiar aspects of research in a more accessible and efficient way.[4]

    In the process, I came to see information management as a consequential aspect of historical research. How we organize and interact with information from our sources can affect what we discover in them. Scholars of the archive and of the social history of knowledge have long observed the consequences of how people keep information, and historians have considered the impact of archival practices on their own findings.[5] Their work raises useful questions about historians’ own research processes—questions highlighted during work with databases. Particularly, where, when, and how do we categorize information; how do we interact with these categories as we think and write; and what can we do so that we do not become bound up in the categories we create at the most uncertain stages of our research?

    Although the quantity and functionality of digital tools for data management, as well as attention to these tools, has increased in the last few years, they are not yet fully woven into the fabric of the profession. Some of this may be generational; but it also results from our discipline’s relative lack of formal conversation about methodology at the granular level. Graduate training programs paradoxically structure their training as internships in the consumption and production of history yet offer little explicit guidance on the mechanics involved.[6] When new tools emerge, their potential utility may not be appreciated fully. Database programs can have broad impact on how we interact with information, but much discussion of them emphasizes their use in the narrower work of bibliographic and citation management.[7]

    While neither an early nor an innovating database user, I offer this account to illustrate some potential benefits and learnings from my modest use of this tool. I first lay out how I organized my research and how it related to my thinking and writing. (See images that document my process in the web version of this essay at http://WritingHistory.trincoll.edu.) Then I venture some connections between that process and questions in the social history of knowledge and the scholarship of the archive—questions about the making and impact of categories in thought.

    Database Note Keeping

    Having decided to keep notes in a database, I selected a program: FileMaker Pro. There are many alternatives: some designed for qualitative research (NVivo, Atlas.ti), some free and web-compatible (such as Zotero), and others emerge periodically.[8] Historians who write code can create their own. I began by creating two FileMaker layouts, one for sources and another for the “note cards” from those sources.[9] Guessing at how I might later sort and analyze my notes, I made a keyword field for themes I expected to recur. Zotero, which I use in current projects, provides a similar structure for sources, notes, and keyword “tags.”

    In trips to several archives over a year, I collected tens of thousands of pages of documents by taking digital photographs of these.[10] I read and took notes on a portion on site, in those collections that prohibited digital copying or charged exorbitantly for physical copies. Because I had very limited time to work on-site at archives, most of my note taking happened once I returned home. I read digital copies on one screen. On the other, I entered notes in the database, putting direct quotes in one field, my observations and tentative analysis in another (see fig. 5). (Zotero uses a single note-taking field.) The vast majority of my note cards were descriptive, but when I had a thought that tied various sources together or hinted at an argument, I made a new note card, titled “memo to self,” and then these entered the digital stack as well, tagged with keywords.

    Once I had worked through most of my documents, I had nearly 10,000 note cards. I used the database as I began my analysis and sense making. I first ran large searches based on my keywords: searching hundreds of note cards on “vocational education,” for example. I organized these cards chronologically—an action that takes only a few keystrokes—and spent a day or more reading them through. As themes or patterns began to emerge or as there were connections to other sections of my research that were not under the “vocational” heading, I ran separate searches on these, incorporating that material into the bin of quotes and comments I was building by cutting and pasting into a new text document. (Databases often have “report” functions that could help this process, but I did not explore that route.) Of course, sorting information can be done without a database. But I found it to happen quickly and more easily with one.

    Having reviewed my research material, I began to draft a section of a chapter. I started to write before I was sure of the precise structure of the chapter or my detailed argument. I used writing as a way to find and refine my argument. Crafting a basic narrative often helped me identify what I was missing, what I needed to find out more about. Writing in this exploratory fashion was made easier by quick access to bits of information from the database as needed.[11]

    Using a database did accomplish the most basic of my goals. It proved a reliable and convenient way to keep notes and contextual information in the same place, and it addressed my most basic fear of forgetting, by allowing searches for information in myriad ways—by title, content of notes, direct quotations, keywords, dates. As my writing advanced, I came to appreciate how the database’s full-text searchability allowed me not only to follow my original questions but to explore ones that I had not anticipated at the start of my research. This mode of note keeping allowed me, as I thought and wrote, to access information that I would have missed otherwise—likely because of the difficulty of tracking down and reordering notes without such a database. Two examples illustrate this accessibility.

    Fig. 5. FileMaker Pro screenshot of sample notes on a court transcript
    Fig. 5. FileMaker Pro screenshot of sample notes on a court transcript

    One central problem in my work has been understanding the multiple layers of inequality at work in Nashville’s desegregation story. There are, of course, salient and central differences by race and by class, but these divisions were often expressed in the language of geography. By the mid-1960s, residents, planners, and educators used the phrase inner city to indicate predominantly black neighborhoods or neighborhoods where planners predicted black population growth. I had noticed this pattern in my own reading and had captured examples of such language and other descriptions of geographic space with a keyword: cognitive map. To read about this phenomenon, I worked through all of my “cognitive map” notes, in chronological order. Through several conference papers and draft chapters, I developed an argument about how pro-suburban bias informed Nashville’s busing plan. In early versions, I seemed to imply that in Nashville residents’ cognitive maps, the correlations between suburban space and white residents and between urban space and black residents were absolute. But were there exceptions? What could I do to test this? I searched for instances where my sources used the phrase inner city. Of course, I may not have not written down each instance, as I did not plan for this textual analysis. Nonetheless, I had enough to begin.

    When I read my sources in this way—some of which I had labeled “cognitive map,” some not—I saw something new. Among the critics of schooling in the “inner city” and the smaller group of its defenders, there was a case that proved that the identification of urban space with black residents was not complete, at least for some city residents. I had earlier made notes and then forgotten about the story of a central-city school that was historically segregated white, remained largely working class, and had a local council representative fighting to retain the school in conjunction with what he labeled its surrounding inner-city neighborhood. William Higgins, the council representative, asked, “You’re taking children from the inner city and busing them to suburbia. Why place the hardship on them? Why not bring children from suburbia to the inner city?” He later proposed, “All new schools . . . should be unified with the inner-city, otherwise the city finds itself a lonely remnant, disunited and eventually abandoned.”[12] When I read these passages in the first years of my research, I had not thought to tag them with the keyword cognitive map. Thus they did not show up in that keyword search over two years later. I was able to discover them again because I could search for a phrase laden with meaning and insinuation. Doing so yielded access to notes that influenced my understanding of how categories of race, geography, and class overlapped in my story and where they diverged.

    In another case, I found that the database allowed me to reframe an initial research question into a broader one. From the start, my dissertation was concerned with why schools were built where they were, how locations got chosen, to suit whose interests. I thought of schools as a good being struggled over in political and economic terms. After analyzing the local politics of school construction, I understood that my story was not about schools alone but about how the distribution of public goods reflected the political and economic structures that supported metropolitan inequality.

    I had been tracing how urban renewal funds subsidized school construction and how, in the context of a metropolitan government, such subsidies could allow a municipality to shift more of its own tax revenues to its suburban precincts. I suspected that this use of urban renewal dollars to reduce the local commitment to supporting city areas in favor of suburban ones was visible in other areas of city services as well. How could I illustrate that broadened claim? I could see what my sources—planning reports, maps, records of community meetings—said about another kind of public good, to see if the dynamics were similar. I knew that I had made some notes about the building and repair of sewer lines for the city and surrounding suburbs, but I had not expected to write about them, so I had no related keyword. Text searchability of the database meant that I could very easily track down everything I had about sewers, organize it chronologically, and test if the pattern I saw for schools fit for sewers as well. Without fully searchable notes, I would have been looking through stacks of note cards, organized to fit another set of categories entirely. I may not have felt I had the time to expand my original question to a broader one.

    In each of the examples just presented, the database helped relevant information jump out of the noise of years of research and thinking. It helped make that information available relationally, easily connected to other information.

    Categories and the Making of Historical Knowledge

    Reflecting on my use of this digital tool for note keeping has led me to questions about how we think about our research practice, how we understand the relationship between how we research and what we learn. Recent work in the social history of knowledge and the history of the archive share a core interest in categories—where they come from, what assumptions or values they represent, how they can be reified on paper or in practice.[13] These interests are relevant to our research methods. In researching and writing my dissertation, I was able to set out initial categories of analysis (via keywords), but it was possible, at no great expense of time, to throw these out. Sometimes I used my initial keywords, and sometimes I skipped over these to evaluate new connections, questions, or lines of analysis. If I had used pens and notebooks or a set of word processing documents, regrouping information would have required a great expenditure of time. I would have been less likely, then, to consider these new avenues, and my earlier categories of analysis would have been more determinative of my final work. Those categories would have been highly influential even though I created them when, in the words of the historian at the snack machine, I really did not know anything yet. Since there was virtually no time expended in trying out new questions utilizing the database, I could explore them easily. Thinking about how my database facilitated my analysis got me thinking about how historians construct, use, and rely on categories in our work.

    It makes sense that historians would think about categories, as we encounter them frequently in our work. As graduate students, we learn to identify ourselves by subfield: “I do history of gender” or “I’m an Americanist.” We are trained implicitly and explicitly to organize information and causal explanations into categories of analysis—race, class, gender, sexuality, politics, space, and so forth—when, in fact, these categories are never so neat and separate, whether in an individual’s life or in a historical moment. Then we research in archives that establish and justify their own categories—legal records divided by plaintiff or defendant, institutions that keep their records with an eye to confirming their power or reinforcing their independence. To make sense of a sometimes overwhelming volume of fact, all of which needs to be analyzed relationally, we rely on categories that we create as we work—like my database keywords.

    This matter of categories connects to at least two fields of scholarship. Scholars of the history of knowledge, such as Peter Burke, have examined the organizational schemes embodied in curricula, in libraries, and in encyclopedias and have shown how these structures and taxonomies represent particular ways of seeing the world. For Burke, such schemes reify or naturalize certain ways of seeing, helping to reproduce the view of the world from which they came. They also make some kinds of information more accessible, and some less.[14]

    Think, for example, of the encyclopedia. We are accustomed to its alphabetical organization of topics, but this structure, in fact, represented a break from previous reference formats that grouped subjects under the structure of classical disciplines. The alphabetized encyclopedia came about at a point when the previous disciplinary categories no longer could contain growing knowledge. A new, more horizontal model took their place, a model that allowed readers access to information by topic, outside of the hierarchies of a discipline. Burke points us to the importance of how we categorize information, where these categories come from, and how categorizations affect our access to and experience of information.[15]

    Anthropologist Ann Stoler comes to the problem of categories from a different perspective. She thinks of the archive as an active site for ethnography and seeks to understand how archives are live spaces in which the Dutch colonial state in Indonesia built, among other things, social categories. She traces how colonial administrators’ use of archiving categorized and assigned particular rights and privileges to people with different national heritages. As they categorized, they made some peoples’ experiences of the colonial state visible and obscured others. Stoler writes that categories are both the explicit subject of archives and their implicit project: “The career of categories is also lodged in archival habits and how those change; in the telling titles of commissions, in the requisite subject headings of administrative reports, in what sorts of stories get relegated to the miscellaneous and ‘misplaced.’” She then frames the archive as a place to understand “how people think and why they seem obliged to think, or suddenly find themselves having difficulty thinking,” in certain ways.[16]

    The work of scholars like Burke and Stoler implies questions for historians’ research processes. Burke’s work suggests that we investigate how categories of thought, either between disciplines or within them, affect us. Think of academic subfields, for example, the boundaries of which still shape the literatures we read (even as many try to transcend them) and still guide which archives we pursue or whether we think of particular questions as part of our domain. Stoler raises a different kind of question. At what points in our research, out of pragmatic necessity, out of a desire for intellectual order, or for yet other reasons, do we set out categories of evidence or thought that influence what we see and what we do not see? What kinds of tools could help us be more aware of these categories or could give us the flexibility to move beyond them when necessary or desirable?

    I hypothesize here that databases offer a kind of flexibility that can allow us to create and re-create categories as we work with notes, to adjust as we know more about our sources, about how they relate to one another and how they relate to the silences we are finding. That flexibility means that we can evaluate particular ways of categorizing what we know and then adapt if we realize that these categories are not satisfactory. In doing so, we are made more aware of the work of categorization and are reminded to take stock of how our ways of organizing help and what they leave out.

    The matter of flexible categorization touches on another strand of scholarship: archivists debating what postmodernism means for their work. How does the growing understanding of archives as spaces in which certain kinds of power are codified and justified and where information has to be understood relationally matter for the practice of archiving? Archival theorist Terry Cook argues that finding aids and item descriptions should be constantly evolving, adapting to new relevant knowledge about the item’s sources and its relationship to other archived and unarchived materials.[17] Working with databases provokes historians to think about how our note-keeping practices could seek such flexibility and relationality.

    Yet there are at least two cautions as well. One comes from the flatness of databases like the one I used. In Burke’s terms, my database was not a reference text organized along disciplinary lines. It was more like an alphabetized encyclopedia. Without hierarchies that keep each fact locked in relationship to others—through the structure of earlier historiography, for example, or through the categories of an archive’s collections—the historian has to be more intentional about seeing information in its context. If we can look across all of our notes at a very granular level and make connections across categories that we or others created, it becomes too easy to look at these bits of information devoid of context—a danger visible even in my own way of cutting and pasting out of my database. I linked bits of notes only to a source code, meaning that they could be read in less-than-direct connection to their origins. Digital bits seem very easily severed from their context. Zotero’s structure links sources and notes visually, which may help safeguard against this.

    More important, despite its usefulness in helping us see things we might otherwise have forgotten or missed, no database does the work of analysis. The two are, of course, interdependent—as they are in any digital or nondigital form of note keeping. The analytical work, the crucial sense making that pushes history writing from chronology to critical interpretation, still happens in our own heads. There other implicit categories or habits of thought might shape our analysis. There we decide whose stories to tell first, or we prioritize one set of historical drivers over another. Some of these habits reflect our deepest-held assumptions and beliefs. It is less easy to talk of these, and certainly less easy for an author to identify his or her own, than it is to speak of note keeping. Maybe bringing critical consciousness to the mechanical can prompt more reflection about the conceptual as well.

    It is also worth considering what kinds of concerns may arise for historians who have not yet made use of digital tools like databases in their own research. Historians surely value, maybe even romanticize, the encounter with sources in the archives. Does converting that textual, even textural, experience into digital note cards somehow deaden it? Does it render our research uncomfortably close to a social scientist’s coding and writing up of findings? Charlotte Rochez, responding to an earlier version of this essay, explained that she worried about sacrificing “some of deeper insights, interpretations and understanding induced from being more involved in sorting and interpreting the sources.”[18] Digital note taking may add to but does not of necessity replace varied encounters between researcher and sources—even “serendipitous engagement.” It remains possible to meander through your notes from a given collection or source, to look back at the original page (even in PDF or photocopied form). But it becomes newly feasible to look broadly across those collections and sources.

    One prompt for this volume came from the Journal of American History’s 1997 special issue that made public the process of academic peer review. David Thelen’s introduction to that issue raised questions about the work of history writing that seem important to revisit in light of digital innovations. The centerpiece of the issue was a submission by Joel Williamson, in which Williamson recounted his failure to perceive lynching’s centrality to and origins in American and Southern history. Two reviewers received Williamson’s piece with shock and dismay that he could have missed what they had appreciated as central for years. Despite this disagreement, or perhaps because of it, Thelen saw Williamson’s piece as issuing a challenge to historians to “think about what we see and do not see, to reflect on what in our experience we avoid, erase, or deny, as well as what we focus on.”[19] I see my attention to categories, to the possibilities and implications of how we choose to organize the information on which our interpretations rest, as a kindred effort.

    Acknowledgments: The author thanks Jack Dougherty and Kristen Nawrotzki for the invitation to reflect on research practice and for good feedback on this essay, Courtney Fullilove for reading suggestions, Seth Erickson for ongoing conversations about archives and information architecture, and all those who commented on earlier versions for their helpful remarks. The dissertation research described here was supported by a Spencer Dissertation Fellowship, a Clifford Roberts/Eisenhower Institute Fellowship, and a Mellon Interdisciplinary Graduate Fellowship at the Paul Lazersfeld Center, Institute for Social and Economic Research and Policy, Columbia University.


    In 1997, funded by a small grant from the National Endowment for the Humanities (NEH), we set out to give U.S. women’s history a more substantial presence on the World Wide Web, then a rather modest and marginal new domain for history publishing. For six years, we focused on work with undergraduates at Binghamton University, State University of New York, and then with faculty and students at a dozen colleges and universities around the United States.[1] In this first stage, we published more than 40 document projects that constituted original research about the history of women and social movements in the United States.

    These document projects consisted of 20 to 30 primary documents complemented by an interpretive essay and other scholarly components, organized to answer a central historiographical question. Document projects have questions for titles, because our goal is to generate more focused scholarship than a topical framework might create. We sought, in this way, to combine new historical interpretations with the publication of valuable and often inaccessible primary sources. In launching this effort, we were struck by the way primary documents and interpretation supported one another and provided a distinctly richer combination for students and scholars than typically emerges from the scholarly article format.

    It was not simply a matter of the conjunction of the two kinds of resources; the electronic medium itself dramatically shaped and enriched our undertaking in important ways. The document project format was a felicitous combination of what historians do (analyze documents) with the Internet’s spaciousness and hypertext capacity. By publishing documents in their entirety and arranging them in document projects that are much more monographic than is economically feasible in print media, as well as by providing a robust database and search engine, our format and research tools permit readers to evaluate the evidence and arguments much more fully than is possible with a traditional journal. For example, our first document project, published in 1997, “How Did African-American Women Define Their Citizenship at the Chicago World’s Fair in 1893?,” is by far the most extensive monographic treatment of that topic. Bringing together 27 documents, including all the speeches of African American women at the World’s Congress of Representative Women, accompanied by an interpretive essay that analyzes the documents, it makes a substantial historiographic contribution to U.S. women’s history, African American history, and U.S. history.[2]

    We immediately recognized the power of this innovative but labor-intensive format, and with the support of NEH grants, we taught courses and employed graduate students that together produced dozens more. But since we wanted to produce authoritative, rather than student, work, we also began to involve a small group of colleagues in U.S. women’s history. In 2003, we came to a crossroads when we anticipated running out of grant funding and the modest support of Binghamton University. That year, we solved our financial crisis and entered a new stage of growth by partnering with Alexander Street Press (ASP) and becoming a peer-reviewed, online journal, Women and Social Movements in the United States, 1600–2000. ASP distributed our journal/database through subscriptions and purchases by academic libraries. Along with financial stability that permitted us to pay our staff, we thus acquired access to much more powerful database and search technology.[3]

    Innovative in its format and medium, the journal has grown remarkably in nine years. Because our partnership with ASP made it possible for us to use database functionalities and a powerful search engine, we decided that we would publish full-text scholarly collections of primary sources, often with accompanying interpretive introductions, as well as our signature document projects. Before long, we added book and website reviews, teaching tools, and news about U.S. women’s history from the archives. We also created another new format, the “document archive.” Bringing together a distinctly larger collection of primary documents—typically 60 to 80 in number—the document archive combines a brief interpretive introduction with a more extensive collection of documents. Document projects seek to “prove” a scholarly interpretation. Document archives provide a minimal interpretive framework for a larger group of documents.

    These gains came at a price—our site was no longer freely accessible. Our initial concern on that score was alleviated as the number of subscribing libraries grew, and we were soon accessed by more users than had visited our smaller Binghamton University site. By early 2013, almost 400 libraries provided access to the site for their students and faculty, about the same number of institutional subscribers as many print media journals. While subscription access imposes limits on our site’s use, we work with an online publisher that is highly respected by librarians for maintaining high standards in their online publications. So we consider ourselves part of a process by which libraries gain access to high-quality online scholarship, even though it is not funded by foundations or major research universities.

    Women and Social Movements in the United States (WASM) has steadily expanded over the years because faculty and librarians have thought well enough of it to fund it with library subscriptions. We hold ourselves directly accountable to those subscribers. In that regard, there is a more democratic aspect to our funding structure than in freely accessible sites that are designed at and funded by well-endowed institutions and foundations but are not accountable to end users.

    Our primary goal is to create new knowledge. We do so by integrating documents with the interpretation of documents. Sites that contain only documents predominate in U.S history. Particularly notable among these are the American Memory site at the Library of Congress and Ed Ayers’s The Valley of the Shadow site, which includes documents from two counties in the Civil War period.[4] The documents on these wonderful sites are very valuable for students and scholars of U.S. history, who can use them to create new knowledge off-site. Our goal is to generate new knowledge on our site—with the publication of document projects and with extensive database functionalities that permit users to organize the data in new ways.

    To generate new knowledge, we take very seriously our responsibility to be authoritative with the documents and interpretations we publish. Our goal of being authoritative makes our editorial process extremely labor-intensive. Space prohibits us from describing it fully here, but a few examples will show what we have undertaken. First, our interaction with authors is extensive and complex. Scholars are not familiar with the document project format, so we need to help them navigate a steep learning curve. Each project presents unique challenges and new frontiers. We begin by asking authors to pose a historiographically meaningful question and then prepare an annotated list of documents that address the question and a brief essay that shows us how the documents can be read to interpret the question. This stage of the process often takes about a year and a half, with frequent communication between us and an author to address historiographic issues or gaps or redundancies in the documentation. Then our peer review process evaluates the result, almost always suggesting more work, sometimes clarity of interpretation based on the documents, and frequently calling for different or more documents. After authors accommodate peer review suggestions, the next stage of the process involves our Binghamton University work with authors to provide authoritative citations and headnotes about the documents’ provenance. We and our authors contact archives to secure permission to publish online and verify that we have accurate metadata. We transcribe the documents because scanning and OCR do not produce sufficiently accurate texts for our database. Our Binghamton University shop carefully compares our transcriptions to the original documents. Our authors provide annotations for the documents and footnotes for their interpretative text, as well as bibliographies. The journal is indexed in America: History and Life and the “Research Scholarship Online” section of the Journal of American History, the two leading bibliographic resources in U.S. History, so WASM publications enter ongoing historiographical debates.

    Our search engine and database functionalities are central to our purpose of providing users the opportunity to create new knowledge by organizing the documentary data in new ways that are meaningful to them. We are constantly developing and enhancing the database functionalities. We began our site by key entering documents into HTML so that they could be more effectively indexed and made full-text searchable. With our partnership with Alexander Street Press, we shifted to standards in the Text Encoding Initiative, TEI-SGML and then TEI-XML.

    In addition to our goal of creating authoritative new knowledge, we also want to facilitate use of the site by scholars who are not historians of American women. The stream of scholarship published about American women in the past 40 years is underrepresented in the mainstream of U.S. history. Perhaps scholars who are not able to explore the wide range of secondary writings about American women might be able to read and use documents on our site in their research and teaching of primary documents about women, especially if they relate to the scholars’ own research interests. Thus, for example, historians of the American Revolution might be interested in the writings of Esther Reed and other elite Philadelphia women who raised funds to support the patriot army during the war. Or they might want to explore Woodrow Wilson’s reimposition of segregation in federal offices in 1913.[5] In the document project format, these topics invite the exploration of historical methods associated with cause, effect, periodization, audience, power, and, of course, class, race, and gender. But they can also be used simply to supplement what readers already know.

    We have a third goal of drawing scholars in U.S. women’s history (and women’s history generally) into greater dialogue across specializations. Like other historical fields, women’s history has developed subfields that often make it impossible for scholars to learn about work outside their own precincts. Thus scholars of the history of women’s health in the antebellum era might not know about recent work on women’s labor history in the Progressive Era or on the history of African American women in the civil rights movement. It might be easier for scholars to learn about fields outside their own specialty by having the opportunity to access primary sources. Secondary works sometimes require a considerable commitment of time and familiarity with related historiography to digest. Primary documents offer a more direct route to learning.

    One brief example can show how a document project in U.S. women’s history might contribute to all three of these rationales: to generate new knowledge, to influence other fields of U.S. History, and to facilitate more communication among historians of American women. Carol Faulkner’s document project “How Did White Women Aid Former Slaves during and after the Civil War, 1863–1891?” analyzes the gendered construction of power in the freedmen’s aid movement during Reconstruction.[6] She offers documents that demonstrate women reformers’ opposition to the policies of such leading men as General Oliver O. Howard, head of the Freedmen’s Bureau, or Horace Greeley, noted editor of the New York Tribune. These men supported the early closing of the Freedmen’s Bureau in 1869 because they feared that assistance to freed people would create economic dependency. Eric Foner, the leading historian of Reconstruction, has described the dominant ethos in the bureau as reflecting “not only attitudes towards blacks, but a more general Northern belief in the dangers of encouraging dependency among the lower classes.”[7] Faulkner’s exploration of correspondence between General Howard and freedmen’s aid advocate Josephine Griffing shows that this group of women reformers sought to provide more generous long-term aid to freedmen and challenged Howard’s concern about dependency. Thus Faulkner’s document project alters our understanding of the possibilities during this major period of American history. By offering the full text of the sources on which its interpretation rests, it invites scholars and students to use those documents in their own work of interpreting American history.

    Our partnership with Alexander Street Press allowed us to expand each issue of our journal to include extensive full-text sources that are not part of document projects. This enhances the site’s resources and the meanings that can be derived from the site’s database. We now publish about 5,000 pages annually of full-text sources. Our first group in the fall of 2003 included about 30,000 pages of books, pamphlets, and convention proceedings related to the struggle for woman suffrage in the United States from 1830 to 1930. There, we brought together for the first time the published proceedings of the three women’s antislavery conventions of the 1830s and the proceedings of 15 women’s rights conventions that were held between 1848 and 1870. These resources enable scholars to analyze change over time in the women’s rights convention movement, viewing that movement much more fully than ever before. The database permits the retrieval of new knowledge in response to new questions. For example, researchers can identify the number of speeches or letters in convention proceedings that mentioned married women’s property rights or education or health. By exploring topics addressed, speakers named, and rhetoric employed at these conventions, historians can explore change over time much more systematically.

    Thus WASM offers the advantages of a database as well as a journal. Another example is our indexing of the six-volume History of Woman Suffrage, totaling some 5,800 pages.[8] These volumes were largely compilations of published and unpublished documents, and the database indexes these works in ways that permit scholars to search for the authors and titles of hundreds of separate documents included in the volumes. The database reveals more than 800 individual documents in those volumes, including 152 speeches, which users can further identify by author, race of author, date, and place (among other variables). We also reprinted, as full-text sources, works by the national and state branches of the League of Women Voters originally published between 1920 and 2000. This collection consists of 660 items totaling 8,000 pages and provides a valuable resource for exploring women in American public life after suffrage was achieved.

    We took on a big project of compiling all the publications that we could find by state and local commissions on the status of women between 1961 and 2005. This database, fully integrated into WASM, dwarfed our earlier efforts, including almost 1,900 items with some 90,000 pages. In a practice that we began to follow increasingly for the website, we commissioned scholarly essays to explore various dimensions of the state commissions’ publications. We have published eight essays in all, exploring such issues as economic security, race, sexuality, labor feminism, and conservatism as they can be found in the publications by commissions in the WASM database.

    The combination of these two threads—the work of scholars in document projects and the publication of full-text primary sources—means that WASM is much more than a journal whose articles are accessible online. This is a lot of work, and we fully understand why most websites in U.S. history do not weave together interpretation and documentation and why those that do so usually focus on one historical actor, event, or time period. But now that the World Wide Web offers a sea of information, we wish there were more sites that took the next step and helped scholars construct meaning within that sea.

    As with other venues for scholarship in U.S. history, we have witnessed and promoted the expansion of the field to include more international perspectives. Our authors have increasingly brought us projects pertaining to U.S. women and international social movements. Nancy Hewitt and her students at Rutgers University prepared a document project, which we published in 2003, on the relationship between the women’s rights movement of the mid-19th century in the United States and contemporary British and European feminism. Colonial themes also added an international dimension to the website: Tracy Leavelle explored the interactions of 17th-century French explorers and Jesuit missionaries with Illinois women, and Patricia Cleary researched women’s sexual, familial, and public roles in 18th-century St. Louis, successively an outpost of Spanish and French empires.[9]

    As we saw this focus emerging, we actively sought to nurture it with two collaborations. First, we established a “Canadian initiative,” aimed at encouraging document projects for a special issue on Canadian women and social movements that we published in September 2009. Second, we organized a collaborative project with Japanese and American historians, encouraging the submission of document projects that explore the interaction of women reformers from Japan and the United States since the Meiji Restoration of 1869. From this collaboration, we have already published three bilingual document projects, beginning in March 2009, and we see this project as a model for how we can contribute to the internationalization of the history of women and social movements in the United States.

    In 2007, we began to create a new, complementary online archive, Women and Social Movements, International—1840 to Present (WASM International).[10] For this project, we have drawn heavily on the international community of historians of women, on archivists around the world, on our talented Binghamton University graduate students, and on the technical and editorial skills of Alexander Street Press. The project went “live” in January 2011 and should be complete by September 2013.We hope it will greatly enhance scholarship about women and social movements internationally by providing a wide range of systematic sources, including the proceedings of more than 500 women’s international conferences.

    If WASM was “born digital,” WASM International was “born digital database.” Both were constructed by scholars with a view to creating new knowledge. WASM International was much larger at its creation and designed with a view to its systematic analysis. With that analysis in mind and with a self-imposed limit of 150,000 pages of documents, we knew we needed to be thoughtful. We assembled an international advisory board of scholars who assisted in the selection of the archive’s resources, meeting with 40 of them at the Berkshire Conference on Women’s History at the University of Minnesota in June 2008. They helped us move beyond our U.S.-centric beginnings and construct a truly international resource.

    As the work of the project progressed, the international advisory board grew dramatically, reaching more than 130 scholars in 2011. Another women’s history conference, at the Aletta Institute in Amsterdam in August 2010, gave us a second opportunity to present the project-in-progress to an international group of women’s history scholars. Their comments and assistance have been enormously helpful. Throughout our work, we shared bibliographies with members of our advisory board and received excellent recommendations for additions to the archive. We also expanded the archival dimension of the project over time; by 2011, we had secured extensive materials—scanned or digitally photographed—from the Sophia Smith Collection, the Schlesinger Library, the Swarthmore College Peace Collection, the Library of Congress, Hollins University, the Aletta Institute, the International Institute of Social History, and the National Library of Australia.

    All our work on WASM International was shaped and facilitated by the electronic revolution that had taken place since we first worked in 1997 with Binghamton students on Women and Social Movements in the United States. For example, we relied from the outset on the powerful database program Zotero, which allowed us to download online catalog records from WorldCat to our own database.[11] This ensured the accuracy of our metadata and permitted us to construct intermediate bibliographies to view how the archive was taking shape. We relied on the new media and posted our topical bibliographies on the website of the Center for the Historical Study of Women and Gender at Binghamton University, which our editorial advisors accessed on the Internet.[12] We also downloaded Zotero entries to spreadsheets that permitted us to keep track of our work, particularly the major effort involved in securing permissions to publish copyrighted materials. With these spreadsheets, we could analyze the contents of our archive as it grew—analyzing it by dates, geographical regions, and topical coverage so that we could periodically take stock of our work and identify areas that remained underrepresented in the archive. This work permitted us to contact our international scholarly advisors and ask them for help with our coverage in their areas of expertise.

    The electronic revolution assisted our work in still other ways. From the outset, we had decided that a focus on organizations would help us identify a core of publications relevant to the history of women’s international activism. Drawing on scholarly writings and using keyword and corporate author searches in WorldCat permitted us to identify the publications of about a hundred organizations that had emerged as key players in promoting women’s networks and activism from the mid-19th century to the present.

    As we worked, we became aware that library catalogs provide a biased vision of women’s international organizing. Established organizations with North American and European memberships and long histories were much more fully represented in library holdings than groups founded recently in the Global South. To complement resources found in major academic libraries, we searched the World Wide Web for online publications of contemporary nongovernmental organizations (NGOs). In this way, we identified 15 to 20 particularly important organizations, found compelling samples of their online publications, and worked with the organizations to improve our coverage of their activism.

    In the course of constructing the online archive, our work took on preservation dimensions. For example, when we could not find a good run of the annual proceedings of the World’s Woman’s Christian Temperance Union (WWCTU) in library catalogs, we contacted (at the suggestion of a colleague in Australia) the national WCTU office in Evanston, Illinois. The WCTU library in Evanston no longer maintains regular hours, but a volunteer addressed our request and soon sent us a duplicate set of the WWCTU proceedings, for scanning and future donation to an appropriate research library. Thus our project unearthed rare copies of proceedings that were not really available to the public, and we were able to publish the proceedings online for scholarly use.

    We have had a similar experience with the International Women’s Tribune Centre in New York. As our work progressed, a professor at Hollins University who had heard about our work told us that an important international activist, Mildred Persinger, had donated her papers to Hollins. Focusing on the United Nations world conferences on women, from Mexico City in 1975 to Beijing in 1995, her papers constituted an international gold mine. We visited the archive and arranged to photograph more than 3,000 pages of manuscript and published documents.

    A preservation project emerged when, anticipating our archival trip, we met Mildred Persinger at her home in Dobbs Ferry, New York. From her, we learned that the International Women’s Tribune Centre (IWTC), which she had founded and directed for many years, was closing its office and moving its files to storage. She mentioned that we should try to get copies of slideshows that the IWTC had produced for each UN conference and find a way to include them in our archive. After months of intense effort, we eventually secured copies of slides for all four of the conferences and scripts and cassette tapes of the audio portions of the original slideshows. The resources dated back to 1975, 1980, 1985, and 1995. The slides were discolored with age, and the cassette tapes were uneven in quality, but we had everything digitized, and student workers at Binghamton University skillfully restored the slides. A skilled off-campus videographer then melded together slides and audio for each slideshow, following the scripts that IWTC staff had originally created. From the aged and deteriorating slides and cassettes of an earlier generation of activists, we now have produced four high-quality videos of the original slideshows, which should be useful to scholars and activists for decades to come. This success was only possible because of the networking that was a part of our project, the cooperation of IWTC activists who wanted the history of the conferences to be widely disseminated, and the skills and resources we were able to mobilize at our university.

    Our experiences with the WWCTU and IWTC materials were repeated with other resources. Many of the published works we have included in the online archive are available at only one or two of the libraries whose collections are recorded in the WorldCat online catalog. By borrowing resources through interlibrary loan, scanning them, and securing permission from the original copyright holders to publish the works online, we have made these rare works accessible at what we anticipate will be hundreds of research libraries around the world. Similarly, we are including hundreds of online publications of contemporary NGOs, most of which cannot be found in academic or public libraries. These documents will have brief lifetimes on NGO websites, soon to be displaced by more recent publications that better fit the organizations’ changing programmatic and fund-raising priorities. We have created an online archive that presents a slice in time, documenting the priorities of women’s international activism between the mid-1990s and 2010 as expressed on these NGO websites.

    Finally, the preparation of an online archive dramatically expands the potential audience for the rare and fugitive materials we have chosen for inclusion. We have included about 30,000 pages of manuscript and published materials from archives in the United States and Europe, selected to provide depth of coverage of significant international women’s organizations and events. In each case, we secured permission from the archive and from copyright holders and then made arrangements for digital photography or scanning to produce electronic copies of the documents that best seemed to fit the selection criteria we established for the archive. These materials do not circulate, and in many cases, it is difficult to determine that the archive actually owns the items in question. Previously, users would have had to visit the archive and conduct research onsite; now, these resources will be accessible in hundreds of academic libraries around the world. While the number of the site’s subscribers at this point is quite small, what is striking about early trends is that about a third of early subscribers or purchasers are libraries outside of the United States. Institutions in Canada, Iceland, Norway, the Netherlands, the United Kingdom, China, New Zealand, and Australia are among the site’s early subscribers.

    Although the site’s resources are not freely available on the Internet, they could never have been created without the subscription plan that funded the four years of work that produced them. Access is limited but is steadily expanding over time as more libraries subscribe. We expect that WASM International will eventually reach as broad a base of subscribers as WASM. Users who are not students or faculty at subscribing institutions can access the databases by visiting a subscribing library and using the resources there. This is the best we can do right now. We are still at an early stage in the evolution of this subscription model, and we continue to consider how we might better serve the needs of scholars and students at nonsubscribing libraries.

    A related issue is the concern about how access might be affected should Alexander Street Press decide to stop supporting the databases. Two provisions in the contracts related to the databases anticipate this possibility. First, we, as editors, hold the copyright on the documentary content of the databases, while ASP holds the copyright on the software that provides the user interface, the search engine, and the associated database. If ASP went out of business, we would be free to try to find another way to keep the sites’ content on the Internet—for example, by approaching a large research university or foundation and securing the needed funding to create a new site for our documents and document projects. Second, many academic libraries that provide access to the databases have purchased them from ASP. The press supplies purchasing libraries with a copy of the database, thereby ensuring its availability in the future. For Women and Social Movements in the United States, more than 180 libraries have purchased the database, assuring its online availability whatever might happen to ASP.

    What conclusions can we draw from this survey of our work on the Women and Social Movements websites since 1997? How does our work illuminate the emergence of electronic resources for historians in the past 15 years? First, while our topic might be perceived as narrow—women and social movements—it actually cuts a broad swath through all of women’s history, as well as U.S. and world history, and speaks to broad issues of social reform that have shaped the mainstream of other national and international histories. So it models how meaning can be constructed in the oceans of information flowing on the Internet. Second, our document projects on Women and Social Movements in the United States point to the crucial connection between historical interpretation and primary source documents, permitting historians to share their methods with others and permitting readers to examine historians’ primary sources and reach their own interpretive conclusions. Third, both projects draw on the participation of a broad community of scholars. At the same time, these sites help build and reinforce that community. The electronic revolution—supplemented by face-to-face meetings—has enabled us to involve hundreds of historians of women in the United States and elsewhere in publishing Women and Social Movements in the United States, and we have relied on the editorial advice of hundreds of historians, librarians, archivists, and activists internationally to construct Women and Social Movements, International. The electronic revolution has made this kind of collaboration easier, and we hope that our example might be useful to others who set out to create collaborative projects.

    From our reliance on WorldCat and other online catalogs to our use of the Zotero database program, WASM and WASM International have been born digital. Only with the electronic circulation of e-mail messages, attachments, bibliographies, and document scans have we been able to mobilize the women’s history community in the construction and use of these resources. The digital age is especially meaningful for historians. It is an exciting time to be working in circumstances so different than those we encountered when we first acquired the tools of our craft.


    The Hermeneutics of Data and Historical Writing

    Ongoing digitization of primary sources and the proliferation of born-digital documents are making it easier for historians to engage with vast amounts of research material. As a result, historical scholarship increasingly depends on our interactions with data, from battling the hidden algorithms of Google Book Search to text mining a hand-curated set of full-text documents. Even though methods for exploring and interacting with data have begun to permeate historical research, historians’ writing has largely remained mired in traditional forms and conventions. This essay discusses some new ways in which historians might rethink the nature of historical writing as both a product and a process of understanding.

    We argue that the new methods used to explore and interpret historical data demand a new level of methodological transparency in history writing. Examples include discussions of data queries, workflows with particular tools, and the production and interpretation of data visualizations. At a minimum, historians’ research publications need to reflect new priorities that explicate the process of interfacing with, exploring, and then making sense of historical sources in a fundamentally digital form—that is, the hermeneutics of data.[1] This may mean de-emphasizing narrative in favor of illustrating the rich complexities between an argument and the data that supports it. It may mean calling attention to productive failure—when a certain methodology or technique proved ineffective or had to be abandoned. These are precisely the kinds of lessons historians need to learn as they grapple with new approaches to making sense of the historical record.

    In this essay, we consider data as computer-processable information. This includes measurements of nearly every kind, such as census records, as well as all types of textual publications that have been rendered as plain text. We must also point out that while data certainly can be employed as evidence for a historical argument, data are not necessarily evidence in themselves. Nor do we consider data necessarily to be a direct representation of the historical record, as they are also produced by tools used to investigate or access large datasets. Given the myriad forms that data can take, making sense of data and using them as evidence has become a rather different skill for historians than it has been. For that reason, we argue that the creation of, interaction with, and interpretation of data must become more integral to historical writing.

    We call on historians to publicly experiment with ways of presenting their methodologies, procedures, and experiences with historical data as they engage in a cyclical process of contextualization and interpretation. This essay hopes to encourage more dialog about why historical writing must foreground methodological transparency and free itself from the epistemological jitters that make many historians wary of moving away from close readings or embracing the notion of the historical record as data.

    Data in History

    Use of data in the humanities has recently attracted considerable attention, no project more so than Culturomics, a quantitative study of culture using Google Books.[2] Of course, the idea of using data for historical research is hardly new, whether in the context of quantitative history, early work from the Annales school, or work done under the rubric of humanities computing. Yet the nature of data and the way it has been used by historians in the past differs in several important respects from contemporary uses of data. This is especially true in terms of the sheer quantity of data now available that can be gathered in a short time and thus guide humanistic inquiry. The process of guiding should be a greater part of our historical writing.

    Some scholars who work within the domain of the digital humanities have begun to think and write more explicitly about data and its potential for new kinds of research. For example, some Shakespeare scholars have been using statistical procedures to identify language features that signal classification in dramatic works.[3] The Stanford Literary Lab has been rethinking the nature of genre through semantic analysis. Yet most projects, including these, continue to be largely confirmatory, like reinforcing the periodization of Shakespeare’s plays or confirming the codified family of literary genres. To be clear, this is not a criticism of these projects and their outcomes—they are, in fact, a crucial step forward. As humanists continue to prove that data manipulation and machine learning can confirm existing knowledge, such techniques come closer to telling us something we do not already know. Other large-scale research projects, like those funded through the Digging into Data initiative, have begun to explore the transformative potential of data in humanities research as well.[4]

    However, even these projects generally focus on research (or research potential) rather than on making their methodology accessible to a broader humanities audience. To some extent, legitimizing digital work requires an appeal to the traditional values (and forms) of the nondigital humanities. But how can digital historians expect others to take their new methodologies seriously when new ways of working with data (even when not with sophisticated mathematics) remain too much like an impenetrable and mysterious black box? The processes for working with the vast amounts of easily accessible and diverse large sets of data suggest a need for historians to formulate, articulate, and propagate ideas about how data should be approached in historical research.

    Toward a Hermeneutics of Data

    What does it mean to “use” data in historical work? To some extent, historians have always collected, analyzed, and written about data. But having access to vastly greater quantities of data, markedly different kinds of datasets, and a variety of complex tools and methodologies for exploring it means that the term using signifies a much broader range of data-related activities than it has previously. The rapid rate of data production and technological change means that we must continue to teach each other how we are using and making sense of data.

    We should be clear about what using data does not imply. For one, it does not refer only to historical analysis via complex statistical methods to create knowledge. Even as data become more readily available and as historians begin to acquire skills in data manipulation as part of their training, rigorous mathematics is not necessarily essential for using data efficiently and effectively. In particular, work with data can be exploratory and deliberately without the mathematical rigor that social scientists must use to support their epistemological claims. Using data in this way is fundamentally different from using data for quantifying, computing, and creating knowledge as per quantitative history.

    Similarly, historians need not treat and interpret data only for rigorous hypothesis testing. This is another crucial difference between our approach and the approaches of the cliometricians of the 1960s and ’70s.[5] Perhaps such a potential dependence on numbers became even more unpalatable to nonnumerical historians after an embrace of the cultural turn, the importance of subjectivity, and a general epistemological stance against the kind of positivism that underpins much of the hypothesis testing baked into the design of statistical procedures and analytical software.

    But data does not always have to be used as evidence. It can also help with discovering and framing research questions. Especially as increasing amounts of historical data is provided via or can be viewed with tools like Google’s Ngram Viewer (to take a simple example), playing with data—in all its formats and forms—is more important than ever. This view of iterative interaction with data as a part of the hermeneutic process—especially when explored in graphical form—resonates with some recent theoretical work that describes knowledge from visualizations as not simply “transferred, revealed, or perceived, but . . . created through a dynamic process.”[6] Data in a variety of forms can provoke new questions and explorations, just as visualizations themselves have been recently described as “generative and iterative, capable of producing new knowledge through the aesthetic provocation.”[7]

    As the investment of time and energy to acquire data decreases, rapidly working with data can now be a part of historians’ early development and exploration of a research question. It can quickly illustrate potentially interesting pathways that are ultimately dead ends of scholarly research—”negative results,” perhaps, that should not be discarded as they likely would be for a typical scholarly book or journal article. It bears repeating that using large amounts of data for research should not be considered opposed to more traditional use of historical sources. As historical data become more ubiquitous, humanists will find it useful to pivot between distant and close readings. More often than not, distant reading will involve (if not require) creative and reusable techniques to reimagine and re-present the past—at least more so than traditional humanist texts do. For this very reason, it becomes insufficient to simply write about research as if it is independent of its methodology.

    Furthermore, rich datasets (like the Access to Archival Databases of the National Archives) and interfaces to data (like Google Fusion Tables) are making it easier than ever for historians to combine different kinds of datasets—and thus provide an exciting new way to triangulate historical knowledge.[8] Steven Ramsay has suggested that there is a new kind of role for searching to play in the hermeneutic process of understanding, especially in the value of “screwing around” and embracing the serendipitous discovery that our recent abundance of data makes possible.[9] This could result, for example, in noticing within the context of London’s central criminal court, the Old Bailey, that trials about poisoning tend to refer to coffee more than to other beverages and very rarely refer to food.[10] Thus our methodologies might not be as deliberate or as linear as they have been in the past. This means we need more explicit and careful (if not playful) ways of writing about them.

    Methodology in Writing

    Despite some recent methodological experimentation with data, historians have not been nearly as innovative in terms of writing about how they use it. Even as scholars (at least in certain fields) have embraced communication with new media, historical writing has been largely confined by linear narratives, usually in the form of journal articles and monographs. The insistence on creating a narrative in static form, even if online, is particularly troubling because it obscures the methods for discovery that underlie the hermeneutic research process.

    Historical work has needed to tell a good story, but methodology has not made for a very good story or the kind of historical writing that is likely to be published in traditional venues. Although relatively simple text searches or charts that aid in our historical analysis are perhaps not worth including in a book, our searches and work with data have grown increasingly complex, as has the data available to us. While these can present new perspectives on the past, they can only do so to the extent that other historians feel comfortable with the methodologies that are used. This means using appropriate platforms to explain our methods. Does it make sense to explain new research methods that are wholly dependent on large datasets and their manipulation and visualization in a static book that distances the reader from the tools and techniques being described? Of course, the realities of the profession restrict publishing freedoms (no one has gotten tenure for a really good website version of their dissertation), but our work need not be restrained by a false dichotomy between new media and old media. We suggest that exploratory methodological work can exist online in a perfectly complementary way to more traditional publication venues—and that the symbiotic pairing will make both elements the better for it.

    Regardless of form, we need history writing that explicates the research process as much as the research conclusions. We need history writing that interfaces with, explains, and makes accessible the data that historians use. We need history writing that will foreground the new historical methods to manipulate text/data coming online, including data queries and manipulation, and the production and interpretation of visualizations. As John Unsworth suggested long ago with respect to hypertext projects, history writing should explicate failure wherever possible.[11] As Tim Sherratt and Bethany Nowviskie suggested in their comments on an earlier version of this essay, one inspiring model for a new kind of publication is the artist’s sketchbook that maps out ideas, explorations, false starts, and promising leads.[12]

    There is no question that humanists can be—and, in fact, are trained to be—skeptical of data manipulation. This is perhaps the preeminent reason why methodology needs to be, at least for now, clearly explained. With new digital tools, we are still groping to understand how to identify the best methods for very messy circumstances of historical data. However, the reasons why many historians remain skeptical about data are not all that different from the reasons they can be skeptical about text. Historians have long reflected on the theoretical advantages and practical limitations of various methodologies and approaches to textual research. Critical theorists and historians alike have commented on the slippery notion of a text; some excellent theoretical work on cybertext and hypertext have muddied the waters further. The last few years have complicated such a notion even more, as many traditional texts have come to be seen as data that can be quickly searched, manipulated, viewed from a variety of perspectives, and combined with other data to create entirely new research corpora. Just as the problematic notion of a text has not undermined the hermeneutic process, nor should the notion of data. It is clear that a new relationship between text and data has begun to unfold.[13] This relationship must inform our approach to writing as well as research.

    One way of reducing hostility to data and its manipulation is to lay bare whatever manipulations have led to some historical insight. Methodological tutorials, for example, not only would help legitimate the knowledge claims that employ them but would make the methodology more accessible to anyone who might recognize that the same or slightly modified approach could be of value in their own work. Beyond explicit tutorials, there are several key advantages in foregrounding our work with data: (1) it allows others to verify historical claims, (2) it is instructive as part of teaching and exposing historical research practices, (3) it allows us to keep pace with changing tools and ways of using them. Besides, openness has long been part of the ethos of the humanities, and humanists continually argue that we should embrace more public modes of writing and thinking, as a way to challenge the kind of work that scholars do. For example, Dave Perry’s blog post “Be Online or Be Irrelevant” suggests that academic blogging can encourage “a digital humanism which takes down those walls and claims a new space for scholarship and public intellectualism.”[14] This cannot happen unless our methodologies with data remain transparent.

    Case Study: Becoming Users and Communities of Data

    Our theoretical and prescriptive remarks thus far will benefit from a concrete example—in this case, one that explores the history of the user. The notion of the user has become ubiquitous. We live in an era of usernames, user experiences, and user-centered design; we tacitly sign end user license agreements when we install software; we read user guides to figure out how to get our software to do what we want. But our omnipresent conception of ourselves as users obscures the history of the term.

    Of course, it now takes only seconds to follow this line of inquiry (the history of a term) and see the relationship between the presence of that term and any other similar terms, as Google’s Ngram Viewer allows anyone to chart the frequency of words or phrases across a subset of the digitized Google Books corpus.[15]

    Needless to say, the chart in figure 6 is not historical evidence of sufficient (if any) rigor to support historical knowledge claims about what is or is not a user. (See the original image in the web version of this essay at http://WritingHistory.trincoll.edu.) For one thing, Google’s data is proprietary, and exactly what comprises it is unclear. Perhaps more important, this graph does not indicate anything interesting about why usage of the term user spiked as it did—the real question that historians want to answer. But these are not reasons to discard the tool or to avoid writing about it. Historians might well start framing research questions this way, with quick uses of the Ngram Viewer or other tools. Conventionally, this work would remain invisible, and only “real” data would appear in published work, to support an argument of influence or causation. But foregrounding such preliminary work (like Ngram charts) will help readers to understand the genesis of the question, flag possible framework errors, and identify category mistakes, and it will perhaps inspire them to think about how such techniques might benefit their own work.

    To investigate the term user in more detail, one can use other online corpora to generate a series of radically different interpretive views. For example, searching in the Time Magazine Corpus allows one to see all of the collocates (words that appear within a specified number of words from the search term) and to display counts by decade.[16] A resulting list of the words that appear most often within a four-word window of the term user makes it easy to see that the word drug appears within four words of user thirty-two times. (See the image in the web version of this essay at http://WritingHistory.trincoll.edu.) To better make sense of these results, the collocates can be coded into two categories: those that have to do with drugs and those that have to do with technology. A spreadsheet created to chart these results drew attention to patterns in the data via cell highlighting to differentiate terms with two hits from those with more than two hits.

    Fig. 6. Google Books Ngram view of the frequency of selected words (user, producer, consumer, customer) from 1900 to 2000
    Fig. 6. Google Books Ngram view of the frequency of selected words (user, producer, consumer, customer) from 1900 to 2000

    On the whole, this charting of collocates of “users” lends itself to some quick observations. For example, as far as the keywords in the Time Magazine Corpus suggest, the growth around the term user happened for drugs a bit earlier than for technology, although the latter context came to be the predominant one. We can also see that one of the first technology terms to appear is telephone, which perhaps suggests that the rise in usage of the term user may have may have to do less with the rise of computing (our typical conception of it now) than with the rise of networks.

    Going beyond the data—making sense of it—can be facilitated by additional expertise in ways that our usually much more naturally circumscribed historical data has generally not required. Owens blogged about this research while it was in progress, describing what he was interested in, how he got his data, and how he was working with it, as well as providing a link for others to explore and download the data.[17] Over the next week, the post was viewed over two hundred times; twenty-two researchers and librarians tweeted about the post. Most important, Owens received several substantive comments from scholars and researchers. These ranged from encouragements to explore technical guides, scholarship on the notion of the reader in the context of the history of the book, and suggestions for different prepositions that could further elucidate semantic relationships about “users.” This discussion resulted from Owens having foregrounded his initial forays into data online, where it was easy to give different views of his data. Sharing preliminary representations of data, providing some preliminary interpretations of them, and inviting others to consider how best to make sense of the data at hand quickly sparked a substantive scholarly conversation. This is not to say that we should expect everyone to help with our own research. But because we have so much raw data that ranges widely over typical disciplinary boundaries, a collaborative approach is even more essential to making sense of data, and it benefits everyone involved, as the discussants can learn about data and methodologies that might be useful in their own work.

    In addition to accelerating research, foregrounding methodology and (access to) data gives rise to a constellation of questions that are becoming increasingly relevant for historians. How far, for example, can expressions of data like Google’s Ngram Viewer be used in historical work? Although a chart from historical data should not be automatically admitted as historical evidence in itself, it certainly can be used to identify curious phenomena that are unlikely to be artifacts of the data or viewer alone. But how does one cite data without “black-boxy” mathematical reductions and bring the data itself into the realm of scholarly discourse? How does one show, for example, that uses of the term sinful in the nineteenth century appear predominantly in sermon and other exegetical literature in the early part of the century but become overshadowed by more secular references later in the century? Typically, this would be illustrated with pithy, anecdotal examples taken to be representative of the phenomenon. But does this adequately represent the research methodology? Does it allow anyone to investigate for themselves or to learn from the methodology?

    It would be far better to explain the steps used to collect and reformat the data; ideally, the data would be available for download. The plain text file that was reformatted to show the aforementioned linguistic shift in the usage of sinful would be considerably useful for other researchers, who, in turn, would certainly make other observations and draw new and perhaps contradictory conclusions. Exposed data allow us to approach interesting questions from multiple and interdisciplinary points of view, in the way that citations to textual sources do not. Again, we are arguing not for wholly replacing close readings and textual analysis in historical research but, rather, for complementing them with our explorations of data. As it becomes easier and easier for historians to explore and play with data, it becomes essential for us to reflect on how we should incorporate this exploration and play as part of our research and writing practices. Is there a better way than to simply provide the raw data and an explanation of how to witness the same phenomenon? Is this the twenty-first century footnote?


    Overall, there has been no aversion to using data in historical research. But historians have started to use data on new scales and to combine different kinds of data that range widely over typical disciplinary boundaries. The ease and increasing presence of data, in terms of both digitized and increasingly born-digital research materials, mean that the historian—irrespective of historical field—faces new methodological challenges. Approaching these materials in a context-sensitive way requires substantial amounts of time and energy devoted to understanding and exploring the particular ways and the degree of precision with which we can interpret data. Consequently, we have argued that historians should deliberately and explicitly share examples of how they are finding and manipulating data in their research, with greater methodological transparency, to promote the spirit of humanistic inquiry and interpretation.

    We have also argued that working with and writing about data does not mean that historians need to shoulder the kinds of epistemological burdens that underpin many of the tools that statisticians or quantitative historians have developed. This is not to say that statistics are not a useful tool for inquiry. But the mere act of working with data does not obligate the historian to rely on abstract data analysis. Historical data might require little more than simple frequency counts, simple correlations, or reformatting to make it useful to the historian looking for anomalies, trends, or unusual but meaningful coincidences.

    To argue against the necessity of mathematical complexity is also to suggest that it is a mistake to treat data as self-evident or that data implicitly constitute historical argument or proof. Historians must treat data as text, which needs to be approached from multiple points of view and as openly as possible. Working with data can be playful and exploratory, and useful techniques should be shared as readily as research discoveries. While typical history scholarship has largely kept methodology and data manipulation in the background, new approaches to writing can complement more traditional methods and venues and thus avoid some of their well-documented limitations, especially as those new approaches enable sharing data in a variety of forms.

    To best use the new kinds of historical data that have opened up new avenues of inquiry for virtually every historical specialty, gathering data, manipulating it, representing it, and, of course, writing about it should be required of all historians in training—not just those in digital history or new media courses. Of course, not all research projects will require facility with data. But just as historians learn to find, collect, organize, and make sense of the traditional sources, they also need to learn to acquire, manipulate, analyze, and represent data. Access to historical sources makes the historical record look rather different in the twenty-first century than it ever has before. Writing about history needs to evolve as well.


