|Author:||Deborah Lines Andersen|
|Title:||Qualitative Data and Computers|
|Publication info:||Ann Arbor, MI: MPublishing, University of Michigan Library
This work is protected by copyright and may be linked to without seeking permission. Permission must be received for subsequent distribution in print or electronically. Please contact email@example.com for more information.
Qualitative Data and Computers
Deborah Lines Andersen
vol. 5, no. 2, September 2002
Qualitative Data and Computers
Benchmark: a standard by which something can be measured or judged. 
.01. Struggling with Ulysses
In 1970, recently graduated from college and a new graduate student in English, I took a course on James Joyce's Ulysses,  plunging into 783 pages of text that defied the rules of punctuation, narrative, and propriety that I was used to. The propriety part came from sections that some critics had called pornographic. There was, in fact, so much controversy over Ulysses that its appropriateness came before the United States District Court, Southern District of New York in "United States of America v. One Book called "Ulysses," Random House, Inc." After much study and deliberation, John M. Woolsey, United States District Judge, rendered a decision on December 6, 1933 that lifted the United States ban on Ulysses. He doing so, he wrote
I am quite aware that owing to some of its scenes "Ulysses" is a rather strong draught to ask some sensitive, through normal, persons to take. But my considered opinion, after long reflection, is that whilst in many places the effect of "Ulysses" on the reader undoubtedly is somewhat emetic, nowhere does it tend to be an aphrodisiac.
"Ulysses" may, therefore, be admitted into the United States. (Opinion A. 110-59, United States District Court, Southern District of New York, December 6, 1933)
Twenty seven years later this decision allowed me and my fellow classmates to pour over the adventures of Leopold and Molly Bloom, Stephen Dedalus, and Buck Mulligan in the Dublin of 1904.
That pouring, in 1970, was in the form of textual analysis for the inevitable papers we wrote in that graduate seminar. The analysis was by hand, searching all 783 pages for the appropriate references that would support the arguments we were putting forth. I still have my copy of Ulysses. It is full of my penciled marginalia for those papers.
.02. Ulysses and the Computer
In 1970 none of us were blessed with the technologies that exist today to do careful data analysis with the help of our computers. I had a typewriter. The only scanners available were the human sort. If I were writing that paper today I would first check to see if Ulysses was available on the World Wide Web in full text, digital format. If so, I would download it to my hard drive, drag and drop it to a piece of content analysis software, and query the entire book about James Joyce's use of the word "atonement," the topic I chose for my final paper. If I could not find it on the Web, I could use a digital scanner with good optical character recognition to still move a digital version to my hard drive and perform this analysis.
The task of qualitative data retrieval and analysis is, perhaps obviously, critical to the historian as well as the literary critic. So much of what the historian does is based on qualitative, not quantitative data. The hard part of constructing an historical argument is often the sifting through materials to find the right sources to support that argument. Primary source documents are surfacing on the World Wide Web at a tremendous rate, but they are often bit-mapped so that one may not perform computerized content analysis. Furthermore, the grist of many historians' mills is not material found on the Web in the public domain, but the dusty papers of attics and archives. With the use of optical character recognition software,  and software that will recognize handwriting and create digital text,  the computer-savvy historian has the ability to peruse text at an alarming rate, eliminating documents that do not contain pertinent materials, and printing out sources, and pieces of sources, that are exactly what he or she was looking for.
.03. Tools for Textual Analysis
There are a variety of tools necessary for this sort of analysis. First and foremost, the student or scholar must understand historical methods. The computer is no substitute for a firm grasp of the basics of historical analysis. Texts and courses abound that deal with these competencies. Furthermore, students and academics need to have an understanding of qualitative data analysis in order to collect and analyze their data without fear of bias. 
Second, I believe that textual analysis using computer software requires a firm grasp of the search strategies that one learns in online database retrieval courses. As anyone who searches for information on the World Wide Web knows, one can use search terms that garner no results, or the wrong results, or so many results that the effort is worthless. Good searching and retrieval require finely tuned searching skills. 
Finally, textual analysis requires software programs that are designed to make searching efficient and effective. These programs are becoming more and more sophisticated in their applications. The digital scholar must therefore be comfortable with the technology—comfortable enough so that the greater part of his or her time is spent on historical analysis, not on struggling with the software. This issue is addressed in detail below.
Historical methods are not the focus of this essay. Alternately, search strategies and software programs are critical tools that deserve more attention in qualitative data analysis methodologies.
.04. Search Strategies
For the academic historian, and his or her students, the topic of information search strategies has become more and more important in the advancing age of the World Wide Web. As information is made available on the Web, students, especially undergraduates, tend to rely on it as their sole information source. Perhaps there will come a time when enough quality information is available on the Web so that scholars can do research with their computers as their only information portal. If individuals do not have the immediate help of a librarian or archivist, then they will need to have internalized a vast array of search strategies to find information for their work.
What are some of these skills? To begin with, searching requires a good grasp of language, and the tools to aid in that process. A thesaurus, a dictionary, and an encyclopedia are central to understanding a topic and finding words that would be good search terms. These are central to good online database searching. A quick search of the index to Roget's Thesaurus  indicates that "compensation," recompense," "reparation," "restitution," and "worship" are all searchable synonyms for "atonement." Furthermore, synonyms for "atone" include "compensate," "harmonize," "make amends," "make restitution," "put in tune," and "repay." To complicate matters a bit more, if one believes that Joyce was playing with words, then "atonement" also deals with a state of "at-one-ment" and one must also look up "agreeing," "in accord," and "unanimous" as synonyms for "at one."
Interestingly enough, the not-so-simple task of finding all the instances of "atonement" in 783 pages of Ulysses could be made much more complicated. A human being could spend months pouring over the text. A computer could spend minutes. The question is, what would the scholar do with all the information that the computer might spit out?
Beyond the generation of search terms are the other skills known to the online search strategist. There is Boolean logic that allows one to search for instances of multiple terms in the same passage, either requiring that both terms or one of two terms be present. There is the technique of truncation, in which one might ask the computer to find all instances of "aton*," allowing for "atonement," "atone," and "atoning" as possible found terms.
With the power of the computer comes the need to create powerful researchers.
.05. Software Programs
After the scholar has identified appropriate search terms, he or she needs a computer program that will retrieve these terms in given text or texts. Searching the World Wide Web (August 2002) produced a "Software for content analysis" site  that listed content analysis software (33 programs), software for qualitative data analysis (16 programs), software for video analysis (10 programs), text management software (17 programs), image and video management software (14 programs), and audio management software (2 programs). One example of analysis software from this site is "ATLAS.ti." 
The ATLAS.ti web site describes the product as a "powerful workbench for the qualitative analysis of large bodies of textual, graphical, audio and video data,... [offering] a variety of tools for accomplishing the tasks associated with any systematic approach to 'soft' data, e.g., material which cannot be analyzed by formal, statistical approaches in meaningful ways." After the researcher "drags and drops" text from a word processing program, ATLAS.ti employs network-building features that allow the researcher to connect selected passages though diagrams, while allowing reference to the original text at any time. This diagramming facilitates the creation of pattern maps in the data in a visual as well as verbal mode. This software program supports on-screen coding of data, identification of words, sentences, and passages that contain the subject(s) under study, Boolean searching, and the use of SPSS (statistical data analysis) in cases where the text can lead to quantitative data analysis.
There is necessarily time investment in learning to use any software program, and in learning to use it well enough so that the software is secondary to the task. The ATLAS.ti web site gives screen shots and applications so that one can see examples of its use. Furthermore, all its materials are available online, and there is an infoline for users. ATLAS.ti is just one example of the software programs that are available to qualitative researchers. It is one that has been recommended to me and therefore has been used as an example here.
.06. The Future
A future scenario inevitably will include more information available to more scholars in digital format. The archiving of E-mail and the use of computers to create almost all the documents that we see every day will provide more information than anyone knows what to do with. Thus, it would be helpful if future technologies provide scholars with means of sifting through mounds of information to find what is truly useful and valuable.
Second, it is probable that optical character recognition software will improve, that scanners will become more sophisticated and cheaper than they presently are, that computer speed and capacity will increase, and that data analysis software will be stronger and more sophisticated. These are all technological innovations that will not be directed at historical research (military applications usually lead the way), but that will nonetheless be extremely useful to the historian of the future.
Perhaps, in a Star Trek type of scenario, the historian of the future will find herself asking her computer for "instances of the word 'atonement' in Ulysses" and the computer will suggest synonyms, indicate variant spellings, give numbers of instances, and discuss how much of each passage the researcher needs to study.
Today we have software sophisticated enough to do qualitative data analysis on full textual material. That is the benchmark. It remains to the future to see how technology will streamline this process for the historical scholar.
1. "Benchmark," American Heritage Dictionary, 4th ed., 2000.
2. James Joyce. Ulysses. New York: Random House, 1922.
3. There are a variety of optical character recognition products on the market. They are evaluated by how well they can recognize letters on a page and reproduce them accurately. Ninety eight percent accuracy has been a recent benchmark. Thus, the researcher must proofread the scanned text to eliminate inaccuracies in the translation.
4. Think of the present use of a stylus with palm pilots to turn handwritten letters into print ones. There are also very sophisticated voice recognition programs such as Dragon Dictate.
5. See: Anselm Strauss and Juliet Corbin. Basics of Qualitative Research: Grounded Theory Procedures and Techniques. Newbury Park: Sage Publication, 1990, for one example of a qualitative research methods text.
6. See: G. Walker and J. Janes. Online Retrieval: A Dialog of Theory and Practice. 2nd ed. Englewood, CO: Libraries Unlimited, 1999, for an example of a text that specifically deals with retrieving digital materials.
7. Roget's International Thesaurus. 4th edition. New York: Harper & Row, Publishers, 1977.
9. http://www.atlasti.de/atlasneu.html Thanks to Luis Reyes Luna who recommended this software and has used it for data analysis.