All Knowledge, Past and Present
Skip other details (including permanent urls, DOI, citation information)
This work is protected by copyright and may be linked to without seeking permission. Permission must be received for subsequent distribution in print or electronically. Please contact email@example.com for more information. :
For more information, read Michigan Publishing's access and usage policy.
ABSTRACT Susan S. Lukesh discusses the long-term desire for scholars to have all the information on one subject — in this case prehistoric pottery — gathered together for easy access in relation to possibilities that are available today. The subject is far broader than pottery and directly relates to the critical issue of modern scholarship and access to raw data underlying all analyses presented in paper and digital publications today.
The dream is an old one: to have in one place all knowledge, past and present. All books, all documents, all conceptual works, in all languages. It is a familiar hope, in part because long ago we briefly built such a library. The great library at Alexandria, constructed around 300 B.C., was designed to hold all the scrolls circulating in the known world. At one time or another, the library held about half a million scrolls, estimated to have been between 30 and 70 percent of all books in existence then. But even before this great library was lost, the moment when all knowledge could be housed in a single building had passed. Since then, the constant expansion of information has overwhelmed our capacity to contain it. For 2,000 years, the universal library, together with other perennial longings like invisibility cloaks, antigravity shoes, and paperless offices, has been a mythical dream that kept receding further into the infinite future.
Over the past thirty years, I developed an online database of pottery information for my own research. It contains information about the pots and sherds found in excavations in Sicily and Southern Italy where my work as an archaeologist has taken me. Recently I developed a version for whole pots to which I was able to add direct access to multiple images of each object. It is this new database that begins to fulfill my dream. It is, for me, the beginning of a great library, perhaps as useful to me as the great library at Alexandria may have been to scholars of that time.
In this article I try to capture how the seemingly small technical change, the ability to directly view multiple images of each object, is turning my local library into the great library at Alexandria. Not many years ago, I oversaw the academic library at my institution and became fascinated with the changing landscape of scholarly communication. In the process of writing this article I have also come to better understand the next frontier in the development of scholarly communication, sharing the data that lie behind published research.
Raw Data and the Scholarly Enterprise
With digital publications and the ease of presenting multiple supporting images, and rapid, if not immediate, access to research in preprint published form, gaining access to the raw data that lies behind published research is the next frontier in the development of scholarly communication. As Eysenback and Sa write, the move to "opening up raw data for research has strong parallels to the "open source" movement of the software industry, where developers freely distribute the source code and allow usage and modification" (166). Although there are compelling arguments for publishing raw data, there are numerous hurdles as well, not the least of which is the inherent compulsion to reserve one's raw data for one's own additional research. See Tajli and Theologois and Davis on issues of data sharing, and Eysenback and Sa's suggestion of a clear code of practice for data sharing.
Raw data are critical for three reasons. The first and most obvious is to verify the analyses presented, as with the recent discussion of Korean stem cell research. A second reason for making raw data available is for further analysis of the data either by itself or with other similar data for further analysis and testing of conclusions drawn from the first set. A third reason is to present the complete picture of the material under study, in effect to produce a digital archive of material catalogs. The driving force for publishing raw data varies and the needs for validity checks in the sciences and the medical profession in particular are far more compelling than those in my own field of archaeology. However, in archaeology, as in some other disciplines (natural science comes to mind), the significance is the corpora of material, the more detailed the better.
Those detailed corpora are often referred to as catalogues raisonnés, which have a long tradition serving scientists and scholars especially from the 16th century forward. The catalogue raisonné is an exhaustive work, meant to catalog all that is known on a specific subject and a close relative to the French Encyclopédie, Ou Dictionnaire Raisonné. Wightman's Science and the Renaissance (1962), charts the course of progress in the emergence of sciences in the 16th century, a time known not only for critical editions of early Greek works, but for accumulations of immense amount of data. Wrightman's work itself originated as a catalogue raisonné of the collection of 16th century scientific works in the University Library of Aberdeen (Rattansi).
Today the term catalogue raisonné appears to be used almost exclusively for catalogs of art works covering all known works of an artist up to the time of publication. These catalogs include the works' dimensions, descriptions, provenances, current locations and a bibliography. They are invaluable for sellers and collectors of prints, they help advanced collectors authenticate work, and they give beginners a starting point.
And today we see also a few examples of online catalogues raisonnés, limited to modern art. They include the Online Catalogue Raisonné of the works published by Gemini G.E.L. at the National Gallery of Art (http://www.nga.gov/gemini/search.html); the catalogue of the lithographs made at Tamarind (http://libxml.unm.edu/tamarind/ ); the online Catalogue Raisonné published by the Rembrandt Research Committee (http://www.vertius.org/rembrandt/catintro.php); and the online catalogue raisonné of Vincent van Gogh (http://www.vggallery.com/).
These online catalogues raisonnés are an important start since collections whose purposes include presenting all material on a specific subject can rarely be said to be complete. Online publications can be expanded easily; with the added tools of selection and comparison, the catalogues raisonnés become even more useful. My bronze age pottery archive can also be called an online catalogue raisonné, albeit in the early stages, since it cannot claim even to be an exhaustive collection of Italian prehistoric pottery.
The significance of my database of bronze age pots is the gathering and publication of a large collection of material to allow me (and my fellow scientists) to further refine hypotheses and develop new theories of use and production to better understand past civilizations. Here the raw data is the pictures and data about the material, with links to published analyses. My database follows the model set with the published "corpora" beginning at the end of 17th century, when collections were produced as illustrated catalogues that featured geometrical plates of pot cross-sections, exact measurements, and proportions of different parts. Though rare and of poor quality because of the cost of production, such publications were extremely valuable to researchers and archaeologists.
The early developments in the 17th-19th centuries were followed early in the 20th century by Pottier's development of the multi-volume Corpus Vasorum Antiquorum (CVA). In the first volume Pottier wrote about the necessity of having "a picture of the object alongside the words describing it," and added that "without illustrations, the best descriptive catalogues remain an almost useless instrument in the hands of researchers, despite all the time and trouble that have been spent on them" (as quoted in Rouet, p.50).
The publication of pottery corpora springs from the instinct to have at one's fingertips all the raw data on one subject that might interest individual scholars and scientists: all measurements, descriptive data and images on which pottery analyses are based today. The major failure of many of the early corpora was the illustrations, both their number and their quality. Even with those problems, there were not enough corpora because of the daunting cost of printing and dissemination. These issues can readily be addressed with current technology. (For a more detailed discussion of this history of ancient pottery publication, see Lukesh, forthcoming.)
Bronze Age Archive - an example of raw data publication closely coupled to multiple images
Solutions for sharing raw data are currently and will continue to be driven by individual disciplines and the shared history their researchers hold. That there will be no one solution for all disciplines is clear; that within a discipline or sub-discipline there will only be some shared solutions is also clear. As a researcher whose raw data are pots — and often pots that have very recently seen the light of day for the first time in a few thousand years — my interests are quite different from those of medical researchers, for example.
My original database solution had worked wonderfully for locating patterns among pots, allowing selection based on multiple criteria (shape, size, neck type, and decoration motifs, for example). Until this new version, however, once I had the subset answering my query, I had to turn shoeboxes or file cabinets with slides, photographs, and drawings to find the images of the selections for visual comparison.
Today there are numerous Web sites and CD publications that display digital pictures of archaeological material. The University of Kansas Museum of Anthropology has developed, with the assistance of a Digital Library Initiatives grant, a digital exhibit of 30 of its pots from the Kansas City Hopewell archaeological sites (http://www.anthro.ku.edu/hopewell/). The Peabody Museum of Archaeology at Harvard, the Peabody Museum at Yale, the Phoebe Heart Museum of Anthropology at Berkeley, and the Potteries Museum of the City of Stoke on Trent, among others, offer online access to their collections. The Metropolitan Museum of Art has produced an electronic database of 424 Cypriot terracotta sculptures available on CD-ROM. Overall, these offerings are valuable as avenues to discover museum holdings and they may be of interest to the lay public and students. As tools to facilitate study of objects, at this stage scholars tend to use them only to acquaint themselves with collections before traveling to study the material in depth.
These offerings usually are not supported by robust databases and rarely, if ever, allow side-by-side visual comparison. The ability to access the hundreds of fascicles of Corpus Vasorum Antiquorum on the Web today (http://www.cvaonline.org/cva/projectpages/CVA1.htm) is useful but severely limited. Only a single collection or volume can be reviewed at a time and there are no selection capabilities. Even ArtStor and the Perseus project in its recent work with MFA Roman coins only added side-by-side comparison late in their development.
By comparison, my new database is based on my research-specific needs and, for the moment, has left the technical capabilities of high resolution images (zooming, etc.) and external user needs (downloading of raw data sets), to later development. (These are also areas for which other technical solutions are available.) More important in this iteration was developing the capabilities of a robust database, wide-ranging selection options and immediate image display — in short, the immediate requirements to undertake analysis of the material.
Bronze Age Pottery Archive: A tool for the study and presentation of artefacts
Database and metadata description
Working with a simple, but nonetheless interesting corpus of Bronze Age Italian pottery, I am developing an archive or repository of Bronze Age pottery that allows easy review of images of each item, access to detail information for each item, selection of subsets of items (based on size, pot shape, site etc), and, of critical importance, comparison of up to eight items on a single screen. In essence, this archive facilitates the direct comparison of objects. In this case it is pots. It could also be coins, or plants, or even paintings. Before I turn to the selection and direct comparison of pots, let me discuss the underpinnings of the archive.
The archive is built on an earlier database that was originally designed to record pottery fragments and decorative motifs in the course of excavation. It proved invaluable for subsequent study of the pottery from the sites. This new version of the database, built on WebFOCUS software from Information Builders, is used for whole or almost whole pots. A small number of data items describes each pot in the database, and those data items are used for searching and categorization. That small number of items is the metadata. Metadata has been described as "structured information that describes, explains, locates, or otherwise makes easier to retrieve, use or manage an information resource. Metadata is often called data about data or information about information' (Hodge 3). Metadata as a concept might have existed before computers, but I use it as Tim Berners-Lee does, to refer specifically to as machine-understandable information about Web resources or other things (Berners-Lee). "Machine understandable" is key, since metadata is information that software agents like databases can use. Berners-Lee says that metadata has well defined semantics and structure. "[M]etadata can be stored regarded as data, it can be stored in a resource. So, one resource may contain information about itself or about another resource." One interesting characteristic about metadata, identified by Stefan Gradmann, is that metadata is very user centric. "[T]hey are driven by very specific end user requirements to a very high degree. This could be seen as a disadvantage since changes in end user behaviour and the context of usage are likely to affect such an approach fundamentally with the risk of lacking continuity — however, this characteristic probably is considered a positive aspect today."
That focus on end-user needs is important to me and others as researchers: we need very much to keep specific requirements of the users in mind. I suggest that metadata for specific situations is strictly and best driven by the users of the data. Only those who toil in the fields of the data know what they need to make sense of it. With that in mind, presented below are key metadata elements of the Bronze Age Archive of prehistoric pottery.
Critical Metadata in the Bronze Age Archive
Site: The site at which a pot was found (its provenance) is critical to the analysis of pottery. Only those sites whose pots are included are available for selection.
Assemblage: This term is used to define a distinctive pottery type understood by archaeologists familiar with this material. In prehistoric Southern Italian and Sicilian pottery, for example, there are many defined pottery types, related to each other in varying fashion. Assemblages often cross sites and sites may have examples from a number of assemblages. Four sites and four assemblages are available today in this archive.
Pot type: This term is used to define specific types of well-know pot shapes. Amphora, for example, is a commonly understood shape. With the material specific to this archive, there are various shapes that cross assemblage and site lines; pedestalled vase (Figures 15 and 16) is one of these, distinct to Sicilian prehistoric pottery.
Pot Form:The generic pottery forms derive specifically from my long work with fragmentary pots and the need to come closer to understanding the pot shapes represented. Figure 1 lays out the schema used to derive pot form from fragments and Figures 2 and 3 show examples available in the archive.
Pot Size: Measurements of all pots are taken and recorded where-ever possible. (The individual measurements may be a series of diameter measurements (rim, neck shoulder, body, base) and heights (base to rim, shoulder to rim, base to handle height).
For the archive I divided pots into "small," "medium," and "large" so that I could select pots on more general sizes rather than specific heights or diameters. The archive contains all the data and selection of data can also be made on, for example, height greater than 30 cm. Additionally, the determination of small, medium, and large can be easily reconfigured in the code.
Decoration Types: Prehistoric pottery from this period may be decorated by painting, excision, incision, or impression, or any combination.
Decoration Motifs: Up to 10 specific motifs are available for selection. Figure 8 (below) shows an example of a page of the available motives for selection. As a new motif is added to the archive, it is immediately available in the selection routine. As with pot shapes, motifs cross the lines of sites and assemblages.
In addition to the metadata used for direct selection of pots, the database contains information on type and number of available images. In the past, this information was used to locate the actual images, whether they were in the drawing file or the photo drawer. Today the information is used by the software to determine the path for the specific image, and to select which of the available images is chosen for display.
Walking through the archive
The archive today contains records and images for close to 300 pots from four sites (San Antonio, Buccino; Ustica; La Muculufa; and Grotta Ticchiara). The images themselves are not objects within the database, which cuts the processing time required to handle very large records, especially when the database is accessed without regard to images. The software contains algorithms based on other data elements to find the source of each image and display it when necessary. Ideally, photographic images would be of a high resolution and of professional quality; this archive uses what is available. In some instances these are photographs from the late 1960s, in other instances they are scanned images from books. Even without the best photographic images, it is clear that the images provide good-enough quality for study purposes.
The archive is available online as a demo from the software company to show the power of their database product (see "Bronze Age Pots" at http://demoprod.informationbuilders.com).
The opening menu of the archive (Figure 4) offers options for access to the pots, to decoration schema (Castelluccian in this instance), pot shapes (designed especially for Bronze Age material but useful generically), maps and plans, and bibliographic information. Clicking through on the maps and plans link brings the user to drawings and photographs of the sites, as well as larger maps and reconstructions. (See Figures 5 and 6.)
The database also includes narrative information on the site that comes from various publications, all listed in the bibliography available on the site.
Also from the opening menu we can select the option to review decoration motifs. From the design motifs screen (Figure 7) we can select (and print) a copy of the most current report on decoration motifs (Figure 8).
Data carried in this database includes the code, an image, a verbal description, and the site at which the motif was first recognized.
Selection of pot shapes from the opening menu shows the schema as developed during the years of working primarily with fragmentary pots. It is a useful way to determine general categories of pot shapes; detail on these generic forms is also in the database (see Figure 9).
Selecting the pot image from the opening menu (Figure 4) and then the All Pots options presents a table of information on each pot in the archive. Figure 10 shows the primary table (two others, measurement ratios and design motifs, are also available). In the 1970s and 1980s, a number of publications included pages of "raw data." These lists may have offered the data on which the author based his or her analysis, but were virtually unusable. In one instance, the author offered that a tape of the raw data would be sent to any reader who wished. I can attest in this instance that there were no takers: I was listed as the provider although I was neither the analyst nor the author. This table (Figure 10), a major feature of the archive, not only presents row on row of data per object but allows immediate sorting and access to further detail behind each cell. The tables of raw data have become dynamic instruments for anaylsis.
Each line of the table includes a clickable link to pictures of the pot, access to the full database record behind the pot (clicking the number on the left), identification information, a thumbnail image, pot type, and various other details. If the full database record is accessed, a page with all data available and two images (pot shape and main image) appears. Additionally, the codes for the individual form components (neck, body, base etc.) can be toggled to reveal their meaning. One (body type) is toggled on the figure displayed (Figure 10). Two additional tables are available from buttons on the bottom of the screen: measurement ratios (which shows such ratios as height to width for comparison purposes) and design motifs. Clicking on a specific motif code brings up further information on the motif.
Clicking on the thumbnail image on the main table (Figure 10) brings up the full page of images and measurement data of the pot (Figure 11).
From this screen (which can also be accessed through the BROWSE button on the Pots option menu), the user can scroll forward or backward through the database as it is currently sorted.
The pot records can be sorted by pot, form, site, rim height, or even ratios of rim diameter to body diameter. After sorting a user may select up to eight pots to compare on a single screen (Figure 12).
Users can also select color photo views, black and white photo views, or drawings (if they are available). Figure 12 displays the pots in color. The drawings of these same pots (Figure 13) clearly illustrate the similarity among these pots, almost as if they have been made to mass-production specifications. The ability to view both drawings and photographs side by side (Figure 14) relieves the concerns that the drawings are manufactured to show similarity. Users may enlarge a specific image simply by clicking on it.
This visual analysis complements statistical work done over 25 years ago on Subapennine and Protoapennine pots (Lukesh and Howe 1979) and is a very critical component of the archive capabilities. From the conclusions of that work, I quote
More important than the implications for the question of Subapennine versus Protoapennine pottery may be the implications for the continuing study of prehistoric pottery. We feel that the analyses outlined above demonstrate not only the existence of a mental template in the minds of prehistoric potters, but also how subtle can be the variation that occur among templates of even closely related cultures. . . . These analyses demonstrate, we hope, both the existence of a precise mental template for hand-made pottery, as well as the possibilities for variation of this template within cultures and from culture to culture. (346-47)
The quick look at high-handled pots just reviewed was undertaken with the full database of close to 300 objects — a simple sort selection brought these pots to the surface. But as such an archive increases in size and complexity, it is enormously useful to have sub-selecting capabilities, and this database offers the option of seeing specific pots chosen by site, pot shape, design motif, assemblage, and size. If I choose to view a set of pedestalled pots from Grotta Ticchiara, I can compare images of the outside of the pots (Figure 15).
This screen presents the first of the images available for the pots - in this case, the outside of the pedestalled pot. The first three pots show little similarity in the decoration schema or style and technique until we select the second image (by the numbers beside each image) for interior views.
The interior of the first three pedestalled pots (Figure 16) shows a remarkable similarity of the decoration here, both in overall design, individual components and reflections of a specific hand (see Figure 17).
It is possible to hypothesize that while the interior of these pots were painted by one craftsman, the outsides were painted by a series of other craftsmen. Similarly one could hypothesize that the interior mattered and the outside didn't. Both such hypotheses give us a basis for further studies and understandings of the Castelluccian potters and way of life.
The two examples of multiple pot visual comparison (Figure 12 and Figures 15/16) illustrate clearly and almost dramatically how a robust on-line catalogue raisonne of hand-made pottery manufacture can support scholarship. It is this level of information that archaeologists seek from the material remains they excavate. The ability to easily compare pots visually based on various criteria opens the door to far better understanding of the material remains. Especially with prehistoric remains for which there are no written records to indicate how groups worked together, how or why they chose specific pot shapes, and what groups knew the work of other groups, this level of visual comparison combined with the rich data in the database and the excavation records themselves provides opportunities for hypotheses that future excavations may confirm or change.
In summary, what this tool offers, among other capabilities, is the ability to select objects based on a rich database and then present multiple images for immediate review. Selection by individual decoration motifs and size offers additional possibilities of further refining the selections. In addition, it offers to others access to a wide range of raw data and built-in tools. In short, such a database is a tool that facilitates gathering a small number of objects from a large pool and quickly and easily studying them. Specific measurements are available at the touch of a button. When we return to consider the various corpora created since the 18th century, we see that this archive satisfies some of the clear unmet needs from that time. Both the early museum catalogs and publications of individual collections would have benefited from such a system. Today, we can clearly address the concerns of early presentations with their poor quality images both with photographs in color and black-and-white and with drawings based on commonly accepted standards. And this archive, with the ability to compare up to eight pots at once, corrects a problem outlined by Rouet.
In the absence of photographic prints enabling them to make multiple comparisons, earlier critics had had no option but to rely on general impressions, and to work by means of synthesis towards comparisons based primarily on the quality of work, or on more questionable criteria. (63)
The archive presented here offers a prototype of what can be developed — although it is far from complete. To be added are a few key capabilities: ability to zoom into and out of images; downloading raw data for analysis with statistical or graphic software ; and printing reports of selected objects. Even with these limitations, the archive allows me to explore hypotheses and verify ideas. It overcomes problems of selection of objects and allows immediate comparison of like objects. And it presents the each object alongside the relevant descriptive material.
The prototype can clearly be adapted to classical pots where the design motifs include subject-specific items and the shapes available are the standard ones long since determined for this body of material. It can also be adapted for other types of artefacts; as an archaeologist I can immediately see the value of such a database of coins, while other scientists might well want to study variations of turtles or plant species.
This same prototype, developed from a database designed for excavation materials, can be used for future excavations, with the added benefit of immediate accessibility to the images of excavated materials taken with digital cameras. As time permits, drawings, plans, and high quality images can be made available and the database updated to provide access. Until then, the database can be used immediately for study and analysis purposes, some of which might inform excavation decisions in the short term.
This new potential for scholarly communication argues strongly for immediately and easily accessible approaches to information sharing. Certainly as the literature of scholarship increasingly is intertwined with the data on which the scholarly work itself is based, models such as this database will become more prevalent. Its strength is its ability to mold the metadata to specific research needs.
The task that remains (and no small one) is the development of readily available systems to both collect and study a wide variety of material as well as present it via the World Wide Web. I am convinced that archives such as this will help future researchers discover connections hitherto unconsidered.
Susan S. Lukesh has been Associate Provost for Planning and Budget at Hofstra University since 1988. From 1997 through 2000 she also served as Interim Dean of Library Services and subsequently completed an MLS. She has taught management of libraries and information centers at Pratt School of Information and Library Sciences. At Hofstra she administers an online archive and has overseen the development of an online bibliography (HofBiblio) for the Hofstra community. Currently, in addition to her position as associate provost, she serves as acting executive director of the School for University Studies, a one-year entry program into Hofstra, for students whose records don't reflect their true capabilities. She has a Ph.D. in Classical Archaeology and has completed over 30 years of archaeological work, most recently on the island of Ustica off the coast of Sicily, as part of a collaborative excavation with Brown University. You may reach her by e-mail at Susan.S.Lukesh@hofstra.edu.
Tim Berners-Lee, "Metadata Architecture," http://www.w3.org/DesignIssues/Metadata.html (accessed 4/4/06)
Gunther Eysenbach and Eun-Ryoung Sa, "Code of conduct is needed for publishing raw data," BMJ 2001 323: 166; [doi: 10.1136/bmj.323.7305.166]
Stefan Gradmann, "Cataloguing vs. Metadata: old wine in new bottles?," http://www.ifla.org/IV/ifla64/007-126e.htm (accessed 4/4/06)
Gail Hodge, "Metadata Made Simpler," http://www.niso.org/news/Metadata_simpler.pdf (accessed 4/4/06)
Eva Keuls, "The Corpus Vasorum Antiquorum, the Lexicon Iconographicum Mythologae Graecae and the Beazley Archive Project: Different Data Bases for the Study of Ancient Greek Iconography," Modern Greek Studies Yearbook, University of Minnesota, vol. 4, 1988: 213-234.
Susan S. Lukesh, "Revolutions and Images and the Development of Knowledge: Implications for Research Libraries and Publishers of Scholarly Communications," JEP, (7,3) April 2002.
Susan S. Lukesh and Sally Howe, "Protoapennine vs. Subapennine: Mathematical Distinction Between Two Ceramic Phases," Journal of Field Archaeology, 5(3), 1978: 339-347. (Available online http://hofprints.hofstra.edu/50/)
Susan S. Lukesh, Bronze Age Pottery and 21st Century Scholarly Communication: A Web-based archive of Bronze Age Pottery, forthcoming, in Studi in onore di Renato Peroni, Milan.
P. M. Rattansi, review of Science and the Renaissance: An introduction to the study of the emergence of the sciences in the sixteenth century, W.P.D. Wightman, Aberdeen University Studies, No. 143-4. Edinburgh, 1962. Philosophical Quarterly, Vol. 15, no. 60: 274-275. [doi: 10.2307/2217615]
Philippe Rouet, Beazley and Pottier, Approaches to the Study of Greek Vases, Oxford, 2001.
Sanna Talja, Information sharing in academic communities: Types and levels of collaboration in information seeking and use. New Review of Information Behavior Research, 2002 http://www.uta.fi/~lisaka/Taljaisic2002_konv.pdf (accessed 4/6/06)
Athanasios Theologis and Ronald W. Davis, To Give or Not to Give? That is the Question, Plant Physiology, May 2004, vol. 135: 4-9. http://www.plantphysiol.org/cgi/content/full/135/1/4 (accessed 4/6/06)