Add to bookbag
Author: Gavan McCarthy
Title: The Structuring of Context: New Possibilities in an XML Enabled World Wide Web
Publication info: Ann Arbor, MI: MPublishing, University of Michigan Library
April 2000

This work is protected by copyright and may be linked to without seeking permission. Permission must be received for subsequent distribution in print or electronically. Please contact for more information.

Source: The Structuring of Context: New Possibilities in an XML Enabled World Wide Web
Gavan McCarthy

vol. 3, no. 1, April 2000
Article Type: Article
PDF: Download full PDF [50kb ]

The Structuring of Context: New Possibilities in an XML Enabled World Wide Web

Gavan McCarthy

The wide acceptance of the Web and its ability to enable the inter-linking of web spaces and networks provides an opportunity to utilise documented (encoded) context entities (principally people and organisations) to build Web-based infrastructure to support cultural heritage activities. Archivists, museum curators, historians and other heritage practitioners have access to key contextual data which, if systematically and simply encoded, could provide the basis for a network of context objects that would underpin a wide variety of functions of local, national and global significance. This paper examines work undertaken in Australia, North America and Europe that is endeavouring to turn this vision into reality.

01. Introduction (The Multiple Dimensions of Context)

The documentation, management and use of contextual information emerged as one of the critical issues of the World Wide Web in the late twentieth century. Contextual information is that extra, associated, related, assumed and perhaps a priori information or knowledge that is required to meaningfully interpret the content of any given information source.

The Web, as it has evolved over the last five or six years, bridges the boundaries of space and time like no other medium yet developed. The printing press of the Middle Ages, the universal postal system of the nineteenth century, the telegraph and telephony systems, also originating in the nineteenth century, radio and television in the early to mid twentieth century, all represent important new media that have shaped our information environment. Is the Web unique in the challenges it poses to the credibility and authority of content or are we facing fundamental issues that have been addressed before? Were there the same issues being raised with the introduction of those earlier information technologies?

These fascinating questions are not the topic of this paper, but they do provide the beginnings of a "contextual" framework from which I will develop the content of this presentation.

For the archivist, and I address you today as an archivist, the documentation and preservation of "records in context" is our primary function. However, context is multi-faceted, dynamic, highly variable and complex beyond our ability to build systems to totally contain it. Given that the effective documentation of context does involve compromise, the questions facing us are:

  • What are its most useful elements? And
  • What do we need to know about them?

This paper is about some of the more pragmatic research, underway at the moment at various sites around the world, on the structuring and digital encoding of information surrogates for people and organisations as the defining elements of context.

The foundation of this research is based on the notion that there exists a recursive, self-reflexive, dynamic relationship between content and context, and that this extends beyond the narrow world of archives and records to all forms of meaningful cultural transmission. So, while I will be talking about archives, records and museum resources, it is on the assumption that the same fundamental ideas apply to music, art, literature and indeed science.

The key premise for this paper is that context, like content, is essentially a product of the action of people and it is through the documentation of people (and organisations) that archivists and other heritage practitioners can establish a usable foundation for the documentation of context more generally.

.02. A Fine Tradition (Has Not this Been Done Before?)

This paper will shortly lead you to Bright Sparcs., which at first glance appears to be a fairly straight-forward, historically focused, biographical dictionary of Australian scientists published on the Web. Which indeed it is, but it is much more and will become increasingly more powerful and useful as Web technology develops.

The need for access to information about the people, places and organisations that have been the key to helping us understand the past has been recognised for centuries. Indeed, The Nuremberg Chronicle (currently on display at the Victoria and Albert Museum) published in 1493, was regarded as the second most important publication of the 15th Century after the Gutenberg Bible.  [1]It is a text about the history of the world, an illustrated book of famous "men" as well as places, cities and events. At the time this tome cost two guilders or the equivalent of one hundred and twenty pounds of beef.

Five hundred years on there is still a significant market for books of this type. Indeed, today, for two pounds you can purchase The Wordsworth Dictionary of Biography; a compact guide to the worthy and infamous throughout the ages. [2] It is not illustrated but what can you expect for the equivalent of a small serving of filet.

It will introduce you to over seven thousand individuals, but it will not do much more. There are no references and no links to other sources. Authority and credibility is established by the "good name" of the publisher and the listing of sixty-four authors. But who are they and which entries did they contribute? The only access point is via an individual's name which means that much of the key knowledge distilled to produce the entries is readily usable.

For example, about one hundred entries either have some direct or indirect association with Australia, covering such areas as politics, sport, literature and the arts, exploration and science, and oddballs (eg. William Buckley, the escaped Tasmanian convict who survived for thirty four years in the Victorian bush with the aborigines, waiting for the Colony to be established - thus the derivation of the phrase Buckley's chance).

From my perspective, as the Director of the Australian Science and Technology Heritage Centre, it is interesting to note that in this obviously UK focused publication, over 50% of the entries dealt with exploration and science and only 7% dealt with sport. How did I discover these "facts"? I read the whole thing taking notes as I went. Imagine how much more useful this data would be if it was well structured and available on the Web. But are there conceptual models that would help us in building the required structure and functionality for Web-based publication?

.03. The Entity - Relationship Model (A Promising Conceptual Foundation)

The "entity - relationship" model for documenting and managing resources is a conceptual model that has emerged in the literature of various disciplines in recent times and offers the most promising foundation from which such systems may arise. [3]This was neatly articulated in the archival context by Chris Hurley in his recent paper in Archives and Manuscripts:

"Archivists can participate in recordkeeping processes by documenting complex relationships between records and context. Records must be placed in context - in time and place - by fashioning descriptive entities and documenting relationships. This is how we can understand the record and derive evidence, it must be interpreted not by reference to our observation of it in the circumstances obtaining when we access it, but by understanding the circumstances which existed at its creation and changes since. . . The two fundamental issues for discussion concerning archival description are therefore what the descriptive entities should be and what are the relationships we need to show between them." [4]

Figure 1, below, is a visualisation, from the digital libraries perspective, of the entity-relationship model in use to underpin the Resource Description Framework (RDF) which is introduced by Eric Miller as "an infrastructure that enables the encoding, exchange and reuse of structured metadata". [5]

Figure 1: A graphic example of the entity-relationship model taken from figure 5 in Eric Miller, Eric, 'An Introduction to the Resource Description Framework'

D-Lib Magazine, May 1998 at <>.

Archivists, in documenting records, collect information about people, organisations, and events but how well are they utilising that knowledge? Are they letting it languish because they maintain a print-based mind-set?

Many types of systems have been created over the years to describe records and in general they attempt to capture context in terms of provenance (creator) and inherent structure or order. However, it has only been in a few cultures that the documentation of some elements of context has been drawn out of the descriptions of records and managed as separate but related entities. In other words, information about creators (provenance) and order (series), within a defined environment, has been used to create a contextual framework that is used to enhance the meaning of a broad range of records -whether they are in custody or not, destroyed or extant or yet to be created. These examples of the entity-relationship model at work, albeit in forms limited by the technology of the day, provide a solid foundation for future research and development.. The International Council on Archives recognised the essential importance of this separation of entity documentation from resource documentation through its two standards for archival documentation - ISAD(G) for archival records and ISAAR(CPF) for corporations, persons, and families. [6]

.04. The Opportunity for Archivists (a key role for historians)

In recent years, the concerns raised by electronic records and the systems in which they operate, with their high levels of assumed knowledge, low levels of embedded contextual information and rapidly changing technological basis, have been a necessary focus of the record-keeping professions. However, this in turn has led to a narrowing of focus onto the development of standards for the description and management of records. It must be remembered that this is only part of the story and in the short term provides few archivists with the tools they need to be archivists rather than just current records managers.

It is in the records themselves that there exists the information necessary to create the surrogate entities for creators. Archivists, when documenting records as part of their normal business should be capturing this data in a structured and systematic way. Historians when using records should also be contributing the knowledge they gain about the people and organisations they are researching. Indeed, the establishment of defined relationships between context surrogates, defined sets of record and published historical research is the key to the successful utilisation of the entity-relationship model. The Web offers the opportunity for this to be realised.

.05. The Web Today (an explosion of infodiversity?)

The Web is huge and it is growing rapidly in terms of the number of people who have access. However, much of its potential remains largely untapped - especially by archives and other history and heritage groups. It is disappointing to see organisations mounting on the Web electronic versions of their in-house information resources, often hiding them behind database query walls with few or limited general access opportunities. The key content of these resources is usually excluded from the main Web search engines and can only be "discovered" by those that already know where to look for them. Like their paper-based predecessors they usually present a one-way flow of information that inhibits participation and contribution from the broader community of users. They assume prior knowledge and lack the dynamic opportunities to grow, develop and evolve through use.

Carl Lagoze, in examining the nature of the resource discovery process both on and off the web noted that it "is a long-term, multi-threaded, and iterative process with complex and dynamic requirements" .  [7]His landmark 1997 study "From Static to Dynamic Surrogates: Resource Discovery in the Digital Age" provides strong argument for the use of the entity-relationship model and the imperative for the building of virtual infrastructure. He observes:

"We do not intend to dismiss the current flock of web indexers as useless. In fact, in the course of the writing this paper we found ourselves using them quite frequently. Making innovative use of IR technology, the indexers are often successful at supporting resource discovery in a framework (the Web and HTTP) that provides little infrastructure support for the service. In fact, even when only marginally successful, the web indexers have a definite role in the resource discovery process."  [8]

More broadly the Web is used predominantly as a publication or marketing tool with little or no attempt to explore the wider hypertext or linking opportunities on offer. The commercialisation of the Web has led to technologies that promote this tendency given that the essence of business is to compete, not to collaborate. Cultural heritage activity, on the other hand, is predicated on collaboration between a wide variety of participants. It is critical that we develop the Web technologies that provide us with the virtual infrastructure and tools to facilitate collaboration both on and off the Web.

The Web can be many things and it can be different things to different people and different communities. There does not need to be any one particular mode which defines usage. It is important that the heritage communities look creatively at how they can build the virtual infrastructure they require. The Web is not a passing fad and is something that we can play a major role in shaping. Indeed, the current security difficulties being faced by e-commerce technologies may re-focus government programs towards supporting the development of the Web for general community and heritage purposes.

But back to the entities and surrogates, again as Lagoze reflected:

"We believe, however, that the greatest potential for improvement to networked resource discovery lies in the use of dynamic, or derived, surrogates. Lynch, Michelson, et. al.  [9]refer to this capability with the comment ' is important to recognize that the networked information environment offers new opportunities to derive (by extraction or computation) a much richer and more diverse set of surrogates from networked objects than the surrogates that were typically found in the print world.'" [10]

.06. The Yale Initiative (Looking to the Future)

In late 1998, a small group of North American archivists and information technology specialists, with funding from USA Digital Libraries Federation, organised a weekend meeting at Yale University to look at whether it was technically possible and indeed worthwhile attempting to develop an international standard for the digital encoding of archival authority records. [11]The starting point for discussion was the existing, but under-utilised, International Council on Archives standard, ISAAR(CPF) - a standard defined before the implications of the emerging Web were apparent. The other key factor shaping the meeting was the experience of the Encoded Archival Description (EAD) initiative and the use of Standard Generalised Markup Language (SGML) for encoding archival finding aids. [12]

The meeting, composed of North Americans, Europeans and one Australian representative, examined existing online systems that treated context elements as separate entities with relationships to resources. The National Archives of Australia RINSE data  [13] and the Australian Science Archive Project Bright Sparcs web resources  [14] were examined along with other examples from the USA and Sweden. In summary the meeting agreed that we were looking at an enormous untapped resource, with implications far beyond the preservation of records for archival purposes. It was agreed that an international working group be established from the core participants of the Yale meeting with the aim of working towards a revised standard of ISAAR(CPF), that takes into account the networking opportunities of the Web and formal requirements of SGML and/or Extensible Markup Language (XML) encoding.  [15]

However, what was perhaps the most exciting aspect of this meeting was recognising that this was an achievable objective, with a low technological dependency, that could be applied across the breadth of the archival world. It has more to do with a change in what we do with the contextual data we already collect and document, as opposed to the complex, and for many, unimplementable metadata schemas for records description.

.07. A Network of Context Entities (Collaboration at all Levels)

In June 1999, the Australian Science and Technology Heritage Centre quietly celebrated five years of Bright Sparcs on the Web. This celebration not only marked 5 years of continuous use of a stable, surrogate-structured, context-based, database driven information space, built from a large and interactive user group, but also marked the beginning of the first major re-development of Bright Sparcs and the initiation of its sister site Australian Science at Work. Funding for this re-development has come through project-based grants from both the Federal government and the Victorian State government.

At the core of these sites are Hypertext Markup Language (HTML) encoded context entities (people for Bright Sparcs and organisations, societies and other constructs for Australian Science at Work) which are linked, by defined relationships, with other entities and information resources. It is based on a relatively simple conceptual model. It is easy to implement in an uncomplicated environment, but can also become complex very quickly and thus mimic the complexity of real life. It has its roots in prosopography, 'the historical technique based on collective biographies and similar sources, [16] and indeed should lead to a re-vitalisation of this historiographic tool. The challenge of further research and development is to maintain the simple and accessible foundations while building systems that will allow the complexity to evolve as new data accumulates, as new entities are defined and new relationships indentified.

Not only does the 'entity-relationship' model provide a useful and workable model for both internal and public information systems it also provides the conceptual foundation for building a structured network of Web-based information sites based on linked context entities. [17] A very simple example of how this could work is demonstrated in the Appendix, Figures 2 to 4. These show the Bright Sparcs entity for Phillip Law (Figure 2), the published hyperlink (or relationship) (Figure 3) to the parallel entry in the National Archives of Australia RINSE database (Figure 4). At this stage there is no reverse link but we hope this is something that will be achieved in the future. The re-development of the systems supporting Bright Sparcs, Australian Science at Work and the History of Australian Science and Technology Bibliography, are focused on the expression of the encoded context entities and their relationships in XML. This will enable specific meaning to be given to data elements within an encoded entity and permit a much greater level of control over elements that define an entity in space and time. This in turn opens the door to the exciting analytical and access opportunities offered by data visualisation and graphic representation.

At the Australian Science ant technology Heritage Centre, we have just recently captured the databases and software that support these online systems in a product we have called the Online Heritage Resource Manager (OHRM). This was stimulated by demand in Melbourne by groups in other subject areas wanting to run sites like Bright Sparcs. This software is being released without cost, but under license, to heritage bodies that wish to create free to the Web heritage information. If you require our time for training, installation, audit or whatever you will have to pay.

.08. Conclusion

The wide acceptance of the Web and its ability to enable the inter-linking of web spaces and networks provides us with the opportunity to utilise documented (encoded) context entities to build Web-based infrastructure to support cultural heritage activities. Archivists (and historians) have access to key contextual data, which if systematically and simply encoded could provide the basis for a network of context objects that would underpin a wide variety of functions of local, national and global significance.

The encoding and networking of context objects on the Web has been demonstrated using current Web technologies. It has been shown to be a powerful tool for network development and the building of interactive communities that contribute to the depth of information in the system. Bright Sparcs is used extensively each day by a broad range of users from around the world and every week we are contacted by new users with new information and resources to contribute.

The re-development of Bright Sparcs provides the archival community with a research and development opportunity to investigate the potential of the next generation of Web technologies utilising the power of XML encoded objects, but anchored firmly within the framework established by the International Council on Archives through ISAAR(CPF).

Indeed, if the preservation of our cultural heritage is of community concern, in the national interest and of global importance in the building of meaningful lives for ourselves and future generations, then we have a critical role in building contextual frameworks that will minimise the risks of destruction of the essential evidence of that heritage and maximise its ability to be meaningfully interpreted.

.09. Acknowledgments

I would like to thank my colleagues at the Australian Science and Technology Heritage Centre, the University of Melbourne for all their work in helping develop these ideas and turning them into reality. I would especially like to thank Joanne Evans for help in the preparation of this paper.

.010. Appendix

Figure 2: This shows the top section of an HTML encoded Bright Sparcs entry. Relationships to other resources are defined in the 'Online Sources', 'Archival Sources' and 'Published Sources' sections
Figure 3: A link from the 'Online Sources' connects this Bright Sparcs entity with its parallel entry in the National Archives of Australia RINSE database.
Figure 4: The top section of National Archives of Australia RINSE entry for Phillip Garth Law that defines him as a 'Commonwealth Person'.

011. NOTES

1. Schedel, Hartmann, The Nuremberg Chronicle, Anton Koberger, Nuremberg, 1493.

2. The Wordsworth Dictionary of Biography: a compact guide to the worthy and infamous throughout the ages. Wordsworth Editions Ltd, Hertfordshire, 1994. 458 pp.

3. The best starting point for accessing this literature is through D-Lib Magazine which can be found at <>.

4. Hurley, Chris, 'The Making and the Keeping of Records: (1) What are Finding Aids For?' Archives and Manuscripts, vol. 26 no. 1, pp. 74 and 75.

5. Miller, Eric, 'An Introduction to the Resource Description Framework', D-Lib Magazine, May 1998 (ISSN 1082-9873) at <>.

6. The International Council on Archives standard for archival authority records, ISAAR(CPF), can be found on the World Wide Web at: <>, the related ISAD(G) document can be located through the same site.

7. Lagoze, Carl, 'From Static to Dynamic Surrogates: Resource Discovery in the Digital Age', D-Lib Magazine, June 1997 (ISSN 1082-9873) at <>.

8. Lagoze, Carl, 'From Static to Dynamic Surrogates: Resource Discovery in the Digital Age', D-Lib Magazine, June 1997 (ISSN 1082-9873), from the section "The Current State of Networked Resource Discovery" at <>.

9. Lynch, Clifford, Avra Michelson, Cecilia Preston, and Craig A. Summerhill, CNI White Paper on Networked Information Discovery and Retrieval, Incomplete Draft, at <>.

10. Lagoze, Carl, From Static to Dynamic Surrogates: Resource Discovery in the Digital Ageí, D-Lib Magazine, June 1997 (ISSN 1082-9873), from the section "Beyond Static Surrogates - Opportunities in Networked Resource Discovery" at <>

11. Information on the Archival Authority Information Meeting, including papers submitted, discussion, and reference, can be found on the Web at <>

12. Pitti, Daniel V., Encoded Archival Description: The Development of an Encoded Standard for Archival Finding Aids American Archivist, vol. 60 Summer 1997, pp. 268-283. Online information about this work can be found at: <>

13. This can be found on the Web at <>.

14. This can be found on the Web at <>.

15. For a more detailed account of the meeting see: McCarthy, Gavan, "Engineering Utility: A Visionary Role For Encoded Archival Authority Information In Managing Virtual And Physical Resources." Auswebb99, Balina, Australia April 1999, at <>.

16. Kragh, Helge, An Introduction to the Historiography of Science, Cambridge University Press, Cambridge, 1987, p. 175.

17. A more detailed development of this idea can be found in: McCarthy, Gavan, "Utilizing the Web to Build a Network of Archival Authority Records" submitted for publication in the International Council on Archives journal Janus, May 1999.

Gavan McCarthy

Director, Australian Science and Technology Heritage Centre
The University of Melbourne