University of Southampton OCS (beta), CAA 2012

Font Size: 
Comparing the informatics of text and Cultural Heritage: the SAWS project
Stuart Dunn, Anna Jordanous, Mark Hedges, Christoph Storz

Last modified: 2011-12-20


Ancient texts, like ancient objects, may be regarded as sets of linked and linkable information. Yet, despite many conceptual similarities, there has been little examination of how the use of computational methods of marking-up and linking primary manuscripts can be used to inform the mark-up of primary material culture, and vice versa. Like archaeological contexts, discrete and philologically significant sections of manuscripts require skill both to identify and to record and define.  Links between related information pervade archaeology: The Harris Matrix describes links between contexts, and the stratigraphic sequences between them, and database management systems have long been used to link information about artefacts and features across sites, and to enable cross-searching. More recently, approaches such as the CIDOC CRM and Semantic Web have been used to link defined entities of archaeological or cultural heritage information identified by URIs and described and linked using controlled standards such as RDF. This paper will examine the use of such standards and methods in archaeology, and focus on their transference to defining and linking related units of text in original manuscripts. Our case study is primary textual material from the Sharing Ancient WisdomS (SAWS) project. Comprising three international teams from the UK, Sweden and Austria, the aim of SAWS is to present and analyse the tradition of wisdom literatures in Greek, Arabic and other languages, which present complex challenges for linking. Throughout antiquity and the Middle Ages, anthologies of extracts from larger texts, containing wise or useful sayings (gnomologia) were created and circulated widely, since few complete texts were available in manuscript form. Focusing on original manuscripts of gnomological texts (not editions), SAWS uses a bespoke TEI XML schema to mark up individual segments identified by editorial experts, which are then linked to parallel segments in other traditions. Parallels include (for example) translations of individual sayings, derivations from a saying in one tradition to another, references to the same subject across traditions, authorship of sayings, and so on. Segments are then linked according to their significant properties, described according to an ontology that extends the CIDOC CRM, and linked using RDF.

It is possible to regard such segments as artefacts, albeit of textual, rather than material, nature; and the parallels with archaeological information and sequencing are thus significant. While it would not be useful to impose a straight metaphor of ‘textual artefacts’ on this material, the theory that they are connected in complex ways owes much to material culture, and the latter’s language provides clues for deeper interrogation: what typologies and common attributes can be applied to segments, how do these evolve over time, can one class of gnomic saying be demonstrated to have evolved in response to (and/or under the influence of) another. This paper will provide concrete examples from SAWS to demonstrate how the combination of XML, RDF and CIDOC are being employed; and thus delve deeper than existing secondary literature approaches to archaeological text mining.



XML; CIDOC CRM; manuscripts; artefacts