STAR Research Demonstrator

The STAR Research Demonstrator cross searches extracts of excavation datasets from different database schemas, including Raunds Roman, Raunds Prehistoric, Museum of London, Silchester Roman (LEAP IADB) and Stanwick sampling. These datasets have been mapped to the core ontology (the CRM-EH extension of the CIDOC CRM) and RDF semantic web representations of key elements (contexts, groups, samples, finds and their properties) have been extracted.

Natural Language Processing information extraction techniques have been applied to some key concepts in the grey literature (an extract of the OASIS corpus operated by the Archaeology Data Service), producing semantic metadata in the same CRM-EH based representation as the extracted data. This allows a unified searching of the different datasets and the grey literature in terms of semantic structure of the core CRM-EH ontology.



A previous Pilot system supported free text search (with interactive query expansion via STAR terminology services), and browsing of the ontology. The current Demonstrator focuses upon (SPARQL-based) semantic search, drawing on use cases and feedback on the Pilot system from project workshops. While a free text search on Note fields is included, the main emphasis is on structured semantic search via controlled identifiers.



The Demonstrator seeks to hide the complexity of the underlying ontology. An interactive query builder offers search (and browsing) for Samples, Finds, Contexts or interpretive Groups with their properties and relationships (including stratigraphic relationships where present in the datasets). As the user selects from the interface, an underlying semantic query is automatically constructed in terms of the corresponding ontological entities.



A set of browser based interactive controls was developed for searching and browsing the extracted archaeological data. The controls are designed to be browser agnostic and the Demonstrator will run in most commonly available internet browsers. Each control is populated via web service calls querying the remotely held data. The controls are combined to form the overall demonstration search interface. The following links serve to demonstrate various aspects of the full search system. The details pages include live links to the Demonstrator to provide further examples:




The Demonstrator is intended as a toolkit which might underpin a research session, where queries may be followed by browsing moves and an initial query may serve as a starting point for an evolving inquiry. The following example searches may serve as a starting point after exploring the controls above. They also illustrate the cross search capability – it is possible to search across all datasets (the default) or select a dataset to search individually (in Results, currently dataset origin is indicated simply by hovering over the ID). The examples show the difference between searching for Contexts containing Finds of a particular Type and Contexts containing Finds generally (select Find but leave Properties blank). They also serve to illustrate alternative search strategies depending on the focus of the inquiry, for example whether the focus is on the Find, or the Context containing the Find.

  • Context of Type hearth

  • Context of Type hearth containing Find of Type coin

  • Context of Type hearth containing Finds generally

  • Find of Type coin within Context of Type hearth

  • Find with Note including metal within Context of Type hearth

  • Context of Type hearth Stratigraphically Below Context with Note including skeleton

  • Context of Type wall Stratigraphically Below Context of Type layer with with Note including oyster

  • Context of Type post-hole fill containing Find of Type animal remains

  • Context of Type corn-drying oven fill containing Samples generally


  • Intended Scope

    The intended users of the STAR Demonstrator are the digital archaeology research community. It is not intended as an operational system, rather as a research demonstration of semantic cross search of different datasets and grey literature via a core ontology. Previously cross search was not possible; each dataset remained in its own ghetto and no link was made to grey literature. The user interface is intended to demonstrate the research aims and combines various interface elements that would be separated and given more space in an operational interface. Specialised spatial and temporal elements could also be added – see STAR work on time periods.



    The server is selected as appropriate for demonstration use. Currently, the first 100 results are returned. While the user interface permits complex queries to be constructed, there are known performance issues with SPARQL generally and particularly with free text queries – here on the Notes fields (free text Note search performs best as part of a semantic search). Thus occasionally a query may time out (returning an error message). We are investigating alternative SPARQL platforms where such problems may be ameliorated.

    STAR Demonstrator resources

    The STAR Demonstrator operates over extracts from the following excavations datasets: Raunds Roman, Raunds Prehistoric, Museum of London Archaeology (MOLA), Silchester Roman (IADB) and Stanwick sampling. The Demonstrator also operates over an extract of archeological grey literature from the Online AccesS to the Index of archaeological investigationS (OASIS), operated by the Archaeology Data Service. The data held within the STAR system are extracts from these databases provided solely for project demonstration purposes and have been experimentally mapped to the ontologies and terminology systems employed by STAR. Thus the displayed data should neither be regarded as necessarily current nor complete and does not constitute any definitive publication of the original datasets.

    The STAR project gratefully acknowledges these data contributions, together with the various terminology resources (vocabularies, thesauri and CM-EH ontology) supplied by English Heritage. Please refer to the contacts acknowledged below for further information on the original datasets.

    Acknowledgments

    Gill Campbell (Stanwick sampling), Phil Carlisle (EH NMR Thesauri), Vicky Crosby (Raunds Roman), Keith May (Raunds Prehistoric, CRM-EH extenstion of CIDOC CRM), English Heritage; Pete Rauxloh, Museum of London Archaeology (MOLA), Museum of London Archaeology; Mike Rains (IADB), York Archaeological Trust, Julian Richards (OASIS), Archaeology Data Service.