Handling Time Periods in STAR

This page discusses issues associated with how dates and time periods are represented in STAR datasets and outlines various techniques developed during the project.

Background and methods

Within the STAR datasets archaeological entities are typically associated with a time span rather than an absolute date. These time spans are expressed in a variety of different textual forms e.g. centuries, AD/BC dates, named Roman Emperors / British Monarchs, 3 age system. In order to use the dates represented in any meaningful way, we had to first convert the data to a more regular form. We then needed a way to align these time periods to a controlled set of known periods (for STAR purposes assuming England as the place concept associated with the period).

Period description
MLC2-C3
AD 341-6
Iron Age
First half 1st century?
Antonine
LC2/EC3
MLA

Table 1 – examples of time periods encountered

Records containing date information were (semi-automatically) processed to give 2 numeric values representing the approximate lower and upper bounds of the time span indicated by the data record. We could then use these values to compare them to other known time spans (and to each other) to determine containment, overlap etc. Firstly a controlled list of time periods was collated to ensure a consistent approach across databases. The English Heritage 'timelines’ thesaurus and the English Heritage periods list (formerly known as the RCHME periods list) were supplemented with dates deduced from record description fields, concept scope notes, and online historical resources. Where there was not adequate evidence to determine even an approximate start and end date for a record, both lower and upper bound values remained set to zero.

The resultant data was then further processed using a custom console based application STAR.TIMELINE to assign a known time period identifier to each record. This allows clustering and searching for records, and also facilitates matching between database records and the grey literature. A semantic closeness measure (from previous work) for time periods was reused in the application to compare the start/end dates produced against a controlled list of known periods. Some periods overlap or are contained within others, so the matching method needs to accommodate these issues to suggest the most appropriate match. The STAR.TIMELINE application is made freely available for download and experimentation.

P1P2Description
0-1500-150P1 equals P2
0-150200-300P1 before P2
0-150150-250P1 meets P2
0-150100-200P1 overlaps P2
0-15050-150P1 contains P2

Table 2 – example relationships between periods P1 and P2

A matching function was run against data extracted from a number of tables in the archaeological datasets. The approach allowed for multiple runs using alternative different time period lists. The first run was against the English Heritage periods list (formerly known as RCHME periods list), which is a fairly coarse grained list encompassing a breakdown of the three age system. Some sample results from this run are shown in Table 3. As this data originated from the Raunds Roman (RRAD) database, perhaps predictably most of the records got tagged as “ROMAN”...

Data record – dates deduced from labelClosest controlled match based on dates
IDLabelFromToIDLabelFromToRelationship
1315AD 228-3122823110ROMAN43410During
1316AD 364-7836437810ROMAN43410During
1317AD 69-79697910ROMAN43410During
1318AD 270-427027410ROMAN43410During
1319AD 275-40227540210ROMAN43410During
1320AD 341-634134610ROMAN43410During
1321AD 268-7026827010ROMAN43410During
1322AD 367-7536737510ROMAN43410During
1324AD 270-8427028410ROMAN43410During
1325AD 270-8427028410ROMAN43410During
1326AD 367-7536737510ROMAN43410During
1327AD 383-838338810ROMAN43410During
1328AD 330-4033034010ROMAN43410During
1337Post-medieval1540190116POST MEDIEVAL15401901Equals
1370Medieval1066154028MEDIEVAL10661540Equals
1371AD 194319431943109SECOND WORLD WAR19391945During

Table 3 – sample of data from RRAD object_period table processed using English Heritage periods list

The second run was against the English Heritage “Timelines” thesaurus data, which we had manually supplemented with start/end dates (based on scope notes and other information) to produce a much more fine-grained controlled list of known periods. The sample results for the same records re-run against this list are shown in Table 4. The Timelines thesaurus included the terms from the periods list, but also more detailed periods. Note how where appropriate a more detailed period has been automatically selected.

Data record – dates deduced from labelClosest controlled match based on dates
IDLabelFromToIDLabelFromToRelationship
1315AD 228-31228231136122ALEXANDER SEVERUS222235During
1316AD 364-783643789000143RD QUARTER 4TH CENTURY AD351375OverlappedBy
1317AD 69-796979136087VESPASIAN6979Equals
1318AD 270-4270274136164TETRICUS I270274Equals
1319AD 275-4022754021348254TH CENTURY AD300399Includes
1320AD 341-63413469000132ND QUARTER 4TH CENTURY AD326350During
1321AD 268-70268270136154CLAUDIUS II GOTHICUS268270Equals
1322AD 367-753673759000143RD QUARTER 4TH CENTURY AD351375Finishes
1324AD 270-84270284135952LATE 3RD CENTURY266299During
1325AD 270-84270284135952LATE 3RD CENTURY266299During
1326AD 367-753673759000143RD QUARTER 4TH CENTURY AD351375Finishes
1327AD 383-83833889000154TH QUARTER 4TH CENTURY AD376399During
1328AD 330-403303409000132ND QUARTER 4TH CENTURY AD326350During
1337Post-medieval15401901134746POST MEDIEVAL15401901Equals
1370Medieval10661540134745MEDIEVAL10661540During
1371AD 194319431943134848SECOND WORLD WAR19391945During

Table 4 – same sample data records processed using EH Timelines thesaurus

As a result of this process we created records for each database record containing dates held in a suitable form that they could be effectively cross searched, either directly by absolute date, or by thesaurus term. The processed data will next need to be extracted to RDF, conforming to the CRM model for representing time period information. This can be achieved using the existing STAR data extraction tool.

Data formats

In the interests of simplicity STAR.TIMELINE imports and exports all data in CSV format. The fields for the named periods file are:

  • periodID – identifier for the named period
  • periodLabel – text label for the named period
  • periodMinYear – numeric minimum year for the named period
  • periodMaxYear – numeric maximum year for the named period
e.g.
5, PALAEOLITHIC, -500000, -10000
54, LOWER PALAEOLITHIC, -500000, -150000
55, MIDDLE PALAEOLITHIC, -150000, -40000
56, UPPER PALAEOLITHIC, -40000, -10000
6, MESOLITHIC, -10000, -4000
etc.

The fields for the record data file are:

  • recordID – identifier for the data record
  • recordLabel – text label for the data record
  • recordMinYear – numeric minimum year for the data record
  • recordMaxYear – numeric minimum year for the data record
e.g.
19,300-400,300,400
20,31 BC-138 AD,-31,138
21,31 BC-AD 14,-31,14
22,3rd/4th century,200,400
23,98-117,98,117
24,AD 375-8,375,378
etc.

The fields for the processed output file are:

  • recordID
  • recordLabel
  • recordMinYear
  • recordMaxYear
  • periodID
  • periodLabel
  • periodMinYear
  • periodMaxYear
  • periodRelation – type of relationship between record and period
e.g.
3,?1st century,1,100,26, PREHISTORIC OR ROMAN,-500000,410,During
4,?Modern,1901,2030,24, 20TH CENTURY,1901,2000,StartedBy
5,?Post medieval,1540,1901,16, POST MEDIEVAL,1540,1901,Equals
10,1st-2nd century AD,1,200,10, ROMAN,43,410,Overlaps
11,20-15 BC,-20,-15,67, LATE IRON AGE,-100,43,During
12,250-400,250,400,10, ROMAN,43,410,During
etc.

For the purposes of experimentation there are some files already uploaded with the application:

  • rchme.csv is a named periods file representing the English Heritage periods list.
  • ehtimelines.csv is a named periods file representing periods from the English Heritage Timelines thesaurus.
  • sampledata.csv is a file of sample records to be processed.

STAR.TIMELINE application

STAR.TIMELINE is an independent component of a larger project, written as an internal application for our own purposes. We will make the application source code freely available on request. It is a console-based application written in C# so requires the .NET framework (v2) installed as a prerequisite. It also uses the FileHelpers component for file import/export operations. The application setup should automatically take care of any installation and configuration. The application has the following menu options:

  • Clear named periods – clear the internal list of known periods
  • Import named periods – populate the internal list of known periods from a specified CSV file
  • Process data records – process the specified CSV file. The output will be the input filename plus “.output.csv”
  • Get closest named periods – use the closeness algorithm to find the closest matching periods for the data record

To get started, try importing one of the supplied lists of known periods (rchme.csv or ehtimelines.csv) and interactively querying it using the last 2 menu options. You can also process the sample data records (sampledata.csv – output goes to sampledata.csv.output.csv). Note – the named period lists included here are for experimentation, the most up to date source of data regarding the English Heritage periods list remains the FISH website and for the “Timelines” thesaurus (not formally published) it is English Heritage Thesauri

To download: STAR.TIMELINE application setup

STAR Timeline Service

As an extension of this work, an experimental URI based web service and test client application [5] were also produced, to perform the 'Get closest named periods’ functionality against the timelines thesaurus as a web service call. Calls to the service take the simple form <prefix>/getRelatedPeriods?startYear=0&endYear=100. The returned data is in JSON format. The test client displays the resultant service call, and the returned data represented as both a list and a graphical timeline.

STAR Timeline Service test client

Publications

Tudhope D., Taylor C. 1997. Navigation via Similarity: automatic linking based on semantic closeness. Information Processing and Management, 33(2), 233-242. Elsevier Science. doi:10.1016/S0306-4573(96)00067-2

Binding C. 2010. Implementing archaeological time periods using CIDOC CRM and SKOS. Proceedings 7th Extended Semantic Web Conference, Heraklion, L. Aroyo et al. (Eds.): ESWC 2010, Part I, Lecture Notes in Computer Science, 6088, 273–287, Springer-Verlag Berlin Heidelberg. final preprint, presentation

Binding C. 2010. Archaeology and Terminology. EuroVoc 2010 Conference: Mind the lexical gap — EuroVoc, building block of the semantic web, EU Publications Office, Luxembourg. Presentation: PDF-1342KB PowerPoint-4597KB (with screen capture video)