Mapping of TEI to CIDOC-CRM

by Øyvind Eide and Christian-Emil Ore

Version 0.1 2007-01-02

This is work in progress, the document will change in the future. All comments are welcome to oyvind.eide@muspro.uio.no.

Based on TEI P5, updated (I hope) to reflect ver. 0.5.

General

In P1/P2 the motivation behind TEI was to classify the function of pieces of a text and not to mark up layout information. Thus the name element is a signal that the content is used as a name, that is, to denote a person, place etc. The exception was elements in the header and corpus participants. The publication statement implies that there has been a publication event. The use of the person element states that there is a person that has participated in teh event record and transcribed in a part of an oral material corpus.

It is clearly tempting to add more elements describing the real or imagined world described in a text. The treatment of the element in the manuscript description and in the name and date chapters clearly demonstrates this. The scopenotes are not consistently formulated. E.g. the element "Event" to be used in corpora is defined as "(Event) any phenomenon or occurrence, not necessarily vocalized or communicative, for example incidental noises or other events affecting communication" while the "persEvent" is defined as "a description of a particular event of significance in the life of a person".

In general the element used in the running text will be names and can be mapped to the CRM class Appellation and its subclasses. However this not very satifying from an information extraction point of view. Thus it is natural to introduce the denoted object (person, place, time etc). In the TEI this is expressed in the attributes of the elements as standardised dates or pointers to internal or external person and place declarations. There is no general event declaration although the"Event" element and the CustEvent can be misused to obtain a kind of event declaration.

Some “ontological” elements in TEI: Events, time appelations, actors and actor appelations

TEI section TEI elements TEI scope note (Description) CRM mapping Comments
12.8 History [of a manuscript] groups elements describing the full history of a manuscript or manuscript part. E5 Event The content of a history element describe a temporal entity. It involves at least one object (the physical manuscript) and may involve other object and participants. Thus it is not a period but an E5 Event. . The length is irrelevant
12.8 Origin [of a manuscript] contains any descriptive or other information concerning the origin of a manuscript or manuscript part E65 Creation event The creation of the manuscript, that is, the writing of the text on the parchment. It may also denote the creation of the physical information carrier, e.g. the preparation of the goat/lamb skin
12.8 Provenance contains any descriptive or other information concerning a single identifiable episode during the history of a manuscript or manuscript part, after its creation but before its acquisition E5 Event or subclasses depending on the information given.. The event can be given a type E55 according to the kind of event. However, if the episode involves an actor it should be mapped to E7 Activity. If it is known to be a move, transfere of title (ownership) or a transfer of custody use the classes E9 Move,E8 Acquisition Event E10 Transfer of Custody. In general this will depend on the textual description in the Provenance element. This indicates that the name of the element is both too specializes and too general. It should be replaced with a general event element with a welldefined list of types.
12.8 Acquisition contains any descriptive or other information concerning the process by which a manuscript or manuscript part entered the holding institution. One or more of the classes E9 Move,E8 Acquisition Event E10 Transfer of Custody. This is the last in time of a series of "Provenance" events and is in principle redundant.
12.8 att.datable provides attributes for normalization of elements that contain datable events. - The att.datable groups the four attributes notBefore, notAfter, from, to. The TEI definition states that the following elements can have one or more of the four attributes: acquisition affiliation age binding birth custEvent date death education faith floruit langKnowledge langKnown nationality origDate origin persEvent persName persState persTrait provenance relation residence seal sex socecStatus time. In this list one finds 1) events (birth), 2) classification types (nationality) and 3) attributes (PersName). This makes the TEI into an unclear mixture of state orientation and event orientation view upon the world. For 1) events the attributes indicates the events place in time. For 2) classification types one may use E3 Condition State although sex is hardly a condition. This case is problematic. Case 3) is unclear. What does it mean when a persName has time attributtes?
11.2.3 Event (Event) any phenomenon or occurrence, not necessarily vocalized or communicative, for example incidental noises or other events affecting communication E5 Event Describes an event taking place in the real world at the time of the recording. The event is documented at a specific place on the recording.
20.4.2.3 persEvent contains a description of a particular event of significance in the life of a person E5 Event
20.4.2.3 birth (Birth details) contains information about a person's birth, such as its date and place. E67 Birth
20.4.2.3 death contains information about a person's death, such as its date and place. E69 Death
20.5 / 6.5.4 Date contains a date in any format. E49 Time Appellation, E52 Time-span, E61 Time Primitive A date without values in the attributtes can be mapped to a E52 Time-span with a E49 Time Appellation. The use of attributes may invoke additional time span instanciations.
20.5 Occasion (under date) a temporal expression (either a date or a time) given in terms of a named occasion such as a holiday, a named time of day, or some notable event Time Appellation and/or Event I may be a reoccuring event?
20.4.2 Person provides information about an identifiable individual, for example a participant in a language interaction, or a person referred to in a historical source. E21 Person Also personGrp: May be modelled as E74 Group, if "a group of individuals treated as a single person for analytic purposes" is always enough to define a group in the CRM sense. May also model sub-elements under person, and attributes.
18.2.1 Hand (used in the header to define each distinct scribe or handwriting style. E21 Person and/or E55 Type "to signal the person responsible ... for the writing..." E.g., this is a single real (but maybe unidentified) person (leaving "med påholden penn" aside). handShift may be modelled as an event involving two persons?
6.11.2.2 Author in a bibliographic reference, contains the name of the author(s), personal or corporate, of a work; the primary statement of responsibility for any bibliographic item E82 Actor Appellation --> P131 identifies --> E39 Actor --> P14 carried out (--> P14.1 in the role of --> E55 Type: author) --> E65 Creation Event This is an actor appelation. The rest of the model follows from the scope note.
6.5.1 Name (name, proper noun) contains a proper noun or noun phrase   E41 Appellation Also more detailled names in this chapter and 20 to be modelled similarly, with subclasses of appelation when applicable, also based on typing in attributes: "Specific subclasses of E41 Appellation should be used when instances of E41 Appellation of a characteristic form are used for particular objects."