The Name of the Place : Towards a Model for Interconnection of Geographical Entities

Øyvind Eide. E-mail: oyvind.eide@muspro.uio.no
Lars-Jørgen Tvedt. E-mail: l.j.tvedt@edd.uio.no
Jon Holmen. E-mail: jon.holmen@edd.uio.no
All authors: Unit for Digital Documentation, University of Oslo.

Paper presented at ALLC/ACH 2002, Tübingen

Background

The Documentation Project and the Museum Project are cooperative projects between the four universities in Norway. Since 1991, these projects have performed retro-conversion and digitization of analogue archives, books, images and other media types.

The Museum Project is also responsible for the development of common database systems for the management of collections for all the Norwegian university museums. Ideally, these database systems should be able to handle all reference information related to artifact and specimen collections inside and outside the museums, for natural history as well as cultural history. The Documentation project has similar responsibilities for other collections in fields such as lexicography and field name studies.

The work is motivated by an ambition to develop IT-based systems that will offer users centralized and efficient access to information regarding the Norwegian cultural and natural heritage. With the help of common user interfaces and links between data from different fields of study, it will be possible to generate new information combinations and new insights in the various disciplines.

During the development of a database system for these diverse collections, the need for a model expressing relationships between places has been strongly felt, both in the modelling and the presentation of each collection and in the work towards deeper integration between the various collections.

Introduction

In this paper, we will describe a model for integration of geographical information being the base of our work at the Museum Project and the Documentation Project.

First, we will give some general views on how geographical entities are typically described, together with our needs for formalization of such descriptions to open up for digital integration.

We will describe two different ways to model relations between such descriptions, and argue why we have to choose the most difficult of them.

To conclude, we will give examples on how this system can be used, and describe how far we have come in the implementation of an instance of our model and import of Norwegian geographical data.

Geographical information in traditional and new media

People tend to use the concept "positioning of places", at lest in Norway, when they are really talking about connecting coordinates to a place which has already being positioned in the form of a textual description. This is similar to some database people talking about "unstructured information" when they should be talking about "information structured in a way not being directly expressible in a database structure".

In our place model, a place is an ever-lasting geographical entity, a geometric figure in the physical world. This figure is expressible in various coordinate systems. Our implementation of the model is being populated with data regarding historic and, eventually, prehistoric time.

A place as defined above can be referred to by several references. These references are strings of symbols. We will group such strings is two main categories:

A name must usually contain more than just the commonly used place name to be unambiguous. E.g., Bergen in itself is ambiguous, whereas

[1] Bergen in western Norway in 2002

is only referring to one single place.

The place referred to in [1] is different from Bergen as it existed in 1890, because the size of the city have changed through changes in the number and sizes of municipalities in Norway. It is also different from the several Bergens in Germany as well as in other countries.

The place references in the documents of our collections is regarded as observations. Each observation is connected to a specific time. Examples of such observations:

In large collections of information referring to places, knowing the relations between the places referred to is important to make integration possible. Thus, the integration we are trying to make, is integration between place observations, and such integration is created by linking each observation to a place object in the implementation of the model.

In traditional media, references to places have generally been expressed by a short name ("Bergen"), and a context. The context will typically tell an informed reader enough to know that the place is in Western Norway, and that the Bergen have a size as it had in the mid-20th century. In some cases, the context is strict, such as in an archaeological or botanical record where the exact date of a find implies the place as it existed that day. In other situations, such as in most novels, exact information about the size of the place is neither important nor questioned by the reader, and not provided by the author.

The information expressed in the context have to be expressed in another way in digital media to provide for integration. This expression will be the reference to a place appellation. The model we use to express this, is similar to the place entities in the CIDOK CRM model. Given the date of the observation, this appellation is linked to a specific place in the implementation of the place model. This implies two important links from the observation:

  1. To the place in the model
  2. To the appellation showing the various versions of this place

Going for a moment back to our Bergen example, these two links will be:

  1. to the object representing the city of Bergen at a certain time span
  2. to the appellation linking all versions of Bergen in western Norway together, but apart from Bergens in Germany or the USA.

Thus, the observation is linked to the exact geographical entity as well as to the concept of this specific Bergen, referring to various geographical entities, but at the same time referring to a distinct cultural entity - the city borders change, the city is also a county for a while, it is the capital of Norway for a while, it is a province city in the kingdom of Denmark for a while, it is named Bjørgvin, but in this context, it is still the same Bergen.

The links between the places

In connection with geographical reference systems for digital libraries we commonly find the use of UTM coordinates in the specification of geographical entities. Whereas this method gives many possibilities both for use in single collections and in the interconnection of heterogeneous collections of information objects, it is not suitable for collections that contain a large number of references to geographical entities for which it is impossible or unfeasible to enter such coordinates.

In information systems where the existence of coordinate references linked to geographical entities is obligatory, integrated searching can be performed as geometrical queries. Well-defined methods exist for this, and several off-the-shelf systems, e.g. Oracle spatial, support such queries. Our problem is that such a demand would strongly restrict the place observations for which integration is possible. To populate all of our collections with such coordinate references is impossible.

Whereas the use of UTM coordinates will be an important part of our systems, we have to design a system which does not depend on this type of information. As the basic structure in the organization of the geographical information objects, we use the political geography of Norway. At any single point in history, this structure takes the form of a tree, as seen in Figure 1. However, this tree structured model is too restricted, both to cover collections spanning a time period, and also to cover other types of named places.

Figure with place model 1

Figure 1


There are several other structures interconnected with the simple structure of Figure 1, e.g.:

In Figure 2, our basic model of links between place objects is sketched, populated with several geographical objects. The idea behind the model is not to represent all potential information about the relations between geographical entities, but to be able to include the information that is available. This is enough information to perform many forms of computer searching, browsing and interconnection between objects in an effective way.

Figure with place model 2

Figure 2


Municipalities described as old or new in the figure have been split up or united. Some of them will be connected to the same appellation, but then with different time periods.

The direction of the graph is always from the bigger unit to the smaller: a unit pointed to is a part of the unit pointing from. To make this possible, we have split geographical entities covering parts of several other entities into parts shown on the figure as hp for hill parts and sp for sea parts. Each such part is part of both the sea as a whole and the municipality that this part is located in.

In Figure 3, we see the links from observations to the place model. In fact, these links go via the appellation, as described earlier. These observations are external to the model; whereas the model shows the objective reality (in the meaning agreed upon, checked), observations are subjective references made by a human being at a specific time.

Figure with place model 3

Figure 3


For many of the objects in the model, we do not know the size or location of the place in the world. This is not an uncertainty in the same sense as for the observation, it is a lack of information in the data provided.

An observation will never refer to more than one place-name. If there is more than one link from an observation to place objects, such a split expresses an insecure fact, in which case an observation may relate to one out of several possible places, but we do not know which one.

Some observations are linked to too large place objects. E.g., an archaeological artifact is connected to a municipality, but could in fact be found at any of the farms in the municipality. This is because of weakness in the data provided, and is common in many of our collections. The model gives us the opportunity to show such possibilities in the search system.

The use of the model - searching

One of the main reasons for using this model is the possibilities for advanced integrated searching. We will sketch one search process as an example of what we are aiming at in the implementation of the system.

The task is to find:

To do this, the following steps will be performed:

  1. Find the place object of the municipality.
  2. Find all parts of this municipality. This is done by selecting all nodes in the sub-graph created by following the lines in the arrow direction on the figure.
  3. Find all observations in the archaeological artifact catalogues, the sites and monuments register and the place name collections with links to these objects
  4. Filter out all non-medieval archaeological records, and all place names with newer roots.
  5. Present the resulting set of records to the user.
  6. If the user also wants to know what records that possibly can be relevant: Follow the graph the other way from the start node. This traverse can be restricted to only certain types of places. E.g.: If the municipality is the result of a split, all records in the old, larger municipality might refer to the new smaller one. Thus, only larger objects with appellations with type set to municipality are searched. If wanted, the graph can also be followed regardless of type, all the way up to national level.
  7. Find all observations linked to this new set, and filter as above.
  8. Present the second list to the user appropriately marked as possible hits.

Implementation

We have quite a job to do before this is accomplished. What we have done so far, is:

We have implemented a directed graph structure in PL-SQL in Oracle. We have implemented methods for:

In the process of populating the graph with data, we have formatted a table of Norwegian municipalities and all changes from the 18th century until today sorted on date of the events when changes were made. This table is entered into the Oracle implementation as objects using the methods described above. In a way, we "played off" the history and recorded it in the graph.

We are now evaluating the results. There are some problems, but most changes are represented correctly. We are currently examining whether the problems are caused by wrong or misinterpreted data, or errors in the methods. Some very special events will probably have to be fixed by hand.

We are also developing an application for viewing and maintaining the graph. This application is not yet graphical, but this is a goal in the further development, making it easier both to view and to maintain fragments of the graph.

We will continue the implementation work, and hope to report back on a working prototype in the near future.