Introduction to databases
Previously in this module we have looked at issues surrounding digital assets: how to capture, store, and protect them and how to make them available. Now we are going to turn our attention to how computational techniques can be applied to information to make it easier to search and analyse when we have got a lot of it. Database systems are essentially nothing more than computerised record-keeping systems that can be used to keep track of sets of data. The database itself can be thought of as a kind of electronic filing cabinet and the rest of the system is there to allow users to create and delete files and to get information in and out of the files. In this session you are introduced to entity relationship diagrams as a technique for modelling data relationships and to basic ideas underlying databases, including data structure, tables, records, fields, keys, data types. You will also be introduced to the FileMaker Pro database management system.
By the end of this session you should be able to:
- Draw an entity relationship diagram to model the entities and attributes in a set of data such as an archival collection.
- Use the ERD to define the tables, fields etc needed to represent the data in your archive, on paper.
- Define key authority fields in data sets.
- Use standard Dublin Core metadata terms to define field names.
- Convert simple linked tables to a Filemaker Pro database
- Explain what is meant by “metadata” and use a metadata schema to create text based descriptions of digital objects to enable their discovery and retrieval from electronic systems.
Constructing a simple database
In this session we are going to be looking at the Roger Fenton Letters from the Crimea Web site and converting the letter information to a database.
An ideal database is sets of data, linked together, with no duplication and no ambiguity, that is readily searchable. A very simple database will consist of a single table which contains all the information. These are usually 'home-made' and may contain the same data several times, which makes such a database time-consuming to change and open to errors. 'Professional' databases consist of more than one table, and are constructed so that important data are only entered once. Databases with more than one table are called relational databases because the tables are 'related' to one another by a common piece of information. Before we start it is useful to define some key terms.
- Field: container of data, eg.date, author name, image, image title
- Primary key: unique identifier of a record in a table
- Record: set of fields containing data about an entity
- Table: set of records
We can use the Photographs Exhibited in Britain 1839-1865 Web site to illustrate how these terms are used.
- Field: exhibit title, exhibition details, catalogue number, process
- Primary key: not shown in this table. NB if PEIB contained exhibits from just one exhibition then the catalogue numbers could be used as primary keys.
- Record: an exhibit such as catalogue number 19 from the 1859 Aberdeen exhibition.
- Table of records:
Entity relationship diagrams
Entity relationship diagrams (ERDs) are annotated visual representations of how the various entities or attributes of an object relate to each other. In the case of a large and complicated data set, an ERD is a much easier way of comprehending everything that is going on than a set of tables. Here are the terms that are used to create a description of the data that will become a database:
- Entity: discrete item, eg. letter, book, photograph
- Attribute: property of an entity, eg. creator, content, date
- Entity relationship: relationship between entities, eg. [letter content]created by [author name]
In the case of PEIB we can draw a diagram like this:
- Rectangle = entity
- Oval = attribute
- Diamond = relationship
Go to the Roger Fenton Letters from the Crimea Web site and look at any one of the letters there.
- What are the component parts, ie "entities" of this letter?
- Draw an entity relationship diagram which describes the way the information in this letter is structured and how the different entities are related to each other.
Having finalised your ERD, the next step is to convert it to a table. Using the example of PEIB again, we can develop a table with these headings:
|Cat No.||Title||Process||Photographer||Date||Location||Price mounted||Price unmounted|
|19||Scott Monument||Talbotype||Adamson, Robert||1859||Aberdeen, British Association||no data||no data|
Convert your Fenton Letters ERD to a single table like this.
Constructing a more complex database
The data that lie behind the Roger Fenton Letters from the Crimea Web site are more complex than the site suggests at first reading. The Web site is actually a compilation of two sets of letters. You can read about how it came about here. The letters were written by Roger Fenton mostly to Annie Grace Fenton, but then copied out by hand by Annie Grace and by another relative, Joseph Fenton. We do not have the original Fenton letters but one set each of the copies made by Annie and Joseph have survived. These two sets of transcriptions do not reproduce exactly the same set of letters and sometimes even within the two copies of the same letter there are variations in content. To an historian, the Roger Fenton Web site is probably of only limited use as it stands. It might be more useful to be able to compare the Annie Grace and the Joseph Fenton versions of the letters with each other. In addition scholars might want to be able to inspect the actual handwritten copies of the letters created by Annie Grace or Joseph. So a more useful research resource would be one which instead of just one composite transcription of the letters, shows two different sets of digital transcriptions and two sets of handwritten pages.
- Redraw your ERD to represent this new database, make a digital copy of the ERD and post it to your notes on the wiki and paste a link to it here:
- Now convert your ERD to a table.
Unique and common in database creation
When looking at our new table the question we have to ask is 'What is it that is UNIQUE about our data?'. From this table you can see that quite a lot of the information about each letter is repetitive (who copied it/which letterbook contains it). Entering the same data again and again is tedious and creates the possibilitiy of data entry errors. In order to avoid duplication and errors it is a good idea to split the information into fields which are unique to the entity and those which contain duplicate data that is common to more than one entity. The unique fields can be gathered together in one table which can be linked to another table, containing the common information. The new tables are not two different databases, we are still in the same database which contains two different tables linked together.
The Primary and Foreign Key
When working on your database you will need a unique identifier in order to keep each record distinct from the other records. This is called your primary key. The primary key is a numerical or alpha numerical field that must be unique and must not be duplicated. The primary key stands on its own and is a reference given to each individual entry in a table of unique fields. Primary keys are used to link database tables together. The primary key in a table of unique fields links to what is known as a foreign key in a second table of data, usually containing common fields of the same data. In the case of PEIB exhibits we could use the catalogue numbers as primary keys, except that PEIB comprises a collection of exhibits from a number of different exhibitions 1839-1865, so its likely that the same catalogue number refers to several exhibits. To ensure that each exhibit has a totally unique identifier its advisable to assign each a unique number.
|Exhibit ID No.||Cat No.||Title||Process||Photographer||Date||Location||Price mounted||Price unmounted|
|001||19||Scott Monument||Talbotype||Adamson, Robert||1859||Aberdeen, British Association||no data||no data|
You can imagine how tedious it would be to have to enter the exhibition details over and over again if there were a lot of exhibits in an exhibition. A way round this is to split the exhibition details off into a separate table that can be linked back to the exhibits table using another primary key, this time an exhibition ID number.
|Exhibit ID No.||Cat No.||Title||Process||Photographer||Exhibition ID No.||Price mounted||Price unmounted|
|001||19||Scott Monument||Talbotype||Adamson, Robert||1||no data||no data|
|Exhibition ID No.||Date||Location||Host|
- Now draw a set of tables to represent the data in your ERD and use arrows to show how you would link the tables by joining the primary key to foreign keys. Post the drawing to your notes and paste a link to it here:
Suppose you are interested in photographers who exhibited their work in the late 19th century. Take a look at how such people are categorised in these two examples:
The Photographs Exhibited in Britain 1839-1865 Web site lists exhibition contributors as "photographers", whereas the Exhibitions of the Royal Photographic Society 1870-1915 Website lists them as "exhibitors", because, in each case, that's how they were identified in the original exhibition catalogues. Computer search engines don't know that in this case "exhibitor" and "photographer" mean much the same thing. So a search for "photographers" would not discover "exhibitors" and vice versa. Using metadata it's possible to link these two sets of data so that they can be discovered by a single search. The data in these two bases has been marked up using Dublin Core metdata (http://www.ukoln.ac.uk/cultural-heritage/documents/briefing-27/html/). "Photographers" and "exhibitors" have both been marked up as "contributors" in Dublin Core. A customised search engine such as the one used at http://kmd.dmu.ac.uk/kmd_photohistory_page/ interrogates the metadata to discover that "Photographers" and "exhibitors" have both been marked up as "contributors" and so returns results for both "photographers" and "exhibitors" when either of these search terms is used. Of course, its possible to link more than two sites in this way. Which is why its important to have internationally agreed standards for metadata and to apply them rigorously and consistently.
This file shows how we used metadata in the Exhibitions of the Royal Photographic Society project to describe the original photographic exhibits, print based copies of them in the catalogues and the digital version we created for the Web site. File:Image Metadata 200707.doc
Using what you have learned from this session, map your database field names to Dublin Core elements.
Before we finish with metatdata its worth noting that there are quite a few different metadata standards or "schema", developed for different purposes. Dublin Core works well for published works such as books, but not so well for other items, such as photographs or other works of art. The most common simple metadata schema for works of art is the CDWA (Categories for the Description for Works of Art) schema. If you want to see how Dublin Core compares with CDWA and other Cultural Heritage metatdata schemas take a look at http://www.getty.edu/research/publications/electronic_publications/intrometadata/crosswalks.html.
Follow this link to the PRDA Contents page to return to the module contents.
Back to the MA: Photographic History Main Page to return to the Course contents.