Metadata in the Oxford Digital Library


Metadata by definition is simply "data about data", information about the objects stored within our collections, whether these are in traditional or electronic formats. In the standard library world, catalogue records are metadata, as they contain information about the library's collection of "data", i.e. the books, journals, and electronic resources that make up its collections. Metadata records in the traditional library fulfil several functions, including allowing users to find items, allowing them to assess their usefulness, and to allow librarians to administer them correctly. The same principles apply to objects within the digital library.

Types of metadata

Metadata can take several forms, some of which will be visible to the user of a digital library system, while others operate behind the scenes. The Digital Library Foundation (DLF), a coalition of 15 major research libraries in the USA, defines three types of metadata which can apply to objects in a digital library:-

  • descriptive metadata: information describing the intellectual content of the object, such as MARC cataloguing records, finding aids or similar schemes
  • administrative metadata: information necessary to allow a repository to manage the object: this can include information on how it was scanned, its storage format etc (often called technical metadata), copyright and licensing information, and information necessary for the long-term preservation of the digital objects (preservation metadata)
  • structural metadata: information that ties each object to others to make up logical units (for example, information that relates individual images of pages from a book to the others that make up the book itself)

In general, only descriptive metadata is visible to the users of a system, who search and browse it to find and assess the value of items in the collection. Administrative metadata is usually only used by those who maintain the collection, and structural metadata is generally used by the interface which compiles individual digital objects into more meaningful units (such a journal volumes) for the user.

Any metadata system for the Oxford Digital Library will have to handle all relevant types of metadata, and will have to be compatible with that used in traditional libraries, at least to the extent of allowing integrated searching. Clearly, any system employed will have to be more powerful than the standard mechanisms (such as the MARC record) currently in use in libraries, which generally limit themselves to the first two types of metadata.

XML: the language for metadata systems

A decision was reached very early in the planning of metadata for the ODL that it should be expressed in the eXtensible Markup Language (XML). This is a language designed initially for marking up electronic text, but which has since then been used for a wide variety of metadata applications. Its advantages for metadata encoding are many: they include its robustness, its software independence and hence its ready interchangeability between systems, and the way in which its structure maps neatly to that of many digital objects.1

An XML system can be expressed in two ways: the first, and longer established, system is the Document Type Definition (DTD), which lists what tags may be employed within an XML document, and also their content and relationships to each other. A much newer method of encoding an XML system is XML Schema, which expresses the rules an XML document has to follow in a further, separate XML document. XML Schema and its related schema languages are much more powerful than a DTD and allow for easier development and maintenance as just another type of XML data.

Some specific metadata standards

The following is a list of a selection of key metadata standards which are currently being used or being assessed for their potential use in the Oxford Digital Library. All conform to, or can be used with, the XML encoding language, which is generally acknowledged as the most robust and easily transferrable system for holding metadata.

Metadata Encoding & Transmission Standard (METS)

A newly devised standard, which refines and extends the earlier Making of America II (MOA), system, METS is designed specifically to encode descriptive, administrative, and structural metadata for objects within a digital library. One of the few systems designed specifically for digital libraries, it can fulfil all basic requirements of electronic collections. METS is widely used in the digital library world and is the standard underpinning the ODL's digital collections. METS is closely associated with a number of standards often used in tandem with METS, such as the Metadata Object Description Schema (MODS) for handling descriptive metadata, or MIX (NISO Metadata for Images in XML) for expressing technical metadata.

METS is written in XML Schema and so requires software that can handle this new format, e.g. XML editors such as oXygen, and XML databases such as eXist or Tamino. As METS depends on an elaborate system of cross references within documents, it is better generated automatically. The ODL has developed a database-driven system for the creation of METS that allows content producers to create, verify, and output their metadata (via a Web frontend) in the METS standard for the purpose of long-term preservation, sharability with scholars and other interested parties, and use in any system supporting these standards or transformation to systems not yet supporting the standard.

Encoded Archival Description (EAD)

EAD is an XML DTD used throughout the archival community for the encoding of finding aids (collection-level descriptions). Because of its extensive facilities to link to digital objects, it is able to describe digital collections as well as their more traditional counterparts. It is also designed to map closely to key standards such as MARC, which allows EAD records to be searched in tandem with those in longer established formats.

EAD is capable to describing a digital collection and its internal structure, from the topmost collection-level, down to individual items: its item-level descriptions are, however, somewhat limited, so making it unlikely to meet all metadata needs at the item level. It can, however, be easily used in tandem with other systems which have more extensive item-level facilities.

Many digital library projects use EAD: among the most significant are: MALVINE , Online Archive of California, American Heritage Project and ILEJ.

Text Encoding Initiative (TEI)

The TEI is the de facto standard for the encoding of most types of electronic texts, and as such is used by almost all of the world's e-text centres. A modular system, it incorporates a set of base tags, to which can be added specialized sets for use in particular applications, such as linguistic corpora, transcriptions of manuscripts, or critical editions. It also includes extensive facilities for descriptive metadata, most of which are located in the TEI header, a section of every document which holds information on the electronic text file itself and on the source from which it is taken. The header is designed to map closely to MARC for purpose of creating a library catalogue entry for the electronic text file.

TEI has been used in a large number of projects, mostly those with an extensive textual component, although a few, such as the Bodleian's Toyota Project use it to render metadata for images only: a complete list may be found at http://www.tei-c.org/Applications/. The TEI is readibly usable for item-level descriptions within a digital library, although for those without a textual component it is likely to be unnecessarily complex and to require a large amount of redundant tagging.

Dublin Core (DC)

The Dublin Core is a list of 15 basic fields designed initially to describe web-based resources sufficiently to allow their discovery by search engines. It is not an XML application as such, but designates elements which might be incorporated into such an application (as may be done in METS, for example). Because the DC elements are so broad, they may be qualified to limit their semantic range, which limits their functionality for cross-searching but increases their precision.

Dublin Core has great potential as a basic set of metadata for digital objects, but will often have to be supplemented by more detailed information specific to the needs of these objects. As it is not in itself a DTD or XML Schema, it needs to be used in conjunction with, or embedded in, such another XML application.

PREservation Metadata: Implementation Strategies (PREMIS)

Preservation metadata supports activities intended to ensure the long-term usability of a digital resource. It is "the information a repository uses to support the digital preservation process." PREMIS stands for "PREservation Metadata: Implementation Strategies" which is the name of an international working group sponsored by OCLC and RLG from 2003-2005. That working group produced a report called PREMIS Data Dictionary for Preservation Metadata which includes both a data dictionary and quite a bit of narrative about preservation metadata. An updated second version was issued in March 2008. The Library of Congress maintains a schema for representing PREMIS in XML. There is an active PREMIS Maintenance Activity sponsored by the Library of Congress. This includes a website linking to all sorts of official and unofficial PREMIS information. The Maintenance Activity also tries to promote awareness of PREMIS, sponsors tutorials in using PREMIS, and commissions studies and publications related to PREMIS. Usually, when people refer to "PREMIS" they mean the Data Dictionary. Occasionally they may be referring to the XML schema, to the working group, or to the entire effort including the Maintenance Activity.

Resource Description Framework (RDF)

The RDF is not a metadata scheme per se, but a system for encoding such schemes within a standardized framework. Designed initially for describing electronic resources on the internet, it provides a standard way of describing element names, their content and their relationships, so making it easier to find these resources and to exchange information on them. RDF is usually expressed in XML, and can be used as a framework for any metadata scheme listed here. For further information see An Idiot's Guide to the Resource Description Framework by Renato Iannella.


ONIX is an XML application designed for use within the book trade to enable publishers and booksellers to exchange essential metadata. As a consequence, it has very good facilities for describing key bibliographic, pricing and stock information, but is very limited in terms of strutural and administrative metadata. It also has limited capabilities to describe anything other than printed books. ONIX is therefore unlikely to be of much value for a digital library.


MARC is the established standard for the creation of machine readable cataloguing records, and underlies virtually all online library catalogues. It consequently has extensive features for describing bibliographic and copy-specific information, but has very limited structural facilities and administrative metadata which is heavily biased towards the needs of traditional library operations. It is of limited use for incunabula or manuscripts, and other objects which may be included in a digital collection.

Mappings to MARC are incorporated into most metadata systems, so that MARC records can be readily generated to allow linking from these to library catalogues. This will allow library users to find electronic versions of library materials in conjunction with their traditional counterparts.

Categories for the Description of Works of Art (CDWA)

Devised by the Art Information Task Force (AITF), CDWA attempts to define a set of core fields for the description of works of art. In effect, it has a similar aim to Dublin Core, but is much more specialised in its scope and function: it distinguishes between information intrinsic to the work (art object, architecture, or group) and information extrinsic to the work (such as information about persons, places, and concepts related to the work). Like DC, it is not tied to any given DTD, but may be incorporated into other XML systems.

A similar project to CDWA is Visual Resources Association Core Categories, which similarly attempts to define core fields for the description of visual resources, and also adds information on their surrogates (such as digital images).This is still in its early testing stage, however, and is certain to undergo further revision as it is evaluated.

Instructional Management Systems Metadata (IMS)

IMS is a metadata system devised for the management of online learning resources, which could include objects within a digital library. Published as an XML DTD, it includes components to provide descriptive and administrative metadata, and is designed to map to Dublin Core. While undoubtedly powerful, it has often been criticized as over-complex, and has been little taken-up by digital libraries.

CURL Exemplars in Digital Archives (Cedars)

CEDARS is a key system to encode metadata necessary for the long-term preservation of digital materials. It aims "to promote awareness of the importance of digital preservation, to produce strategic frameworks for digital collection management policies and to promote methods appropriate for long-term preservation." Among its deliverables is a draft specification for preservation metadata , which can readibly be incorporated into an XML system.

Metadata Tools

Numerous software tools exist for the encoding of metadata, ranging from freeware packages to highly complex and expensive integrated systems. Those designed specifically for XML encoding range from free packages such as Emacs to commercial packages such as XMetal or oXygen.

Some further references

An extensive body of material on metadata for digital libraries is available on the internet. A very limited selection of important resources are listed below.

Library Digital Initiative: Metadata
A basic guide to metadata standards for digital libraries, and specific metadata systems, compiled for Harvard's Library Digital Initiative
IFLANET: Digital Library Metadata Resources
A extensive collection of links to information on digital library metadata resources, compiled by IFLA.
Library of Congress Standards
A gateway page to information on standards maintained by the Library of Congress, and others used by the Library in its digital library projects.
UKOLN Metadata
The UK Office of Library Networking aims to be a national focus for digital information management, and is involved in several key metadata projects listed on this page.
Digital Library Standards and Practices
The Digital Library Federation's home page for standards and practices lists key metadata systems as well as providing information on topics such as benchmarking, quality evaluation etc.
PowerPoint Presentation for the 2005 Association for Computing in the Humanities Conference, at the University of Victoria, British Columbia, 15th June 2005. Given by Richard Gartner.



Copyright 2005 ODL, University of Oxford