About Digital Libraries


Table of Contents

Important Digital Library Web Sites

Digital Library Federation


IFLA : Digital Library Resources and Projects


National Science Foundations : Digital Libraries Initiative Phase II


American Memory : Library of Congress National Digital Library


Metadata

Metadata is not a new concept; it has existed in the computer science field for decades, and refers to information about electronic computer files. To update the concept a bit, the term "metadata" is now used to refer to information about any digital object that exists on the Internet. The need for certain types of data (such as creation date, file size, etc.) might seem obvious if one is managing a large group of digital objects merely as files. However, the Internet and World Wide Web offer great promise in terms of precision management, discovery and retrieval of digital objects such as images, e-texts, multimedia presentations, and other electronic files. Metadata may manifest itself either as an embedded, integral part of the digital object, to be retrieved and manipulated for various purposes, or it may exist externally from the digital object. Metadata often is broken into three broad categories:

  • Descriptive metadata: Information that conveys some sense of intellectual content and context.

  • Structural Metadata: Information that describes the attributes of an object, such as size, electronic format, and digital capture process.

  • Administrative metadata: Information regarding rights management, creation date of the digital resource, hardware configuration, etc.

A "descriptive" metadata record consists of a set of elements, such as title, creator, format, date of creation, and subject coverage, that are necessary for describing a particular resource. Obviously, some institutions will use more elements to suit their specific needs or the needs of the resource being described.

Interoperability

Interoperability is the ability of two or more systems to exchange information and to use the information that has been exchanged. Interoperability is highly dependent upon the ability to both a) conceptually map identical or similar elements of data structures, and b) consistently and reliably extract relevant information from within data structures. Numerous strategies can promote interoperability between multiple systems, but the simplest strategy is for the owners/operators of each system to employ similar data structures and to utilize similar or identical semantics and vocabularies as information is entered into these systems. For example, consider the following:

System One contains images and information about cattle. Its database structure includes a field named TERMS, and every record related to cows or cattle has a "Cattle" entry in this field.


System Two also contains images and information about cattle. Its database structure include a field named SUBJECTS, and every record related to cows or cattle has a "Cows" entry in this field.


Achieving interoperability between these two systems will require coordination on two levels:

Establishing a common mapping between the fields each system uses to hold subject terms for each record. Both system owners could agree to simply change this field to a common name used by both, or they can alternatively decide to keep their respective data structures, and build a "map" that equates these and other similar data fields. When exchanging information, each system owner would then know what type of information would be found in this field.


Providing some means for equating the related subjects "Cows" and "Cattle" so that a cross-system search will consistently and reliably retrieve all material on this topic. Although both subject terms are valid terms from a standardized vocabulary (Library of Congress Subject Headings), owners of both System One and System Two cannot guarantee users full retrieval of relevant content until they reach some agreement on how to address differences in descriptive metadata.


Obviously, the second aspect of interoperability, semantic and taxonomic compatibility, is the more difficult task. Certainly, consistently utilizing standardized vocabularies such as the Library of Congress Subject Headings, the Art & Architecture Thesaurus, or those employed by other professional communities, is a good first step. Documentation of utilized vocabularies is an essential aspect of managing large information systems. As is shown in the example above, however, the potential for variation even within the same controlled vacabulary is great. Therefore, achieving fuller interoperability between Systems One and Two would require agreement by both partners on some common application guidelines for the Library of Congress Subject Headings.

In the real world, full semantic and taxonomic interoperability across diverse systems with diverse content is impossible. Factors such as differing descriptive needs, the granularity of description, and even intended use of information, dictate that every system cannot be identical or 100 percent mappable to another system. However, by documenting employed data structure and content standards, owners of any system can still promote eventual interoperability at some level with other systems.

Controlled Vocabularies

Content data for some elements, such as the subject element, may be selected from a "controlled vocabulary," a limited set of consistently used and carefully defined terms. Using terminology from a controlled vocabulary ensures consistency and can improve the quality of search results, and may also reduce the likelihood of spelling errors when recording metadata. The description of each element indicates whether content should be selected from a controlled vocabulary, if possible.

Below are links to several "controlled vocabularies" accessible online:



Mappings Between Metadata Standards

A "crosswalk" has been defined as a "set of transformations applied to the content of elements in a source metadata standard that results in the storage of appropriately modified content in the analagous elements of a target metadata standard." (NISO White Paper, October 1998) A fully specified crosswalk contains a semantic mapping as well as a conversion specification. See the NISO White Paper, "Issues in Crosswalking Content Metadata Standards," for further information on a definition and specification of crosswalks.

Crosswalks provide the ability to create and maintain a set of metadata customized for local needs, and to map that metadata into any number of related content metadata standards. In order to build successful crosswalks and mapping schemes, it is important to maintain consistent data formats and data quality across metadata standards.

Other Interesting Metadata Intitiatives

A Word About Archival scanning

The notion of "archival" scanning is a misnomer in most cases.  Digital reproductions are just that: a reformating technology.  It is rarely intended to replace the original item, except when that item would disintegrate on its own accord. Even so, a high-quality digital reproduction can help preserve the original by removing it from excessive handling or casual use.  In such cases, scanning should strive to capture the essence of the item in terms of detail and color fidelity, as well as information about the original and its digital surrogate.

There are no accepted standards for "archival" scanning, although plenty of guidelines.  Most guidelines are designed to practically capture the greatest amount of detail relevant to the image. Most high-resolution images are stored off-line because of their size (easily 50MB in the case of a 600 pixel per inch scan at 24-bit color) and the relatively low throughput of the Web.

The most underestimated expense of scanning is capturing information about the scan (metadata), as well as providing adequate descriptive information about the image.  While most archives describe their holdings collection by collection, the process of putting images on the Web demands an item-by-item description.  Archivists have an ethical requirement to explain individual images in the context of the collection in which they were found.  With no established metadata requirements, and virtually no software to connect everything together, the descriptive process may turn out to be the most expensive part of a scanning project.

Planning A Digitization Project

Numerous factors will influence the overall success of a digitization project, including staff, available expertise, equipment, and funding. The most important key to success, however, is planning and setting realistic goals, based upon knowledgeable advice. Per-item scanning costs, workflows, and many other details are bound to vary from project to project and from one original format to the next. The following links provide good advice on how to go about planning a digitization project.

Moving Theory Into Practice
Scanning Photographic Collections
Visual Resource Image Quality
RLG Guidelines for Imaging
Digital Libraries Resources
Visual Arts Digital Resources

Technical Information on Digital Formats

The following sites offer specific recommended technical formats for different types of original information:

Digital Formats for Reproductions
Preserving Digital Information
Digitization Technical Standards
Other Archival Scanning Guidelines
Library of Congress - Digital Formats for Content Reproductions
Creating and Distributing High Resolution Cartographic Images
Digital Imaging for Photographic Collections: Foundations for Technical Standards
Joint RLG and NPO Conference on Guidelines for Digital Imaging
PIMA / IT10 Technical Committee on Electronic Still Picture Imaging

Authoritative Sites on Digitization Standards

The following projects and sites provide a wealth of experience and information to anyone considering creating and managing large bodies of digital information.

Art Libraries Society of North America
Visual Resources Association
Museum Computer Network
OCLC
Research Libraries Group
CIMI (Consortium for Computer Interchange of Museum Information)
Society of American Archivists
Getty Research Institute
Digital Library Federation
Library of Congress
National Information Standards Organization

Appropriate Use of Electronic Resources
©1999-2002 University of Minnesota Libraries. All rights reserved.
Please credit the University of Minnesota Libraries if you copy or reproduce material from this page.
URL: http://digital.lib.umn.edu/digcoll.html
Last Revised: June 12, 2002