6. Metadata value space



In the previous sections, examples of data structures or metadata element sets have been introduced. "The choice of terms or words (data values) and the selection, organization, and formatting of those words (data content) are two other types of standards that must be used in conjunction with an agreed-upon data structure" (CCO Introduction, 2005). This part provides resource related to data values and data content.

6.1 Using controlled vocabularies for named entities, time and space, and subjects

•  Almost all metadata standards require or recommend the use of controlled vocabularies for some elements.

Examples from Dublin Core 1.1:

Element Name Subject
Label: Subject and Keywords
Definition: A topic of the content of the resource.
Comment: Typically, Subject will be expressed as keywords, key phrases or classification codes that describe a topic of the resource. Recommended best practice is to select a value from a controlled vocabulary or formal classification scheme.

Element Name: Type
Label: Resource Type
Definition: The nature or genre of the content of the resource.
Comment: Type includes terms describing general categories, functions, genres, or aggregation levels for content. Recommended best practice is to select a value from a controlled vocabulary (for example, the DCMI Type Vocabulary [ DCT1 ]). To describe the physical or digital manifestation of the resource, use the FORMAT element.


Element Name: date
Label: Date
Definition: A point or period of time associated with an event in the lifecycle of the resource.
Comment: Date may be used to express temporal information at any level of granularity. Recommended best practice is to use an encoding scheme, such as the W3CDTF profile of ISO 8601.

The following controlled vocabularies are usually recommended by the metadata standards or best practice guide. For a more completed list, seeanother source.

6.2 Standardized vocabularies

DCMI Type Vocabulary
A general, cross-domain list of approved terms that may be used as values for the Resource Type element to identify the genre of a resource.

[MIME] Internet Media Types
May be used as values for the Format element.

RFC 4646 Tags for Identifying Languages

ISO 3166 - Codes for the representation of names of countries.

ISO 639 Codes for the representation of names of languages
Provides two sets of language codes for the representation of names of languages.

W3C Date and Time Formats (W3C-DTF)

6.3 Thesauri and classification schemes

Note: Only a small number of thesauri and classification schemes are listed below. They are frequently mentioned in metadata standards. A more completed list is available online.

Subject Headings

Library of Congress Subject Headings (LCSH)

FAST (Faceted Application of Subject Terminology) Authority File
An adaptation of the Library of Congress Subject Headings (LCSH) with a simplified syntax. The headings have been built into FAST authority records and accessible through the OCLC FAST Test Databases Web site.

Medical Subject Headings (MESH) 
MeSH consists of sets of terms naming descriptors in a hierarchical structure that permits searching at various levels of specificity. There are 22,568 descriptors in MeSH. In addition to these headings, there are more than 139,000 headings called Supplementary Concept Records (formerly Supplementary Chemical Records) within a separate thesaurus.


Art and Architecture Thesaurus (AAT)
The AAT is a structured vocabulary of more than 133,000 terms, descriptions, bibliographic citations, and other information relating to fine art, architecture, decorative arts, archival materials, and material culture.
Linked Data Sparql Endpoint: http://vocab.getty.edu/queries#Finding_Subjects

Library of Congress Thesauri

Thesaurus for the Global Legal information Network (GLIN)
Now used for The Global Legal Information Network's multi-national database of legislation, this thesaurus has been under continuous development since 1950.

Legislative Indexing Vocabulary (LIV)
The thesaurus was developed by the Congressional Research Service for use with legislative and public policy material.

Thesaurus for Graphic Materials
The Thesaurus for Graphic Materials is a tool for indexing visual materials by subject and by genre/format. The thesaurus includes more than 7,000 subject terms and 650 genre/format terms to index types of photographs, prints, design drawings, ephemera, and other pictures.

Classification schemes  

Dewey Decimal Classification (DDC)
Website about DDC http://www.oclc.org/dewey/default.htm

Library of Congress Classification
Outline: http://lcweb.loc.gov/catdir/cpso/lcco/lcco.html
Available as Linked Data:http://id.loc.gov/authorities/classification.html

Universal Decimal Classification (UDC)
Website about UDC http://www.udcc.org/index.php/site/page?view=about
UDC Summary http://www.udcc.org/udcsummary/php/index.php

The ACM Computing Classification System [2012 Version], Association for Computing Machinery

6.4 Name authority lists

VIAF (The Virtual International Authority File)
The VIAF combines multiple name authority files into a single OCLC-hosted name authority service. Contributed by 34 agencies in 29 countries (as of July 2014).

The Union List of Artist Names (ULAN)
The ULAN is a structured vocabulary containing more than 225,000 names and biographical and bibliographic information about artists and architects, including a wealth of variant names, pseudonyms, and language variants.
Linked Data SPARQL Endpoint: http://vocab.getty.edu/queries#ULAN-Specific_Queries

The Getty Thesaurus of Geographic Names (TGN)
The TGN is a structured, world-coverage vocabulary of 1.3 million names, including vernacular and historical names, coordinates, place types, and descriptive notes, focusing on places important for the study of art and architecture.
Linked Data SPARQL Endpoint:http://vocab.getty.edu/queries#TGN-Specific_Queries

LC Name Authority file = Anglo-American Authority File (AAAF)
Includes several millions of name authority records for personal, corporate, meeting, and geographic names.

Linked Data version: http://id.loc.gov/authorities/names.html

6.5 Best practice guidelines for data content

The best practice guides prepared by various communities and projects usually provide detailed guidelines regarding how to assign values when creating metadata records. The following are examples of standards for data content to be followed in particular communities.

Cataloguing Culture Objects (CCO), A Guide to Describing Cultural Works and Their Images

Provides guidelines for selecting, ordering, and formatting data used to populate elements in a catalogue record, in order to to advance the increasing move toward shared cataloguing and contribute to improved documentation and access to cultural heritage information.

Guidelines for Encoding Bibliographic Citation Information in Dublin Core Metadata

It deals primarily with bibliographic citations for a resource within its own metadata, but some guidelines for describing references to other resources are also indicated.

Describing Archives: A Content Standard (DACS) Society of American Archivists (SAA) http://www.archivists.org/governance/standards/dacs.asp
An output-neutral set of rules for describing archives, personal papers, and manuscript collections, and can be applied to all material types.

DLESE Best Practices
Lists the metadata field definitions, cataloging best practices, and vocabulary explanations for the metadata fields in the DLESE Cataloging System.

LODE-BD Recommendations 2.0
-- Report on how to select appropriate encoding strategies for producing Linked Open Data (LOD)-enabled bibliographic data.

Guidelines released by the AIMS of the Food and Agriculture Organization (FAO) of the United Nations.  

RDA: Resource Description and Access
RDA Toolkit: http://www.rdatoolkit.org/
A comprehensive set of guidelines and instructions on resource description and access covering all types of content and media.

Many metadata standards usually include the best practice guides in the specifications, see Part 4 for the list of standards.

