2. Principles of Controlled Vocabularies

There are four important principles of vocabulary control that guide their design and development.
These are:

eliminating ambiguity
controlling synonyms
establishing relationships among terms where appropriate
testing and validation of terms

A major goal of vocabulary control is to ensure that each distinct concept refers to a unique linguistic form. These types of linguistic relationships should be controlled or regularized so that information or content that is provided to a user is not spread across the system under multiple access points, but is gathered together in one place. Eliminating ambiguity and compensating for synonymy through vocabulary control assures that each term has only one meaning and that only one term may be used to represent a given concept or entity.

2.1 Ambiguity (5.3.1)

Ambiguity occurs in natural language when a word or phrase (a homograph or polyseme) has more than one meaning. Figure 2 provides an example and shows how a single word may be used to represent multiple, very different concepts.

Figure 2: Ambiguity caused by homographs and polysemes

A controlled vocabulary must compensate for the problems caused by ambiguity by ensuring that each term has one and only one meaning.

2.2 Synonymy (5.3.2)

A different problem occurs when a concept can be represented by two or more synonymous or nearly synonymous words or phrases. This is called synonymy. This means that desired content may be scattered around an information space or database because it can be described by different but equivalent terminology. Figure 3 illustrates this case:

Figure 3: Information scatter caused by synonyms

A controlled vocabulary must compensate for the problems caused by synonymy by ensuring that each concept is represented by a single preferred term. The vocabulary should list the other synonyms and variants as non-preferred terms with USE references to the preferred term.

Note: A synonym ring is an exception to the above rule. See section 5.4.2 for more information on this type of vocabulary.

There are other types of “equivalent” terms besides synonyms which require vocabulary control. Section 4.2 includes a full discussion of equivalence control.

2.3 Semantic Relationships (5.3.3)

Various types of semantic relationships may be identified among the terms in a controlled vocabulary. These include equality relationships, hierarchical relationships and associative relationships which may be defined as required for a particular application. Section 8 includes a full discussion of the various types of relationships that may be included in controlled vocabularies.

See also: 4. Semantic Relationships used in Controlled Vocabularies

2.4 Using Warrant to Select Terms (5.3.5)

The process of selecting terms for inclusion in controlled vocabularies involves consulting various sources of words and phrases as well as criteria based on:

• the natural language used to describe content objects (literary warrant),
• the language of users (user warrant), and
• the needs and priorities of the organization (organizational warrant).

2.4.1 Literary Warrant (

Assessing literary warrant involves consulting reference sources such as dictionaries or textbooks as well as existing vocabularies. The word or phrases chosen should match as closely as possible the prevailing descriptions for the concept in the literature.

2.4.2 Organizational Warrant ( )

Determining organization warrant requires identifying the form or forms of terms that are preferred by the organization or organizations that will use the controlled vocabulary.

2.4.3 User Warrant (

Creating lists of potential terms to enhance completeness of the vocabulary.

• Organizing candidate terms into broad categories to determine what categories users prefer and what they should be called.
• Placing candidate terms into a tentative set of broad categories to validate categories that have been created.
• Reviewing drafts of the vocabulary to add missing terms, delete terms that are incorrect or obsolete, create more useful term forms, and identify and correct missing and/or incorrect relationships among terms.


