Friday, January 20, 2017

Orphan Terms in a Taxonomy

A taxonomy has hierarchical relationships between all of its terms, so one of the quality control checks on a taxonomy is to ensure that there are no “orphan” terms, which are terms that lack hierarchical relationships. One of the purposes of a taxonomy is for users to be able to navigate it (whether it is fully displayed or whether the links between only the selected terms are displayed), in order to find terms of interest. An orphan term, thus, cannot be found by browsing, only by searching.

Taxonomy/thesaurus management software can generate orphan term reports. However, as there are different kinds or definitions of taxonomies or thesauri, there are also different kinds or definitions of orphan terms.  Certain definitions of orphans may be permitted, other kinds of orphans may be permitted in only certain kinds of controlled vocabularies, and some kinds of orphans are never permitted in any taxonomy or thesaurus.

Differences between taxonomies and thesauri

There are two main differences between strictly defined taxonomies and thesauri that have an impact on orphan terms.
  1. A taxonomy has only hierarchical (broader-narrower) relationships between its terms, whereas a thesaurus has both hierarchical and associative (related-term) relationships between terms.
  2. In a taxonomy, all terms belong to a single or limited number of hierarchies, each with a designated, broad-meaning “top term,” whereas in a thesaurus hierarchical relationships are created between terms merely as appropriate, without regard to any larger hierarchies or top terms.  A taxonomy thus has a top-down inverted tree structure, whereas a thesaurus does not necessarily have an over-arching hierarchical structure.

Different kinds of orphan terms

The loosest and easiest to remember definition of an orphan term is a term which lacks a “parent”. In other words, the term has no broader term, but it may have other kinds of relationships to terms.  A “top term” report of taxonomy/thesaurus management software will get this result, since all top terms are, by this definition, orphans.

An orphan term could also be defined as a term that has no hierarchical relationships, whether broader or narrower. In a thesaurus, such terms could have associative relationships only. In a taxonomy (lacking associative relationships), these terms then would have no relationships to other terms in the taxonomy.

At the strictest definition, an orphan term is defined as a term which lacks any relationships to any other term. This would be the same in a taxonomy or a thesaurus.
Finally, taxonomy/thesaurus management software may have the feature to allow you to define your own orphans, that is to designate a relationship type and then generate a list of terms that lack that relationship type to any other terms.

Which kind of orphans to avoid

Orphans defined merely as those lacking broader terms, are not necessarily a problem, since every taxonomy or thesaurus has top terms. For quality control, you would want to ensure that these parent-less “orphans” are indeed the top terms that you want. For a taxonomy, there are strict criteria for top terms. They must be broad-meaning categories under which are extensive hierarchical trees, perhaps even of a similar depth and breadth for each top term. For thesauri, the requirement for top terms are usually not strict, but it is still a good idea to review the top terms to ensure that there really is no appropriate broader term move them under.

An orphan report of the kind that indicates terms that lack any hierarchical relationship (narrower or broader) but may have associative (related-term) relationships is quite helpful when editing thesauri. It will depend on the thesaurus owner whether the policy should permit such “hierarchical orphans.” Generally, such orphans should at least be avoided and perhaps permitted in only exceptional circumstances.

Orphans defined as terms that lack any relationships to other terms in the taxonomy should not be permitted in any circumstance. They don’t serve the navigation feature of a taxonomy, as there is no way to find them without search. If a suitable broader term within the taxonomy cannot be found, then they may be out of scope of the taxonomy/thesaurus. Usually, though, such orphan terms are the results of taxonomist error. If the taxonomy management software permits duplicate terms, these orphans could be duplicates of synonyms/nonpreferred terms/alternative labels.

Resolving orphan terms

In the case of orphan terms that lack broader terms but are not obviously top terms, the taxonomist should search the taxonomy/thesaurus for a suitable broader term. If one cannot be found, careful consideration should be made whether a new term should be added that would both serve as a broader term for the orphan term but also have a suitable broader term of its own already in the taxonomy/thesaurus. If dealing with a thesaurus rather than a taxonomy, then it may be OK to leave the term without a broader term, but then the related-term relationships should be checked and possibly enhanced so that there are multiple related-term relationships.

Sometimes stretching the thesaurus rules for hierarchical relationships may be desired to provide a broader term to an orphan. This is generally acceptable in a taxonomy but not in a thesaurus. Following are examples of former orphan terms whose candidate broader terms are not 100% correct broader terms (the narrower term is not a kind of or a part of its broader term), but they are close, so these relationships could be made, even in a thesaurus. What follows in parentheses are theoretical broader terms which are not practical terms to create.
  • College applications BT College admissions (and not a BT of Applications)
  • Behavior problems BT Behavior (and not a BT of Problems)
  • Atmospheric composition BT Atmosphere (and not a BT of Composition)
  • Conflict termination (Military science) BT Wars (and not a BT of Termination)

Orphans that lack any relationships are usually the result of taxonomist error. Perhaps the taxonomist got interrupted and did not complete the process of relating a term and then forgot. In many cases these orphans should have been made as synonyms/nonpreferred terms/alternative labels. The taxonomist should run orphan reports frequently enough to remember whether the orphan term was intended to be a preferred or a nonpreferred name.

More examples of how to resolve orphan terms are in a PDF of a PowerPoint presentation “Managing Mature Taxonomies: Resolving Orphan Terms” I gave as an SLA Taxonomy Division webinar in December 2016.