

The creation of a thesaurus de novo is always a daunting task, especially in such a multi-disciplinary field as health care informatics. This thesaurus had its roots in a historical collection at the University of Missouri-Columbia where the literature of the field has been collected since the 1970s. However, not only is informatics a multi-disciplinary field, but it is also a rapidly evolving field. Thus, the need to be current balanced the historical perspective of the early collection. Generally, we have chosen to use the current term when the same concept has changed its nomenclature and, by and large, have concentrated on recent literature as the source for the data which resulted in the thesaurus.
Relation to MeSH (Medical Subject Headings)
There are several indexing languages for the fields of biomedicine but in the United States the most widely known is MeSH, or the Medical Subject Headings, maintained by the United States National Library of Medicine. As we created this thesaurus for a health-related field we found that some of our terms matched those in MeSH. To validate our belief that our thesaurus was really a different entity, we checked all terms against the 1998 MeSH volumes and noted the tree numbers for those occurring in MeSH. (The tree numbers represent a classification scheme which is part of MeSH.) We learned that, while some of the computing terms in this thesaurus were the same as those in the L tree of MeSH (the section which contains the terms for health care informatics), we had terms for many more concepts. Further, we learned that often we had viewed the concept in a different relationship than its treatment in MeSH. Thus, our data indicated that we did have a new thesaurus which depicted new relationships between the concepts in health informatics.
From the beginning of the project, we had determined that we did not want to recreate those parts of MeSH which would be treated in the same way in both systems. Thus, we chose not to include terms for any concepts which occur in the following MeSH trees:
A (Anatomy)
B (Organisms)
C (Diseases)
D (Chemicals and Drugs)
For these concepts, we refer users to MeSH or one of the other existing schemes. So, for example, although several articles in the literature discuss computerized education programs for diabetes patients, this thesaurus does not include the term for the disease. We suggest that others use MeSH or a similar scheme for these concepts.
History of the Thesaurus
Early formal work in the terminology of medical informatics was at the United States National Library of Medicine. Medical informatics appeared in the L tree of MeSH developed in 1960. The revisions to that tree in 1963, 1965, 1966, 1975, and 1987, have added new terms, eliminated terms and changed the hierarchical relationship of terms. Rada and others helped in this revision in 1986 when they developed a medical informatics thesaurus. Their thesaurus was developed using an automatic merging of the thesaurus used by the Association of Computing Machinery and the information science component of MeSH. The terminology was then pruned by eliminating terms not related to those in the MEDINFO keyword list or not in the medical informatics literature. The terminology from this thesaurus was incorporated into the 1987 version of the MeSH L tree. Rada, working with the Committee for European Normalization under the International Medical Information Association, produced a 200 word thesaurus. This terminology left wide areas of informatics unrepresented because it focused on the creation of a framework for standards development. A complete history of the beginning work on this thesaurus has been published in the following articles:
Ogg NJ, Sievert MC, Li R and Mitchell JA, “The Missouri Medical Informatics Thesaurus,” Medlnfo '95: Proceedings of the Eighth World Congress on Medical Informatics, 1995, 15-156.
Ogg NJ, Sievert MC, Li,R, and Mitchell, JA, “Construction of a Medical Informatics Thesaurus,” Proceedings of the Eighteenth Annual Symposium on Computer Applications in Medical Care, 1994, 900-904.
Work on this thesaurus began in September 1992. It developed from efforts to do original abstracting and indexing of the literature of medical informatics for the D. A. B. Lindbergh Information Center for Health Management and Informatics at the University of Missouri-Columbia. Our efforts convinced us that a specific informatics thesaurus was needed that covered all the disciplines in the field and was large enough to offer the specificity needed by this constantly evolving field. Our purpose in developing the thesaurus was to address both user and literary warrant in selecting terms for inclusion. User warrant means that the terms chosen for inclusion must be those which users in the field would employ, while literary warrant means that the terms would be found in key documents in the literature of that discipline. Using existing thesauri, medical informatics literature and terminology of experts in the field, we identified appropriate concepts and terms to include in this thesaurus. We arranged items into categories and created the hierarchical structure for these terms. The categories for the final version are:
Administration
Computers and Communications
Education
Engineering
Health Care
Language, Libraries and Information Science
Mathematics
In addition we have included a list of terms under the heading Research and Evaluation Terms. These terms are designed to be used with terms from any and all of the other sections. That is, they are general terms which could appear in many parts of the language but are not clearly part of any one more than any other.
Structure of the book
The book is divided into three parts. The usual pattern for thesauri is to list the terms first in alphabetical, then hierarchical, then rotated order. We have chosen to arrange the parts in what to us is a more logical order than that. Just as it is suggested that the Permuted MeSH be the beginning point in using MeSH, we have decided to begin with the rotated display.
The first section of the book, then, is the rotated list of all words occurring in any term naming any of the concepts in the language. Thus, when users are unsure how a multi-word term will appear in the thesaurus, they can begin with any of the single words from that term in the rotated list. In other words, this section is similar to the Permuted MeSH volume. For example the term Information Systems appears both in the I section of the alphabet as
Information Storage and Retrieval
Information Systems
Clinical Information Systems
and in the S section of the listing as
Imaging Systems
Information Systems
Information Retrieval Systems
Thus, a user could find the term by either of the words of which it is composed. A second reason to begin with the rotated display is that for terms which have acronyms we have spelled out the complete name in the alphabetical list with the acronym following in parentheses. A user going directly to the alphabetical listing, therefore, needs to know that FTP will appear as File Transmission Protocol (FTP). However, in the rotated list the acronym appears in its correct alphabetical list with the complete spelling. The user who began in the rotated list, then, would learn the correct term to find in the alphabetical list.
The second part is an alphabetic listing of each term, showing its broader and narrower terms from the levels of the hierarchy directly above or below the term. To assist the user, all terms in the controlled language appear in boldface font. At times, we have also suggested other areas to look for terms. The alphabetical listing also contains our scope notes for terms which were distinctly defined during the creation of the language, e.g., Online Systems, SN: Historical use only. Typical entries look like the following:
Ambulatory Care Facilities
USE FOR: Outpatient Facilities
BT: Health Care Facilities
RT: Ambulatory Care Environments
Backtracking Algorithm
BT: Algorithms
NT: Backward Chaining
Feedback
SN: Use only in educational contexts; consider also Retrieval Feedback
BT: Instructional Design
SN stands for scope note, NT for narrower term, BT for broader term and RT for related term.
Finally, as noted above, terms with acronyms are completely spelled out with the acronym listed at the end of the term in parentheses.
The final section of the book, the hierarchical display, shows all terms in all occurrences in any of the six broad classes into which we divided the language. Some concepts appear in more than one class or in more than one position within a class. We retained such multiple listings because we felt they delineated the concept in its multiple relationships. This multi-axial approach indicates the complexity of the language because terms can often have meaning in more than one context. Terms can also have different meanings in different contexts, for example Audits when it appears under Accounting versus Data Security. The last listing in this section is termed “Research and Evaluation.” These terms are arranged alphabetically because they are designed to be used in combination with terms in the other six classes. While there are obvious relations between some of these terms the underlying concept of this group of terms is their general application and not their relationship to each other.
Feedback from users
The publication of this book is only part of the project. We also plan to make an electronic version available through the World Wide Web. In the fall 1998 we will be setting up a Web site where the thesaurus will be available and will offer there a place to send comments. We expect that the electronic version will appear as a link from the home page of the Department of Health Management and Informatics, School of Medicine, University of Missouri-Columbia. The current URL is: http://dabl.hmi.missouri.edu.
The team at the University of Missouri-Columbia which created this thesaurus is committed to maintaining it. Since this was a major undertaking we recognized that we would miss some concepts and that others would not agree with our placement of some concepts in relations to other concepts. We are interested in receiving feedback from users to help us in expanding it and correcting it. For those without access to the Web either of two authors may be reached via e-mail: sievertm@missouri.edu or moxleyd@missouri.edu. Traditional mail may also be used and comments may be sent to the first or second author at:
Department of Health Management and Informatics, 324 Clark Hall, MU Columbia, Missouri 65211, USA